determining snapshot size in BTRFS

After switching my backup hard drive from zfs-fuse to btrfs, one of the features I missed most was the extra info zfs list -t snapshot gives, and specifically the size of the snapshot.

It’s helpful if you want to know how much space you’d get back if you deleted a snapshot, but it’s also a decent indicator of how much the fs changed between snapshots. This second point has helped me identify when large files are accidentally being backed up when they shouldn’t (like when I accidentally put a video file in the wrong folder.)

There’s currently no built-in way to determine this information, and after asking google and #btrfs on freenode.net, I decided to try and write a python script to figure it out.

Approach

From my limited understanding, data for btrfs is stored in extents and snapshots with identical data just point to the same extents. Therefore, to determine unique data on a snapshot, just find the extents on that snapshot that are on no other snapshot (or subvolume).

Sounds easy, but scanning and keeping track of all extents on a btrfs fs takes forever and then you run out of memory. Someone in #btrfs gave me the idea to use ‘btrfs subvolume find-new’ to find all the changed files between snapshots. Scanning the extents in only these files should be enough to identify unique extents, since the extents in all the other files are obviously shared. This will only work for successive snapshots of identical files (like in a backup scheme), but that’s what I’m using it for. This also makes it easy to tell how much has changed between snapshots if they are scanned in order of creation (in order of generation id).

This method doesn’t account for different files in the same snapshot that share extents, so if there are a lot of hardlinks it may not be accurate. In fact, I don’t really have a way to check it’s accuracy but it seems to give reasonable numbers.

Anyway, here’s the code:

#!/usr/bin/python
# Brian Parma
#
# script to get snapshot size info from a btrfs mount

import sys
import os
import stat
import subprocess as sp
import fiemap
import time
import shelve
import json
from collections import defaultdict


# if you set this to false, you may run out of memory
SKIP_FIRST = True
#SKIP_FIRST = False

# this function gets a list of subvolumes from a btrfs fs
def get_subvolume_list(path):
    out = sp.Popen('btrfs subvolume list'.split()+[path], 
                    stdout=sp.PIPE).communicate()[0]
                    
    return sorted( ' '.join(x.split()[6:]) 
                            for x in out.split('\n') if x != '')

# this function gets the last genid present in a subvolume (i think)
def get_genid_old(sv):
    out = sp.Popen(('btrfs subvolume find-new {0} 999999999'.format(sv)).split(),
                    stdout=sp.PIPE).communicate()[0]
                    
    return int(out.split()[-1])

# new function pulls the genid from the list of files
def get_genid(sv):
    out = sp.Popen(('btrfs subvolume find-new {0} 1'.format(sv)).split(),
                        stdout=sp.PIPE).communicate()[0]
                        
    return max( [int(row.split()[13]) for row in out.split('\n') 
                                            if row.startswith('inode')] )
    


# get full file list
def get_all_files(sv):
    out = sp.Popen(('find {0} -xdev -type f'.format(sv)).split(), 
                        stdout=sp.PIPE).communicate()[0]
                        
    return set( os.path.relpath(file, sv) 
                    for file in out.split('\n') if file != '' )


# this function gets the files in a subvolume changed since genid (i think)
def get_new_files(sv, genid):
    out = sp.Popen(('btrfs subvolume find-new {0} {1}'.format(sv, genid)).split(), 
                    stdout=sp.PIPE).communicate()[0]
                    
    return set( ' '.join(x.split()[16:]) for x in out.split('\n') 
                                                    if x.startswith('inode'))

# this func tries to determine extent info for a path
#TODO: use array.array for db, only storing extent address?
#TODO: maybe use an on-disk db
def check(path, exdb):
    # db keepts track of the extents in path
    db = set()
    try:
        st = os.lstat(path)
        if stat.S_ISLNK(st.st_mode):
            # don't follow symlinks
            return db
            
        try:
            # get fiemap info
            res = fiemap.fiemap(path)[0]
            for ex in res['extents']:
                # add extent to db
                db.add(ex['physical'])
                
                # check for extent in exdb
                pex = exdb.get(ex['physical'], [])
                if st.st_dev not in pex:
                    # keep track of the different devices that ref this extent 
                    #  (limited to same path on alternate device)
                    # store size as first element
                    if len(pex) == 0:
                        pex.append(int(ex['length']))
                        
                    pex.append(st.st_dev)
                    exdb[ex['physical']] = pex
                
        except Exception, s:
            print 'could not fiemap: {0}'.format(path)
            pass

    except OSError, e:
        pass

    return db
        
# found this on stack overflow
import math
def filesizeformat(bytes, precision=2):
    """Returns a humanized string for a given amount of bytes"""
    bytes = int(bytes)
    if bytes is 0:
        return '0bytes'
    log = math.floor(math.log(bytes, 1024))
    return "%.*f%s" % (
                        precision,
                        bytes / math.pow(1024, log),
                        ['bytes', 'kb', 'mb', 'gb', 'tb','pb', 'eb', 'zb', 'yb']
                        [int(log)]
                        )
_ = filesizeformat


def main(root, path=None):
    # need a trailing /
    if root[-1] != '/':
        root += '/'
    path = path if path is not None else root
    if path[-1] == '/':
        path = path[:-1]
    
    # list of subvols in path
    sv_list = [root+x for x in get_subvolume_list(root) if path in (root+x)]
    if len(sv_list) == 0:
        print 'No subvolumes found with (root,path) of ({0},{1})'.format(root,path)
        return
        
    # device id -> subvol dict
    sv_dict = dict([(os.stat(x).st_dev, x) for x in sv_list])
    
    # subvolume -> genid dict (genids not necessarily unique)
    sv_glist = sorted([ (get_genid(x), x) for x in sv_list])
#    sv_glist = sorted([ (get_gen_old(x), x) for x in sv_list])

    
    # database of {physical address : (extent size, devices...)} for extents
    exdb = defaultdict(list)

    print 'Building db of extents...'
    t = time.time()
    
    # subvolume -> delta size
    sv_delta = {sv_glist[0][1]:0}
    
    # generate list of files that need to be checked
    file_dict = defaultdict(set)
    gid_old, sv_old = sv_glist[0]
    ofiles = get_all_files(sv_old)
    for j in xrange(len(sv_glist)-1):
        gid_new, sv_new = sv_glist[j+1]
        
        nfiles = get_all_files(sv_new)
        nfiles_changed = get_new_files(sv_new, gid_old+1)
        nfiles_removed = ofiles - nfiles
        # files added with cp --reflink don't get a new genid, but don't 
        # take up extra space, should we count them in delta?? TODO
        nfiles_added = nfiles - ofiles
        
        # old subvolume, check changed files + removed files
        file_dict[sv_old].update(set.union(nfiles_changed, nfiles_removed))

        # new subvolume, check changed files + new files
        file_dict[sv_new].update(set.union(nfiles_changed, nfiles_added))
        
        # rotate
        gid_old, sv_old = gid_new, sv_new
        ofiles = nfiles
    
    # first step
    i = 0           # count files
    gid_old, sv_old = sv_glist[0]
    if not SKIP_FIRST:
        # This first pass scans all files in the first subvolume, which may
        # take forever and use all your memory if you have a large filesystem.
        # Without it, the first subvolume's numbers will be wrong, and files 
        # that don't change through any subvolume are not counted 

        all_files = get_all_files(sv_old)
        db_old = set()
        for file in all_files:
            db_old.update(check(sv_old+'/'+file, exdb))
            i += 1

        dsz = sum( exdb[addr][0] for addr in db_old )
        
        sv_delta[sv_old] = dsz
    
    else:
        # This scans all changed/added/removed files that exist on the first
        # subvolume.  This makes sure the first subvolume's device id is in
        # exdb, which prevents files that are removed later from only 
        # being counted on the subvolume they are removed from (fix uniqueness)
        #
        # i sub the first sv files out so they aren't run twice
        for file in (set.union(*(file_dict.values()))-file_dict[sv_old]):
            check(sv_old+'/'+file, exdb)     
            i += 1

    # fill first sv        
    db_old = set()
    for file in file_dict[sv_old]:
        db_old.update(check(sv_old+'/'+file, exdb))
        i += 1
        
    # loop the rest
    for j in xrange(len(sv_glist)-1):
        gid_new, sv_new = sv_glist[j+1]

        db_new = set()        
        for file in file_dict[sv_new]:
            db_new.update(check(sv_new+'/'+file, exdb))

        extents_removed = db_old - db_new
        extents_added = db_new - db_old
        
        # delta size between the svs
        dsz = sum( exdb[addr][0] 
                    for addr in set.union(extents_removed, extents_added) )

    
        i += len(file_dict[sv_new])
            
        sv_delta[sv_new] = dsz

        # rotate
        db_old = db_new
            
    
    print 'Calculating sizes...'
    uniq = defaultdict(int)
    
    # go through and find extents that are only pointed to by one device, 
    # sum up the sizes for each device
    for ex in exdb:
        if len(exdb[ex]) == 2:
            dev = exdb[ex][1]
            uniq[dev] += exdb[ex][0]

    print 'Checked {0} items over {1} devices in {2}s.'\
                    .format(i, len(sv_dict), time.time()-t)
                    
    # print out in order of generation id, since thats how deltas are computed
    keys = reversed([key for g,sv in sv_glist 
                                for key,val in sv_dict.items() if val == sv ])
                                
    print 'GenID   DeviceID  Delta     Unique    Subvol'
    for dev in keys:
        gen = (g for g,sv in sv_glist if sv == sv_dict[dev]).next()
        sv = sv_dict[dev]
        dsz, usz = _(sv_delta[sv_dict[dev]]), _(uniq[dev])
        print '{0:>6}  {1:>4}      {2:<8}  {3:<8}  {4}'\
                        .format(gen, dev, dsz, usz, sv)   
    #    print 'Device {0} (gen {2}): {1}'.format(dev, sv_dict[dev], gen)
    #    print ' delta size: {0} ({1} unique)'.format(_(sv_delta[sv_dict[dev]]), _(uniq[dev]))
    #    print ' gen: {0}'.format((key for key,val in sv_gdict.items() if val == sv_dict[dev]).next())
    
if __name__ == '__main__':
    # root is the btrfs fs mountpoint
    # path is the full path to the subvolume
    path = root = '/mnt/fub'
    if len(sys.argv) == 2: # <path>
        path = sys.argv[1]
    elif len(sys.argv) == 3: # <root> <path>
        root, path = sys.argv[1:3]
    main(root, path)

UPDATE:
As was pointed out int he comments, the genID returned from find-new was giving a genID higher than the highest on any of the files, which was causing files to be skipped. I created a new get_genid function that uses ‘find-new 1’ to get all the files and grabs the highest genID in that list. This is working, but I don’t know if it will take longer on a large filesystem.

I also noticed the way I was calculating the delta was not quite correct. I was only considering the difference between snapshots in files that changed in a new snapshot. This had a few flaws:

  • Files that were simply removed were not taken into account.
  • Files that changed on a snapshot but remained the same on the following snapshot could have their extents counted as ‘removed’.
  • Files that were the same on many snapshots then changed might have their extents considered unique to that last snapshot (no previous ones were counted)

I have updated the code so that it figures out which files are changed through all snapshots before scanning them. It then scans all the files present on the first subvolume. This makes sure there are at least two devices pointing at files that don’t change util later or are removed later (fixes uniqueness). I don’t have a full btrfs setup at this time, so I can’t tell the speed impact these changes make.

I also put a flag that conditionally runs a full scan of all files for the first subvolume. This was just for testing on my small test fs, and shouldn’t be used on a full fs.

11 Responses to “determining snapshot size in BTRFS”

  1. Michael Says:

    I just tried the script on Debian Squeeze with 2.6.39-bpo.2-686-pae kernel from debian-backports and current btrfs-tools. Unfortunately, it reports 0 bytes on everything:

    root@file01:/data/rsyncbackup/tools# ./snapshotsize.py
    Building db of extents…
    Calculating sizes…
    Checked 0 items over 3 devices in 0.00271487236023s.
    GenID DeviceID Delta Unique Subvol
    376 21 0bytes 0bytes /data/rsyncbackup/dkXXXX
    375 23 0bytes 0bytes /data/rsyncbackup/dkXXXX/@GMT-2011.12.26-20.20.01
    373 22 0bytes 0bytes /data/rsyncbackup/dkXXXX/@GMT-2011.12.27-10.11.05
    root@file01:/data/rsyncbackup/tools#

    Are you still using your script and do you know what would need to be done to fix it?

  2. Michael Says:

    Ok, from what I can tell, btrfs subvolume find-new isn’t doing what your script is expecting it to do…

  3. bj0z Says:

    Ok, I haven’t used this script in a while but after testing it out, it seems like there are a couple things:
    1) fiemap.py doesn’t import array (i guess newer versions of python require it to be imported?) so that was causing the fiemap call to fail and returning no extent information.

    2) find-new seems to be behaving a little differently than I remember. The returned transid marker id is supposed to (i thought) return the id on the newest file in that snapshot. It seems the transid id can be newer than that and, as a result, files between that snapshot and the next aren’t counted.

    Fixing 1 is easy, but the quickest work around for 2 that I can think of is to pass ‘1’ on the initial find-new to a subvolume and grab the highest genid returned from the file list. I don’t know if this will add any more time to the script, I haven’t tested it. I’ll update the script when I test that it’s working.

  4. Michael Says:

    Hmm, snapshots taken starting around the time I upgraded my Linux kernel from 2.6.39 to 3.2 have their size reported correctly by your script (older snapshots still show 0 bytes). I can’t tell for sure whether this is just a coincidence or actually related. Have you been able to test it on a current Linux kernel?

  5. bj0z Says:

    I’m getting the same behavior you reported on my Mint 12 box. I’ve updated the post/code, hopefully the new version works correctly.

  6. JoelJ Says:

    When I run this(./snapshotsize.py /backups /backups/snapshot2, I get:
    File “/backups/snapshotsize.py”, line 201, in main
    for file in (set.union(*(file_dict.values()))-file_dict[sv_old]):
    TypeError: descriptor ‘union’ of ‘set’ object needs an argument

    Please help?

    • bj0z Says:

      hmm, looks like it’s just not checking the case where file_dict is empty. this shouldn’t happen unless there are empty snapshots or the paths passed in are wrong. to fix the error, just change that block of code to look like:

              if len(file_dict) > 0:
                  for file in (set.union(*(file_dict.values()))-file_dict[sv_old]):
                      check(sv_old+'/'+file, exdb)     
                      i += 1
      

      So it gets skipped if the dictionary is empty.

  7. Adam Ryczkowski Says:

    From where should I get the “fiemap” module? There is none in “pip install fiemap” nor any hits in “apt-cache search fiemap” (Ubuntu Quantal)?

  8. Donald Duncan Says:

    I’m getting an error with this script:
    Traceback (most recent call last):
    File “/root/bin/btrfs-size”, line 271, in
    main(root, path)
    File “/root/bin/btrfs-size”, line 135, in main
    sv_dict = dict([(os.stat(x).st_dev, x) for x in sv_list])
    OSError: [Errno 2] No such file or directory: ‘/5 path .snapshots’

    Output of btrfs subvolume list /

    ID 256 gen 320 top level 5 path boot/grub2/x86_64-efi
    ID 258 gen 9153 top level 5 path home
    ID 259 gen 9062 top level 5 path opt
    ID 260 gen 4311 top level 5 path srv
    ID 261 gen 9153 top level 5 path tmp
    ID 262 gen 5998 top level 5 path usr/local
    ID 263 gen 20 top level 5 path var/crash
    ID 264 gen 9152 top level 5 path var/log
    ID 265 gen 20 top level 5 path var/opt
    ID 266 gen 9152 top level 5 path var/spool
    ID 267 gen 9153 top level 5 path var/tmp
    ID 274 gen 9144 top level 5 path .snapshots
    ID 279 gen 80 top level 5 path .snapshots/5/snapshot
    ID 287 gen 9068 top level 5 path var/cache
    ID 465 gen 1237 top level 5 path .snapshots/177/snapshot

    • duckblaster7090 Says:

      needed to change x.split()[6:] to x.split()[8:] in get_subvolume_list
      # this function gets a list of subvolumes from a btrfs fs
      def get_subvolume_list(path):
      out = sp.Popen(‘btrfs subvolume list’.split()+[path],
      stdout=sp.PIPE).communicate()[0]

      return sorted( ‘ ‘.join(x.split()[8:])
      for x in out.split(‘\n’) if x != ”)

Leave a comment