Archive for April, 2011

determining snapshot size in BTRFS

April 27, 2011

After switching my backup hard drive from zfs-fuse to btrfs, one of the features I missed most was the extra info zfs list -t snapshot gives, and specifically the size of the snapshot.

It’s helpful if you want to know how much space you’d get back if you deleted a snapshot, but it’s also a decent indicator of how much the fs changed between snapshots. This second point has helped me identify when large files are accidentally being backed up when they shouldn’t (like when I accidentally put a video file in the wrong folder.)

There’s currently no built-in way to determine this information, and after asking google and #btrfs on freenode.net, I decided to try and write a python script to figure it out.

Approach

From my limited understanding, data for btrfs is stored in extents and snapshots with identical data just point to the same extents. Therefore, to determine unique data on a snapshot, just find the extents on that snapshot that are on no other snapshot (or subvolume).

Sounds easy, but scanning and keeping track of all extents on a btrfs fs takes forever and then you run out of memory. Someone in #btrfs gave me the idea to use ‘btrfs subvolume find-new’ to find all the changed files between snapshots. Scanning the extents in only these files should be enough to identify unique extents, since the extents in all the other files are obviously shared. This will only work for successive snapshots of identical files (like in a backup scheme), but that’s what I’m using it for. This also makes it easy to tell how much has changed between snapshots if they are scanned in order of creation (in order of generation id).

This method doesn’t account for different files in the same snapshot that share extents, so if there are a lot of hardlinks it may not be accurate. In fact, I don’t really have a way to check it’s accuracy but it seems to give reasonable numbers.

Anyway, here’s the code:

#!/usr/bin/python
# Brian Parma
#
# script to get snapshot size info from a btrfs mount

import sys
import os
import stat
import subprocess as sp
import fiemap
import time
import shelve
import json
from collections import defaultdict


# if you set this to false, you may run out of memory
SKIP_FIRST = True
#SKIP_FIRST = False

# this function gets a list of subvolumes from a btrfs fs
def get_subvolume_list(path):
    out = sp.Popen('btrfs subvolume list'.split()+[path], 
                    stdout=sp.PIPE).communicate()[0]
                    
    return sorted( ' '.join(x.split()[6:]) 
                            for x in out.split('\n') if x != '')

# this function gets the last genid present in a subvolume (i think)
def get_genid_old(sv):
    out = sp.Popen(('btrfs subvolume find-new {0} 999999999'.format(sv)).split(),
                    stdout=sp.PIPE).communicate()[0]
                    
    return int(out.split()[-1])

# new function pulls the genid from the list of files
def get_genid(sv):
    out = sp.Popen(('btrfs subvolume find-new {0} 1'.format(sv)).split(),
                        stdout=sp.PIPE).communicate()[0]
                        
    return max( [int(row.split()[13]) for row in out.split('\n') 
                                            if row.startswith('inode')] )
    


# get full file list
def get_all_files(sv):
    out = sp.Popen(('find {0} -xdev -type f'.format(sv)).split(), 
                        stdout=sp.PIPE).communicate()[0]
                        
    return set( os.path.relpath(file, sv) 
                    for file in out.split('\n') if file != '' )


# this function gets the files in a subvolume changed since genid (i think)
def get_new_files(sv, genid):
    out = sp.Popen(('btrfs subvolume find-new {0} {1}'.format(sv, genid)).split(), 
                    stdout=sp.PIPE).communicate()[0]
                    
    return set( ' '.join(x.split()[16:]) for x in out.split('\n') 
                                                    if x.startswith('inode'))

# this func tries to determine extent info for a path
#TODO: use array.array for db, only storing extent address?
#TODO: maybe use an on-disk db
def check(path, exdb):
    # db keepts track of the extents in path
    db = set()
    try:
        st = os.lstat(path)
        if stat.S_ISLNK(st.st_mode):
            # don't follow symlinks
            return db
            
        try:
            # get fiemap info
            res = fiemap.fiemap(path)[0]
            for ex in res['extents']:
                # add extent to db
                db.add(ex['physical'])
                
                # check for extent in exdb
                pex = exdb.get(ex['physical'], [])
                if st.st_dev not in pex:
                    # keep track of the different devices that ref this extent 
                    #  (limited to same path on alternate device)
                    # store size as first element
                    if len(pex) == 0:
                        pex.append(int(ex['length']))
                        
                    pex.append(st.st_dev)
                    exdb[ex['physical']] = pex
                
        except Exception, s:
            print 'could not fiemap: {0}'.format(path)
            pass

    except OSError, e:
        pass

    return db
        
# found this on stack overflow
import math
def filesizeformat(bytes, precision=2):
    """Returns a humanized string for a given amount of bytes"""
    bytes = int(bytes)
    if bytes is 0:
        return '0bytes'
    log = math.floor(math.log(bytes, 1024))
    return "%.*f%s" % (
                        precision,
                        bytes / math.pow(1024, log),
                        ['bytes', 'kb', 'mb', 'gb', 'tb','pb', 'eb', 'zb', 'yb']
                        [int(log)]
                        )
_ = filesizeformat


def main(root, path=None):
    # need a trailing /
    if root[-1] != '/':
        root += '/'
    path = path if path is not None else root
    if path[-1] == '/':
        path = path[:-1]
    
    # list of subvols in path
    sv_list = [root+x for x in get_subvolume_list(root) if path in (root+x)]
    if len(sv_list) == 0:
        print 'No subvolumes found with (root,path) of ({0},{1})'.format(root,path)
        return
        
    # device id -> subvol dict
    sv_dict = dict([(os.stat(x).st_dev, x) for x in sv_list])
    
    # subvolume -> genid dict (genids not necessarily unique)
    sv_glist = sorted([ (get_genid(x), x) for x in sv_list])
#    sv_glist = sorted([ (get_gen_old(x), x) for x in sv_list])

    
    # database of {physical address : (extent size, devices...)} for extents
    exdb = defaultdict(list)

    print 'Building db of extents...'
    t = time.time()
    
    # subvolume -> delta size
    sv_delta = {sv_glist[0][1]:0}
    
    # generate list of files that need to be checked
    file_dict = defaultdict(set)
    gid_old, sv_old = sv_glist[0]
    ofiles = get_all_files(sv_old)
    for j in xrange(len(sv_glist)-1):
        gid_new, sv_new = sv_glist[j+1]
        
        nfiles = get_all_files(sv_new)
        nfiles_changed = get_new_files(sv_new, gid_old+1)
        nfiles_removed = ofiles - nfiles
        # files added with cp --reflink don't get a new genid, but don't 
        # take up extra space, should we count them in delta?? TODO
        nfiles_added = nfiles - ofiles
        
        # old subvolume, check changed files + removed files
        file_dict[sv_old].update(set.union(nfiles_changed, nfiles_removed))

        # new subvolume, check changed files + new files
        file_dict[sv_new].update(set.union(nfiles_changed, nfiles_added))
        
        # rotate
        gid_old, sv_old = gid_new, sv_new
        ofiles = nfiles
    
    # first step
    i = 0           # count files
    gid_old, sv_old = sv_glist[0]
    if not SKIP_FIRST:
        # This first pass scans all files in the first subvolume, which may
        # take forever and use all your memory if you have a large filesystem.
        # Without it, the first subvolume's numbers will be wrong, and files 
        # that don't change through any subvolume are not counted 

        all_files = get_all_files(sv_old)
        db_old = set()
        for file in all_files:
            db_old.update(check(sv_old+'/'+file, exdb))
            i += 1

        dsz = sum( exdb[addr][0] for addr in db_old )
        
        sv_delta[sv_old] = dsz
    
    else:
        # This scans all changed/added/removed files that exist on the first
        # subvolume.  This makes sure the first subvolume's device id is in
        # exdb, which prevents files that are removed later from only 
        # being counted on the subvolume they are removed from (fix uniqueness)
        #
        # i sub the first sv files out so they aren't run twice
        for file in (set.union(*(file_dict.values()))-file_dict[sv_old]):
            check(sv_old+'/'+file, exdb)     
            i += 1

    # fill first sv        
    db_old = set()
    for file in file_dict[sv_old]:
        db_old.update(check(sv_old+'/'+file, exdb))
        i += 1
        
    # loop the rest
    for j in xrange(len(sv_glist)-1):
        gid_new, sv_new = sv_glist[j+1]

        db_new = set()        
        for file in file_dict[sv_new]:
            db_new.update(check(sv_new+'/'+file, exdb))

        extents_removed = db_old - db_new
        extents_added = db_new - db_old
        
        # delta size between the svs
        dsz = sum( exdb[addr][0] 
                    for addr in set.union(extents_removed, extents_added) )

    
        i += len(file_dict[sv_new])
            
        sv_delta[sv_new] = dsz

        # rotate
        db_old = db_new
            
    
    print 'Calculating sizes...'
    uniq = defaultdict(int)
    
    # go through and find extents that are only pointed to by one device, 
    # sum up the sizes for each device
    for ex in exdb:
        if len(exdb[ex]) == 2:
            dev = exdb[ex][1]
            uniq[dev] += exdb[ex][0]

    print 'Checked {0} items over {1} devices in {2}s.'\
                    .format(i, len(sv_dict), time.time()-t)
                    
    # print out in order of generation id, since thats how deltas are computed
    keys = reversed([key for g,sv in sv_glist 
                                for key,val in sv_dict.items() if val == sv ])
                                
    print 'GenID   DeviceID  Delta     Unique    Subvol'
    for dev in keys:
        gen = (g for g,sv in sv_glist if sv == sv_dict[dev]).next()
        sv = sv_dict[dev]
        dsz, usz = _(sv_delta[sv_dict[dev]]), _(uniq[dev])
        print '{0:>6}  {1:>4}      {2:<8}  {3:<8}  {4}'\
                        .format(gen, dev, dsz, usz, sv)   
    #    print 'Device {0} (gen {2}): {1}'.format(dev, sv_dict[dev], gen)
    #    print ' delta size: {0} ({1} unique)'.format(_(sv_delta[sv_dict[dev]]), _(uniq[dev]))
    #    print ' gen: {0}'.format((key for key,val in sv_gdict.items() if val == sv_dict[dev]).next())
    
if __name__ == '__main__':
    # root is the btrfs fs mountpoint
    # path is the full path to the subvolume
    path = root = '/mnt/fub'
    if len(sys.argv) == 2: # <path>
        path = sys.argv[1]
    elif len(sys.argv) == 3: # <root> <path>
        root, path = sys.argv[1:3]
    main(root, path)

UPDATE:
As was pointed out int he comments, the genID returned from find-new was giving a genID higher than the highest on any of the files, which was causing files to be skipped. I created a new get_genid function that uses ‘find-new 1’ to get all the files and grabs the highest genID in that list. This is working, but I don’t know if it will take longer on a large filesystem.

I also noticed the way I was calculating the delta was not quite correct. I was only considering the difference between snapshots in files that changed in a new snapshot. This had a few flaws:

  • Files that were simply removed were not taken into account.
  • Files that changed on a snapshot but remained the same on the following snapshot could have their extents counted as ‘removed’.
  • Files that were the same on many snapshots then changed might have their extents considered unique to that last snapshot (no previous ones were counted)

I have updated the code so that it figures out which files are changed through all snapshots before scanning them. It then scans all the files present on the first subvolume. This makes sure there are at least two devices pointing at files that don’t change util later or are removed later (fixes uniqueness). I don’t have a full btrfs setup at this time, so I can’t tell the speed impact these changes make.

I also put a flag that conditionally runs a full scan of all files for the first subvolume. This was just for testing on my small test fs, and shouldn’t be used on a full fs.

FIEMAP ioctl from Python

April 22, 2011

The other day I was trying to figure out how to get some extent information about a file from python (so I could get some info on my btrfs fs). I checked pyfragtools’ source, but it cheats and calls ‘filefrag’ using the subprocess module, and I wanted to get the info directly.

Well the filefrag utility is also open source, so after skimming through the source I knew I needed to use ioctl calls.

The following code was thrown together from an example found on a mailing list, filefrag.c, and fiemap.h:

EDIT: So apparently python’s ioctl has a maximum length of 1024 bytes for it’s arg parameter if it’s an immutable type (like a string). For a file with more than 17 extents, this isn’t enough. To overcome this, we must use a mutable type (array.array).

#!/usr/bin/python
from contextlib import contextmanager
import struct
import fcntl
import sys
import os

# context friendly os.open (normal open() doesn't work on dirs)
@contextmanager
def osopen(file, mode=os.O_RDONLY):
    try:
        fd = os.open(file, mode)
        yield fd
    finally:
        os.close(fd)

def sizeof(x):
    return struct.calcsize(x)

IOCPARM_MASK = 0x7f
IOC_OUT = 0x40000000
IOC_IN = 0x80000000
IOC_INOUT = (IOC_IN|IOC_OUT)

# defines from LINUX_FIEMAP_H
#define FIEMAP_MAX_OFFSET   (~0ULL)
#define FIEMAP_FLAG_SYNC    0x00000001 /* sync file data before map */
#define FIEMAP_FLAG_XATTR   0x00000002 /* map extended attribute tree */
#define FIEMAP_FLAGS_COMPAT (FIEMAP_FLAG_SYNC | FIEMAP_FLAG_XATTR)
FIEMAP_EXTENT_LAST          = 0x00000001 # Last extent in file. */
FIEMAP_EXTENT_UNKNOWN       = 0x00000002 # Data location unknown. */
FIEMAP_EXTENT_DELALLOC      = 0x00000004 # Location still pending.
#                            * Sets EXTENT_UNKNOWN. */
FIEMAP_EXTENT_ENCODED       = 0x00000008 # Data can not be read
#                            * while fs is unmounted */
FIEMAP_EXTENT_DATA_ENCRYPTED = 0x00000080 # Data is encrypted by fs.
#                            * Sets EXTENT_NO_BYPASS. */
FIEMAP_EXTENT_NOT_ALIGNED   = 0x00000100 # Extent offsets may not be
#                            * block aligned. */
FIEMAP_EXTENT_DATA_INLINE   = 0x00000200 # Data mixed with metadata.
#                            * Sets EXTENT_NOT_ALIGNED.*/
FIEMAP_EXTENT_DATA_TAIL     = 0x00000400 # Multiple files in block.
#                            * Sets EXTENT_NOT_ALIGNED.*/
FIEMAP_EXTENT_UNWRITTEN     = 0x00000800 # Space allocated, but
#                            * no data (i.e. zero). */
FIEMAP_EXTENT_MERGED        = 0x00001000 # File does not natively
#                            * support extents. Result
#                            * merged for efficiency. */
#define FIEMAP_EXTENT_SHARED        0x00002000 /* Space shared with other
#                            * files. */

_flags = {}
_flags[FIEMAP_EXTENT_UNKNOWN] = "unknown"
_flags[FIEMAP_EXTENT_DELALLOC] = "delalloc"
_flags[FIEMAP_EXTENT_DATA_ENCRYPTED] = "encrypted"
_flags[FIEMAP_EXTENT_NOT_ALIGNED] = "not_aligned"
_flags[FIEMAP_EXTENT_DATA_INLINE] = "inline"
_flags[FIEMAP_EXTENT_DATA_TAIL] = "tail_packed"
_flags[FIEMAP_EXTENT_UNWRITTEN] = "unwritten"
_flags[FIEMAP_EXTENT_MERGED] = "merged"
_flags[FIEMAP_EXTENT_LAST] = "eof"

def _IOWR(x, y, t):
    return (IOC_INOUT|((sizeof(t)&IOCPARM_MASK)<<16)|((x)<<8)|y)

struct_fiemap = '=QQLLLL'
struct_fiemap_extent = '=QQQQQLLLL'

sf = sizeof(struct_fiemap)
sfe = sizeof(struct_fiemap_extent)

FS_IOC_FIEMAP = _IOWR (ord('f'), 11, struct_fiemap)

# shift is for reporting in blocks instead of bytes
shift = 0#12

def parse_fiemap_extents(string, num):
    '''return dict of fiemap_extents struct values'''
    ex = []
    for e in range(num):
        i = e*sfe 
        x = [x >> shift for x in struct.unpack(struct_fiemap_extent, string[i:i+sfe])]
        flags = ' '.join(_flags[z] for z in _flags.keys() if (x[5]&z>0))
        ex.append({'logical':x[0],'physical':x[1],'length':x[2],'flags':flags})
    return ex

def parse_fiemap(string):
    '''return dict of fiemap struct values'''
    # split fiemap struct
    res = struct.unpack(struct_fiemap, string[:sf])
    return {'start':res[0], 'length':res[1], 'flags':res[2], 'mapped_extents':res[3],
            'extent_count':res[4], 'extents':parse_fiemap_extents(string[sf:], res[4])}

def fiemap_ioctl(fd, num_ext=0):
    # build fiemap struct
    buf = struct.pack(struct_fiemap , 0, 0xffffffffffffffff, 0, 0, num_ext, 0)
    # add room for fiemap_extent struct array
    buf += '\0'*num_ext*sfe
    # use a mutable buffer to get around ioctl size limit
    buf = array.array('c', buf)
    # ioctl call
    ret = fcntl.ioctl(fd, FS_IOC_FIEMAP, buf)
    return buf.tostring()

def fiemap(file=None, fd=None, get_extents=True):
    if fd is None and file is None:
        raise TypeError('must provide either a filename or file descriptor')

    def _do(fd):
        # first call to get number of extents
        res = fiemap_ioctl(fd)
        # second call to get extent info
        if get_extents:
            res = fiemap_ioctl(fd, parse_fiemap(res)['mapped_extents'])
        return parse_fiemap(res), res

    if fd is None:
        with osopen(file) as fd:
            res = _do(fd)
    else:
        res = _do(fd)

    return res

if __name__ == '__main__':
    import json
    file = len(sys.argv) == 2 and sys.argv[1] or '.'
    print json.dumps(fiemap(file)[0], indent=2)

The fiemap function returns a dict of the fiemap struct values, including a list of dicts containing the extents values.

Changing adapter MTU with in Python

April 22, 2011

I’m writing a little p2p VPN app in python. A little bit ago I noticed that over the internet, the connection was stalling and it turned out that the packets I was sending were too big. Setting a lower MTU fixed the problem. As a quick fix, I need to make my app set the MTU automatically.

I could do this by calling ifconfig or ip with the subprocess module, but I wanted to do it w/out using an external program. After a few hours of toiling and googling, I was able to get it to work:

from fcntl import ioctl
import socket
import struct

SIOCGIFMTU = 0x8921
SIOCSIFMTU = 0x8922

s = socket.socket(socket.AF_PACKET, socket.SOCK_RAW)
ioctl(s, SIOCGIFMTU, struct.pack('<16sH', 'eth0', 0))

mtu = 1280
ioctl(s, SIOCSIFMTU, struct.pack('<16sH', 'eth0', mtu)+'\x00'*14)

EDIT: The code I now use to change the MTU looks more like this (same definitions as above):

def get_mtu(self):
	'''Use socket ioctl call to get MTU size'''
	s = socket.socket(type=socket.SOCK_DGRAM)
	ifr = self.ifname + '\x00'*(32-len(self.ifname))
	try:
		ifs = ioctl(s, SIOCGIFMTU, ifr)
		mtu = struct.unpack('<H',ifs[16:18])[0]
	except Exception, s:
		logger.critical('socket ioctl call failed: {0}'.format(s))
		raise

	logger.debug('get_mtu: mtu of {0} = {1}'.format(self.ifname, mtu))
	self.mtu = mtu
	return mtu

 def set_mtu(self, mtu):
	'''Use socket ioctl call to set MTU size'''
	s = socket.socket(type=socket.SOCK_DGRAM)
	ifr = struct.pack('<16sH', self.ifname, mtu) + '\x00'*14
	try:
		ifs = ioctl(s, SIOCSIFMTU, ifr)
		self.mtu = struct.unpack('<H',ifs[16:18])[0]
	except Exception, s:
		logger.critical('socket ioctl call failed: {0}'.format(s))
		raise

	logger.debug('set_mtu: mtu of {0} = {1}'.format(self.ifname, self.mtu))

	return self.mtu