IPython Custom Cell Magic for Rendering Jinja2 Templates

April 8, 2013

NOTE: I wrote this post in the IPython notebook, but I haven’t had any luck getting it in a form I can use on wordpress, so I copied the markdown and input blocks to this post. To see the fully rendered notebook, check: This nbviewer link

After watching some of the PyCon 2013 videos on IPython, I felt inspired, as always, to play once more with IPython. Since I had just recently learned to use Jinja2, I thought it would be cool if I could test some jinja2 template rendering in the IPython notebook.

According to a post on the IPython mailing list, unfortunately, jinja2 rendering is not supported in the markdown cells (which would be really neat) and probably will not be, because it is too Python specific. This means I am restricted rendering input cells and displaying the result in output cells.

It would be simple enough to just import jinja2 and render a string, but I wanted to make it a little nicer, so I looked up the documentation on defining your own magic functions. Turns out that it’s pretty simple. There’s an example in the IPython docs that I used as a starting point to create the following class.

from IPython import display
from IPython.core.magic import register_cell_magic, Magics, magics_class, cell_magic
import jinja2

@magics_class
class JinjaMagics(Magics):
    '''Magics class containing the jinja2 magic and state'''
    
    def __init__(self, shell):
        super(JinjaMagics, self).__init__(shell)
        
        # create a jinja2 environment to use for rendering
        # this can be modified for desired effects (ie: using different variable syntax)
        self.env = jinja2.Environment(loader=jinja2.FileSystemLoader('.'))
        
        # possible output types
        self.display_functions = dict(html=display.HTML, 
                                      latex=display.Latex,
                                      json=display.JSON,
                                      pretty=display.Pretty,
                                      display=display.display)

    
    @cell_magic
    def jinja(self, line, cell):
        '''
        jinja2 cell magic function.  Contents of cell are rendered by jinja2, and 
        the line can be used to specify output type.

        ie: "%%jinja html" will return the rendered cell wrapped in an HTML object.
        '''
        f = self.display_functions.get(line.lower().strip(), display.display)
        
        tmp = self.env.from_string(cell)
        rend = tmp.render(dict((k,v) for (k,v) in self.shell.user_ns.items() 
                                        if not k.startswith('_') and k not in self.shell.user_ns_hidden))
        
        return f(rend)
        
    
ip = get_ipython()
ip.register_magics(JinjaMagics)

The class creates a simple jinja2 environment with the FileSystemLoader, so template files can be imported/extended, and defines a function used to register a cell tagged with the cellmagic “%% jinja <output>”. The output specifier is optional, and will return the rendered text wrapped in one of IPython’s rich display objects

The local (non-hidden) namespace is used for rendering, so any variables or functions defined in the IPython notebook can be accessed.

names = ['alice','bob']

Here is an example of rendering a simple HTML template and displaying it with the HTML object:

%%jinja html
<html>
<head>
<title>{{ title }}</title>
</head>
<body>
{% for name in names %}

Hello {{ name }} <br/>

{% endfor %} 
</body>
</html>

Output:

Hello alice
Hello bob

A string with no template synax will simply be passed through as-is to the display object. The following will produce the same as using the %%latex cellmagic, but there are no built-in cellmagics for the other display objects.

%%jinja latex

\begin{eqnarray}
\nabla \times \vec{\mathbf{B}} -\, \frac1c\, \frac{\partial\vec{\mathbf{E}}}{\partial t} & = \frac{4\pi}{c}\vec{\mathbf{j}} \\
\nabla \cdot \vec{\mathbf{E}} & = 4 \pi \rho \\
\nabla \times \vec{\mathbf{E}}\, +\, \frac1c\, \frac{\partial\vec{\mathbf{B}}}{\partial t} & = \vec{\mathbf{0}} \\
\nabla \cdot \vec{\mathbf{B}} & = 0 
\end{eqnarray}

The following is an example of using the %%jinja magic to generate and render some latex:

vars = {'rho':5,'alpha':6,'pi':3.14,'phi':1.618,'hbar':'6.582121 \cdot 10^{-16} eV\cdot s'}
%%jinja latex

\begin{eqnarray}

{% for k,v in vars.iteritems() %}
\{{ k }} & = {{ v }} \\
{% endfor %}

\end{eqnarray}

I honestly thought it would be more difficult to extend IPython with my own magic function, but the IPython devs really know what they’re doing. It has come a long long way since the first time I used it back around version 0.9, when it was simply an enhanced, interactive python terminal. What it has become now is pretty amazing.

Garbage, Weakrefs, and Magic Closures

February 22, 2012

In the previous post, I showed a class that can be used to pass ‘weak references’ to bound methods as callbacks without creating extra refs that prevent garbage collection. For global functions, you can simply use a standard weakref, but for very simple callbacks a lambda or short function defined in the local scope, like so:

def register_callbacks(self):
    
    def callback(*x):
        ... do stuf ...
    
    some_object.connect('event', callback)
    
    some_object.connect('another-event', lambda *x: ...)
    

Since no external reference to these functions are normally held, if a weakref is used in their place it is invalid immediately (or after the local function returns). This is rarely a concern unless the object storing the callbacks itself is using weakrefs (like a weakValueDictionary) to store the callbacks.

Exaile does this, which is why, if you write plugins for Exaile that use the event manager, you will either need to store references to these functions or use global functions or class methods.

The Magic Closure Problem
In most cases, you don’t have to worry about such things and can just pass a strong reference (the actual function) to the callback managers. There is one caveat, however, that I recently ran into that wasn’t very obvious at first: objects (like self) that are referenced inside the lambda or local function are kept alive by reference in the closure attached to it. I’ll explain with a few examples:

import gc 
from pprint import pprint

class C(object):
    def __init__(self, x):
        self.x = x

class A(object):
    def __init__(self):
        def lfun():
            return 'lfun'
            
        self.c = C(lfun) 
    
    def fun(self):
        return 'fun'
        
    def cfun(self):
        return self.c.x()

def rec(x):
    print 'reclained',repr(x)

print 'start'
a = A()
b = weakref.ref(a, rec)
print 'post'
print a.fun()
print a.cfun()

print 'del a'
del a

print 'collecting'
n = gc.collect()
print 'garbage:', n
pprint(gc.garbage)

In this example I am using gc debugging (see: gc & weakref pmotw) to check for memory leaks. I am also using a weakref callback to notify me when its referent dies, which is a neat little debugging feature.

Note: You can also use the special function __del__ on the class to notify you when it dies, but beware: Python’s gc is smart enough to automatically recognize and clean up cyclic references (A holds a ref to B which holds a ref to A) that have no external references. If any of the objects in the cyclic chain, however, define the __del__ function, this behavior is not applied. (see gc docs)

A’s __init__ function creates an instance of C and passes it a local function, which it saves a reference to.

Looking at the output:

start
post
fun
lfun
del a
reclained <weakref at 0xcc2208; dead>
collecting
garbage: 0
[]

From the weakref callback, we see that the object dies when we remove it’s reference, as expected. The local function lfun has no effect on the lifetime of a. If we modified it such that it references self, though:

        def lfun():
            return 'lfun',self

Then the output becomes:

start
post
fun
('lfun', <__main__.A object at 0x26c1890>)
del a
collecting
gc: collectable <A 0x26c1890>
gc: collectable <cell 0x26c6050>
gc: collectable <tuple 0x26c18d0>
gc: collectable <function 0x26c7230>
gc: collectable <C 0x26c1950>
gc: collectable <dict 0x2699af0>
gc: collectable <dict 0x26edd80>
reclained <weakref at 0x26c3208; dead>
garbage: 7
[<__main__.A object at 0x26c1890>,
 <cell at 0x26c6050: A object at 0x26c1890>,
 (<cell at 0x26c6050: A object at 0x26c1890>,),
 <function lfun at 0x26c7230>,
 <__main__.C object at 0x26c1950>,
 {'x': <function lfun at 0x26c7230>},
 {'c': <__main__.C object at 0x26c1950>}]

What this shows is that the object does not actually die when we ‘del a’. Since the function lfun uses self, its closure holds a reference to self and is subsequently stored in c. Since the only reference to c is in a, this effectively creates a cyclic reference between C and A, which is cleaned up when the gc runs it’s collection routine. The gc debug output shows that all the inaccesible objects were succesfully collected.

The problem is that, unlike this simple contrived example, callbacks are most often passed to external objects that store and call them. In this case, that external object will hold a reference, through the closure, to self and prevent it from being disposed of. This is similar to the situation described in the previous post, but, as mentioned above, we cannot use weakrefs to the functions themselves as they have no external reference to keep them alive.

One simple solution that I recently discovered is pass a strong reference to the function, but use a weakref to self inside the function itself:

class A(object):
    def __init__(self):
        pself = weakref.proxy(self)
        def lfun():
            return 'lfun',pself
            
        self.c = C(lfun) 

The output once again becomes:

start
post
fun
('lfun', <weakproxy at 0x7fb576c882b8 to A at 0x7fb576c86890>)
del a
reclained <weakref at 0x7fb576c88310; dead>
collecting
garbage: 0
[]

And the object dies when we ‘del a’. Now this is a simple example, and any real usage of weakrefs in this way need to check that they are valid before using them (or handle their exceptions).

Extra Bits
As I was researching this topic I came across two neat recipies using weakrefs that I can’t help but share:

weakattr – a weakly-referenced attribute. When the attribute is no longer referenced, it ‘disappears’ from the instance. Great for cyclic references.

weakmethod – unlike the previously defined WeakMethod, this is a decorator which makes it such that any reference to the decorated method automagically contains a weak reference to the instance object.
Think: fun(weakref.proxy(self), *args, **kwargs).
Decorated methods can then be passed to other objects without creating extra references. Neat.

Python weakmethods

February 22, 2012

Python is my favorite language. It is so simple to code and readable that doing just about anything is quicker and easier than any other language I’ve used. At the same time it is so advanced that I am always discovering new features/techniques/oddities.

One of the oddities I recently ran into is the inability of python’s weakref module to create weak references to bound methods.

Problem
Any application where you pass methods as callbacks could run into this situation. Two recent occurences involve GUI programming and network programming (with twisted). In both cases there is an event system which you need to register callbacks that are called to notify your class. They usually look something like:

    eo = EventObject()

    class MyClass(object):
        def __init__(self):
           eo.register(self.callback)
           
        def callback(self):
            ...

This works great, and it’s usually what you’ll see in just about every tutorial on pygtk or twisted (using their own api of course). What if your program is long running and object that generates your events is expected to outlive your class instance? Your program will ‘leak memory’ because, unless you can explicitly unregister your callback, the event object will hold a reference to your bound method which keeps your instance alive.

A real-world application where this is a problem is Exaile, which uses a global event manager that lives as long as the program runs.

WeakRef
One (elegant) solution is provided by Python’s weakref module:

This module lets you pass weak references to other classes, so when you dispose of your object, it is not kept alive (good examples on that site). The problem with weakref, though, is that it doesn’t act as you might expect on bound methods.

Since bound methods are ‘first class objects’, unless you store a separate reference to the method (which requires extra bookkeeping), the weakref created from a bound method is dead on arrival. The following example illistrates this:

    import weakref
    class A(object):
        def fun(self):
            return 'fun!'
            
    def notify_dead(ref):
        '''called by weakref when the referent dies'''
        print '{0} now dead'.format(repr(ref))

    print 'start'
    a = A()
    r = weakref.ref(a.fun, notify_dead) # dead on arrival
    print a.fun()   # fun!
    print r()       # None
    print r()()     # Exception

WeakMethod
I’ve seen a few ways to work around this, but the cleanest I’ve seen is what’s done in Exaile. The general idea is to create a class that holds a weakref to the instance object and generates the bound method when called if the referent is still alive. The following is a simplified version (no exception handling):

import types
import weakref

class WeakMethod(object):
    def __init__(self, meth, notify=None):
        if meth.im_self is not None:
            raise ValueError ('unbound method')
        self.obj = weakref.ref(meth.im_self, notify)
        self.func = meth.im_func
        self.cls = meth.im_class
        
    def __call__(self):
        obj = self.obj()
        if obj is None:
            return None
        else:
            return types.MethodType(self.func, self.obj, self.cls)

The previous example then becomes:

    class A(object):
        def fun(self):
            return 'fun!'
            
    def notify_dead(ref):
        '''called by weakref when the referent dies'''
        print '{0} now dead'.format(repr(ref))

    print 'start'
    a = A()
    r = WeakMethod(a.fun, notify_dead)
    print a.fun()   # fun!
    print r()       # 
    print r()()     # fun!
    del a           # dies
    print r()       # None

A similar WeakMethodProxy class can be made to behave like a proxy object, if you need to pass something that acts like a method.
# notes on lambdas,closures

Note:
These classes are only valid for methods. Unbound methods, functions, lambdas, and closures/nested functions will not work. You creat weakrefs to these objects as you would any object with weakref.ref, but since these objects are usually created and used in-place (not stored), the weakrefs will be invalid beyond the scope that they are defined in.
Because of this, it is not very useful to create weakrefs of lambdas and closures, more on this in the next post.

One final note:
atexit
This is a convenient function for making sure resources are cleaned up on program termination. The problem is that there is no way to ‘unregister’ a function, and if you pass an instance method the instance is kept alive for the length of the program. You can pass a weakref proxy, but this will cause an exception when atexit tries to execute an invalid weakref proxy. Most solutions I’ve seen create a global ‘cleanup’ function or class that they then wrap their cleanup functions in, supplementing atexit with their own register/unregister or exception handling.

Creating an “enum” in Python

January 26, 2012

In a project I’ve been playing with, I have been using a set of global constants to define types of message that can be sent between clients. In an effort to clean up the code a little bit and streamline all the constants (and avoid collisions, since they are defined in multiple modules) I moved them to their own module (like in this interesting post).

That’s a fine solution, but I was looking for a few more features. The biggest feature I wanted was to be able to throw them in “”.format() and have a human readable name instead of a number, which is a big help when debugging. It would also be nice if I could use isinstance to check types of constants. Essentially, I wanted to mimic the behavior found in pygtk’s constants:

[~]|1> from gi.repository import Gtk
[~]|3> type(Gtk.MessageType)
   <3> <type 'type'>
[~]|4> type(Gtk.MessageType.WARNING)
   <4> <class 'gi.repository.Gtk.GtkMessageType'>
[~]|5> repr(Gtk.MessageType.WARNING)
   <5> '<enum GTK_MESSAGE_WARNING of type GtkMessageType>'
[~]|6> isinstance(Gtk.MessageType.WARNING, Gtk.MessageType)
   <6> True

This behavior sounds a bit like what you get from ‘enum’s in most languages, but, since there are no enums in Python (everthing is an object), I started googling for a clever way to fake it. It’s pretty easy to find lots of clever Python tricks if you search around.

I ran across this post at StackOverflow: http://stackoverflow.com/questions/36932/whats-the-best-way-to-implement-an-enum-in-python

The current top solution is simply a class with constants:

class Animal:
    DOG=1
    CAT=2

x = Animal.DOG

This is essentially the same as the previously linked post, replacing module with class. You can even stick your class in a module if you wanted to access it from multiple modules.

The second solution was pretty clever and had a lot of votes:

def enum(**enums):
    return type('Enum', (), enums)

>>> Numbers = enum(ONE=1, TWO=2, THREE='three')
>>> Numbers.ONE
1
>>> Numbers.TWO
2
>>> Numbers.THREE
'three'

This looks like magic, but it’s just a dynamic version of the previous solution. The function uses the built-in type() function to dynamically create a class of constants. It’s neat to be able to create it dynamically at run time instead of hard coding it, but it still doesn’t have any of the behavior of Gtk.MessageType. Because it dynamically creates a class, though, we can add extra behavior to it:

def enum_type(**enums):
    '''simple enums with a type'''
    class Enum(object):
        def __new__(cls, val):
            o = object.__new__(cls)
            o.value = val
            return o
        def __call__(self):
            return self.value

    for key,val in enums.items():
        setattr(Enum, key, Enum(val))

    return Enum

def enum_base(t, **enums):
    '''enums with a base class'''
    T = type('Enum', (t,), {})
    for key,val in enums.items():
        setattr(T, key, T(val))

    return T

The first function creates a generic enum class that can store any type of value. The second, and simpler, function inherits a base type (like int). The second class will create something very similar to Gtk.MessageType:

[~/code/python]|16> T = enums.enum_base(int, one=1,two=2,three=3)
[~/code/python]|17> x = T.two
[~/code/python]|18> x
               <18> 2
[~/code/python]|19> repr(x)
               <19> '2'
[~/code/python]|20> type(x)
               <20> <class 'enums.Enum'>
[~/code/python]|21> isinstance(x,T)
               <21> True
[~/code/python]|22> x + 5
               <22> 7

Neat! The returned class behaves (almost) exactly like I wanted. I can add extra behavior by overriding magic class methods and adding them to the third argument to type() (or I could just define the class in the function). After adding a new __repr__, the ability to name the “Enum” class, and the ability to add new values to the class, I end up with:

def enum(name, _type, *lst, **enums):
    '''
        Dynamically create enum-like class

        :param name: name of the class

        :param _type: inherited base class (like int)

        :param *lst: list of names to enumerate (ie: ONE, TWO)

        :param **enums: dict enumerations (ie: ONE=1,TWO=2)
    '''
    def _new(cls, k, v):
        obj = super(T, cls).__new__(cls, v)
        obj._name = k
        return obj

    def _repr(self):
        return ''.format(self._name, name, _type.__name__, _type(self))

    @staticmethod
    def add(*lst, **enums):
        vals = list(T._enums.keys())
        for key,val in enums.items():
            if val in vals:
                raise ValueError, "{0}'s value {1} already assigned to {2}"\
                                .format(key, val, T._enums[val])
            T._enums[val] = key
            setattr(T, key, T(key,val))
            vals.append(val)
        mx = max(vals+[0,])
        for key in lst:
            val = mx+1
            T._enums[val] = key
            setattr(T, key, T(key,val))
            vals.append(val)
            mx = val

    T = type(name, (_type,), {'__new__':_new,
                              '__repr__':_repr,
                              'add':add})

    T._enums = {}
    T.add(*lst, **enums)

    return T

Ok, it’s starting to look a little more complicated, but I added some extra ‘fluff’ to make it easy to check what has already been defined and to avoid value collisions. There is one more feature I wanted to add. In my project I pack/unpack these values into data with struct.pack/struct.unpack. I would like the ability to ‘cast’ or convert the unpacked integers back into the Enum type, the way int(‘1’) == 1.

As it turns out, you can’t just add a function to the Enum class to get this behavior, because it’s handled in type()’s __call__ function. Metaclass time! A simple metaclass that only extends __call__ and inherits type will add the desired behavior:

def enum(name, _type, *lst, **enums):
    '''
        Dynamically create enum-like class

        :param name: name of the class

        :param _type: inherited base class (like int)

        :param *lst: list of names to enumerate (ie: ONE, TWO)

        :param **enums: dict enumerations (ie: ONE=1,TWO=2)
    '''

    class Type(type):
        '''
            metaclass for new enum type, to support casting
        '''
        def __call__(cls, *args):
            if len(args) > 1:
                return super(Type, cls).__call__(*args)
            else:
                x = args[0]
                if isinstance(x, str):
                    if x in T._enums.values():
                        return getattr(T, x)
                    else:
                        return _type(x)
                elif isinstance(x, _type):
                    return getattr(T, T._enums[x])
                else:
                    raise TypeError("invalid argument type, must be str or {0}"
                                        .format(_type.__name__))

    def _new(cls, k, v):
        obj = super(T, cls).__new__(cls, v)
        obj._name = k
        return obj

    def _str(self):
        return self._name

    def _repr(self):
        return ''.format(self._name, name,
                                                _type.__name__, _type(self))

    @staticmethod
    def add(*lst, **enums):
        vals = list(T._enums.keys())
        for key,val in enums.items():
            if val in vals:
                raise ValueError, "{0}'s value {1} already assigned to {2}"\
                                .format(key, val, T._enums[val])
            T._enums[val] = key
            setattr(T, key, T(key,val))
            vals.append(val)
        mx = max(vals+[0,])
        for key in lst:
            val = mx+1
            T._enums[val] = key
            setattr(T, key, T(key,val))
            vals.append(val)
            mx = val

    T = Type(name, (_type,), {'__new__':_new,
#                              '__metaclass__':Meta,
                              '__str__':_str,
                              '__repr__':_repr,
                              'add':add})

    T._enums = {}
    T.add(*lst, **enums)

    return T
[~/code/python]|24> T = enums.enum('MyType',int,one=1,two=2,three=3)
[~/code/python]|25> T(2)
               <25> <enum two=2 of type MyType(int)>
[~/code/python]|26> T('two')
               <26> <enum two=2 of type MyType(int)>

Beautiful!

HOWTO Build a base LXC container in Ubuntu 11.04

August 19, 2011

I occasionally build LXC containers to run self contained servers or for testing/demos. Each time I do I end up looking up some references to remind me the details and make sure I don’t forget any steps, so I decided I would put it down in writing here to make it easier to find and follow next time…

The references I generally use are:
http://www.stgraber.org/2011/05/04/state-of-lxc-in-ubuntu-natty/
http://nigel.mcnie.name/blog/a-five-minute-guide-to-linux-containers-for-debian
http://lxc.teegra.net/

Create cgroup
LXC uses cgroups to manage container resources. If cgroups aren’t already mounted, then add them as follows:

  •   Add the following line to /etc/fstab:
none /dev/cgroup cgroup defaults 0 0
  •   Then create the directory and mount:

mkdir /dev/cgroup
mount /dev/cgroup

NOTE: It can be mounted anywhere. I like to put them in /dev/cgroup, other people put them in /cgroup or /var/cgroup.

Install Required Packages
Install some required packages to use LXC:

  • lxc – the LXC tools
  • bridge-utils – for creating linux bridges
  • debootstrap – for creating base systems, required by the template scripts

aptitude install lxc bridge-utils debootstrap # or apt-get install

Networking Setup
There are two types of networking setup I use, depending on how I am going to use the container. The first and most common setup is the bridged configuration. This is required (I believe) if the container is to receive ethernet frames (like DHCPREQUEST packets for a *pxe server). It is also required if the container is to use your network’s DHCP server. This requires creating a bridge device and plugging eth0 (or the primary interface) into it, which can affect the system (ie: if NetworkManager is being used, it can no longer manage the connection; if there are services bound to eth0, they will need to be bound to br0 instead).

The second setup is NAT. A bridge interface is created which is only connected to the container’s network interface. The host then routes packets to/from the container.

(a) Bridged Setup
Add a bridge interface to /etc/network/interfaces file, and set eth0 to manual mode:

vi /etc/network/interfaces

...
# find the section talking about your physical interface, it's normally
# eth0 or eth1
auto eth0
iface eth0 inet manual # change from 'dhcp' to 'manual'</code>

# add this section
auto br0
iface br0 inet dhcp
    bridge_ports eth0
    bridge_stp off
    bridge_fd 0
    bridge_maxwait 0

Then restart networking:

/etc/init.d/networking restart

(b) NAT Setup
Add a bridge interface to /etc/network/interfaces for the container to use. Here I give it a static IP and configure the host to route for it:

vi /etc/network/interfaces

...
# LXC bridge
auto br-lxc
iface br-lxc inet static
    address 192.168.254.1
    netmask 255.255.255.0</code>

    post-up echo 1 &gt; /proc/sys/net/ipv4/ip_forward
    post-up iptables -t nat -A POSTROUTING -o eth0 -j MASQUERADE
    pre-down echo 0 &gt; /proc/sys/net/ipv4/ip_forward
    pre-down iptables -t nat -D POSTROUTING -o eth0 -j MASQUERADE

    bridge_ports none
    bridge_stp off

Then restart networking:

/etc/init.d/networking restart

Setup initial LXC configuration file
This file will be used by the lxc-start command to create the configuration. Replace br-lxc with br0 if using bridged network setup.

vi network.conf

lxc.network.type = veth 
lxc.network.flags = up 
lxc.network.link = br-lxc # (or br0 for bridged) 
# optional stuff (from lxc HOWTO) 
#lxc.network.hwaddr = {a1:b2:c3:d4:e5:f6} # As appropiate (line only needed if you wish to dhcp later) 
#lxc.network.ipv4 = {192.168.10.2/24} # (Use 0.0.0.0 if you wish to dhcp later) 
#lxc.network.name = eth0 # could likely be whatever you want 
#lxc.mount = {/path/to/fstab/for/CONTAINER_NAME} 
#lxc.rootfs = {/path/to/rootfs/of/CONTAINER_NAME} 

Create the container
Now invoke lxc-create to create the container. I HIGHLY recommend using a template script for this, as it uses debootstrap to download a base system and then fixes the init scripts so that it can be booted in a container (it may also install an ssh server/user, depending on the script). If you don’t use a script, you have to fix these things manually and it can be a pain. The LXC package in Ubuntu 11.04 comes with several template scripts in /usr/lib/lxc/templates, if you need to setup a container on an older Ubuntu (like 10.04) you can copy one of these scripts off an 11.04 install. There are also other scripts on the web, but I have not tried them.

lxc-create -n name -f network.conf -t lucid

Container network configuration
For static NAT configuration, the container’s IP needs to be set before it is booted up (it defaults to dhcp, so it should just work for a bridged network setup).

vi /var/lib/lxc/name/rootfs/etc/network/interfaces

auto lo 
iface lo inet loopback 

auto eth0 
iface eth0 inet static 
    address 192.168.254.2 
    netmask 255.255.255.0 
    gateway 192.168.254.1 
    dns-nameservers 8.8.8.8 

Start it!
lxc-start -n name -d #( -d for daemonize)

SSH to it!
The script auto created ssh server, and root’s password is root:

ssh root@192.168.254.2

That’s it, a base, (hopefully) working LXC container.
* NOTE: If the nameserver isn’t put in /etc/resolv.conf automatically (you can’t resolve anything), then restart networking INSIDE the container with:

/etc/init.d/networking restart

Optional
Some optional steps to make the container a little more usable:

  • Create a user:

  • adduser user
    adduser user GROUP
    ...

  • Install some packages:

  • apt-get install aptitude
    aptitude install man bash-completion vim python-software-properties command-not-found sudo

    Where:

    • python-software-properties – add-apt-repository script
    • command-not-found – suggests which package contains typed missing command

Preview Android Boot Animations

June 17, 2011

I was looking for a new android boot animation after flashing a new ROM to my Evo Shift. There are a lot of animations posted in the xda forums, but unfortunately not all of them come with screenshots, and even the ones that do are usually static.

I did a quick search for a way to preview them on my computer and found a program on xda forums for just that. Unfortunately it was written in .NET so it’s not cross platform and can’t be used in linux.

In the end, I decided to refresh my pygtk knowledge and write my own simple script for previewing boot animations, code below. It can be used by “boot_animation.py ” or just launching it and dragging a boot animation .zip file onto the window.

import zipfile
import gtk
import glib
import sys



class Screen( gtk.DrawingArea ):
    """ This class is a Drawing Area"""
    def __init__(self, fps):
        super(Screen,self).__init__()
        ## Old fashioned way to connect expose. I don't savvy the gobject stuff.
        self.connect ( "expose_event", self.do_expose_event )
        ## This is what gives the animation life!
        dt = int(1./fps * 1000.)
        glib.timeout_add( dt, self.tick ) # Go call tick every 50 whatsits.
        self._running = True

    def tick ( self ):
        ## This invalidates the screen, causing the expose event to fire.
        self.alloc = self.get_allocation ( )
        rect = gtk.gdk.Rectangle ( self.alloc.x, self.alloc.y, self.alloc.width, self.alloc.height )
        self.window.invalidate_rect ( rect, True )        
        return self._running # Causes timeout to tick again.

    ## When expose event fires, this is run
    def do_expose_event( self, widget, event ):
        self.cr = self.window.cairo_create( )
        ## Call our draw function to do stuff.
        self.draw( *self.window.get_size( ) )
        


class Ani(Screen):
    '''Animation widget for displaying a list of sequence of frames (pixbufs)'''
    def __init__(self, anim_list, fps, loop):
            
        self.images = anim_list
        self.loop = loop
        
        Screen.__init__(self, fps)

        # init the generator        
        self._iter = self.iter()
        
    def iter(self):
        '''A generator to dole out the animation frames in the right 'order' '''
        p = 0
        for pixbuf_list in self.images:
            p += 1
            self.parent.set_title('Running: sequence {0}'.format(p))
            for pixbuf in pixbuf_list:
                yield pixbuf
                
        while self.loop:
            # loop final list
            self.parent.set_title('Looping: sequence {0}'.format(p))
            for pixbuf in self.images[-1]:
                yield pixbuf
        
        # no looping
        self.parent.set_title('Stopped.')
        self._running = False
        while True:
            # but we may still get expose events
            yield self.images[-1][-1]
        
    def next(self):
        # return next image
        return self._iter.next()
    
    def draw(self, w, h):
        # draw image
        self.window.draw_pixbuf(self.get_style().fg_gc[gtk.STATE_NORMAL], self.next(), 0, 0, 0, 0)
        

class Main(gtk.Window):
    def __init__(self):
        super(Main, self).__init__()
        
        self.connect( "delete-event", gtk.main_quit )
        self.connect( "destroy", gtk.main_quit )
        self.widget = None
        self.set_size_request ( 400, 400 )


        ###### Drag n Drop Stuff ######
        self.TARGET_TYPE_URI_LIST = 80
        dnd_list = [ ( 'text/uri-list', 0, self.TARGET_TYPE_URI_LIST ) ]
        self.drag_dest_set( gtk.DEST_DEFAULT_MOTION |
                      gtk.DEST_DEFAULT_HIGHLIGHT | gtk.DEST_DEFAULT_DROP,
                      dnd_list, gtk.gdk.ACTION_COPY)    
        self.connect('drag-data-received', self.drag_data_received)


    def load_boot_animation(self, file):
        '''Load a boot animation .zip file, creating an animation widget
        and displaying it in the window.'''
        if not zipfile.is_zipfile(file):
            print 'invalid zip file', file
            raise KeyError('invalid zip file')

        zf = zipfile.ZipFile(file, 'r')
        try:
            zf.getinfo('desc.txt')
        except KeyError:
            print 'invalid zipfile, no desc.txt'
            raise
            
        # read the desc.txt file
        spec = zf.read('desc.txt')
        file_list = zf.namelist()
        w, h, fps = map(int, spec.split()[:3])
        anim_list = []
        loop = False

        # parse the lines containing directorys
        for line in spec.split('\n')[1:]:
            line = line.strip()
            if line == '':
                # ignore blank lines
                continue

            # load images in the directory
            dir = line.split()[3]
            names = [x for x in file_list if dir in x]
            pixbufs = load_images(zf, names)
            anim_list.append(pixbufs)
            
            if line.split()[1] == '0': 
                # if this line is set to loop, nothing after will get displayed
                loop = True
                break
        

        # clear old animation
        if self.widget is not None:
            import gc
            kid = self.widget
            self.widget = None
            kid._running = False
            self.remove(kid)
            del kid.images # make sure to remove references to loaded images
            kid.destroy()
            del kid
            #gc.collect() # force freeing of image memory
            glib.idle_add(gc.collect)
            

        # create animation widget
        self.widget = Ani( anim_list, fps, loop )

        self.add(self.widget)
        self.show_all()
        self.set_size_request(w, h)
        

    def drag_data_received(self, wdg, context, x, y, selection, target_type, time):
        '''Event handler for dropping files onto the window.
        Contains a list of filenames, only keep the last one.'''
        if target_type == self.TARGET_TYPE_URI_LIST:
            uri = selection.data.strip().split('\n')[-1]
            file = uri_to_path(uri)

            try:
                # try to load it
                self.load_boot_animation(file)
            except KeyError:
                pass
                

def load_images(zf, names):
    '''Load a list of images from a ZipFile into pixbufs
        zf - ZipFile object containing images
        names - list of files to load 
    '''
    pixbufs = []
    for img in names:
        try:
            pbf = gtk.gdk.PixbufLoader()
            pbf.write(zf.read(img))
            pbf.close()
            pixbufs.append(pbf.get_pixbuf())
        except Exception, s:
            print 'cannot load {0}:'.format(img),s
    return pixbufs



def uri_to_path(uri):
    '''Convert a file URI to a path'''
    try:
        # gio is easier to use
        import gio
        path = gio.File(uri).get_path()
    except ImportError:
        # if gio isn't available
        import urllib
        # get the path to file
        path = ""
        if uri.startswith('file:\\\\\\'): # windows
	        path = uri[8:] # 8 is len('file:///')
        elif uri.startswith('file://'): # nautilus, rox
	        path = uri[7:] # 7 is len('file://')
        elif uri.startswith('file:'): # xffm
	        path = uri[5:] # 5 is len('file:')

        path = urllib.url2pathname(path) # escape special chars
        path = path.strip('\r\n\x00') # remove \r\n and NULL

    return path
	
	
def run():
    main_window = Main()
    
    if len(sys.argv) <= 1:
        print 'no arguments passed'
    else:
        file = sys.argv[1]

        if not zipfile.is_zipfile(file):
            print 'invalid file passed on cli', file
        else:
            main_window.load_boot_animation(file)

    # initial window display
    main_window.show_all()
    main_window.present( )

    # start mainloop
    gtk.main( )
    
if __name__ == '__main__':
    run()

Getting the serial PSC Powercan to work in Ubuntu

June 3, 2011

PSC Powerscan

I needed to get a PSC Powercan to work (scan a barcode and enter text) a little while ago, but it wasn’t being recognized as a keyboard device (the expected behavior). I did some searching, but most of the information I found applied only to barcode scanners that send ASCII over serial. Putting serio0 into serio_raw mode and hexdumping the data coming from the scanner, it was obvious that it was sending scancodes.

The solution I eventually found was a kernel parameter. The atkbd driver supports “dumb” keyboards, but apparently looks for a smart keyboard by default, putting the following on the kernel boot line made everything work fine: “atkbd.dumbkbd=1”

In ubuntu, adding it to /etc/default/grub makes it a permanent solution.

determining snapshot size in BTRFS

April 27, 2011

After switching my backup hard drive from zfs-fuse to btrfs, one of the features I missed most was the extra info zfs list -t snapshot gives, and specifically the size of the snapshot.

It’s helpful if you want to know how much space you’d get back if you deleted a snapshot, but it’s also a decent indicator of how much the fs changed between snapshots. This second point has helped me identify when large files are accidentally being backed up when they shouldn’t (like when I accidentally put a video file in the wrong folder.)

There’s currently no built-in way to determine this information, and after asking google and #btrfs on freenode.net, I decided to try and write a python script to figure it out.

Approach

From my limited understanding, data for btrfs is stored in extents and snapshots with identical data just point to the same extents. Therefore, to determine unique data on a snapshot, just find the extents on that snapshot that are on no other snapshot (or subvolume).

Sounds easy, but scanning and keeping track of all extents on a btrfs fs takes forever and then you run out of memory. Someone in #btrfs gave me the idea to use ‘btrfs subvolume find-new’ to find all the changed files between snapshots. Scanning the extents in only these files should be enough to identify unique extents, since the extents in all the other files are obviously shared. This will only work for successive snapshots of identical files (like in a backup scheme), but that’s what I’m using it for. This also makes it easy to tell how much has changed between snapshots if they are scanned in order of creation (in order of generation id).

This method doesn’t account for different files in the same snapshot that share extents, so if there are a lot of hardlinks it may not be accurate. In fact, I don’t really have a way to check it’s accuracy but it seems to give reasonable numbers.

Anyway, here’s the code:

#!/usr/bin/python
# Brian Parma
#
# script to get snapshot size info from a btrfs mount

import sys
import os
import stat
import subprocess as sp
import fiemap
import time
import shelve
import json
from collections import defaultdict


# if you set this to false, you may run out of memory
SKIP_FIRST = True
#SKIP_FIRST = False

# this function gets a list of subvolumes from a btrfs fs
def get_subvolume_list(path):
    out = sp.Popen('btrfs subvolume list'.split()+[path], 
                    stdout=sp.PIPE).communicate()[0]
                    
    return sorted( ' '.join(x.split()[6:]) 
                            for x in out.split('\n') if x != '')

# this function gets the last genid present in a subvolume (i think)
def get_genid_old(sv):
    out = sp.Popen(('btrfs subvolume find-new {0} 999999999'.format(sv)).split(),
                    stdout=sp.PIPE).communicate()[0]
                    
    return int(out.split()[-1])

# new function pulls the genid from the list of files
def get_genid(sv):
    out = sp.Popen(('btrfs subvolume find-new {0} 1'.format(sv)).split(),
                        stdout=sp.PIPE).communicate()[0]
                        
    return max( [int(row.split()[13]) for row in out.split('\n') 
                                            if row.startswith('inode')] )
    


# get full file list
def get_all_files(sv):
    out = sp.Popen(('find {0} -xdev -type f'.format(sv)).split(), 
                        stdout=sp.PIPE).communicate()[0]
                        
    return set( os.path.relpath(file, sv) 
                    for file in out.split('\n') if file != '' )


# this function gets the files in a subvolume changed since genid (i think)
def get_new_files(sv, genid):
    out = sp.Popen(('btrfs subvolume find-new {0} {1}'.format(sv, genid)).split(), 
                    stdout=sp.PIPE).communicate()[0]
                    
    return set( ' '.join(x.split()[16:]) for x in out.split('\n') 
                                                    if x.startswith('inode'))

# this func tries to determine extent info for a path
#TODO: use array.array for db, only storing extent address?
#TODO: maybe use an on-disk db
def check(path, exdb):
    # db keepts track of the extents in path
    db = set()
    try:
        st = os.lstat(path)
        if stat.S_ISLNK(st.st_mode):
            # don't follow symlinks
            return db
            
        try:
            # get fiemap info
            res = fiemap.fiemap(path)[0]
            for ex in res['extents']:
                # add extent to db
                db.add(ex['physical'])
                
                # check for extent in exdb
                pex = exdb.get(ex['physical'], [])
                if st.st_dev not in pex:
                    # keep track of the different devices that ref this extent 
                    #  (limited to same path on alternate device)
                    # store size as first element
                    if len(pex) == 0:
                        pex.append(int(ex['length']))
                        
                    pex.append(st.st_dev)
                    exdb[ex['physical']] = pex
                
        except Exception, s:
            print 'could not fiemap: {0}'.format(path)
            pass

    except OSError, e:
        pass

    return db
        
# found this on stack overflow
import math
def filesizeformat(bytes, precision=2):
    """Returns a humanized string for a given amount of bytes"""
    bytes = int(bytes)
    if bytes is 0:
        return '0bytes'
    log = math.floor(math.log(bytes, 1024))
    return "%.*f%s" % (
                        precision,
                        bytes / math.pow(1024, log),
                        ['bytes', 'kb', 'mb', 'gb', 'tb','pb', 'eb', 'zb', 'yb']
                        [int(log)]
                        )
_ = filesizeformat


def main(root, path=None):
    # need a trailing /
    if root[-1] != '/':
        root += '/'
    path = path if path is not None else root
    if path[-1] == '/':
        path = path[:-1]
    
    # list of subvols in path
    sv_list = [root+x for x in get_subvolume_list(root) if path in (root+x)]
    if len(sv_list) == 0:
        print 'No subvolumes found with (root,path) of ({0},{1})'.format(root,path)
        return
        
    # device id -> subvol dict
    sv_dict = dict([(os.stat(x).st_dev, x) for x in sv_list])
    
    # subvolume -> genid dict (genids not necessarily unique)
    sv_glist = sorted([ (get_genid(x), x) for x in sv_list])
#    sv_glist = sorted([ (get_gen_old(x), x) for x in sv_list])

    
    # database of {physical address : (extent size, devices...)} for extents
    exdb = defaultdict(list)

    print 'Building db of extents...'
    t = time.time()
    
    # subvolume -> delta size
    sv_delta = {sv_glist[0][1]:0}
    
    # generate list of files that need to be checked
    file_dict = defaultdict(set)
    gid_old, sv_old = sv_glist[0]
    ofiles = get_all_files(sv_old)
    for j in xrange(len(sv_glist)-1):
        gid_new, sv_new = sv_glist[j+1]
        
        nfiles = get_all_files(sv_new)
        nfiles_changed = get_new_files(sv_new, gid_old+1)
        nfiles_removed = ofiles - nfiles
        # files added with cp --reflink don't get a new genid, but don't 
        # take up extra space, should we count them in delta?? TODO
        nfiles_added = nfiles - ofiles
        
        # old subvolume, check changed files + removed files
        file_dict[sv_old].update(set.union(nfiles_changed, nfiles_removed))

        # new subvolume, check changed files + new files
        file_dict[sv_new].update(set.union(nfiles_changed, nfiles_added))
        
        # rotate
        gid_old, sv_old = gid_new, sv_new
        ofiles = nfiles
    
    # first step
    i = 0           # count files
    gid_old, sv_old = sv_glist[0]
    if not SKIP_FIRST:
        # This first pass scans all files in the first subvolume, which may
        # take forever and use all your memory if you have a large filesystem.
        # Without it, the first subvolume's numbers will be wrong, and files 
        # that don't change through any subvolume are not counted 

        all_files = get_all_files(sv_old)
        db_old = set()
        for file in all_files:
            db_old.update(check(sv_old+'/'+file, exdb))
            i += 1

        dsz = sum( exdb[addr][0] for addr in db_old )
        
        sv_delta[sv_old] = dsz
    
    else:
        # This scans all changed/added/removed files that exist on the first
        # subvolume.  This makes sure the first subvolume's device id is in
        # exdb, which prevents files that are removed later from only 
        # being counted on the subvolume they are removed from (fix uniqueness)
        #
        # i sub the first sv files out so they aren't run twice
        for file in (set.union(*(file_dict.values()))-file_dict[sv_old]):
            check(sv_old+'/'+file, exdb)     
            i += 1

    # fill first sv        
    db_old = set()
    for file in file_dict[sv_old]:
        db_old.update(check(sv_old+'/'+file, exdb))
        i += 1
        
    # loop the rest
    for j in xrange(len(sv_glist)-1):
        gid_new, sv_new = sv_glist[j+1]

        db_new = set()        
        for file in file_dict[sv_new]:
            db_new.update(check(sv_new+'/'+file, exdb))

        extents_removed = db_old - db_new
        extents_added = db_new - db_old
        
        # delta size between the svs
        dsz = sum( exdb[addr][0] 
                    for addr in set.union(extents_removed, extents_added) )

    
        i += len(file_dict[sv_new])
            
        sv_delta[sv_new] = dsz

        # rotate
        db_old = db_new
            
    
    print 'Calculating sizes...'
    uniq = defaultdict(int)
    
    # go through and find extents that are only pointed to by one device, 
    # sum up the sizes for each device
    for ex in exdb:
        if len(exdb[ex]) == 2:
            dev = exdb[ex][1]
            uniq[dev] += exdb[ex][0]

    print 'Checked {0} items over {1} devices in {2}s.'\
                    .format(i, len(sv_dict), time.time()-t)
                    
    # print out in order of generation id, since thats how deltas are computed
    keys = reversed([key for g,sv in sv_glist 
                                for key,val in sv_dict.items() if val == sv ])
                                
    print 'GenID   DeviceID  Delta     Unique    Subvol'
    for dev in keys:
        gen = (g for g,sv in sv_glist if sv == sv_dict[dev]).next()
        sv = sv_dict[dev]
        dsz, usz = _(sv_delta[sv_dict[dev]]), _(uniq[dev])
        print '{0:>6}  {1:>4}      {2:<8}  {3:<8}  {4}'\
                        .format(gen, dev, dsz, usz, sv)   
    #    print 'Device {0} (gen {2}): {1}'.format(dev, sv_dict[dev], gen)
    #    print ' delta size: {0} ({1} unique)'.format(_(sv_delta[sv_dict[dev]]), _(uniq[dev]))
    #    print ' gen: {0}'.format((key for key,val in sv_gdict.items() if val == sv_dict[dev]).next())
    
if __name__ == '__main__':
    # root is the btrfs fs mountpoint
    # path is the full path to the subvolume
    path = root = '/mnt/fub'
    if len(sys.argv) == 2: # <path>
        path = sys.argv[1]
    elif len(sys.argv) == 3: # <root> <path>
        root, path = sys.argv[1:3]
    main(root, path)

UPDATE:
As was pointed out int he comments, the genID returned from find-new was giving a genID higher than the highest on any of the files, which was causing files to be skipped. I created a new get_genid function that uses ‘find-new 1’ to get all the files and grabs the highest genID in that list. This is working, but I don’t know if it will take longer on a large filesystem.

I also noticed the way I was calculating the delta was not quite correct. I was only considering the difference between snapshots in files that changed in a new snapshot. This had a few flaws:

  • Files that were simply removed were not taken into account.
  • Files that changed on a snapshot but remained the same on the following snapshot could have their extents counted as ‘removed’.
  • Files that were the same on many snapshots then changed might have their extents considered unique to that last snapshot (no previous ones were counted)

I have updated the code so that it figures out which files are changed through all snapshots before scanning them. It then scans all the files present on the first subvolume. This makes sure there are at least two devices pointing at files that don’t change util later or are removed later (fixes uniqueness). I don’t have a full btrfs setup at this time, so I can’t tell the speed impact these changes make.

I also put a flag that conditionally runs a full scan of all files for the first subvolume. This was just for testing on my small test fs, and shouldn’t be used on a full fs.

FIEMAP ioctl from Python

April 22, 2011

The other day I was trying to figure out how to get some extent information about a file from python (so I could get some info on my btrfs fs). I checked pyfragtools’ source, but it cheats and calls ‘filefrag’ using the subprocess module, and I wanted to get the info directly.

Well the filefrag utility is also open source, so after skimming through the source I knew I needed to use ioctl calls.

The following code was thrown together from an example found on a mailing list, filefrag.c, and fiemap.h:

EDIT: So apparently python’s ioctl has a maximum length of 1024 bytes for it’s arg parameter if it’s an immutable type (like a string). For a file with more than 17 extents, this isn’t enough. To overcome this, we must use a mutable type (array.array).

#!/usr/bin/python
from contextlib import contextmanager
import struct
import fcntl
import sys
import os

# context friendly os.open (normal open() doesn't work on dirs)
@contextmanager
def osopen(file, mode=os.O_RDONLY):
    try:
        fd = os.open(file, mode)
        yield fd
    finally:
        os.close(fd)

def sizeof(x):
    return struct.calcsize(x)

IOCPARM_MASK = 0x7f
IOC_OUT = 0x40000000
IOC_IN = 0x80000000
IOC_INOUT = (IOC_IN|IOC_OUT)

# defines from LINUX_FIEMAP_H
#define FIEMAP_MAX_OFFSET   (~0ULL)
#define FIEMAP_FLAG_SYNC    0x00000001 /* sync file data before map */
#define FIEMAP_FLAG_XATTR   0x00000002 /* map extended attribute tree */
#define FIEMAP_FLAGS_COMPAT (FIEMAP_FLAG_SYNC | FIEMAP_FLAG_XATTR)
FIEMAP_EXTENT_LAST          = 0x00000001 # Last extent in file. */
FIEMAP_EXTENT_UNKNOWN       = 0x00000002 # Data location unknown. */
FIEMAP_EXTENT_DELALLOC      = 0x00000004 # Location still pending.
#                            * Sets EXTENT_UNKNOWN. */
FIEMAP_EXTENT_ENCODED       = 0x00000008 # Data can not be read
#                            * while fs is unmounted */
FIEMAP_EXTENT_DATA_ENCRYPTED = 0x00000080 # Data is encrypted by fs.
#                            * Sets EXTENT_NO_BYPASS. */
FIEMAP_EXTENT_NOT_ALIGNED   = 0x00000100 # Extent offsets may not be
#                            * block aligned. */
FIEMAP_EXTENT_DATA_INLINE   = 0x00000200 # Data mixed with metadata.
#                            * Sets EXTENT_NOT_ALIGNED.*/
FIEMAP_EXTENT_DATA_TAIL     = 0x00000400 # Multiple files in block.
#                            * Sets EXTENT_NOT_ALIGNED.*/
FIEMAP_EXTENT_UNWRITTEN     = 0x00000800 # Space allocated, but
#                            * no data (i.e. zero). */
FIEMAP_EXTENT_MERGED        = 0x00001000 # File does not natively
#                            * support extents. Result
#                            * merged for efficiency. */
#define FIEMAP_EXTENT_SHARED        0x00002000 /* Space shared with other
#                            * files. */

_flags = {}
_flags[FIEMAP_EXTENT_UNKNOWN] = "unknown"
_flags[FIEMAP_EXTENT_DELALLOC] = "delalloc"
_flags[FIEMAP_EXTENT_DATA_ENCRYPTED] = "encrypted"
_flags[FIEMAP_EXTENT_NOT_ALIGNED] = "not_aligned"
_flags[FIEMAP_EXTENT_DATA_INLINE] = "inline"
_flags[FIEMAP_EXTENT_DATA_TAIL] = "tail_packed"
_flags[FIEMAP_EXTENT_UNWRITTEN] = "unwritten"
_flags[FIEMAP_EXTENT_MERGED] = "merged"
_flags[FIEMAP_EXTENT_LAST] = "eof"

def _IOWR(x, y, t):
    return (IOC_INOUT|((sizeof(t)&IOCPARM_MASK)<<16)|((x)<<8)|y)

struct_fiemap = '=QQLLLL'
struct_fiemap_extent = '=QQQQQLLLL'

sf = sizeof(struct_fiemap)
sfe = sizeof(struct_fiemap_extent)

FS_IOC_FIEMAP = _IOWR (ord('f'), 11, struct_fiemap)

# shift is for reporting in blocks instead of bytes
shift = 0#12

def parse_fiemap_extents(string, num):
    '''return dict of fiemap_extents struct values'''
    ex = []
    for e in range(num):
        i = e*sfe 
        x = [x >> shift for x in struct.unpack(struct_fiemap_extent, string[i:i+sfe])]
        flags = ' '.join(_flags[z] for z in _flags.keys() if (x[5]&z>0))
        ex.append({'logical':x[0],'physical':x[1],'length':x[2],'flags':flags})
    return ex

def parse_fiemap(string):
    '''return dict of fiemap struct values'''
    # split fiemap struct
    res = struct.unpack(struct_fiemap, string[:sf])
    return {'start':res[0], 'length':res[1], 'flags':res[2], 'mapped_extents':res[3],
            'extent_count':res[4], 'extents':parse_fiemap_extents(string[sf:], res[4])}

def fiemap_ioctl(fd, num_ext=0):
    # build fiemap struct
    buf = struct.pack(struct_fiemap , 0, 0xffffffffffffffff, 0, 0, num_ext, 0)
    # add room for fiemap_extent struct array
    buf += '\0'*num_ext*sfe
    # use a mutable buffer to get around ioctl size limit
    buf = array.array('c', buf)
    # ioctl call
    ret = fcntl.ioctl(fd, FS_IOC_FIEMAP, buf)
    return buf.tostring()

def fiemap(file=None, fd=None, get_extents=True):
    if fd is None and file is None:
        raise TypeError('must provide either a filename or file descriptor')

    def _do(fd):
        # first call to get number of extents
        res = fiemap_ioctl(fd)
        # second call to get extent info
        if get_extents:
            res = fiemap_ioctl(fd, parse_fiemap(res)['mapped_extents'])
        return parse_fiemap(res), res

    if fd is None:
        with osopen(file) as fd:
            res = _do(fd)
    else:
        res = _do(fd)

    return res

if __name__ == '__main__':
    import json
    file = len(sys.argv) == 2 and sys.argv[1] or '.'
    print json.dumps(fiemap(file)[0], indent=2)

The fiemap function returns a dict of the fiemap struct values, including a list of dicts containing the extents values.

Changing adapter MTU with in Python

April 22, 2011

I’m writing a little p2p VPN app in python. A little bit ago I noticed that over the internet, the connection was stalling and it turned out that the packets I was sending were too big. Setting a lower MTU fixed the problem. As a quick fix, I need to make my app set the MTU automatically.

I could do this by calling ifconfig or ip with the subprocess module, but I wanted to do it w/out using an external program. After a few hours of toiling and googling, I was able to get it to work:

from fcntl import ioctl
import socket
import struct

SIOCGIFMTU = 0x8921
SIOCSIFMTU = 0x8922

s = socket.socket(socket.AF_PACKET, socket.SOCK_RAW)
ioctl(s, SIOCGIFMTU, struct.pack('<16sH', 'eth0', 0))

mtu = 1280
ioctl(s, SIOCSIFMTU, struct.pack('<16sH', 'eth0', mtu)+'\x00'*14)

EDIT: The code I now use to change the MTU looks more like this (same definitions as above):

def get_mtu(self):
	'''Use socket ioctl call to get MTU size'''
	s = socket.socket(type=socket.SOCK_DGRAM)
	ifr = self.ifname + '\x00'*(32-len(self.ifname))
	try:
		ifs = ioctl(s, SIOCGIFMTU, ifr)
		mtu = struct.unpack('<H',ifs[16:18])[0]
	except Exception, s:
		logger.critical('socket ioctl call failed: {0}'.format(s))
		raise

	logger.debug('get_mtu: mtu of {0} = {1}'.format(self.ifname, mtu))
	self.mtu = mtu
	return mtu

 def set_mtu(self, mtu):
	'''Use socket ioctl call to set MTU size'''
	s = socket.socket(type=socket.SOCK_DGRAM)
	ifr = struct.pack('<16sH', self.ifname, mtu) + '\x00'*14
	try:
		ifs = ioctl(s, SIOCSIFMTU, ifr)
		self.mtu = struct.unpack('<H',ifs[16:18])[0]
	except Exception, s:
		logger.critical('socket ioctl call failed: {0}'.format(s))
		raise

	logger.debug('set_mtu: mtu of {0} = {1}'.format(self.ifname, self.mtu))

	return self.mtu