darkness

Saturday, 14 January 2006

I’m dumb (unit testing network code in Python)

darkness @ 02:35:07

Now, before you have a chance to jump in and say you’ve always suspected it, but you’ve been afraid to bring it up, and how glad you are that it was me not you that brought it up, let me tell you why I’m dumb.

Part of a Python application I’m writing needs to be able to sniff packets. Furthermore, I’m doing it directly with Linux’s PF_PACKET instead of libpcap (for reasons irrelevant here; if I remember I’ll cover this at the bottom). To unit test the sniffer component, I attempt to pass some UDP datagrams on the loopback interface and see if the sniffer picks them up.

So in my first stab in doing this, I went ahead and did it in threads. It took me a reasonably long time, even though I had a handle on the way (or at least a way) to do it. The result: testing a sniffer with threads. (P.S.: easy syntax highlighting with enscript: enscript --color -Epython -Whtml threadutils.py -o - | tidy -asxhtml -clean -o threads.xhtml. Supposedly other languages can be done this way too; for example, Perl.) I abstracted out the CooperativeThread class because I’ve created that class several times now, but I don’t think doing so made more code; quite the opposite, since I was able to use it for both the TrafficGenerator and SnifferThread classes.

But at around 180 lines (not counting comments and imports), compared to test a module that’s around 40 lines, it seemed way too big. I had been looking at Twisted earlier, looking at its reactors and its event-driven, generally single-threaded model. It wasn’t the first time I was exposed to the “why do multi-threading, just use select(2)” kind of philosophy, but I’ve always ignored it because I was sure threads were easier.

Finally, my point: I am dumb. No way in hell is it easier to use threads. Proof: the version that uses select(2) takes nearly a third as many lines of code (around 65 lines), was (in my estimation) faster to program, and runs about eight times as fast. No doubt it’ll be easier to debug, too.

The code in the single-threaded version is a little uglier, I think; I haven’t found a nice way to factor that select loop out. Still, I think it’s readable to anyone who’s familiar with sockets and select. I could probably use Twisted, reduce lines of code, and make it easier to understand… to anyone that is familiar with Twisted, which probably isn’t as big of a pool of people as the ones that can read code that just uses select directly. Plus, I’m not terribly fond of the idea of requiring the Twisted framework just for testing. If I end up using it elsewhere, then maybe I’ll go back and use it in this test.


A note on libpcap: it provides no way to determine whether a packet is being sent or received. There are platform- or interface-specific ways to tell, I suppose. On Ethernet, if the source MAC is your own, then I’d say it’s probably being sent (though I’m sure there is some case, maybe involving… bridging? that I’m not thinking of). On Linux, a few different interface types such as PPP use “cooked” mode, where the packet libpcap gives you is preceded by a faux header that includes information about whether a packet is being sent or received. In cooked mode, you get this faux header instead of the link header. (According to code comments, PPP doesn’t always pass a link header, or a correct link header, to the PF_PACKET socket.)

Now, how does it get this information for PPP? Why, the Linux PF_PACKET interface (the method by which libpcap reads packets on Linux) includes some result information that gives the “direction” of the packet (inbound or outbound). However, libpcap only returns this information when it is using cooked PF_PACKET sockets, so you only get this information for protocols like PPP (and only on Linux; other platforms might have a similar quirk I suppose). So if you want the inbound/outbound information for every interface, you basically need to skip use of libpcap and just use the (apparently) Linux-specific interface directly. The good news is, it’s not hard, and it’s even supported in Python.

Powered by WordPress