March 23, 2006

CherryPy, multi-threading, SQLite

Quick tip: I’m using SQLite, at least for development, on a Python web application. I wanted to use a CHECK constraint, which is only supported in SQLite 3.3 or newer. My box is CentOS 4, which has SQLite 3.2.x.

Now, FC5 has an SQLite 3.3.3 (I think) RPM. I rebuilt this, installed it. Now my server won’t serve a page that queries the database; it just locks up. I find that pysqlite (RH/FC package python-sqlite) says you really need 1.1.7 to use SQLite 3.3.x. So I try to rebuild the python-sqlite package with pysqlite 1.1.7 (there’s a new major version available, but I’m not that daring by far). When the package build gets to the tests, it locks up too.

Then I read that there are threading problems with SQLite 3.3 that are fixed in SQLite 3.3.4. So I take the FC5 SQLite 3.3.3 SRPM and rebuild it with SQLite 3.3.4 without complications. Now I try the python-sqlite 1.1.7 rebuild again. Same lock up.

Finally I see that, although there is a bug and a line in the %changelog section saying that --enable-threadsafe is now passed to configure, SQLite is not being compiled with -DTHREADSAFE=0. --enable-threadsafe only appears in the RPM change log. I put --enable-threadsafe after %configure, rebuild and install SQLite 3.3.4 RPM, then rebuild python-sqlite 1.1.7 and it works swimmingly.

Moral of the story: to do anything multi-threaded with SQLite,

  • Use SQLite 3.3.4 or newer
  • Use pysqlite 1.1.7 or newer (I think there may have been a release of the 2.0.x and 2.1.x branches around the same time, presumably to fix similar problems with SQLite 3.3.x)
  • Make sure SQLite is being built with --enable-threadsafe
March 21, 2006

More on CherryPy, mod_python, and Python web development

Over the past three or so days I’ve been doing a lot more reading and experimenting with Python web development. I have to say, I’m not very happy overall. I’m used to developing in Java via Apache, mod_perl via Apache, and I’m semi-familiar with PHP via Apache. Python web development just doesn’t seem to work that well through Apache. CherryPy’s recommended deployment is having Apache proxy to CherryPy’s web server. Off the top of my head, I think only web.py and maybe Django actually seem to advertise and officially support running from mod_python. If anyone else supports running from Apache, it seems to be via FastCGI.

The major issue for me has been reloading of files. Remember the problem I previously mentioned where it seemed like some of my Apache/mod_python children were not reloading changed Python files? That’s far from an isolated incident. In fact, I can reproduce it pretty reliably. A single child each time fails to reload. Reloading seemed to work pretty well in other “web technologies” that I’ve used: I seem to recall that Servlets may end up reloading the entire context, but maybe just changed JARs; PHP reloads files by default AFAIK; I think mod_perl also reloads files. All this happens, and all of these run through Apache!

Turns out that mod_python’s reloading is really a big tease: it only reloads modules that are imported with its special (and supposedly not too functional) importing API. There may be solutions to this. web.py takes a straightforward approach: it just searches sys.modules for files with a newer mtime and uses the reload() built-in to reload them. Of course, this might not work so well if you’ve got references to instances from the old module hanging around, since it has no way of updating those instances. I think this method could be easily broken in CherryPy. Say this is your “top level” CherryPy file:

import appclasses
import cherrypy

cherrypy.root = appclasses.Root()

Now change appclasses.Root. How will the object in cherrypy.root get updated? Every module that uses appclasses probably needs to be reloaded too, and then the parents of those modules, and so on. At the very least your any files that update the CherryPy publishing tree may need to be reloaded. I’m not even sure if calling reload() on the above module would actually re-execute the cherrypy.root assignment. (I do concede that web.py’s method for declaring handlers may moot this point, but I don’t want to use web.py handlers; I want to use CherryPy’s method for dispatching.)

In Java, I think this problem is somewhat solved by reloading the whole context (i.e., application). This mostly works except for some situations where I seem to recall managing to leave old instances around. Then trying to use those old instances causes some bizarre ClassLoader-related exception, if I recall correctly, followed by a server restart to fix them. Still, I seem to recall that such restarts were few and far between. I suspect PHP and Perl have similar problems (I’m thinking you could store a reference to an object in PHP or mod_perl sessions).

In Common Lisp I think the object in cherrypy.root would get updated when the Root class was updated. CLOS is just awesome like that.

There is some kind of mod_python “framework” called Vampire that supposedly greatly improves upon the reloading mechanisms in mod_python. (Disclosure, For example, it may reload parent modules, potentially fixing the above CherryPy example. It’s not clear to me if you could somehow import Vampire and then ignore the rest of Vampire’s functionality in favour of using CherryPy. Frankly, I kind of stopped caring. I think I can see why it might be nice to start with a whole new application, and everyone else seems to be developing this way.

(Actually, if you keep in-memory state information, restarting the whole application could be a pain in the ass. One could say, “well, you shouldn’t do that.” That’s probably reasonable: unless you’re running in a threaded model, I don’t see how different mod_python or FastCGI instances are going to be able to see each other’s data.)

Regardless, the method CherryPy—and many others!—use to do “reloading” is to basically exec(myself); and that comes across to me as gross. (It looks like there may be some practical problems with this method as well.)

All these Apache/reloading complications almost made me abandon Python for something else. Except I can’t find a good way to have Emacs indent PHP, with Java I’ve been there and suffered that, and Ruby doesn’t seem any better. (If someone has a good way to indent PHP relative to the surrounding HTML in Emacs, please let me know! I’d probably prefer PHP for the relatively simple app I’m working on.)

I get the impression that Ruby on Rails uses this same method. Ruby on Rails looks approximately as hostile to Apache as Python, too. One seemingly well researched piece on deploying Rails applications said that you should just use lighttpd and its built-in FastCGI support.


Some notes on how I intend to develop. I’ll be running CherryPy with its built-in web server for development. I’ll probably do Apache/mod_python in production. Note that mod_python reloads just fine when you service apache reload (RH/FC command; equivalent to apachectl graceful or something like that.)

You could do FastCGI, but it sounds kind of problematic. One Rails users reported a lot of zombie processes, and said these claims were not uncommon. I have an RPM made for mod_fastcgi. I also found mod_fcgid which may well be better. However, I can’t really think of any advantage to using FastCGI over mod_python, since reloading basically sucks in both equally. (Note that you can’t use CherryPy’s built-in reloading under either mod_python or FastCGI.) There’s also something called SCGI out there; no idea about that.

Let me add that this is the newest version of modpython_gateway. You no longer need wsgiref. I think the best documentation for using modpython_gateway is in the comments of the source file itself. Also note that you can no longer use PythonOption import, and the PythonImport directive seems very hard to use (I couldn’t get it to do anything). To get your application loaded into CherryPy, the best way seems to be to import CherryPy’s wsgiApp into one of your own source files, then specify PythonOption wsgi.application yourModule::wsgiApp. That way yourModule gets loaded (and gets an opportunity to mount things in CherryPy).

I am a little concerned about CherryPy’s sessions in mod_python: I’m not sure if they get shared across Apache children (using mpm_prefork, not mpm_worker). I saw one person talking about deploying CherryPy on mod_python, and they said they needed to use file-based sessions. I believe CherryPy also supports sessions in PostgreSQL.

March 17, 2006

CherryPy on CentOS 4

Note: go read a follow-up entry after you’re done reading this. It has some important updates.

Yesterday I was talking with a friend about the relative advantages of PHP versus other web “toolkits” (“frameworks”?), especially other web toolkits in Python. We agreed that one of the advantages of PHP is that it’s easily installed; it “just works.” He asked if the same was true for Zope, and my answer was basically “no fucking way.” Granted, the last Zope I used was 2.5 or 2.6 which is pretty old by now. But it took me a while to get it installed (I think someone had RPMs which helped) and then get it set up proxying through Apache; in fact, getting it going through Apache was the real hard part. I seem to recall having to create a “cookie monster” object, and some sort of “zombie root” or something like that? Then lots of bugs. (Then find out it didn’t work with the latest version of Python, which I really wanted to use.)

What about other web software, though? Such as CherryPy? How easily is it set up? I liked the look of some of the CherryPy examples so I decided to find out. What follows is a rough set of steps that I took to install CherryPy on CentOS 4 (and thus it should hopefully work for RHEL 4 too).

Note that I’ve chosen to set up CherryPy using WSGI and mod_python. I’ll talk more about this decision later.

  1. CentOS 4 only has Python 2.3. Lament, but not for long: ATrpms has Python 2.4 RPMs. Install them.

  2. Update expat. I used the version from Fedora “development,” which I think is probably what’s in FC5: expat-1.95.8-8.2. If you don’t update expat you’ll get httpd crashing with messages like the following (from /var/log/httpd/error_log):

    httpd: Objects/stringobject.c:105: PyString_FromString: Assertion `str != ((void *)0)' failed.
    
  3. Rebuild the CentOS mod_python RPM from source. You’ll need to modify the spec file to supply --with-python=/usr/bin/python2.4 to the configure script.

  4. Download CherryPy (2.2.0rc1) and try python2.4 setup.py bdist_rpm. Lament, because it does not work correctly. Make a MANIFEST.in in the CherryPy top-level source directory that looks like:

    include cherrypy/favicon.ico
    include cherrypy/tutorial/*
    include cherrypy/test/*
    include cherrypy/test/static/*
    

    Then make a setup.cfg:

    [bdist_rpm]
    install_script = bdist_rpm_install
    

    Finally, you need to make bdist_rpm_install:

    python2.4 setup.py install --root=$RPM_BUILD_ROOT --record=INSTALLED_FILES
    sed -i -e '/ /s/(.*)/"1"/' INSTALLED_FILES
    

    Note that the sed command works around a problem RPM apparently has with a filename that contains a space (it thinks you’re trying to supply it with two separate file names). I suppose the MANIFEST.in might be the only thing I expect CherryPy to include in their distribution, but the above magic to make bdist_rpm work would be appreciated by me, and I’m sure by many others. I intend to submit a patch back to the developer, I suppose.

    (One problem: notice that I had to hard code python2.4 in the bdist_rpm_install script. That should really be a variable or parameter of some sort, but Python’s distutils have no feature for doing this.)

    Now CherryPy will build an RPM with something like: python2.4 setup.py bdist_rpm --python=/usr/bin/python2.4. I like to do things like --release 1.py24 also, so I remember that it’s a package compiled for Python 2.4 rather than Python 2.3 (which is still installed, mind you). Install the resulting RPM.

  5. Check out svn://svn.eby-sarna.com/svnroot/wsgiref/ (that’s Subversion; try svn co <above url>). Copy/paste the code for modpython_gateway.py and put modpython_gateway.py in src/wsgiref (found underneath the wsgiref directory you just checked out). bdist_rpm this package and install the RPM.

  6. Set up your site’s directories. I’ve got /srv/www/www.some.site/root set up for the DocumentRoot and /srv/www/www.some.site/lib/python to keep the Python code in.

  7. Time to configure Apache:

    <VirtualHost www.some.site:80>
            ServerName www.some.site
            DocumentRoot /srv/www/www.some.site/root
    
    
        <Directory /srv/www/www.some.site/root/subdir>
                Options +Indexes
    
    
                SetHandler python-program
                PythonHandler wsgiref.modpython_gateway::handler
                PythonOption application cherrypy._cpwsgi::wsgiApp
                PythonPath "sys.path + ['/srv/www/www.some.site/lib/python']"
                PythonOption import cphw
    
    
                # Serve up static files without going through CherryPy.
                <FilesMatch ".(css|gif|jpe?g|png)$">
                        SetHandler None
                </FilesMatch>
        </Directory>
    </VirtualHost>

    That’s from my /etc/httpd/conf/hosts.d/conf.www.some.site file. Make sure the above stuff gets into the Apache configuration and restart Apache.

  8. Now make /srv/www/www.some.site/lib/python/cphw.py (chpw == CherryPy hello world) with contents as follows:

    import cherrypy
    
    
    class HelloWorld (object):
        @cherrypy.expose
        def index(self):
            return "Hello world!"
    
    
    class Empty (object): pass
    
    
    cherrypy.root = Empty()
    cherrypy.root.subdir = HelloWorld()
    cherrypy.config.update({"server.environment": "production",
                            "server.protocolVersion": "HTTP/1.1"})
    cherrypy.server.start(initOnly=True, serverClass=None)
    

    Note this is mostly taken from the documentation on using CherryPy with WSGI and mod_python. In fact, most of these steps are based on the information there.

If you did everything correctly you should now be able to hit http://www.some.site/subdir/ and get “Hello world!” back.

A few things to add. First, you can eliminate the need for wsgiref, supposedly, by using mpcp. I didn’t try this.

Notice I took the ::startapp off of the end of the PythonOption import directive used in my Apache configuration; compare to the Apache configuration in the ModPythonWSGI instructions to see what I’m talking about. If you leave startapp on there, wsgiref will expect your module (cphw in this case) to have a startapp callable that initializes the module. This actually gets called for every request, though the comments indicate you probably only want this function (startapp, that is, in this example) to be called once so you’ll need to set some sort of global flag to indicate it’s been called before.

Finally, I’m not sure this was really the way to go. I originally thought, “yeah, mod_python, that’ll improve performance since it won’t have to run Python, load all the modules, etc. for every request.” This was an alternative to running CherryPy the way that seems to be recommended: as a standalone web server. Now, sure, you might not want CherryPy’s HTTP server facing the rest of the Internet. However, they also seem to recommend that you run it through Apache using modrewrite/modproxy. (This way you also get stuff like HTTP/1.1 and SSL.) Now, theoretically there is a performance cost for a setup like this, where Apache forwards the request to CherryPy’s HTTP server. However, the cost sounds pretty small (one person quoted 0.9ms in a simplistic benchmark they did). Further, I’ve already seen weird behavior from mod_python: I modified cphw.py and only three of the four children actually seemed to reload it. I believe it’s fixed itself after a good night’s rest, but that’s not comforting to run into such a problem after a fairly short period of use.

I think my recommendation for setting up CherryPy is probably the same as the CherryPy developers/masters, then: just proxy it through Apache. CherryPy is running the whole time, mind you, so you’re still not suffering a Python startup for every request. Still, I don’t know how it might handle lots of simultaneous requests (given threading and the GIL, all that). But, then, you’re not getting multiple simultaneous requests are you? And you’re probably never going to.

There are some complexities involved in setting up CherryPy behind Apache. Read Running CherryPy behind Apache through mod_rewrite for all the gory details.

Nonetheless I’m going to try running CherryPy in mod_python for a while and see how it goes. This whole process took 30 minutes. That doesn’t much include installing Python 2.4 or upgrading expat, since I didn’t think that was fair to blame on CherryPy. Still, 30 minutes of reading documentation and a third of that fixing the bdist_rpm problem. Not too bad, but it’s sure no yum -y install php.

Next, lets see how long it takes me to get Kid set up and running.

March 8, 2006

Rescuing hardware RAID (again)

I was asked to perform hardware RAID heroics again tonight. In fact, I’ve been on site about eight hours now (it’s 0630) and I’m still working at it, but I’ve basically got the problem licked.

I put a broken SCSI RAID 5 (six disks, one failed) back together. The general process is:

  1. Get twice as much storage as the size of the array you want to restore.
  2. Boot your friendly Knoppix disc, or other Linux distribution as you see fit. Make sure you can see the array you’re restoring (in JBOD, of course) and the additional storage you’re going to use for recovery.
  3. dd each disk from the broken RAID array to a file on the additional storage. Now you’ve got a backup to work from. You’d hate to screw up the original devices.
  4. Disconnect the original devices. You can work with the “images” you made of them. You’d hate to screw up the original devices.
  5. Pick apart the disks with something like BIEW. I couldn’t get mine to work with large files, so I had to, uh, “bind” the data files to loop devices. You’re looking for some data that spans a block boundary so you can tell what parity algorithm the array is using. Choices are: left symmetric, left asymmetric, right symmetric, right assymmetric. You’ll also need to discover the block size. Both arrays I’ve done this with have used 64KiB block size. Hopefully you can make a good guess at the order the controller put the drives in; for my SCSI RAID this was in order of their IDs, just like I guessed.
  6. Now you’ve likely got a problem. Your logical RAID volume has a partition table and (duh) partitions. We’re using Linux software RAID (md) to stitch your disks back together, and md only semi-recently (kernel 2.6) gained the ability to have partitions on a software raid device. So, you’re going to either need a version of mdadm that lets you use --build with --level=raid5 (which, to my knowledge, doesn’t exist), or else you’ll apparently need to patch raidtools. I don’t have a patch for you; just go into the source and comment out any section that bitches not an MD device! if major(stat_buf.st_rdev) != MD_MAJOR. Miraculously once you stop mkraid from looking at the device major, it works just fine with partitioned MD devices.
  7. cat /proc/devices, look for the major number assigned to mdp. If you don’t find it, I gather your kernel lacks partitions-on-md-devices support. You might need to load some module; this was already done for me on Knoppix, and the number assigned was 254.
  8. mknod /dev/md0 b 254 0 substituting 254 for whatever number you got in the previous step.
  9. Make the individual partition devices by counting up minors from there. I.e., md0p1 is mknod /dev/md0p1 b 254 1. I think you can go up to 63 partitions.
  10. Now you can make an /etc/raidtab. Mine was pretty standard, and a lot like the one in the original awesome post of the guy that gave me this idea in the first place, except this particular night I had a failed-disk. (Don’t worry about what device to supply for the failed-disk; I gave it /dev/missing. mkraid bitches, but it still starts the array.)
  11. Now you can try something like mkraid --force --dangerous-no-resync /dev/md0. (Since I had a failed disk, I don’t think I actually needed --dangerous-no-resync.)
  12. Now take the other half of your additional storage (wondering why I said you need twice the size of the RAID array you’re restoring, weren’t you?), format it, mount md0p1 or whatever, and copy the data over. In my case it was a Windows server, so I made a FAT32 partition (mkdosfs -F 32 /dev/hdf2) and then copied all the data over to that.

When I was done, I had an IDE drive I could jam in a Windows computer to read the data back on to the newly rebuilt server. Trickiest parts are reading the drives by hand to determine how they need to be reassembled, and then making raidtools work with partitioned md devices.

As far as “reading the drives by hand,” that could be automated if I ever took the time to learn more about NTFS. Reading it tonight, and working with a little Python I wrote, I noticed things about it, like two bytes that seem to hang out around the end of every 4KiB (I think) block, and the fact that the second byte before the start of a long file name seems to indicate that file name’s length. Still since there are only four parity types, and probably only about 5-10 block types I can even remotely call “sane,” someone with a little knowledge of the filesystem could automate the process of finding out the parameters without a tremendous amount of effort.

March 3, 2006

Selectively firewalling OpenVPN users

This whole entry is very nearly a brain dump. You’ve been warned.

First, some notes on the use of learn-address scripts.

Something to understand about the use of the “update” event with an OpenVPN learn-address script. The update event is only called when a previously learned source address is seen on a different connection than the recorded one. Thus if a user changes their MAC, you don’t see an update event with the same common name (CN) but a different MAC; for that you would get a new add event with the new MAC. But if a user connects and has the same MAC as another connection (note that it may or may not be the same user; maybe it’s a reconnection and OpenVPN hasn’t dropped the dead connection) you’ll get an update event with the MAC and the new CN. As I alluded to above, I think you’re most likely to get an update event in the case of a user stealing another user’s MAC, or in the case of a user reconnecting and their MAC being redirected to their new connection.

For my purposes, I can treat “update MAC CN” as “delete MAC” then “add MAC CN.”


The use of learn-address scripts is important for my project, selectively firewalling OpenVPN users. To clarify what I’m attempting to do: some users need to have access only to certain hosts. I don’t predetermine their MAC address (I’m using bridged mode) though you could: have the learn-address script exit non-zero if the CN doesn’t match the MAC you have listed in some table. I also don’t predetermine their IP address, though it would be a simple matter to determine their MAC and then drop any packets with a MAC/IP mismatch.

To accomplish this I wrote my own learn-address script in Python (2.3; sigh, no generator expressions). It reads a file with a series of mappings between a CN and another file name. This other file name contains a list of iptables rules that should be used for connections from this CN.

You are expected to give the script its own chain (referred to as the “dispatch chain”), which needs to be called from the FORWARD chain. The script then makes rules that dispatch from the dispatch chain to per-client chains it creates on demand and fills with the rules from the files that were mapped to.

Put another way, and in more detail:

  1. The learn-address script, firewaller, gets called to add a new address.
  2. It reads the mapping file, looking for the CN (passed on the command line; see the OpenVPN manual page). If it doesn’t find the CN, no further action is taken and the address is learned.
  3. Assuming the CN is found, firewaller creates a new chain (the name of this chain is the MAC address, or the IP address learned in routed mode). This per-client chain is filled with rules from the file pointed to by the mapping in the previous step.
  4. firewaller then makes a rule in the chain you have set aside for the script. This rule matches on the client’s (MAC or IP) address and jumps to the per-client chain when it matches.

On delete, firewaller deletes the per-client chain and deletes the rule from the script’s dispatch chain. On update, firewaller acts like it received a “delete” event followed by an “add.”

More exact details on usage follow. But first, here’s firewaller.py in case you want to look at it (and its possibly helpful comments) as we go along.

First I make the dispatch chain with something like:

iptables -N openvpn-firewaller
iptables -I FORWARD 1 -m physdev --physdev-in ovpntap0 -j openvpn-firewaller

Note that I only jump to the dispatch chain if the packet actually came in through the OpenVPN TAP device. You have to use the physdev match because ovpntap0 is bridged to the rest of the LAN. In fact, for reference, my usual setup is simple:

LAN---Linux box---Internet

So the Linux box bridges the OpenVPN TAP device with the Ethernet adapter facing the LAN. Also note that I usually make the first rule in my FORWARD chain something like -m state --state ESTABLISHED,RELATED. Doing a state lookup first thing might be expensive, but I’ve never had performance problems that I’ve been able to attribute to iptables, and it sure makes the rest of the rules simple to keep track of (allow the initial packet of a connection and don’t worry about the rest of it, including the return packets). However, I put the dispatch chain call before the ESTABLISHED,RELATED rule because I don’t want the user to somehow trick another host on the network into initiating a connection to the VPN user’s machine, and then have that connection allowed by the ESTABLISHED,RELATED rule.

Next, the map file:

George_Clooney__County General_ test-policy

Note the CN has to be modified as described in both firewaller and in the OpenVPN man page. (In this case the CN prior to modification might have been “George Clooney (County General)”.) A possible firewall rules file:

[root@gateway scripts]# cat firewaller-rules/test-policy
-d 10.0.0.5 -j RETURN
-j DROP

So Mr. Clooney only has access to 10.0.0.5. The RETURN is used to bounce back to the dispatch chain, after which it will presumably return from the dispatch chain and go on through the FORWARD chain.

That’s all there is to it. Kind of.

Since I’m bridged, there’s more to worry about. Thankfully, I’m only doing IP. iptables, of course, doesn’t get a whack at non-IP traffic. So we need to block anything that’s not IP with commands such as:

[root@gateway scripts]# ebtables -L --Lx
ebtables -t filter -N openvpn-ip-only
ebtables -t filter -A FORWARD -i ovpntap0 -j openvpn-ip-only
ebtables -t filter -A openvpn-ip-only -p IPv4 -j RETURN
ebtables -t filter -A openvpn-ip-only -p ARP -j RETURN
ebtables -t filter -A openvpn-ip-only -j DROP

(BTW: ebtables for CentOS 4 at RPMforge.) Note that here I’m blocking everything except ARP and IP coming from the OpenVPN device. I’m doing this for every user, not just some users (i.e., the users that I restrict with firewaller). No problem: like I said, only doing IP. It is important to let through ARP though.

Also, note that I’m blocking Ethernet frames using 802.3 framing (with or without SNAP) here. There’s an important reason for this: ebtables says it can’t look inside 802.3 frames. So if you somehow got an 802.3 frame, I’m sketchy on what matches would and wouldn’t work in iptables. It certainly sounds like you can’t look at the destination IP (as I do). Can you look at the source MAC (as I also do)? Depending on how you write your rules, using this kind of framing might be a way around it. I have the feeling it’s up to Win32-TAP to determine the framing, and I suspect it defaults to Ethernet II and not 802.3, but why take a chance? Block it until you have a good reason to allow it. (Note that RFC 1122 says that hosts should support 802.3 framing. Of course, it says that they must support Ethernet II framing.)

I’ve tested firewaller lightly and it seems to work. There could be some problems. Obviously, something could happen/change with your iptables rules; basically they could somehow get out of sync. firewaller, by default, actually tries to work around this (basically deleting anything that gets in its way). However, it’s a fragile system since firewaller doesn’t really look at the iptables rules. It depends on the exit status of iptables to figure out if a command worked or not. It’s possible one iptables command could succeed, but then a following related command could fail, and the first command hangs around. The good news is that if firewaller exits with an error the address isn’t learned and I believe OpenVPN drops the packet. (Do watch your log file sizes as it keeps dropping packets, though; OpenVPN is kind of noisy.)

Also, another catch: be careful when using service iptables save while an OpenVPN client is connected. That’s a good way to start out with some stale rules after a reboot. In fact, the script I keep my iptables rules in deletes all rules and re-adds them when I run it, so using that while OpenVPN clients are connected could have a more dangerous impact: it would likely allow all currently connected OpenVPN clients to go anywhere they’d like. Just keep factors like these in mind when working with your iptables setup.

A few problems, or potential problems, remain with this setup. For one thing, the way I’m using firewaller, and the way my rules are constructed, no traffic destined for the OpenVPN server (which is also the firewall; see diagram above) is filtered. In other words, a restricted OpenVPN client can get to everything on the firewall that a user on the LAN could get to. In my case this isn’t a big deal. At worst they could query DNS, maybe make some DHCP requests or connect to SSH. It is something to consider, though.

Traffic coming in to the client isn’t filtered. This means you could still potentially have packets coming in to the client that you didn’t really authorize. You’d either have to lock it down with MACs (which is a management burden—remember to change your OpenVPN rules when you reinstall the TAP-Win32 device, or when you switch the NIC in your server!) or else take more drastic measures in order to be able to use iptables to filter. For example, you’d need to verify that a CN is only using one MAC or something like that, then make sure that a given source/destination MAC (depending on whether traffic is coming from or going to the restricted client) always corresponds to the IP that was allocated to the client. There is an ifconfig_pool_remote_ip environment variable that I think you should be able to read from a learn-address script (at least on the add event, and probably for an update as well) to see what IP was allocated to the “connection.” Assuming you’re letting OpenVPN assign IPs and not your DHCP server. Once you know that a connection is always using a single MAC, and that single MAC is always being referred to with the right IP address, you can lock down with iptables.

The notion that another host on the network might be passing packets into the restricted client is not such a big deal, probably. After all, they shouldn’t be able to address return traffic to a host they’re not supposed to be talking to (see above notes about ESTABLISHED,RELATED). I can think of one possible exploit, in conjunction with another hole in the filtering. We don’t filter the contents of ARP messages, so perhaps a malicious restricted client could spoof some ARP messages and get traffic redirected to it that the restricted client shouldn’t be seeing. Again, the only way to filter ARP messages would be to ensure the mapping of the CN to the MAC to the IP, and then probably use ebtables to peek inside the ARP messages to make sure they’re kosher. An alternative might be to restrict ARP messages from OpenVPN clients and instead proxy ARP for them. Or do something “like proxy ARP” since proxy ARP is kind of neutered in any recent Linux kernel. Might require a separate daemon.

All of this is kind of pointing to “use routed OpenVPN if you can.” No ARP messages from the client. No worries about Ethernet framing types. No worries about protocols other than IP. Just normal ol’ iptables like you’re used to. I guess if you can get away with routed (you don’t need broadcast traffic, don’t need to pass protocols other than IP, and you can get all the Windows stuff to work—because I know how fickle Windows seems to be when it can’t broadcast) use it.

One final note. The service ebtables save command from the RPMforge ebtables RPM saves into some kind of binary file using some kind of “atomic” save/load commands in ebtables. I actually suspect ebtables distributes the initscript that does this. I’m a little scared of this. What’s wrong with the nice format you get from ebtables -L --Lx, as demonstrated above? What happens if this binary file gets corrupted? I hope the kernel code checks the validity of this binary data before using it. (Surely it does. Right?)