March 21, 2006

More on CherryPy, mod_python, and Python web development

Over the past three or so days I’ve been doing a lot more reading and experimenting with Python web development. I have to say, I’m not very happy overall. I’m used to developing in Java via Apache, mod_perl via Apache, and I’m semi-familiar with PHP via Apache. Python web development just doesn’t seem to work that well through Apache. CherryPy’s recommended deployment is having Apache proxy to CherryPy’s web server. Off the top of my head, I think only and maybe Django actually seem to advertise and officially support running from mod_python. If anyone else supports running from Apache, it seems to be via FastCGI.

The major issue for me has been reloading of files. Remember the problem I previously mentioned where it seemed like some of my Apache/mod_python children were not reloading changed Python files? That’s far from an isolated incident. In fact, I can reproduce it pretty reliably. A single child each time fails to reload. Reloading seemed to work pretty well in other “web technologies” that I’ve used: I seem to recall that Servlets may end up reloading the entire context, but maybe just changed JARs; PHP reloads files by default AFAIK; I think mod_perl also reloads files. All this happens, and all of these run through Apache!

Turns out that mod_python’s reloading is really a big tease: it only reloads modules that are imported with its special (and supposedly not too functional) importing API. There may be solutions to this. takes a straightforward approach: it just searches sys.modules for files with a newer mtime and uses the reload() built-in to reload them. Of course, this might not work so well if you’ve got references to instances from the old module hanging around, since it has no way of updating those instances. I think this method could be easily broken in CherryPy. Say this is your “top level” CherryPy file:

import appclasses
import cherrypy

cherrypy.root = appclasses.Root()

Now change appclasses.Root. How will the object in cherrypy.root get updated? Every module that uses appclasses probably needs to be reloaded too, and then the parents of those modules, and so on. At the very least your any files that update the CherryPy publishing tree may need to be reloaded. I’m not even sure if calling reload() on the above module would actually re-execute the cherrypy.root assignment. (I do concede that’s method for declaring handlers may moot this point, but I don’t want to use handlers; I want to use CherryPy’s method for dispatching.)

In Java, I think this problem is somewhat solved by reloading the whole context (i.e., application). This mostly works except for some situations where I seem to recall managing to leave old instances around. Then trying to use those old instances causes some bizarre ClassLoader-related exception, if I recall correctly, followed by a server restart to fix them. Still, I seem to recall that such restarts were few and far between. I suspect PHP and Perl have similar problems (I’m thinking you could store a reference to an object in PHP or mod_perl sessions).

In Common Lisp I think the object in cherrypy.root would get updated when the Root class was updated. CLOS is just awesome like that.

There is some kind of mod_python “framework” called Vampire that supposedly greatly improves upon the reloading mechanisms in mod_python. (Disclosure, For example, it may reload parent modules, potentially fixing the above CherryPy example. It’s not clear to me if you could somehow import Vampire and then ignore the rest of Vampire’s functionality in favour of using CherryPy. Frankly, I kind of stopped caring. I think I can see why it might be nice to start with a whole new application, and everyone else seems to be developing this way.

(Actually, if you keep in-memory state information, restarting the whole application could be a pain in the ass. One could say, “well, you shouldn’t do that.” That’s probably reasonable: unless you’re running in a threaded model, I don’t see how different mod_python or FastCGI instances are going to be able to see each other’s data.)

Regardless, the method CherryPy—and many others!—use to do “reloading” is to basically exec(myself); and that comes across to me as gross. (It looks like there may be some practical problems with this method as well.)

All these Apache/reloading complications almost made me abandon Python for something else. Except I can’t find a good way to have Emacs indent PHP, with Java I’ve been there and suffered that, and Ruby doesn’t seem any better. (If someone has a good way to indent PHP relative to the surrounding HTML in Emacs, please let me know! I’d probably prefer PHP for the relatively simple app I’m working on.)

I get the impression that Ruby on Rails uses this same method. Ruby on Rails looks approximately as hostile to Apache as Python, too. One seemingly well researched piece on deploying Rails applications said that you should just use lighttpd and its built-in FastCGI support.

Some notes on how I intend to develop. I’ll be running CherryPy with its built-in web server for development. I’ll probably do Apache/mod_python in production. Note that mod_python reloads just fine when you service apache reload (RH/FC command; equivalent to apachectl graceful or something like that.)

You could do FastCGI, but it sounds kind of problematic. One Rails users reported a lot of zombie processes, and said these claims were not uncommon. I have an RPM made for mod_fastcgi. I also found mod_fcgid which may well be better. However, I can’t really think of any advantage to using FastCGI over mod_python, since reloading basically sucks in both equally. (Note that you can’t use CherryPy’s built-in reloading under either mod_python or FastCGI.) There’s also something called SCGI out there; no idea about that.

Let me add that this is the newest version of modpython_gateway. You no longer need wsgiref. I think the best documentation for using modpython_gateway is in the comments of the source file itself. Also note that you can no longer use PythonOption import, and the PythonImport directive seems very hard to use (I couldn’t get it to do anything). To get your application loaded into CherryPy, the best way seems to be to import CherryPy’s wsgiApp into one of your own source files, then specify PythonOption wsgi.application yourModule::wsgiApp. That way yourModule gets loaded (and gets an opportunity to mount things in CherryPy).

I am a little concerned about CherryPy’s sessions in mod_python: I’m not sure if they get shared across Apache children (using mpm_prefork, not mpm_worker). I saw one person talking about deploying CherryPy on mod_python, and they said they needed to use file-based sessions. I believe CherryPy also supports sessions in PostgreSQL.