More on CherryPy, mod_python, and Python web development
Over the past three or so days I’ve been doing a lot more reading and experimenting with Python web development. I have to say, I’m not very happy overall. I’m used to developing in Java via Apache, mod_perl via Apache, and I’m semi-familiar with PHP via Apache. Python web development just doesn’t seem to work that well through Apache. CherryPy’s recommended deployment is having Apache proxy to CherryPy’s web server. Off the top of my head, I think only web.py and maybe Django actually seem to advertise and officially support running from mod_python. If anyone else supports running from Apache, it seems to be via FastCGI.
The major issue for me has been reloading of files. Remember the problem I previously mentioned where it seemed like some of my Apache/mod_python children were not reloading changed Python files? That’s far from an isolated incident. In fact, I can reproduce it pretty reliably. A single child each time fails to reload. Reloading seemed to work pretty well in other “web technologies” that I’ve used: I seem to recall that Servlets may end up reloading the entire context, but maybe just changed JARs; PHP reloads files by default AFAIK; I think mod_perl also reloads files. All this happens, and all of these run through Apache!
Turns out that mod_python’s reloading is really a big
tease: it
only reloads modules that are imported with its special (and
supposedly not too functional) importing API. There may be solutions
to this. web.py takes a straightforward approach: it just searches
sys.modules for files with a newer mtime and uses the reload()
built-in to reload them. Of course, this might not work so well if
you’ve got references to instances from the old module hanging around,
since it has no way of updating those instances. I think this method
could be easily broken in CherryPy. Say this is your “top level”
CherryPy file:
import appclasses
import cherrypy
cherrypy.root = appclasses.Root()
Now change appclasses.Root. How will the object in cherrypy.root
get updated? Every module that uses appclasses probably needs to be
reloaded too, and then the parents of those modules, and so on. At
the very least your any files that update the CherryPy publishing tree
may need to be reloaded. I’m not even sure if calling reload() on
the above module would actually re-execute the cherrypy.root
assignment. (I do concede that web.py’s method for declaring handlers
may moot this point, but I don’t want to use web.py handlers; I want
to use CherryPy’s method for dispatching.)
In Java, I think this problem is somewhat solved by reloading the
whole context (i.e., application). This mostly works except for some
situations where I seem to recall managing to leave old instances
around. Then trying to use those old instances causes some bizarre
ClassLoader-related exception, if I recall correctly, followed by a
server restart to fix them. Still, I seem to recall that such
restarts were few and far between. I suspect PHP and Perl have
similar problems (I’m thinking you could store a reference to an
object in PHP or mod_perl sessions).
In Common Lisp I think the object in cherrypy.root would get updated
when the Root class was updated. CLOS is just awesome like that.
There is some kind of mod_python “framework” called Vampire that supposedly greatly improves upon the reloading mechanisms in mod_python. (Disclosure, For example, it may reload parent modules, potentially fixing the above CherryPy example. It’s not clear to me if you could somehow import Vampire and then ignore the rest of Vampire’s functionality in favour of using CherryPy. Frankly, I kind of stopped caring. I think I can see why it might be nice to start with a whole new application, and everyone else seems to be developing this way.
(Actually, if you keep in-memory state information, restarting the whole application could be a pain in the ass. One could say, “well, you shouldn’t do that.” That’s probably reasonable: unless you’re running in a threaded model, I don’t see how different mod_python or FastCGI instances are going to be able to see each other’s data.)
Regardless, the method CherryPy—and many others!—use to do
“reloading” is to basically exec(myself); and that comes across to
me as gross. (It looks like there may be some practical problems
with this method as well.)
All these Apache/reloading complications almost made me abandon Python for something else. Except I can’t find a good way to have Emacs indent PHP, with Java I’ve been there and suffered that, and Ruby doesn’t seem any better. (If someone has a good way to indent PHP relative to the surrounding HTML in Emacs, please let me know! I’d probably prefer PHP for the relatively simple app I’m working on.)
I get the impression that Ruby on Rails uses this same method. Ruby on Rails looks approximately as hostile to Apache as Python, too. One seemingly well researched piece on deploying Rails applications said that you should just use lighttpd and its built-in FastCGI support.
Some notes on how I intend to develop. I’ll be running CherryPy with
its built-in web server for development. I’ll probably do
Apache/mod_python in production. Note that mod_python reloads just
fine when you service apache reload (RH/FC command; equivalent to
apachectl graceful or something like that.)
You could do FastCGI, but it sounds kind of problematic. One Rails users reported a lot of zombie processes, and said these claims were not uncommon. I have an RPM made for mod_fastcgi. I also found mod_fcgid which may well be better. However, I can’t really think of any advantage to using FastCGI over mod_python, since reloading basically sucks in both equally. (Note that you can’t use CherryPy’s built-in reloading under either mod_python or FastCGI.) There’s also something called SCGI out there; no idea about that.
Let me add that this is the newest version of
modpython_gateway.
You no longer need wsgiref. I think the best documentation for using
modpython_gateway is in the comments of the source file itself. Also
note that you can no longer use PythonOption import, and the
PythonImport directive seems very hard to use (I couldn’t get it to
do anything). To get your application loaded into CherryPy, the best
way seems to be to import CherryPy’s wsgiApp into one of your own
source files, then specify PythonOption wsgi.application
yourModule::wsgiApp. That way yourModule gets loaded (and gets an
opportunity to mount things in CherryPy).
I am a little concerned about CherryPy’s sessions in mod_python: I’m not sure if they get shared across Apache children (using mpm_prefork, not mpm_worker). I saw one person talking about deploying CherryPy on mod_python, and they said they needed to use file-based sessions. I believe CherryPy also supports sessions in PostgreSQL.