March 17, 2006

CherryPy on CentOS 4

Note: go read a follow-up entry after you’re done reading this. It has some important updates.

Yesterday I was talking with a friend about the relative advantages of PHP versus other web “toolkits” (“frameworks”?), especially other web toolkits in Python. We agreed that one of the advantages of PHP is that it’s easily installed; it “just works.” He asked if the same was true for Zope, and my answer was basically “no fucking way.” Granted, the last Zope I used was 2.5 or 2.6 which is pretty old by now. But it took me a while to get it installed (I think someone had RPMs which helped) and then get it set up proxying through Apache; in fact, getting it going through Apache was the real hard part. I seem to recall having to create a “cookie monster” object, and some sort of “zombie root” or something like that? Then lots of bugs. (Then find out it didn’t work with the latest version of Python, which I really wanted to use.)

What about other web software, though? Such as CherryPy? How easily is it set up? I liked the look of some of the CherryPy examples so I decided to find out. What follows is a rough set of steps that I took to install CherryPy on CentOS 4 (and thus it should hopefully work for RHEL 4 too).

Note that I’ve chosen to set up CherryPy using WSGI and mod_python. I’ll talk more about this decision later.

  1. CentOS 4 only has Python 2.3. Lament, but not for long: ATrpms has Python 2.4 RPMs. Install them.

  2. Update expat. I used the version from Fedora “development,” which I think is probably what’s in FC5: expat-1.95.8-8.2. If you don’t update expat you’ll get httpd crashing with messages like the following (from /var/log/httpd/error_log):

    httpd: Objects/stringobject.c:105: PyString_FromString: Assertion `str != ((void *)0)' failed.
    
  3. Rebuild the CentOS mod_python RPM from source. You’ll need to modify the spec file to supply --with-python=/usr/bin/python2.4 to the configure script.

  4. Download CherryPy (2.2.0rc1) and try python2.4 setup.py bdist_rpm. Lament, because it does not work correctly. Make a MANIFEST.in in the CherryPy top-level source directory that looks like:

    include cherrypy/favicon.ico
    include cherrypy/tutorial/*
    include cherrypy/test/*
    include cherrypy/test/static/*
    

    Then make a setup.cfg:

    [bdist_rpm]
    install_script = bdist_rpm_install
    

    Finally, you need to make bdist_rpm_install:

    python2.4 setup.py install --root=$RPM_BUILD_ROOT --record=INSTALLED_FILES
    sed -i -e '/ /s/(.*)/"1"/' INSTALLED_FILES
    

    Note that the sed command works around a problem RPM apparently has with a filename that contains a space (it thinks you’re trying to supply it with two separate file names). I suppose the MANIFEST.in might be the only thing I expect CherryPy to include in their distribution, but the above magic to make bdist_rpm work would be appreciated by me, and I’m sure by many others. I intend to submit a patch back to the developer, I suppose.

    (One problem: notice that I had to hard code python2.4 in the bdist_rpm_install script. That should really be a variable or parameter of some sort, but Python’s distutils have no feature for doing this.)

    Now CherryPy will build an RPM with something like: python2.4 setup.py bdist_rpm --python=/usr/bin/python2.4. I like to do things like --release 1.py24 also, so I remember that it’s a package compiled for Python 2.4 rather than Python 2.3 (which is still installed, mind you). Install the resulting RPM.

  5. Check out svn://svn.eby-sarna.com/svnroot/wsgiref/ (that’s Subversion; try svn co <above url>). Copy/paste the code for modpython_gateway.py and put modpython_gateway.py in src/wsgiref (found underneath the wsgiref directory you just checked out). bdist_rpm this package and install the RPM.

  6. Set up your site’s directories. I’ve got /srv/www/www.some.site/root set up for the DocumentRoot and /srv/www/www.some.site/lib/python to keep the Python code in.

  7. Time to configure Apache:

    <VirtualHost www.some.site:80>
            ServerName www.some.site
            DocumentRoot /srv/www/www.some.site/root
    
    
        <Directory /srv/www/www.some.site/root/subdir>
                Options +Indexes
    
    
                SetHandler python-program
                PythonHandler wsgiref.modpython_gateway::handler
                PythonOption application cherrypy._cpwsgi::wsgiApp
                PythonPath "sys.path + ['/srv/www/www.some.site/lib/python']"
                PythonOption import cphw
    
    
                # Serve up static files without going through CherryPy.
                <FilesMatch ".(css|gif|jpe?g|png)$">
                        SetHandler None
                </FilesMatch>
        </Directory>
    </VirtualHost>

    That’s from my /etc/httpd/conf/hosts.d/conf.www.some.site file. Make sure the above stuff gets into the Apache configuration and restart Apache.

  8. Now make /srv/www/www.some.site/lib/python/cphw.py (chpw == CherryPy hello world) with contents as follows:

    import cherrypy
    
    
    class HelloWorld (object):
        @cherrypy.expose
        def index(self):
            return "Hello world!"
    
    
    class Empty (object): pass
    
    
    cherrypy.root = Empty()
    cherrypy.root.subdir = HelloWorld()
    cherrypy.config.update({"server.environment": "production",
                            "server.protocolVersion": "HTTP/1.1"})
    cherrypy.server.start(initOnly=True, serverClass=None)
    

    Note this is mostly taken from the documentation on using CherryPy with WSGI and mod_python. In fact, most of these steps are based on the information there.

If you did everything correctly you should now be able to hit http://www.some.site/subdir/ and get “Hello world!” back.

A few things to add. First, you can eliminate the need for wsgiref, supposedly, by using mpcp. I didn’t try this.

Notice I took the ::startapp off of the end of the PythonOption import directive used in my Apache configuration; compare to the Apache configuration in the ModPythonWSGI instructions to see what I’m talking about. If you leave startapp on there, wsgiref will expect your module (cphw in this case) to have a startapp callable that initializes the module. This actually gets called for every request, though the comments indicate you probably only want this function (startapp, that is, in this example) to be called once so you’ll need to set some sort of global flag to indicate it’s been called before.

Finally, I’m not sure this was really the way to go. I originally thought, “yeah, mod_python, that’ll improve performance since it won’t have to run Python, load all the modules, etc. for every request.” This was an alternative to running CherryPy the way that seems to be recommended: as a standalone web server. Now, sure, you might not want CherryPy’s HTTP server facing the rest of the Internet. However, they also seem to recommend that you run it through Apache using modrewrite/modproxy. (This way you also get stuff like HTTP/1.1 and SSL.) Now, theoretically there is a performance cost for a setup like this, where Apache forwards the request to CherryPy’s HTTP server. However, the cost sounds pretty small (one person quoted 0.9ms in a simplistic benchmark they did). Further, I’ve already seen weird behavior from mod_python: I modified cphw.py and only three of the four children actually seemed to reload it. I believe it’s fixed itself after a good night’s rest, but that’s not comforting to run into such a problem after a fairly short period of use.

I think my recommendation for setting up CherryPy is probably the same as the CherryPy developers/masters, then: just proxy it through Apache. CherryPy is running the whole time, mind you, so you’re still not suffering a Python startup for every request. Still, I don’t know how it might handle lots of simultaneous requests (given threading and the GIL, all that). But, then, you’re not getting multiple simultaneous requests are you? And you’re probably never going to.

There are some complexities involved in setting up CherryPy behind Apache. Read Running CherryPy behind Apache through mod_rewrite for all the gory details.

Nonetheless I’m going to try running CherryPy in mod_python for a while and see how it goes. This whole process took 30 minutes. That doesn’t much include installing Python 2.4 or upgrading expat, since I didn’t think that was fair to blame on CherryPy. Still, 30 minutes of reading documentation and a third of that fixing the bdist_rpm problem. Not too bad, but it’s sure no yum -y install php.

Next, lets see how long it takes me to get Kid set up and running.

Comments (3)

  1. May 17, 2006
    Robert Brewer said...

    Great writeup! I’m glad you found the newer version of modpython_gateway (that doesn’t depend on wsgiref). I’ll try to improve the documentation so the next guy doesn’t have to read the source. ;) Perhaps we should distribute it with CP so you don’t have to make a separate RPM…?

    Also, CP 2.x should work with Python 2.3. [There's a problem in the test suite we just found out about, but the rest should do just fine.] And is there a reason you chose CP 2.2.0rc1 instead of 2.2.1?

    I eagerly await your bdist_rpm patch. :)

  2. May 19, 2010
    Jean said...

    No,

    CherryPy 2.3.x still does not work with python 2.3 (RedHat RHEL 4) 4 years later.

  3. May 19, 2010
    Jean said...

    I get the bellow error:

    Traceback (most recent call last):
    File “/usr/lib/python2.3/site-packages/cherrypy/_cpwsgiserver.py”, line 234, in run
    request.parse_request()
    File “/usr/lib/python2.3/site-packages/cherrypy/_cpwsgi.py”, line 169, in parse_request
    _cpwsgiserver.HTTPRequest.parse_request(self)
    File “/usr/lib/python2.3/site-packages/cherrypy/_cpwsgiserver.py”, line 150, in parse_request
    for k in headers:
    File “/usr/lib/python2.3/rfc822.py”, line 390, in __getitem__
    return self.dict[name.lower()]
    AttributeError: ‘int’ object has no attribute ‘lower’