darkness

Friday, 14 September 2007

Python drive-by

darkness @ 19:17:05

Making simple tree data structures out of things like lists and dicts in Python always pains me. In my head other languages, like maybe ECMAScript or Lua, have better syntax for this. Here’s an example of what I might write to set up a particular data structure:

day = {}
day["date"] = "date object here"
day["title"] = "string data here"
sections = day["sections"] = [dict(title="section title")]
jobs = sections[0]["jobs"] = []
jobs.append(dict(title="Google",
                 url="http://www.google.com"))
jobs.append(dict(title="Yahoo!",
                 url="http://www.yahoo.com"))

Perhaps pprint can make it easier to see what I’ve done here:

{'date': 'date object here',
 'sections': [{'jobs': [{'title': 'Google', 'url': 'http://www.google.com'},
                        {'title': 'Yahoo!', 'url': 'http://www.yahoo.com'}],
               'title': 'section title'}],
 'title': 'string data here'}

You might want to argue that I made bad choices WRT that syntax, since the pretty printed version above probably looks better. Here’s an alternative, I guess:

day = {"date": "calculate date object here",
       "sections": [{"title": "section title",
                     "jobs": [{"title": "Google",
                               "url": "http://www.google.com"},
                              {"title": "Yahoo!",
                               "url": "http://www.yahoo.com"},
                              ],
                      },
                     ],
       }
# Can't assign this in-line, unless I calculate "date" into a local
# variable first.
day["title"] = "this is based on " + day["date"]

It’s not really fair to say “just write it like pprint has.” pprint has a few advantages, such as having everything that it needs to put in its rendered data structure up front, and also being a computer. As a human writing this data structure, I think of the title before I think of putting in the jobs for example; pprint alphabetizes the keys, so it puts jobs first, which means you don’t potentially have }]}]} at the end of the structure. Also note that I needed one of the values in the structure to compute another; pprint already had that value when it went to render the data structure.

Also, I may need to think about switching from double quotes to single quotes. To my overly picky mind, they now look a little “cleaner.” My use of double quotes can be traced back to when I was frequently programming in C, but is not helped by the fact that '' and "" behave differently in languages such as Perl and the Bourne Shell. (C also makes/made me do slightly weird things in Perl and sh, such as writing 'x' and "xxx" with different quote types.)

So I went crazy and made a class which is currently called DataObject. It’s actually a somewhat disgusting set of wrappers over dictionaries, but check out the syntax:

day = DataObject()
day.date = "date object here"
day.title = "string data here"
day.sections[0].title = "section title"
day.sections[0].jobs.new_child(title="Google",
                               url="http://www.google.com")
day.sections[0].jobs.new_child(title="Yahoo!",
                               url="http://www.yahoo.com")

To me this is vastly more readable (and easier to write too), and it works just like the data structure I made above with Python’s built-in types. You can also ask for a clone of the data using built-in types by calling day.to_native() (it operates recursively).

I think there may be some weird side-effects, like weird exceptions that happen when you make a typo on a “key” (since attributes are mapped to keys) and a new object springs into existence. I’m going to try using it a bit more before I pass judgment on whether or not it’s a useful idea.

Powered by WordPress