A Nice Little Python Story
I talk quite a bit about Python recently, partly because it’s the only language I’m actively using right now aside from VBA. I was doing a little gedankening (it’s a gerund now!) this morning about folding RSS feeds into the main content stream of Efendi.
Typically, if you are throwing around what just instinctively feels like a classical computer science problem, you need to start just absently paging through Python documentation looking for an official solution to your problem.
It seems a little esoteric at first, but my problem is a fairly classical one when broken down:
- We have an ordered list of blog entries, sorted by date
- There are zero or more other lists, not necessarily in date order, involving things like Picasa pictures, Google Code commits, Cluster updates, etc.
- We slice the primary ordered list to show, say, the fourth set of 10 items (this would be page four on the blog if there were ten nodes per page)
- We want to fold in (my word picture on “fold” is like putting the whipped cream into key lime pie filling, umm umm, don’t think about that in the morning when in a Central Asian country!) any entries which fall in between (or immediately above) the entries in the sliced list
Let’s substitute numbers for dates to make things easier to read, and I’ll depict it below:
Main list:
30 - Cookies
42 - Brownies
45 - Key Lime Pie
49 - Lokum
52 - Jello
58 - Fruit Salad
60 - Baklava
75 - Carrot Cake
88 - Chocolate Cake
92 - Ice Cream
Let’s take the second slice of four:
52 - Jello
58 - Fruit Salad
60 - Baklava
75 - Carrot Cake
If we were folding RSS entries into this list, we would only want “dates” between 50 and 75. A date of 49 would be above Lokum on the first page, and a date of 76 would be above Chocolate Cake on the third page.
Anyway, I now need some efficient way to do this slicing.
Enter bisect, a nice little module tucked away in the mathy part of the Python library docs. Its function is to calculate where a certain item should be inserted into a sorted list to retain sorting order. Sounds like the solution.
But, we’re dealing with RSS feed items, not raw numbers. This module will have to compare things, so what to do?
In C++, you might have a remote shot of using operator overloading.
In Python, just monkey patch the RSS library. What? Monkey patching means overwriting or adding functionality atop an existing library. So, Python tells you to implement a class function called __cmp__()
to override or add comparison functionality. You just have __cmp__
look at the dates of the posts to derive the comparison.
For non-programmers, it is probably hard to see the appeal of such flexibility. In fact, it might be expected that there has been this kind of flexibility in programming for a long time. This is not the case.
In C, you get errors or warnings trying to convert figures from decimals to whole numbers. And there is a compiler switch called pedantic
to make it even worse.
Python is freeing, as is Ruby, and also JavaScript (which is much better to write when there’s a toolkit like jQuery to help with browser compatibility). It says “if it can work it will work,” which is much different than getting your knuckles wrapped for initiating an implicit forced coersion on type int, or something like that.