ICHC Update

Good news: there will be an ICanHasCheezburger app update sometime after the holidays. This isn’t the fabled Three20 based version 2.0, but a minor update which simply changes the feed URL. However, by doing so, it should fix most of the crashes.

Our application uses a ‘scraped’ feed generated from the site content rather than the site’s own RSS feed. We do this because the standard RSS feed only has a limited number of items and must be loaded in its entirety. By scraping the site, we can create multiple pages with unlimited items, which load a lot faster since it only loads 10 or fewer items at a time.

The feed scraper was originally developed by the folks at ICHC for their dashboard widget. I later modified it to support all of their sites and change the output from JSON to simplified XML. Unfortunately that feed scraper has a major flaw. It uses regex pattern matching to parse the HTML, which is a Very Bad Thing. The script can easily get confused by changes to the site and often produces invalid data. It also hasn’t included videos since a format change at the site broke it.

Last night I started hacking at it with PHP’s XML parser and DOM commands and came up with a much more robust script that uses element classes to identify valid items and to avoid outputting bad data that could crash the application. I’m also hosting it on my own server at DreamHost, so I can easily fix it myself instead of having to go back and forth for fixes as we do now.

Unfortunately a few Viddler movies still won’t play, since they must be explicitly enabled for downloading on the site.

Leave a Comment