I’ve just given Feed43 a go. It’s very nifty.
Basically, it’s a pattern-based HTML-to-RSS scraper — similar to my own Sitescooper in that respect ;) — but built entirely as a web app.
Until now, I’ve been hacking up scrapers one by one, using either Sitescooper or WWW::Mechanize, run from cron, and putting the output up on taint.org; for example, http://taint.org/scraped/ has the public ones: Threadless, Perry Bible Fellowship, and White Ninja comics.
Today, I came across a case where I wanted a new RSS feed, and since I’d been hearing of Feed43, thought I’d give it a try, to save running yet another cron on our server. It was reasonably simple, although still required a fair bit of knowledge of the concepts of scraping via pattern matching against HTML; but the UI was fantastic, with everything previewed using a clean AJAX UI, and within 3 minutes I had a new feed.
For the curious — the feed was for TCAL’s Ireland category , and the results are here: Feed43 (Feed For Free) : TCAL – Ireland. (go ahead and sign up if you like ;)
New web pattern, by the way — there’s a trend towards using “secret URLs” instead of username/password authentication for the kind of “trivial” auth task, like editing feed-scraper details. Good idea.