Sometimes It’s The Little Things

(stackoverflow rep: 9138, Project Euler 90/274 complete)

From time to time I trawl through my blog subscriptions: some are defunct while others may have changed their feed details sufficently that they’re no longer being picked up. I have about 270 subscriptions, which makes the job a chore and hence it doesn’t get done very frequently. The upshot is, for the case where the blog hasn’t just died, I sometimes miss something.

What should we do with tedious manual activities? Automate! I went and did some investigation.

Google Reader will, through the “manage subscriptions” link (it’s at the bottom of the subscriptions list in my browser) let you download your details in an XML (more specifically, Outline Processor Markup Language, or OPML) file. It looks like this (heavily snipped to avoid excess tedium):

<?xml version="1.0" encoding="UTF-8"?>
<opml version="1.0">
    <head>
        <title>mikewoodhouse subscriptions in Google Reader</title>
    </head>
    <body>
        <outline title="misc" text="misc">
            <outline text="Grumpy Old Programmer"
                title="Grumpy Old Programmer" type="rss"
                xmlUrl="https://grumpyop.wordpress.com/feed/" htmlUrl="https://grumpyop.wordpress.com"/>
            <outline text="Google Code Blog" title="Google Code Blog"
                type="rss"
                xmlUrl="http://google-code-updates.blogspot.com/atom.xml" htmlUrl="http://googlecode.blogspot.com/"/>
        </outline>
    </body>
</opml>

Ignoring for the moment the beauties of XML, this is pretty simple: there’s an outer “outline” that matches the folder I’ve created in Reader, within which is an outline for each feed to which I’m subscribed.

What I wanted to do is something like this:

  • parse the OPML, extracting the xmlUrl tag;
  • download the feed using that tag;
  • scan the entry listing in the feed to find the latest entry date, as a proxy for the last-known activity on that blog;
  • review the blogs that seemed oldest and deadest for update or removal.

Simples!

Well, with a little Googling and not much Rubying, it actually turned out to be so. John Nunemaker‘s HappyMapper gem does a quick enough job of the parsing:

require 'happymapper'
module OPML
  class Outline
    include HappyMapper
    tag 'outline'
    attribute :title, String
    attribute :\xmlUrl, String # remove the \ - WordPress insists on trying to make a smiley out of colon-x
    has_many :\outlines, Outline # see above. Stupid WordPress. Or me. Or both.
  end
end

sections = OPML::Outline.parse(File.read("google-reader-subscriptions.xml"))
sections.delete_if { |section| section.outlines.size == 0 } # remove outline children with no parents

The delete_if part is there to cater for my parse creating duplicates of the “child” outlines: once in their own right and once within their parent section. I’m pretty sure I’ve seen how to avoid that somewhere, but for now this will do, since all my subscriptions live in folders. It leaves something there for the next iteration.

And then there’s the spiffy little Feed Normalizer gem, that will parse RSS or Atom agnostically, which is good: I don’t want to have to care.

require 'feed-normalizer'
require 'open-uri'

sections.each do |section|
 section.outlines.each do |feed|
 list = FeedNormalizer::FeedNormalizer.parse(open(feed.xmlUrl))
 latest = list.entries.map{|entry| entry.date_published}.max
 puts "#{section.title} #{feed.title} #{latest}"
 end
end

Job done.

OK, this is the everything-works-as-expected version, which assumes files will always exist (they won’t), date strings are present and valid (they aren’t), but nobody wants to see a pile of exception- and error-handling code. Or at least, they shouldn’t. Not in a blog post.

Wax On, Wax Off, Wax Lirrical?

(stackoverflow rep: 8502, Project Euler 88/266 complete

One of the presentations at the Stack Overflow London DevDay (and I believe something similar occurred at several of the other events) was an introduction to Python, conducted via a dissection of a classic piece of code developed by Peter Norvig, currently Chief Scientist at Google.

The code is a “toy” spell-check routine and the talk focused on some of the “interesting” aspects of Python that help to make it powerful despite being very concise. List comprehensions, in particular.

What wasn’t so apparent at the time was how the code actually worked, and I wanted to understand it. Plus, in my competitive way, and given that I kind of left Python behind a few years back*, I wondered whether a Ruby implementation could be as terse.

So, obviously enough, given that I’m writing this, I wrote one. I didn’t quite get to Norvig’s 21 lines: Python not needing “end”s gives it a big advantage. If Ruby had significant white space I’d have made it. Actually, if I was prepared to make one really long line out of the Array creation (lines 17-20 below) I’d have squeaked under, but we’re not playing code golf and I balk at 170-character code lines.

There are several other-language implementations listed at the bottom of the page, including a Ruby one. On my WinXP machine, running straight MRI 1.8.6, my version is a gratifying 2.7 times faster, as well as being about 8 lines shorter.

Having got there, and achieved the original purpose of gaining a reasonable understanding of the algorithm in the process, I’m less than wholly satisfied with the outcome. It does the job, but it’s a long way from being idiomatic Ruby or being useful as anything more than a demonstration. Since code kata** seem to be on the menu this month, I may go back and code this up some more different ways to explore the algorithm, the language, the environment and my own personal inadequacies. For one thing, I’m curious to see how I would test-drive the development of the algorithm.

Anyway, FWIW, this is what I have so far…

Alphabet = ('a'..'z').to_a

NWORDS = begin
 words = Hash.new(0)
 File.read('big.txt').downcase.scan(/[a-z]+/).each { |w| words[w] += 1 }
 words
end

def correct(word)
 (known([word]) || known(edits1(word)) || known_edits2(word) || [word]).max{|a,b| NWORDS[a] <=> NWORDS[b]}
end

def edits1(word)
 n = word.size
 (0..n).map do |i|
 a, b = [word[0, i], word[i-n, n-i]]
 [ (a + b[1, n] unless b.empty?),
 (a + b[1, 1] + b[0, 1] + b[2, n] unless b.size < 2),
 (Alphabet.map { |c| a + c + b[1, n] } unless b.empty?),
 Alphabet.map { |c| a + c + b } ]
 end.flatten.uniq
end

def known_edits2(word)
 edits1(word).map { |e1| edits1(e1).select { |e2| NWORDS.has_key?(e2) } if e1 }.flatten
end

def known(words)
 list = words.select { |w| NWORDS.has_key?(w) }
 list.size > 0 ? list : false
end

And yes, it does come up with the right correction for the misspelling in the title.


* Not Python’s fault – the project where I was using it ceased to be, and most of my work for the last year or so has been web-based, with a choice of Java- or Ruby/Rails frameworks to work with. I grasped the latter like a British MP grasps an expense claim.

** One of Uncle Bob’s favourite kata is prime factoring. There’s a Ruby distillation of such an algorithm at Ben Rady’s blog. I don’t know when I’ve been more tickled by an algorithm. I’d love to see the refactoring steps that got to it.

harrylillis.com would probably have been cheaper

Putting 2 and 2 together, Jeff Atwood appears to have paid* a fairly large (to me) sum to acquire the superuser.com domain. I wonder how much Microsoft paid for bing.com?

I switched my default search engine to Microsoft’s new beta search engine yesterday. Today I switched back to Google. Not that bing was all that bad – to be honest I couldn’t see much difference between what it gave me and what I see from Google. The background picture, which I guessed was of some Greek island yesterday (it’s somewhere different, but similarly attractive today) was certainly pleasant.

The killer was that after Firefox (3.0.10) reported that the page load was “Done” (and the results certainly seemed to be present) there was a delay – during which time FF froze completely – of about 12 seconds, after which my browser shook itself and woke up.

I don't need to search to find Microsoft being annoying...

I don't need to search to find Microsoft being annoying...

Bing could be the best search engine in the world ever and I’d still not use it if that delay were present. I can’t believe it occurs for all users for all browsers but I’m only me and I prefer FireFox. It could be some interaction with one or more of my – fairly standard – plugins. Maybe I’ll try it again when the beta is done.

Your mileage, of course, may vary.

* Unless of course he was tweeting about another one, which is entirely possible.

[Wi|Ga]dgets – what’s the use?

(stackoverflow rep: 2539, Project Euler 47/227 complete)

Gadgets? We don’t need no stinking gadgets!

Microsoft, Yahoo! and Google, to name but three proponents, each have a desktop widget (or gadget, or whatever) model, either providing stuff spread out all over your Windows (are OSX and other *nix environments similarly afflicted?) desktop or tucked away in a “Sidebar”. These handy-dandy little applets provide information on such vitally otherwise hard-to-get functions such as:

  • The weather (for those not in sight of a window?);
  • RSS feeds (for those who want to read their feeds in a 2″x1″ window);
  • An e-mail notifier (because you still don’t understand the meaning of ‘asynchronous’);
  • The time (for those who can’t read the little digital clock on their taskbar and who don’t own a watch);
  • A calculator (because the world needs another computer desktop calculator);
  • Resource usage monitors (so you can tell your computer isn’t running slow);
  • A mediocre controller for your media player (because media players don’t have minimised controls … oh, yes they do);
  • Teeny-tiny picture viewers (for when flickr is just too much detail)
  • Out-of-date stock tickers (so you can see in delayed real-time how much poorer you are today)

… you get the idea.

Even better, all these are neatly tucked away on a “sidebar” (a sort of non-window window) or even better, spread out all over the shop, which is exactly what you want on a two- or three-screen setup. And they’re all handily concealed beneath the applications you’re running at any time. You know, those stupid time-wasters like Outlook, Excel, Firefox, Word, Visual Studio, a couple of Explorers and a database utility or two.

Marvellous.

What we seem to have here are a collection of (sometimes) graphically pleasing little applications that deliver functionality available elsewhere, each having one or more of the following drawbacks:

  • Always-present, seldom needed;
  • Inadequate functionality;
  • Duplicate of something that’s already fit-for-purpose;
  • Pointless eye-candy;
  • Invisible.

Unloved

Looking at Yahoo!’s programming category, I find that the most popular has been downloaded 80,000 times. It’s a widget that performs geolocation for a given IP address. With a flag. I’m trying to imagine a situation where I (or anyone) would need that often enough to abandon a browser-based function, opting for a desktop-resident applet against a “proper” application because I don’t need it that much. I don’t exactly see the “programming” connection either, come to think.

Yahoo programming widget downloads as at 12-Jan-2009

Data as at 12-Jan-2009

I’m probably not being fair – I thought the most useful stuff would be written for programmers. What does it look like overall? I can’t tell much from Google’s list because they don’t give download stats, although sorting by popularity shows the expected four C’s (clock, climate, calculator, calendar). So back to Yahoo! where the current Number One, with a snappy 4.5 million downloads, is Yahoo! Weather. In fact, as I write this, a whopping six widgets have passed the million mark. Only two are clocks.

Oh look - Microsoft Sidebar does weather too

Oh look - Microsoft Sidebar does weather too

The highlights of a quick-and-dirty breakdown of the top 100 are 19 fun-and-games, 15 system monitor thingies, 14 clocks, 9 calendars, 8 media players, 7 weather reports, 6 post-it notes, 5 gold rings. Very similar to the Google list. I was too depressed to look at Microsoft’s in any detail, but it’s the same ol’ same ol’, although their weather widget claims over 22 million downloads. I’m guessing it’s downloaded automatically when a Vista PC connects to the Internet…

Number one Google gadget - because you can never have too many clocks

Number one Google gadget - because you can never have too many clocks

I’m not getting it. Looks like many others aren’t either. The numbers of people who come back to provide a rating are miniscule: about 7,000 for the weather app. About 0.16%

Apart from the shocking paucity of imagination in the applets themselves, what’s wrong with the whole idea? (IMHO, of course, YMMV).

Real estate is Precious

There are people out there who have enough monitors to be able to allocate space for widgets. Two 1280×1024 screens isn’t enough for me though. Utility drops to almost zero if the things aren’t always available. Google make things worse by allowing applets to live anywhere on the desktop. Stuff needs to go in a sidebar that manages window maximisation to keep itself visible. So it needs to be on the right- or left-most monitor, unless you’re prepared to give up on dual-screen workbooks. For the average user, the bar needs to be a lot narrower than at present to be tolerable. Somewhere between, say, 25 and 50 pixels? I could live with that.

Useful vs Pretty

Useful doesn’t always win. Useful-but-ugly often doesn’t get past “Go”, whereas Pretty at least gets a chance. Long-term, it’s got to have both: too much of what’s on offer seems to be limited to pretty useless.

To offer a compelling argument against rapid deletion, applets have to either provide something in a better way than is currently available or provide something that isn’t available at all elsewhere . Example: there is a Ruby script that allows, from the command line, simple copying of files to an Amazon S3 bucket. Useful. Maybe we could have a widget (maybe it already exists, but I couldn’t find one) that allows upload via drag-and-drop from Explorer. That would be better than what’s already available. Something I can’t do at all except via cut and paste is store stuff in Google Notebook. A drag-and-drop gadget to simplify that would be providing something I can’t do at all.

Conclusion

I don’t know that I have one. There are several similar implementations of a desktop XML/Javascript applet technology that has a lot of money invested in it. Well, I don’t know how much exactly, but I bet I’d be a happy old programmer if you’d given it all to me instead. And you might as well have done exactly that, for all the benefit mankind appears to be accruing. It ought to be good for something, oughtn’t it?

Mmmmm, Shiny!

Browsers++, eh? Google have launched their browser, in beta form at least. Of course, “beta” for Google doesn’t always mean what it means for others – is gmail still in beta, by the way?

Anyway, ever ready to while away half an hour of work time looking at something new, off I went to the download page. A smallish (475KB) bootstrapper pulled down the actual installer, managing to find the necessary information about our somewhat complex Monte-Carlo proxy-server load-balancing  script without grief (presumably by digging into the IE or Firefox connection settings) and ran. Pretty pain-free, apart from this:

Getting warm, getting warmer, oops!

Getting warm, getting warmer, oops!

Ah well, it is a beta, after all. And it appears that the crash may have occurred at the run-after-install bit, since by the time it happened I had a desktop icon that seems – touch wood – to work.

A little detail that I really appreciated was that the install option page included a setting to make Chrome my default browser but it was unchecked by default. Nice one.

And then it just mostly worked. Some minor issues with font sizes, which seemed to randomly apply changes across tabs when I zoomed in or out using Control +?- or Control-mousewheel, but otherwise my regular stuff all seemed to render pretty well, internal or external.

It appears that the Chrome rendering engine shares the same standards book as Firefox’s – both render our IE-specific corporate intranet home page with the same set of “errors”. I tried looking to see what in the CSS was causing the problem but my limited skills weren’t up to the task. But while I was searching the source, which on a right-click/”View Source” request opens in a new browser window, which is nice, I discovered something nice. Nothing earth-shattering, but nice. I hit Control-F, which did what I expected, typed a few characters and the page was scrolled to the first found instance. As expected. Then I noticed something.

Score points for attention to detail

Score points for attention to detail

See what they did? No? Look at the vertical scroll bar. That’s a really nice touch. I like the way the “what to find” box organically grows from the surround too, and the animation is smooth, too. I suppose they could have made it slightly bouncy, in the way that Flash apps seem to like to work these days, although that can make one a little nauseous when over-done.

Oh, and another little plus on the view-source-in-the-browser thing is that links to, for example, stylesheets, are navigable. That removes a tiny piece of Firefox excise that I didn’t previously even know existed.

I’m sure there are all sorts of other little things. The address-bar within each tab may prove to be a boon, and the process-per-tab thing could be useful, although I can’t say crashing ranks very highly on my list of browser annoyances. We’ll have to keep sucking it to see.

I won’t be deleting Chrome. Neither will it be elevated to the status of default browser in the short term – I’m far too fond of my little set of FF add-ins. When Chrome has features that give me the capability provided by, at least, AdBlock, Firebug and Greasemonkey then we may be in business. But I think it’s going to be something of an uphill struggle until something really compelling and unique is offered. The thing is, Windows users who cared have already switched from IE to (mostly) Firefox and I don’t see a reason, other than possibly the bleeding-else coolness, to change again.

At least, I don’t see the compelling reason to switch yet.