Posts Tagged ‘Ruby’
Round-tripper
(stackoverflow rep: 3856, Project Euler 63/235 complete)
Good grief. I wrote the first draft of this about a month ago, planning on completing and posting it when the code was done. I expected that to take a few more days. A little more work required on the estimating front, then.
I’m starting to go off Oracle.
Let me put that into context a little. I first encountered Oracle some time in 1998, when version 5 was all the rage. I’d actually taught data analysis, third and fifth normal form, stuff like that for a few years previously but actual hands-on table creation had to wait. Strange but true. Anyway, over the next two or three years, some of which I spent as “Technical Architect” for the investment bank where I worked, I got to be something of a whiz with both version 5 and the swanky new version 6. Heck, I know the query optimiser’s rules off by heart. I’m not just blowing my own trumpet, mind: when I was untimately (fortuitous typo retained) laid off, I was offered a job by Oracle, which I rejected because I didn’t want to take a pay cut.
I spent five more years in Oracle-land with another bank before drifting into the realms of Sybase in its early MS SQL Server guise, and then Sybase itself across three jobs and four years (it seems like longer). Now, fourteen years after we parted company, Oracle and I are back together.
But we’ve both changed. I no longer code in COBOL and have acquired a pathological dislike of business logic in the database. Oracle has a cost-based optimiser, loves to grab all your business rules (more processors = more revenue) and has become a fat bloated porcine creation. Even the free “personal” 10g Express Edition for Windows is a 165MB download. (OK, SQL Server Express 2008 is even larger, I checked). When running, the thing takes out a 642MB virtual machine. OK, it’s almost entirely swapped out, but still.

How we did parallel processing in the old days
But Oracle is still a helluva fast platform. Unoptimised I was seeing about 8K inserts a minute on my development PC, three times that on a real server. Unfortunately our db server currently lives abroad for tax reasons (or something) and the network latency is fierce. About 900 inserts a minute fierce. So I needed to batch up my inserts or enter the living hell that is SQL Loader.
In order to get multiple insert processes working within my Ruby On Rails-based code, I split each file into several fragments, then run a separate process on each fragment. This takes a bit of doing, generating lots of CMD files that run ruby scripts with “START [/WAIT] CMD /C whatever_I_want_goes_here“.
My file-splitting code, I thought, was rather spiffy – it needs to put the headings from the original to each fragment (because they’re used to figure out what’s in the file) then it starts dealing out the records:
def create_fragment_files(paths)
File.open(file_path, 'r') do |fin|
hdgs = fin.readline.chomp
files = paths.map { |path| File.open(path, 'w+') }
files.each { |fout| fout.puts hdgs }
fin.each_line do |line|
files.first.puts line
files.push files.shift # the first shall be last...
end
files.each { |fout| fout.close }
end
end
There are faster ways, I’m sure – I could calculate the “ideal” file size and dump records into a file until it’s reached, but this is fast enough (well under a minute for an 85MB file) and it pleases me.
There’s a handy little library, ar-extensions, that makes batching of inserts possible within ActiveRecord (which is the default data mapping library within Rails). It works nicely with MySQL, but turned out to have the Oracle code stubbed and invalid. It only took me a day or two to find a solution to that problem, although I still haven’t figured out how to push an update through a proxy server to github. Finally a chance to do something open sourceful, and I’m thwarted at every turn.
So all in all, it’s taken a month. OK, a month in which a lot of other stuff got done, but still.On the plus side, I just fired it up and I’m watching about 36,000 inserts a minute go through. It’ll be faster when the lookup tables are fully populated. (Another day on, and I’m looking at it: 46,000 – and I still have a few tricks up my sleeve)
While the nearly-two years’ of data is backfilling I now get to rewrite the front end.
And the point of this post? In no small part, to remind me of what I actually spent the lion’s share of the last month doing. Also, to record my first-ever open-source contribution, even if I still haven’t worked out how to get my source out into the open.
If you have been, thanks for your forebearance.
Route 55 (and Route 19)
(Stackoverflow reputation down to 2232 after they cleaned up some over-voted stuff from the early days, sniff)
While I prevaricate over all kinds of things, including a redesign of the xlUnit interface, I have been enjoying Michael’s series of articles on Project Euler solutions in VBA, posted at Daily Dose of Excel. There are some ingenious solutions to long-standing VB/VBA deficiencies, not least the absence of built-in facilities for handling arbitrarily large numbers beyond double precision variables.
I have an abiding fondness for the “classic” VB family – not least because it was the skill that fed and housed me and my family for a good 15 years or more. But boy, it can look a bit tired these days.
As it happens, I’d been taking a few shots at the Euler problems myself, but in Ruby, since that’s my language of choice these days (not least because much of my working day is currently spent working on intranet applications using Rails). So it was interesting to compare the two.
Let’s take problem 55. I took Michael’s code (with the neat little large number AddAsStrings routine) into Excel on my whizz-bang dual-dual-Xeon machine and it solved the problem in 0.554 seconds, which, considering the amount of string-based arithmetic that’s going on, is a testament to the speed of modern PCs.
Below is my Ruby version, which takes a rather different approach. Firstly, in Ruby we have support for arbitrarily large numbers, via the built-in Bignum class, so the string adding business is taken care of. Secondly, classes in Ruby, even compiled standard ones, are open to modification, via a technique colloquially known as monkey-patching. So I could patch in a method directly to the Integer class, which seems appropriate, since we’re looking for a property of the number.
Here’s the code:
class Integer
def lychrel?(max = 50)
temp = self
max.times do
temp = temp + temp.to_s.reverse.to_i
return false if temp.to_s == temp.to_s.reverse
end
true
end
end
puts (1..9999).inject(0) { |t, i| t + (i.lychrel? ? 1 : 0) }
That took 0.410 seconds on the same machine. I can see at least one inefficiency: calling to_s twice on the same number, which is expensive.
On the other hand, VBA has the edge on problem 19. I spotted a little optimisation in Michael’s code, which gave me this in VBA, which is about 20 times faster at 0.0019 seconds than the original:
Dim Start As Date
Dim Answer As Long
Start = DateSerial(1901, 1, 1)
Do While Start < DateSerial(2001, 1, 1)
If Weekday(Start) = vbSunday Then
Answer = Answer + 1
End If
Start = DateSerial(Year(Start), Month(Start) + 1, 1)
Loop
Debug.Print Answer
Ruby’s Date class doesn’t do anything clever with months outside the 1 to 12 range, so I had to inject a little logic, but otherwise we’re pretty much in synch, algorithmically:
d = Date.new(1901,1,1)
end_date = Date.new(2000,12,31)
until d > end_date do
res += 1 if d.cwday == 7
y, m = d.year, d.month + 1
y, m = y + 1, 1 if m > 12
d = Date.new(y, m, 1)
end
puts res
Not much in it, lines-of-code wise as you’d probably expect, but the Ruby code takes about 0.36 seconds, which is a hell of a difference.
Call it one-all for now.
Whither (Wither?) VBA?
Considering the origins of the various Office components (and a light bulb may be flickering dimly in the minds of those who can remember back that far) we’ve come a long way in the application automation stakes. Can you remember Access Basic ? WordBasic? Excel macros pre-VBA? Did earlly versions of Outlook have macros?
Office application macro capabilities have come a long way but they’ve been pretty much stuck at the last version of the VB6 runtime. That’s about a decade with no significant change. In that time Microsoft have hammered their way up to version 3.5 of .NET, but with only half-arsed (my opinion) gestures made towards improving/extending/renewing the internal automation aspects of Office.
I say “half-arsed”, which, I dunno, might be a little harsh, but the whole VSTO thing just seems like a thin wrapper on COM Interop, which is itself a wrapper to permit communication between shiny new .NET code and skanky old legacy stuff. Why do I need to use VSTO at all? If I need complex, compiled high-performance extensions then I’m probably better off getting one of my C++ literate colleagues to write a “proper” XLL add-in that won’t have to deal with COM at all. If I don’t need high-performance then any scripting language that can talk to COM will do the job. Heck, I can use Ruby (and do) – David Mullett has a whole blog on the topic of Windows automation with Ruby.
Microsoft want to get away from VBA, I think that’s clear. They’re never, never, never going to get the current host of non-technical VBA users to switch to VSTO. Forget it, it’s not going to happen. Hell, I don’t want to have to use VSTO and I’m one who should benefit from raising the cost of entry to macro programming. Do MS want to get away from COM? Maybe. They wanted to get away from DDE too, but it’s still lurking somewhere not too deep inside Windows.
But here we have the Dynamic Language Runtime, which sits on top of the .NET CLR and allows fun things such as IronPython, IronRuby and others. Snappy performance, ability to use .NET libraries, interoperability between languages, sounds like fun. According to Wikipedia, the plan is to move VB to run on it. Now there’s a thought: why shouldn’t Excel be rebuilt in sexy modern managed code, with VBA ported to the DLR and the old COM interfaces reduced to a shim to keep backwards compatibility? Then we’d have macro programming where it should be, in the application, with the billions of lines of legacy code still runnable, and I’d be able to hit Alt-F11 and work in Ruby.
Seems like a win-win scenario to me.
