Grumpy Old Programmer

He’s old, he’s a programmer and he’s grumpy

Posts Tagged ‘Ruby

Round-tripper

without comments

(stackoverflow rep: 3856, Project Euler 63/235 complete)

Good grief. I wrote the first draft of this about a month ago, planning on completing and posting it when the code was done. I expected that to take a few more days. A little more work required on the estimating front, then.

I’m starting to go off Oracle.

Let me put that into context a little. I first encountered Oracle some time in 1998, when version 5 was all the rage. I’d actually taught data analysis, third and fifth normal form, stuff like that for a few years previously but actual hands-on table creation had to wait. Strange but true. Anyway, over the next two or three years, some of which I spent as “Technical Architect” for the investment bank where I worked, I got to be something of a whiz with both version 5 and the swanky new version 6. Heck, I know the query optimiser’s rules off by heart. I’m not just blowing my own trumpet, mind: when I was untimately (fortuitous typo retained) laid off, I was offered a job by Oracle, which I rejected because I didn’t want to take a pay cut.

I spent five more years in Oracle-land with another bank before drifting into the realms of Sybase in its early MS SQL Server guise, and then Sybase itself across three jobs and four years (it seems like longer). Now, fourteen years after we parted company, Oracle and I are back together.

But we’ve both changed. I no longer code in COBOL and have acquired a pathological dislike of business logic in the database. Oracle has a cost-based optimiser, loves to grab all your business rules (more processors = more revenue) and has become a fat bloated porcine creation. Even the free “personal” 10g Express Edition for Windows is a 165MB download.  (OK, SQL Server Express 2008 is even larger, I checked). When running, the thing takes out a 642MB virtual machine. OK, it’s almost entirely swapped out, but still.

How we did parallel processing in the old days

How we did parallel processing in the old days

But Oracle is still a helluva fast platform. Unoptimised I was seeing about 8K inserts a minute on my development PC, three times that on a real server. Unfortunately our db server currently lives abroad for tax reasons (or something) and the network latency is fierce. About 900 inserts a minute fierce. So I needed to batch up my inserts or enter the living hell that is SQL Loader.

In order to get multiple insert processes working within my Ruby On Rails-based code, I split each file into several fragments, then run a separate process on each fragment. This takes a bit of doing, generating lots of CMD files that run ruby scripts with “START [/WAIT] CMD /C whatever_I_want_goes_here“.

My file-splitting code, I thought, was rather spiffy – it needs to put the headings from the original to each fragment (because they’re used to figure out what’s in the file) then it starts dealing out the records:

def create_fragment_files(paths)
  File.open(file_path, 'r') do |fin|
    hdgs = fin.readline.chomp
      files = paths.map { |path| File.open(path, 'w+') }
      files.each { |fout| fout.puts hdgs }
      fin.each_line do |line|
        files.first.puts line
        files.push files.shift # the first shall be last...
      end
    files.each { |fout| fout.close }
  end
end

There are faster ways, I’m sure – I could calculate the “ideal” file size and dump records into a file until it’s reached, but this is fast enough (well under a minute for an 85MB file) and it pleases me.

There’s a handy little library, ar-extensions, that makes batching of inserts possible within ActiveRecord (which is the default data mapping library within Rails). It works nicely with MySQL, but turned out to have the Oracle code stubbed and invalid. It only took me a day or two to find a solution to that problem, although I still haven’t figured out how to push an update through a proxy server to github. Finally a chance to do something open sourceful, and I’m thwarted at every turn.

So all in all, it’s taken a month. OK, a month in which a lot of other stuff got done, but still.On the plus side, I just fired it up and I’m watching about 36,000 inserts a minute go through. It’ll be faster when the lookup tables are fully populated. (Another day on, and I’m looking at it: 46,000 – and I still have a few tricks up my sleeve)

While the nearly-two years’ of data is backfilling I now get to rewrite the front end.

And the point of this post? In no small part, to remind me of what I actually spent the lion’s share of the last month doing. Also, to record my first-ever open-source contribution, even if I still haven’t worked out how to get my source out into the open.

If you have been, thanks for your forebearance.

Written by mikewoodhouse

12 March 2009 at 13:14

Route 55 (and Route 19)

with one comment

(Stackoverflow reputation down to 2232 after they cleaned up some over-voted stuff from the early days, sniff)

While I prevaricate over all kinds of things, including a redesign of the xlUnit interface, I have been enjoying Michael’s series of articles on Project Euler solutions in VBA, posted at Daily Dose of Excel. There are some ingenious solutions to long-standing VB/VBA deficiencies, not least the absence of built-in facilities for handling arbitrarily large numbers beyond double precision variables.

I have an abiding fondness for the “classic” VB family – not least because it was the skill that fed and housed me and my family for a good 15 years or more. But boy, it can look a bit tired these days.

As it happens, I’d been taking a few shots at the Euler problems myself, but in Ruby, since that’s my language of choice these days (not least because much of my working day is currently spent working on intranet applications using Rails). So it was interesting to compare the two.

Let’s take problem 55. I took Michael’s code (with the neat little large number AddAsStrings routine) into Excel on my whizz-bang dual-dual-Xeon machine and it solved the problem in 0.554 seconds, which, considering the amount of string-based arithmetic that’s going on, is a testament to the speed of modern PCs.

Below is my Ruby version, which takes a rather different approach. Firstly, in Ruby we have support for arbitrarily large numbers, via the built-in Bignum class, so the string adding business is taken care of. Secondly, classes in Ruby, even compiled standard ones, are open to modification, via a technique colloquially known as monkey-patching. So I could patch in a method directly to the Integer class, which seems appropriate, since we’re looking for a property of the number.

Here’s the code:

class Integer
  def lychrel?(max = 50)
    temp = self
    max.times do
      temp = temp + temp.to_s.reverse.to_i
      return false if temp.to_s == temp.to_s.reverse
    end
    true
  end
end
puts (1..9999).inject(0) { |t, i| t + (i.lychrel? ? 1 : 0) }

That took 0.410 seconds on the same machine. I can see at least one inefficiency: calling to_s twice on the same number, which is expensive.

On the other hand, VBA has the edge on problem 19. I spotted a little optimisation in Michael’s code, which gave me this in VBA, which is about 20 times faster at 0.0019 seconds than the original:

Dim Start   As Date
Dim Answer  As Long
Start = DateSerial(1901, 1, 1)
Do While Start < DateSerial(2001, 1, 1)
   If Weekday(Start) = vbSunday Then
      Answer = Answer + 1
   End If
   Start = DateSerial(Year(Start), Month(Start) + 1, 1)
Loop
Debug.Print Answer

Ruby’s Date class doesn’t do anything clever with months outside the 1 to 12 range, so I had to inject a little logic, but otherwise we’re pretty much in synch, algorithmically:

d = Date.new(1901,1,1)
end_date = Date.new(2000,12,31)
until d > end_date do
  res += 1 if d.cwday == 7
  y, m = d.year, d.month + 1
  y, m = y + 1, 1 if m > 12
  d = Date.new(y, m, 1)
end
puts res

Not much in it, lines-of-code wise as you’d probably expect, but the Ruby code takes about 0.36 seconds, which is a hell of a difference.

Call it one-all for now.

Written by mikewoodhouse

2 January 2009 at 20:01

Posted in Excel, Ruby

Tagged with , , ,