Subscribe to
Posts
Comments

ActiveRecord Enumerable

Ever wanted to iterate over a huge number of objects and found
that find(:all) doesn’t cut it? Then this plugin
might be for you. Install it with

$ script/plugin install svn://rubyforge.org/var/svn/ar-enumerable/ar_enumerable/trunk

First, a note of caution. This plugin has not seen much practical
use yet, in its current state it is only slightly above a
proof-of-concept. There are several tests included that cover most
of the functionality. If you intend to use this plugin, please
make sure to independently check that it does what you want.

Enumerating ActiveRecord objects

Let’s start with a boring example, to get the general idea

Person.where(['lastname = ?', 'Smith']).map(&:firstname)

Slightly more interesting

Person.where(['lastname = ?', 'Smith']).inject(0) do |income, person|
  income += person.income
end

siblings = Person.where(
    ['lastname = ?', 'Smith'],
    :include => :siblings).inject([0, 0]) do |memo, person|
  memo[0] += person.siblings.count
  memo[1] += 1
  memo
end
siblings[0].to_f / siblings[1]

The point here is that your database may contain millions of Smiths,
for what it’s worth, the entire population of the earth could have
turned into Smiths, still, only a limited number of them would be
loaded into memory at the same time.

Internally, enumeration works in two completely different ways.
By default, a chunk of objects is retrieved from the database using
OFFSET and LIMIT clauses. This way is supported by all databases,
however it can be slow.
Therefore, the preferred alternative is to use a standard SQL cursor
for iterating over the result set of a query. It would simply look like
this

Person.where(['lastname = ?', 'Smith'], :use_cursor => true).inject(0) do |income, person|
  income += person.income
end

A further parameter is how many records (or rows, in the case of cursors)
are fetched from the database in one access. The default is 1000. You can
set it explicitly like this

Person.where(['lastname = ?', 'Smith'], :fetch_count => 100).inject(0) do |income, person|
  income += person.income
end

Performance Considerations

Some informal tests suggest that for large result sets without
included associations cursor-based enumeration can be faster than
OFFSET/LIMIT-based enumeration by a factor of 3 to 5.
When associations are included, things become more blurry. In this
case, cursor-based enumeration imposes an ordering on the result
set that can slow down things considerably.
The fetch count has considerable influence on performance, basically,
you can buy speed by spending memory.

8 Responses to “ActiveRecord Enumerable”

  1. on 12 Feb 2007 at 23:18Daniel Schierbeck

    Just a reminder, you don’t need to write the equal sign in the following:

    Person.where(['lastname = ?', 'Smith']).inject(0) do |income, person|
    income += person.income
    end

    Since the *value* of the block is what’s being stored in ‘income’. So the example should look like this:

    Person.where(['lastname = ?', 'Smith']).inject(0) do |income, person|
    income + person.income
    end

  2. on 13 Feb 2007 at 00:37marcello

    thanks michael!

  3. on 13 Feb 2007 at 20:17Drew

    Michael -

    This seems like it would be very helpful for a data validation suite I’m attempting to write using ActiveRecord. A couple of questions for you:

    1. Can I install this plugin outside of the rails framework? I’m currently only using the ActiveRecord gem in my work.

    2. Is there support for manually controlling iteration across the objects? For example, during data validation I’d like to manually compare records from two tables (both of which I’d like to access using your plugin, as they are quite large) and then in some case move to the next object in either 1 or both of the data sets. I need to manually control when the next object is yielded and need to iterate across two tables at the same time. If this is not supported, is there a simple way for me to create some type of where_two method which would give me this functionality?

    Seems like a great start on a problem with ActiveRecord that has bothered me for a while :)

    -Drew

  4. on 13 Feb 2007 at 23:16michael

    Drew: I’ve never used ActiveRecord on its own, but I’m pretty sure you can use the plugin with it, without the rest of the Rails frameworks. In your setting, plugins are not loaded and initialized automatically, therefore you have to make sure yourself, that the plugin’s lib directory is added to the load path ($:.unshift ...) and that init.rb is called.

    Regarding coordinated iteration over two tables, I haven’t thought of this before, and I don’t have a solution handy. Have a look at the standard Ruby Enumerator and Generator libraries, they may help you get what you need.

    Be sure to wrap a transaction block around the combined iteration! ActiveRecord Enumerable internally uses a transaction block, but you have to ensure that all iteration/cursor are in the same single transaction.

    I gather that Enumerator/Generator use some callcc magic to jump out of and back into a scope. I’m not completely sure that this won’t mess up things in some way.

  5. on 07 Mar 2007 at 18:08Drew

    Michael -

    I’ve been using this plugin on a test basis and I like what I see. Another simple question: can we get a gem (for non-rails work) of this app? I think you’d be surprised the amount of people that are looking for something lik this to use with some type of statistical analysis of data outside rails using activerecord.

    Thanks,
    Drew

  6. on 07 Mar 2007 at 19:52michael

    Drew: A gem on its own wouldn’t do much good. The enumeration functionality must be inserted into ActiveRecord and that’s something a gem can’t do by itself.

    If you want to use the AR Enumeration in your app and the Rails plugin loading mechanism is not available, you need to do two things: First, make sure that the directory containing ar_enumerable.rb is in the load path (the Ruby global variable $:). Then execute the code in init.rb.

  7. on 31 May 2007 at 21:16michael

    Arthur Lyman asked:

    I am interested in trying out ar-enumerable in a test Rails app I am working on.

    When I run the command below, nothing happens. Have I done something not quite right?

    RUBY script/plugin install Svn://rubyforge.org/var/svn/ar-enumerable/ar_enumerable/trunk

    Is the plug-in available at this time?

    Yes, the plugin is available, I just checked it. You have to make sure that you use the correct URL, in particular, it has to start with “svn:” not “Svn:”, i.e.

    svn://rubyforge.org/var/svn/ar-enumerable/ar_enumerable/trunk

  8. [...] with the size of the dataset. A (possibly MySQL specific) solution described by both Jamis Buck and Michael Schuerig relies on splitting the dataset into smaller chunks based on the primary key. Since the primary key [...]

Leave a Reply

Fork me on GitHub