Mongoid: Make sure all changed objects are saved

With Rails’s ActiveRecord and presumably other object-relational mappers, it is easy to use persistent, database-backed objects as if they were just there. Create new objects, navigate along associations, just like that. This picture breaks down however, when it comes time to ensure those in-memory changes are saved back to the database.

In general, in a single unit of work, we only want to read an object once from the database and write changes to that object, if any, back to the database at most once.

The “reading once” part is mostly handled by an identity map. Such a map does not prevent multiple reads of the same object through conditional queries, but it does ensure that each persistent object has a single in-memory representation and it does avoid reading the same object twice when the object is accessed by its id (primary key).

The writing part is harder, because a changed or new object might not be valid by the constraints imposed by the application or the database. In effect, writing/saving must be done explicitly in order to react to the possible error conditions. In a lot of common cases this fits in nicely with the business logic.

Say your application provides the functionality to send a message. It is part of your business logic to check that the user has indeed entered a valid message, one having some text for the body. So when you write code like this

class MessagesController < ApplicationController
  def new
    @message = Message.create(message_params)
    if @message.errors.empty?
      ...
    else
      ...
    end
  end
end

you are really performing a validity check on the message with the side-effect that a valid message is written to the database.

It is a somewhat different case when objects are changed not directly by the user, but as a consequence of another user action. Say we want to mark a message as read when a user has seen it:

class MessagesController < ApplicationController
  def show
    @message = Message.find(params[:id])
    @message.mark_as_read
  end
end

Now we are coming to the heart of the matter: How do we ensure that the state change (unread -> read) of our message is saved? Maybe like this:

class Message
  def mark_as_read
    self.state = 'read'
    save!
  end
end

That looks fine by itself, but what if we want to make another change to the same object, like this

class MessagesController < ApplicationController
  def show
    @message = Message.find(params[:id])
    @message.mark_as_read
    @message.register_last_reader(current_user)
  end
end

and

class Message
  def register_last_reader(user)
    self.last_reader = user
    save!
  end
end

Note that each method by itself ensures that changes are written to the database. As a result, we can compose these methods easily without any further concern that the changes they effect have to be written explicitly to the database.

Unfortunately, there is a drawback. The price we pay is that the same object is written to the database twice because each change is saved individually.

How about not saving the message in mark_as_read and register_last_reader and instead doing it in the controller action?

class MessagesController < ApplicationController
  def show
    @message = Message.find(params[:id])
    @message.mark_as_read
    @message.register_last_reader(current_user)
    @message.save!
  end
end

This works, needs only a single database write -- and results in less compositional, more risky code because now mark_as_read and register_last_reader implicitly shift responsibility to their calling context.

Now to the denouement. I don't have a handy solution that satisfies all constraints, but I do have a code snippet that helps make the second approach less risky.

This version is for Mongoid, but the idea is applicable to other ORMs that use an identity map. And importantly, the identity map must be enabled for this to work.

What It Does and Why

The after_filter logs and raises an exception if the identity map contains any objects that should have been saved. In particular, these are the objects that have changes to their attributes, excluding non-essential technical attributes, and where no failed attempt has been made to save them explicitly. We don't care for objects with errors, because we assume that such an object is explicitly handled. We do care about the forgotten objects, changed but never saved.

Getting a view at views

The structure of Rails views can become rather complicated with templates, layouts and partials inclusions multiple levels deep. Here are two tools for disentangling views.

The first one adds HTML comments to generated markup. These comments identify the file where the markup originated. Place it in config/initializers/show_view_structure.rb

The second tool works offline. It prints the hierarchical structure of template inclusions

There’s a limitation in that only explicit partial renderings are recognized. Layouts are not taken into account and partials rendered from helpers are not recognized either.

No database to rule them all. Branch databases for Rails and git

You’re using git to manage your Ruby on Rails projects? Then you’ve probably come to appreciate topic branches as a kind of transaction facility for programmers. If you need to implement a non-trivial or just try something, the way to go is onto a new “topic” branch. There you can mangle the code all you like, without fearing to cause any lasting harm. In case everything works out fine, you merge the branch into the master branch and discard it. If you come to a dead end — well, you just discard the misbegotten branch.

But what about the database? If moving along your branch involves changes to the database, structural changes in particular, you can’t easily switch to another branch without these changes. In other words, your development database is really only suitable for a single branch.

Up to now, that is. Install the branch_db gem and what you get is a private database for your branch that you can mutilate without interfering with work on the other branches.

For details and installation instructions see the README. Here’s just an appetizer. Say you’re on branch “feature” and want a private copy of the “master” database. Here’s how you do it:

$ rake db:branches:copy ORIG_BRANCH=master

Getting it

  • github
  • $ sudo gem install mschuerig-branch_db

Let your Rails app know about it

In the appropriate place in config/environment.rb add

config.gem "mschuerig-branch_db", :lib => 'branch_db'

Ruby as a language for Rails views

Didn’t you always want to write your Rails views as plain Ruby objects? — “What?”, I hear you say. No, I haven’t lost my mind and the idea is quite sensible (or so I hope), once you add the restriction that it is JSON-formatted data that you want to return.

Say you need to set up some hashes or arrays for rendering to JSON. This is best done in Ruby and it is clearly a view concern. So let’s do it in the views. Like this:

  # app/controllers/movies_controller.rb
  def index
    respond_to do |format|
      format.json do
        @movies = Movie.all
        @count = Movie.count
        render :template => 'movies/index.json.rb'
      end
    end
  end

  # app/views/movies/index.json.rb
  {
    :identifier => Movie.primary_key,
    :totalCount => @count,
    # render @movies does not work as it insists on returning a string
    :items => @movies.map { |m| render(m) }
  }

  # app/views/movies/_movie.json.rb
  {
    :id => movie.to_param,
    :title => movie.title,
    :releaseDate => movie.release_date
  }

Getting it

  • github
  • $ sudo gem install mschuerig-ruby_template_handler

Let your Rails app know about it

In the appropriate place in config/environment.rb add

config.gem "mschuerig-ruby_template_handler", :lib => 'ruby_template_handler'

Simplistic Enumerations for ActiveRecord

Invariably, in almost every application there happen to be lists of data items that are immutable, just for reference. It could be colors, the four seasons, continents, the states of your country, kinds or types of this and that. These items are almost like constants. As accessible as they are through the ordinary ActiveRecord API, it seems an utter waste to hit the database again and again, for data that won’t change however often you request it.

Of course, the Rails community recognized early on that some data are more constant than others and over the years several plugins have been published that add cached and easily accessible enumerations to ActiveRecord. Some of these additions are quite complicated or befuddle ActiveRecord by not backing the enumeration values with real database objects. My experience has been that these attempted optimizations result in bizarre behavior when I did interesting things with ActiveRecord such as multiply nested named scopes plus custom SQL.

So, I thought a basic, no, simplistic, version of enumerations is called for. Here’s how it looks:

class Color < ActiveRecord::Base
  enumerates do |e|
    e.value :name => 'red'
    e.value :name => 'green'
    e.value :name => 'blue'
  end
end

Color[:green]
Color.find_by_name('red')
Color.find_by_name!(:red)
Color.all
Color.count
Color.reload

e.value :name => 'red'

ensures that a Color object with name ‘red’ exists, if it does not, one is created.

Caveats

Although there is a #reload method defined on enumeration models, i.e. Color.reload, it is very unwise to use it. The point is that this method only affects a single server process and you most likely have many of them.

So, if you need to change enumeration values, the only way to do it is to treat it like an update to your application code.

Getting it

  • github
  • $ sudo gem install mschuerig-easy_enums

Let your Rails app know about it

In the appropriate place in config/environment.rb add

config.gem "mschuerig-easy_enums", :lib => 'easy_enums'

Lifting indexes with ActiveRecord

On a Rails project I’m currently working on I need to fill the database with test data to have something to play with. Apart from large imports, that’s the time when indexes may slow down operation severely instead of speeding things up. Consider: The indexes are not used, but have to be updated again and again for each new record that is inserted into the database. It is much cheaper, to lift — well, really drop — the indexes during mass operation and recreate them afterwards.

Here’s an example:

namespace :db do
  desc "Populate the database with sample data"
  task :populate => :environment do

    retained_indexes = [
      'index_people_on_lastname_and_firstname',
      { :table => :movies, :columns => :title }
      { :table => 'people', :columns => ['lastname', :firstname] }
    ]

    ActiveRecord::Base.transaction do
      IndexLifter.without_indexes(
        # Only consider indexes on these tables;
        # all tables by default.
        :movies,
        :people,
        # Don't lift these indexes
        :except => retained_indexes,
        # Don't lift unique indexes; default: false.
        :except_unique => true
      ) do
        ActiveRecord::Base.silence do

          # import or generate large amounts of data here

        end
      end
    end
  end
end

Please bear in mind that dropping and creating of indexes is a rather intrusive operation on the structure of your database. You should only perform it while no other users (or processes) are accessing it.

Also, consider that some indexes may be important for the proper function of your database. If you have unique indexes, i.e. indexes that enforce that particular columns or combinations of columns are unique, and if you are handling violations of this constraint in your application code, then you might need to retain these indexes even during data generation.

Getting it

  • github
  • $ sudo gem install mschuerig-index_lifter

Let your Rails app know about it

In the appropriate place in config/environment.rb add

config.gem "mschuerig-index_lifter", :lib => 'index_lifter'

Bash completion for script/generate

The Rails generator script/generate knows pretty well what things it can generate. In fact, it knows much better than I do. So, I think it could really give me some help when I’m typing along on the command line.

If you save the snippet below as /etc/bash_completion.d/generate you can enjoy this help, too.

_generate()
{
  local cur

  COMPREPLY=()
  cur=${COMP_WORDS[COMP_CWORD]}

  if [ ! -d "$PWD/script" ]; then
    return 0
  fi

  if [ $COMP_CWORD == 1 ] && [[ "$cur" == -* ]]; then
    COMPREPLY=( $( compgen -W '-h -v\\
      --help --version'\\
      -- $cur ))
    return 0
  fi

  if [ $COMP_CWORD == 2 ] && [[ "$cur" == -* ]]; then
    COMPREPLY=( $( compgen -W '-p -f -s -q -t -c\\
      --pretend --force --skip --quiet --backtrace --svn'\\
      -- $cur ))
    return 0
  fi

  COMPREPLY=( $(script/generate --help | \\
    awk -F ': ' '/^  (Plugins|Rubygems|Builtin|User):/ { gsub(/, */, "\n", $2); print $2 }' | \\
    command grep "^$cur" \\
  ))
}

complete -F _generate $default generate

Testing with foreign key constraints

I’m not yet so enlightened that all of my Rails unit and functional tests run without accessing the database. Indeed, I’m still using YAML fixtures to populate the database for testing.

I also insist on having foreign key constraints in the database, a thing that’s not exactly encouraged by Rails, but which is quite possible nonetheless. The various plugins from RedHill Consulting are a big help.

But then, when you feel all warm and cosy due to the additional safety at the database-level, you’re suddenly trapped by a snag: Sooner or later you find that your fixtures contain dependencies among objects that preclude any attempt at clever ordering by violating one foreign key constraint or another. Fixture files are loaded one after another in their entirety and when an object in an earlier fixture refers to an object in a later fixture, the database aptly notices as an inconsistency.

Well, you may think, it is an inconsistency, but only a temporal one. After all the fixture files are loaded, everything is consistent again. That’s the clue. We need to tell the database that, yes, indeed, things may be inconsistent for a time, but we’ll be cleaning up, promise. The good thing is that there is even an SQL standard-compliant way to express this promise.

  START TRANSACTION
  SET CONSTRAINTS ALL DEFERRED
  COMMIT

If you use transactional fixtures, the transaction bracket is already provided by Rails, but there’s no pretty way to sneak in the "SET CONSTRAINTS ..." line. There are two ways of slightly different brutality. First, you can edit activerecord/lib/fixtures.rb and just insert the required line.

  def self.create_fixtures(fixtures_directory, table_names, class_names = {})
    ...
    connection.transaction(Thread.current['open_transactions'].to_i == 0) do
      # insert the following line
      connection.execute("SET CONSTRAINTS ALL DEFERRED")
      ...
    end
    ...
  end

Alternatively, you can overwrite the entire method in, say, <railsapp>/lib/transactional_fixture_loading_hack.rb like this

require 'active_record/fixtures'

Fixtures.class_eval do
  def self.create_fixtures(fixtures_directory, table_names, class_names = {})
    ...
    connection.transaction(Thread.current['open_transactions'].to_i == 0) do
      # inserted line
      connection.execute("SET CONSTRAINTS ALL DEFERRED")
      ...
    end
    ...
  end

Whatever you do, you’ll have to inspect your code whenever you update your Rails version.

We’re still not done, unfortunately. The database defers only those constraints that are deferrable. Have a look at the Foreign Key Migrations Plugin for how to achieve this.

As a matter of convenience, I suggest that in your test_helper.rb you add a method that loads all your fixtures

class Test::Unit::TestCase
  self.use_transactional_fixtures = true

  def self.load_all_fixtures
    fixtures :users, :thingamajigs, :gadgets, :widgets
  end
end

Then, in a testcase class you can use it like this

require File.dirname(__FILE__) + '/../test_helper'

class ThingamajigTest < Test::Unit::TestCase
  load_all_fixtures

  ...
end

Note that with transactional fixtures this results in each fixture file loaded only once for all the tests.

So, there we are a last. Or those with a reasonable DBMS, I might say. For, of course, this technique is no use, if your database does not support deferrable constraints. PostgreSQL for one does support them.

XML round-trip testing for Rails resources

I’ve recently started to add XML support to a Rails application, meaning that the application provides data in XML format, if the request asks for it, and it understands XML data on create or update.

To keep the application as well as myself sane, I’ve written a test that ensures the round trip of getting XML and updating an object by sending XML works. This is only a very basic test and there surely is more that can and should be tested.

class XmlRoundtripTest < ActionController::IntegrationTest

  RESOURCES = [
    :people, :things
  ]
  fixtures :stuff, *RESOURCES

  def self.assert_roundtrippability_for(*resources)
    resources.each do |resource|
      define_method("test_xml_roundtrip_for_#{resource}") do
        @user = user
        @user.logs_in

        xml = @user.gets_xml(resource)
        @user.changes_object_name!(xml)

        old_version = @user.extracts_lock_version(xml)
        @user.sends_xml(xml, resource)

        # Make sure that the resource was really updated
        new_version = @user.extracts_lock_version(@user.gets_xml(resource))
        assert_equal(old_version + 1, new_version)
      end
    end
  end

  assert_roundtrippability_for *RESOURCES

  def user
    open_session do |user|
      def user.gets_xml(resources, id = 1)
        get "/#{resources}/#{id}.xml"
        assert_response :success
        @response.body
      end

      def user.sends_xml(xml, resources, id = 1)
        put "/#{resources}/#{id}.xml", xml, :content_type => 'application/xml'
        assert_response :success
        @response.body
      end

      def user.changes_object_name!(xml)
        # arbitrarily change an attribute we know is there
        xml.gsub!(%r{<name>(.*?)</name>}, '<name>X\1Y</name>')
      end

      def user.extracts_lock_version(xml)
        xml =~ %r{<lock-version>(\d+)</lock-version>}
        $1.to_i
      end
    end
  end
end