Mongoid: Make sure all changed objects are saved

With Rails’s ActiveRecord and presumably other object-relational mappers, it is easy to use persistent, database-backed objects as if they were just there. Create new objects, navigate along associations, just like that. This picture breaks down however, when it comes time to ensure those in-memory changes are saved back to the database.

In general, in a single unit of work, we only want to read an object once from the database and write changes to that object, if any, back to the database at most once.

The “reading once” part is mostly handled by an identity map. Such a map does not prevent multiple reads of the same object through conditional queries, but it does ensure that each persistent object has a single in-memory representation and it does avoid reading the same object twice when the object is accessed by its id (primary key).

The writing part is harder, because a changed or new object might not be valid by the constraints imposed by the application or the database. In effect, writing/saving must be done explicitly in order to react to the possible error conditions. In a lot of common cases this fits in nicely with the business logic.

Say your application provides the functionality to send a message. It is part of your business logic to check that the user has indeed entered a valid message, one having some text for the body. So when you write code like this

class MessagesController < ApplicationController
  def new
    @message = Message.create(message_params)
    if @message.errors.empty?
      ...
    else
      ...
    end
  end
end

you are really performing a validity check on the message with the side-effect that a valid message is written to the database.

It is a somewhat different case when objects are changed not directly by the user, but as a consequence of another user action. Say we want to mark a message as read when a user has seen it:

class MessagesController < ApplicationController
  def show
    @message = Message.find(params[:id])
    @message.mark_as_read
  end
end

Now we are coming to the heart of the matter: How do we ensure that the state change (unread -> read) of our message is saved? Maybe like this:

class Message
  def mark_as_read
    self.state = 'read'
    save!
  end
end

That looks fine by itself, but what if we want to make another change to the same object, like this

class MessagesController < ApplicationController
  def show
    @message = Message.find(params[:id])
    @message.mark_as_read
    @message.register_last_reader(current_user)
  end
end

and

class Message
  def register_last_reader(user)
    self.last_reader = user
    save!
  end
end

Note that each method by itself ensures that changes are written to the database. As a result, we can compose these methods easily without any further concern that the changes they effect have to be written explicitly to the database.

Unfortunately, there is a drawback. The price we pay is that the same object is written to the database twice because each change is saved individually.

How about not saving the message in mark_as_read and register_last_reader and instead doing it in the controller action?

class MessagesController < ApplicationController
  def show
    @message = Message.find(params[:id])
    @message.mark_as_read
    @message.register_last_reader(current_user)
    @message.save!
  end
end

This works, needs only a single database write -- and results in less compositional, more risky code because now mark_as_read and register_last_reader implicitly shift responsibility to their calling context.

Now to the denouement. I don't have a handy solution that satisfies all constraints, but I do have a code snippet that helps make the second approach less risky.

This version is for Mongoid, but the idea is applicable to other ORMs that use an identity map. And importantly, the identity map must be enabled for this to work.

What It Does and Why

The after_filter logs and raises an exception if the identity map contains any objects that should have been saved. In particular, these are the objects that have changes to their attributes, excluding non-essential technical attributes, and where no failed attempt has been made to save them explicitly. We don't care for objects with errors, because we assume that such an object is explicitly handled. We do care about the forgotten objects, changed but never saved.

Copying tags between music files

Over the years, I’ve ripped CDs in various formats. In the beginning, it was MP3, but I soon switched to OGG. A few years ago I, it made the switch to FLAC. Unfortunately, that leaves me with a large “sediment” of files in lossy formats and I’ve started to rip those again — for the last time, I hope.

The ripping is tedious, but redoing the tags would be far worse. So here’s a script to help avoid the drudgery.

This is how it is used:

$ copytags.rb --help
Usage: copytags [options] from_pattern to_pattern
    -n, --dry-run                    Don't run any commands, just print them
    -t, --tag                        copy tags from matching files (default)
    -T, --no-tag                     do not copy tags from matching files
    -r, --rename                     rename from matching files
    -R, --no-rename                  do not rename from matching files (default)
    -q, --quiet                      Display less information
    -v, --verbose                    Display extra information
    -h, --help                       Show this message

Examples
    # Copy tags from 1.Track.ogg to 1.Track.flac
    copytags '%.1d.*.ogg' '%.1d.*.flac'

    # Copy tags from 01.Track.ogg to 01.Track.flac
    copytags '%.2d.*.ogg' '%.2d.*.flac'
                                                                                                                                                    
    # Assuming 1.Song.ogg, rename 1.Track.flac to 1.Song.flac                                                                                       
    copytags -rT '%.1d.*.ogg' '%.1d.*.flac'                                                                                                         

And this is the script:

Scripted Screenshots

I recently needed to make a screenshot of an entire web page, i.e. not just the part the fits on the screen (or in the viewport), but the whole thing from top to bottom.

Somehow I missed the possibility that there might be a zillion browser extensions for just this purpose. From writing features (acceptance tests) with Cucumber I’m familiar with PhantomJS, a “headless”, scriptable browser based on WebKit.

Here’s how it works:

$ webshot --help
Usage: webshot [OPTIONS]... URL IMAGE_FILE

Options:
          -j|--jquery           Inject jQuery into the page.
         +-e|--exec «value»     Execute script in page context.
         +-f|--file «value»     Execute script file in page context.
          -v|--viewport «value» Viewport size WxH; default 1280x800.

          (* mandatory option)
          (+ repeatable option)

Luckily, I didn’t have to write the commandline parsing myself. But it was fun to figure out how Deferreds might be implemented.

Getting a view at views

The structure of Rails views can become rather complicated with templates, layouts and partials inclusions multiple levels deep. Here are two tools for disentangling views.

The first one adds HTML comments to generated markup. These comments identify the file where the markup originated. Place it in config/initializers/show_view_structure.rb

The second tool works offline. It prints the hierarchical structure of template inclusions

There’s a limitation in that only explicit partial renderings are recognized. Layouts are not taken into account and partials rendered from helpers are not recognized either.

Better Rake Bash completion

What’s new?

  1. Rake tasks are cached (thanks to turadb).
  2. If there is no Rakefile in the current directory, ancestor directories are searched in the way Rake itself does.

Installation

Copy the file to /etc/bash_completion.d or possibly /usr/share/bash-completion/completions or to another place where it is loaded when your Bash shell is started.

Humble HATEOAS

HATEOAS is an acronym coined in the context of RESTful web services. It stands for Hypermedia as the engine of application state. Imagine the client, human or machine, in a conversation with the server.

They take it in turns, the client asks (requests) something of the server, who promptly responds. Of course, a polite client doesn’t ask just anything of the humble server. Rather, in keeping with protocol, the client only follows the leads (links) suggested by the server. Thus, the conversation is advanced by following link after link.

That’s as good as it is, no diplomatic incidents caused by a client stepping out of line. But does it make for a good conversation? Hardly, with a client who only knows how to be polite, but has no idea what he is talking about. Talking about something presupposes semantics, in this case, an agreement about the meaning of following a particular lead (link).

Talking about something becomes a lot easier when you have an idea what the topic is supposed to be. Enter Media Types. Not any old text/plain, application/xml — that’s like talking about the weather, only for machines. You want something with substance, something fitting your own conversational niche:

  application/x.mytopic+xml

There, application indicates a “multipurpose” file, not specifically geared towards human consumption. x. indicates that it is an experimental media type, not registered with IANA. If you’ve got more clout, maybe you’ll be able to officially register a vnd. (vendor) or .prs (personal) type. Finally, the +xml suffix says that the media type is based on XML.

Now for the dénouement: That’s it.

No magic RESTful pixie dust. No script for successfully chatting up every server. Nothing. You could use XML Schema or a relative to specify the data contents of the media types you are using. You could even write some nice prose explaining in so many words what lies ahead when someone follows a link with, say, rel="payment". If you fancy arcane arts, you might write pre- and postconditions in OCL.

In summary, HATEOAS doesn’t imbue client software with more intelligence. It inserts a (useful) level of indirection by offering the client a choice of transitions. Thereby achieving two things: The client is ignorant of the concrete URIs associated with transitions; for all it knows they can be different each time. The client doesn’t have to figure out which transitions are applicable, because the server only offers those to begin with.

What it means to effect a particular transition must still be explicitly coded. So, if the client-side workflow stipulates that a payment transaction is to be started at some point, someone, somewhere has to know of the agreement that such a transition is indicated by a link with rel="payment" and write code to look for and follow such a link.

No database to rule them all. Branch databases for Rails and git

You’re using git to manage your Ruby on Rails projects? Then you’ve probably come to appreciate topic branches as a kind of transaction facility for programmers. If you need to implement a non-trivial or just try something, the way to go is onto a new “topic” branch. There you can mangle the code all you like, without fearing to cause any lasting harm. In case everything works out fine, you merge the branch into the master branch and discard it. If you come to a dead end — well, you just discard the misbegotten branch.

But what about the database? If moving along your branch involves changes to the database, structural changes in particular, you can’t easily switch to another branch without these changes. In other words, your development database is really only suitable for a single branch.

Up to now, that is. Install the branch_db gem and what you get is a private database for your branch that you can mutilate without interfering with work on the other branches.

For details and installation instructions see the README. Here’s just an appetizer. Say you’re on branch “feature” and want a private copy of the “master” database. Here’s how you do it:

$ rake db:branches:copy ORIG_BRANCH=master

Getting it

  • github
  • $ sudo gem install mschuerig-branch_db

Let your Rails app know about it

In the appropriate place in config/environment.rb add

config.gem "mschuerig-branch_db", :lib => 'branch_db'

Ruby as a language for Rails views

Didn’t you always want to write your Rails views as plain Ruby objects? — “What?”, I hear you say. No, I haven’t lost my mind and the idea is quite sensible (or so I hope), once you add the restriction that it is JSON-formatted data that you want to return.

Say you need to set up some hashes or arrays for rendering to JSON. This is best done in Ruby and it is clearly a view concern. So let’s do it in the views. Like this:

  # app/controllers/movies_controller.rb
  def index
    respond_to do |format|
      format.json do
        @movies = Movie.all
        @count = Movie.count
        render :template => 'movies/index.json.rb'
      end
    end
  end

  # app/views/movies/index.json.rb
  {
    :identifier => Movie.primary_key,
    :totalCount => @count,
    # render @movies does not work as it insists on returning a string
    :items => @movies.map { |m| render(m) }
  }

  # app/views/movies/_movie.json.rb
  {
    :id => movie.to_param,
    :title => movie.title,
    :releaseDate => movie.release_date
  }

Getting it

  • github
  • $ sudo gem install mschuerig-ruby_template_handler

Let your Rails app know about it

In the appropriate place in config/environment.rb add

config.gem "mschuerig-ruby_template_handler", :lib => 'ruby_template_handler'

Simplistic Enumerations for ActiveRecord

Invariably, in almost every application there happen to be lists of data items that are immutable, just for reference. It could be colors, the four seasons, continents, the states of your country, kinds or types of this and that. These items are almost like constants. As accessible as they are through the ordinary ActiveRecord API, it seems an utter waste to hit the database again and again, for data that won’t change however often you request it.

Of course, the Rails community recognized early on that some data are more constant than others and over the years several plugins have been published that add cached and easily accessible enumerations to ActiveRecord. Some of these additions are quite complicated or befuddle ActiveRecord by not backing the enumeration values with real database objects. My experience has been that these attempted optimizations result in bizarre behavior when I did interesting things with ActiveRecord such as multiply nested named scopes plus custom SQL.

So, I thought a basic, no, simplistic, version of enumerations is called for. Here’s how it looks:

class Color < ActiveRecord::Base
  enumerates do |e|
    e.value :name => 'red'
    e.value :name => 'green'
    e.value :name => 'blue'
  end
end

Color[:green]
Color.find_by_name('red')
Color.find_by_name!(:red)
Color.all
Color.count
Color.reload

e.value :name => 'red'

ensures that a Color object with name ‘red’ exists, if it does not, one is created.

Caveats

Although there is a #reload method defined on enumeration models, i.e. Color.reload, it is very unwise to use it. The point is that this method only affects a single server process and you most likely have many of them.

So, if you need to change enumeration values, the only way to do it is to treat it like an update to your application code.

Getting it

  • github
  • $ sudo gem install mschuerig-easy_enums

Let your Rails app know about it

In the appropriate place in config/environment.rb add

config.gem "mschuerig-easy_enums", :lib => 'easy_enums'