Mongoid: Make sure all changed objects are saved

With Rails’s ActiveRecord and presumably other object-relational mappers, it is easy to use persistent, database-backed objects as if they were just there. Create new objects, navigate along associations, just like that. This picture breaks down however, when it comes time to ensure those in-memory changes are saved back to the database.

In general, in a single unit of work, we only want to read an object once from the database and write changes to that object, if any, back to the database at most once.

The “reading once” part is mostly handled by an identity map. Such a map does not prevent multiple reads of the same object through conditional queries, but it does ensure that each persistent object has a single in-memory representation and it does avoid reading the same object twice when the object is accessed by its id (primary key).

The writing part is harder, because a changed or new object might not be valid by the constraints imposed by the application or the database. In effect, writing/saving must be done explicitly in order to react to the possible error conditions. In a lot of common cases this fits in nicely with the business logic.

Say your application provides the functionality to send a message. It is part of your business logic to check that the user has indeed entered a valid message, one having some text for the body. So when you write code like this

class MessagesController < ApplicationController
  def new
    @message = Message.create(message_params)
    if @message.errors.empty?
      ...
    else
      ...
    end
  end
end

you are really performing a validity check on the message with the side-effect that a valid message is written to the database.

It is a somewhat different case when objects are changed not directly by the user, but as a consequence of another user action. Say we want to mark a message as read when a user has seen it:

class MessagesController < ApplicationController
  def show
    @message = Message.find(params[:id])
    @message.mark_as_read
  end
end

Now we are coming to the heart of the matter: How do we ensure that the state change (unread -> read) of our message is saved? Maybe like this:

class Message
  def mark_as_read
    self.state = 'read'
    save!
  end
end

That looks fine by itself, but what if we want to make another change to the same object, like this

class MessagesController < ApplicationController
  def show
    @message = Message.find(params[:id])
    @message.mark_as_read
    @message.register_last_reader(current_user)
  end
end

and

class Message
  def register_last_reader(user)
    self.last_reader = user
    save!
  end
end

Note that each method by itself ensures that changes are written to the database. As a result, we can compose these methods easily without any further concern that the changes they effect have to be written explicitly to the database.

Unfortunately, there is a drawback. The price we pay is that the same object is written to the database twice because each change is saved individually.

How about not saving the message in mark_as_read and register_last_reader and instead doing it in the controller action?

class MessagesController < ApplicationController
  def show
    @message = Message.find(params[:id])
    @message.mark_as_read
    @message.register_last_reader(current_user)
    @message.save!
  end
end

This works, needs only a single database write -- and results in less compositional, more risky code because now mark_as_read and register_last_reader implicitly shift responsibility to their calling context.

Now to the denouement. I don't have a handy solution that satisfies all constraints, but I do have a code snippet that helps make the second approach less risky.

This version is for Mongoid, but the idea is applicable to other ORMs that use an identity map. And importantly, the identity map must be enabled for this to work.

What It Does and Why

The after_filter logs and raises an exception if the identity map contains any objects that should have been saved. In particular, these are the objects that have changes to their attributes, excluding non-essential technical attributes, and where no failed attempt has been made to save them explicitly. We don't care for objects with errors, because we assume that such an object is explicitly handled. We do care about the forgotten objects, changed but never saved.

No database to rule them all. Branch databases for Rails and git

You’re using git to manage your Ruby on Rails projects? Then you’ve probably come to appreciate topic branches as a kind of transaction facility for programmers. If you need to implement a non-trivial or just try something, the way to go is onto a new “topic” branch. There you can mangle the code all you like, without fearing to cause any lasting harm. In case everything works out fine, you merge the branch into the master branch and discard it. If you come to a dead end — well, you just discard the misbegotten branch.

But what about the database? If moving along your branch involves changes to the database, structural changes in particular, you can’t easily switch to another branch without these changes. In other words, your development database is really only suitable for a single branch.

Up to now, that is. Install the branch_db gem and what you get is a private database for your branch that you can mutilate without interfering with work on the other branches.

For details and installation instructions see the README. Here’s just an appetizer. Say you’re on branch “feature” and want a private copy of the “master” database. Here’s how you do it:

$ rake db:branches:copy ORIG_BRANCH=master

Getting it

  • github
  • $ sudo gem install mschuerig-branch_db

Let your Rails app know about it

In the appropriate place in config/environment.rb add

config.gem "mschuerig-branch_db", :lib => 'branch_db'

Emperor Ming strikes back

Emperor Ming strikes back

May 5, 2008by Michael Schürigproduct

Joe Celko’s Thinking in Sets

★★★☆☆

I have (read) copies of five earlier of Celko’s books on my shelf, still I am again amazed by the cultural distance. Most of my programming life I have spent with object-oriented programming languages and associated technologies. Thus, when Celko starts the present book with a discussion of the differences between flat files and relational databases, it could hardly be more distant than if he had extolled the virtues of the gasoline engine over its steam predecessor.

Celko likes to refer to his informers as “Mr. So-and-so, working for company X” this again moves the cultural differences to the front, and I can’t avoid a slight chuckle when he reverently cites “Dr. E.F. Codd” for the umpteenth time. It all decidedly feels like a tale from an imaginary 1950s. I certainly envision people in lab coats.

The tone moves from enjoyably quaint to annoying, when Celko (again and again) ridicules the many failings of database novices and sophomores. He might not realize that those who share in the joke have no need to read his book — and that those who bought the book to learn something from it may feel a wee bit offended. After all, we are already aware that there’s something we don’t know yet and want to learn, there’s really no need to rub it in.

So much for the atmospheric stuff. But, of course, I didn’t buy this book to make me feel good, but to learn something, come rain or shine. And, yes, there is a lot useful stuff in this book. More in the bits and pieces than in some generalized approach. And by far more in line with the subtitle, “Auxiliary, Temporal and Virtual Tables in SQL” than with “Thinking in Sets”, the main title. Regarding the latter, I found the most worthwhile part of the book to be the discussion of why boolean flags are bad (ch. 11, Thinking in SQL).

Celko’s effort to distance the relational, set-based approach from earlier practices crops up all over the book. I had expected — and hoped! — that Celko would put considerably energy into comparing, contrasting, and hopefully complementing set-based thinking with current object-oriented approaches. Alas, he’s completely preoccupied with his own tradition and doesn’t wander into OO-land at all.

I would have been very interested in reading a knowledgable discussion of where to draw the line between procedural and set-based approaches. And, as most practical programs will employ both of these approaches, how to interface the respective parts. On the latter issue, there’s not a single word in this book. The treatment of the former issue is interesting, in a twisted sense. Celko demonstrates some string processing in SQL and concedes that this would be much easier in languages such as ICON or SNOBOL, those stalwarts of 1970s era dataprocessing (does he even know Perl?). Well, why then try to abuse SQL to do something for which it is ill-suited and results in bloated code? Why anyone would want to solve Sudoku puzzles in SQL I cannot fathom, either. Celko doesn’t tell, and neither does he present the whole (repetitive) code, nor explain how the set-based approach works in any sufficient detail.

The overarching mindset exemplified in this book is to push as much into the database as possible, even if it hurts at times. I don’t mean to denigrade the intention, namely application-independent, consistent data storage. However, the reality in current software engineering is that a shared database is but one solution among others. For instance, SOA (Service Oriented Architecture) is specifically about connecting applications through services they provide, not by tying them to a shared database.

Celko likes to style himself in the image of Ming the Merciless. The semblance is indeed uncanny and as I hinted already, he tries to live up to the role as his author persona. Unfortunately, he doesn’t seem to realize that there’s one thing that can’t be tolerated in an arch-villain (as well as in his henchmen and henchwomen): sloppiness. The book has more than its fair share of typos and grammatical accidents. A particularly amusing case in point — due to his belligerent character, a deeper insight, or simply search-and-replace gone awry — is an example that consistently refers to “martial status”.