Lifting indexes with ActiveRecord

On a Rails project I’m currently working on I need to fill the database with test data to have something to play with. Apart from large imports, that’s the time when indexes may slow down operation severely instead of speeding things up. Consider: The indexes are not used, but have to be updated again and again for each new record that is inserted into the database. It is much cheaper, to lift — well, really drop — the indexes during mass operation and recreate them afterwards.

Here’s an example:

namespace :db do
  desc "Populate the database with sample data"
  task :populate => :environment do

    retained_indexes = [
      'index_people_on_lastname_and_firstname',
      { :table => :movies, :columns => :title }
      { :table => 'people', :columns => ['lastname', :firstname] }
    ]

    ActiveRecord::Base.transaction do
      IndexLifter.without_indexes(
        # Only consider indexes on these tables;
        # all tables by default.
        :movies,
        :people,
        # Don't lift these indexes
        :except => retained_indexes,
        # Don't lift unique indexes; default: false.
        :except_unique => true
      ) do
        ActiveRecord::Base.silence do

          # import or generate large amounts of data here

        end
      end
    end
  end
end

Please bear in mind that dropping and creating of indexes is a rather intrusive operation on the structure of your database. You should only perform it while no other users (or processes) are accessing it.

Also, consider that some indexes may be important for the proper function of your database. If you have unique indexes, i.e. indexes that enforce that particular columns or combinations of columns are unique, and if you are handling violations of this constraint in your application code, then you might need to retain these indexes even during data generation.

Getting it

  • github
  • $ sudo gem install mschuerig-index_lifter

Let your Rails app know about it

In the appropriate place in config/environment.rb add

config.gem "mschuerig-index_lifter", :lib => 'index_lifter'

Emacs Tidbit: Console or GUI?

Emacs has two different UIs: one using only a text console, the other using the window system. Even when a window system is present, Emacs can be started in a terminal emulator with the -nw option to show its text UI.

There are some customizations you might want to make conditional on the kind of UI. In my case, I use ECB, and in general I’d like it to start up automatically when I launch Emacs, but only when running in GUI mode.

It’s pretty simple to achieve this, a small addition to ~/.emacs does the trick.

(if (memq window-system '(x w32 mac))
  (ecb-activate))

Line movement for Emacs

Recently, I’ve been mingling with two old acquaintances (if not downright friends): C and Emacs. Through my interaction with other text editors I’ve come to know functions to move the current line, i.e. the one containing the cursor, up or down. That’s something I wanted to have in Emacs too. And so I wrote my first ever Emacs Lisp functions with some help from comp.emacs.

Stick these definitions in ~/.emacs and evaluate them or restart Emacs.

(defun move-line-down ()
  (interactive)
  (let ((col (current-column)))
    (save-excursion
      (next-line)
      (transpose-lines 1))
    (next-line)
    (move-to-column col)))

(defun move-line-up ()
  (interactive)
  (let ((col (current-column)))
    (save-excursion
      (next-line)
      (transpose-lines -1))
    (move-to-column col)))

(global-set-key [\M-down] 'move-line-down)
(global-set-key [\M-up] 'move-line-up)

Now you have Meta-Down and Meta-Up (i.e., probably Alt-Down and Alt-Up) bound to moving the current line around.

Emperor Ming strikes back

Emperor Ming strikes back

May 5, 2008by Michael Schürigproduct

Joe Celko’s Thinking in Sets

★★★☆☆

I have (read) copies of five earlier of Celko’s books on my shelf, still I am again amazed by the cultural distance. Most of my programming life I have spent with object-oriented programming languages and associated technologies. Thus, when Celko starts the present book with a discussion of the differences between flat files and relational databases, it could hardly be more distant than if he had extolled the virtues of the gasoline engine over its steam predecessor.

Celko likes to refer to his informers as “Mr. So-and-so, working for company X” this again moves the cultural differences to the front, and I can’t avoid a slight chuckle when he reverently cites “Dr. E.F. Codd” for the umpteenth time. It all decidedly feels like a tale from an imaginary 1950s. I certainly envision people in lab coats.

The tone moves from enjoyably quaint to annoying, when Celko (again and again) ridicules the many failings of database novices and sophomores. He might not realize that those who share in the joke have no need to read his book — and that those who bought the book to learn something from it may feel a wee bit offended. After all, we are already aware that there’s something we don’t know yet and want to learn, there’s really no need to rub it in.

So much for the atmospheric stuff. But, of course, I didn’t buy this book to make me feel good, but to learn something, come rain or shine. And, yes, there is a lot useful stuff in this book. More in the bits and pieces than in some generalized approach. And by far more in line with the subtitle, “Auxiliary, Temporal and Virtual Tables in SQL” than with “Thinking in Sets”, the main title. Regarding the latter, I found the most worthwhile part of the book to be the discussion of why boolean flags are bad (ch. 11, Thinking in SQL).

Celko’s effort to distance the relational, set-based approach from earlier practices crops up all over the book. I had expected — and hoped! — that Celko would put considerably energy into comparing, contrasting, and hopefully complementing set-based thinking with current object-oriented approaches. Alas, he’s completely preoccupied with his own tradition and doesn’t wander into OO-land at all.

I would have been very interested in reading a knowledgable discussion of where to draw the line between procedural and set-based approaches. And, as most practical programs will employ both of these approaches, how to interface the respective parts. On the latter issue, there’s not a single word in this book. The treatment of the former issue is interesting, in a twisted sense. Celko demonstrates some string processing in SQL and concedes that this would be much easier in languages such as ICON or SNOBOL, those stalwarts of 1970s era dataprocessing (does he even know Perl?). Well, why then try to abuse SQL to do something for which it is ill-suited and results in bloated code? Why anyone would want to solve Sudoku puzzles in SQL I cannot fathom, either. Celko doesn’t tell, and neither does he present the whole (repetitive) code, nor explain how the set-based approach works in any sufficient detail.

The overarching mindset exemplified in this book is to push as much into the database as possible, even if it hurts at times. I don’t mean to denigrade the intention, namely application-independent, consistent data storage. However, the reality in current software engineering is that a shared database is but one solution among others. For instance, SOA (Service Oriented Architecture) is specifically about connecting applications through services they provide, not by tying them to a shared database.

Celko likes to style himself in the image of Ming the Merciless. The semblance is indeed uncanny and as I hinted already, he tries to live up to the role as his author persona. Unfortunately, he doesn’t seem to realize that there’s one thing that can’t be tolerated in an arch-villain (as well as in his henchmen and henchwomen): sloppiness. The book has more than its fair share of typos and grammatical accidents. A particularly amusing case in point — due to his belligerent character, a deeper insight, or simply search-and-replace gone awry — is an example that consistently refers to “martial status”.

Collecting DOM events

The DOM change notification I presented yesterday is one piece in the puzzle to rejuvenate my languishing JavaScript form validator. I’ve not yet made up my mind on a lot of issues, but there’s one thing that’s certain: the next version will have to be able to work with changing forms, that is, forms that show different input elements depending on already chosen values.

Since yesterday, I have a way to be notified when the DOM has changed. As it is, the dom:changed custom event is completely general and not quite what I need. Of course, I do need some kind of notification, but what I get when a function changes the DOM in several places, is a barrage of dom:changed events. I don’t want to do lots of stuff for each of these events, but I’ve got to do something about the events collectively. There is no natural bracketing for events, there’s nothing that says which events belong to the same high-level change. But that’s not really necessary, rather, it is good enough to collect all relevant events that occur in period of time.

Here’s a class that collects all events that it receives over a given interval (of seconds) and passes them on to a handler function.

Element.EventCollector = Class.create({
  initialize: function(handler, interval) {
    this.handler = handler;
    this.interval = (interval || 1) * 1000;
    this.reset();
    this.dischargeEvents = this.discharge.bind(this);
  },
  reset: function() {
    this.events = [];
    this.timer = null;
  },
  observer: function() {
    return this.onEvent.bindAsEventListener(this);
  },
  onEvent: function(event) {
    this.events.push(event);
    this.wait();
  },
  wait: function() {
    if (!this.timer) {
      this.timer = setTimeout(this.dischargeEvents, this.interval);
    }
  },
  discharge: function() {
    this.timer = null;
    var events = this.events;
    this.reset();
    this.handler(events);
  }
});

Add some syntactic sugar, so that we can do things like

document.body.collectEvents('dom:changed', function(events) { ... });

Element.addMethods({
  collectEvents: function(element, eventName, handler, interval) {
    var collector = new Element.EventCollector(handler, interval);
    $(element).observe(eventName, collector.observer());
  }
});

For my motivating case of the validator that’s still not exactly what I need. I can arrange to get a notification that a whole bunch of elements was changed, but I don’t want to deal with these elements individually. The validator I have (and the next version I envision) works it’s way down from a starting element and attaches its magic pixie dust to everything below. Thus, what I need is a common ancestor element for all the changed elements. The idea is easy: Go down from the root of the “family” (document) tree and follow it until the path to the elements branches.

Element.addMethods({
  commonAncestor: function() {
    var ancestorsLists = $A(arguments).map(function(el) {
      el = $(el);
      var ancestors = el.ancestors().reverse()
      ancestors.push(el);
      return ancestors;
    });
    var common = document;
    Enumerable.zip.apply(ancestorsLists.shift(), ancestorsLists).each(
      function(ancestorTuple) {
        var cand = ancestorTuple.shift();
        if (!ancestorTuple.all(function(el) { return el == cand; })) {
          throw $break;
        }
        common = cand;
      });
    return common;
  }
});

[See Element#ancestors, Enumerable#zip, $break.]

Finally, here’s an example that puts the existing pieces together.

document.observe('dom:loaded', function() {
  document.body.collectEvents('dom:changed', function(events) {
    var affectedElements = events.invoke('element');
    var ancestor = Element.commonAncestor.apply(null, affectedElements);
    console.log('dom:changed below: ', ancestor);
  }, 10);
});

The DOMs they are a-changin’

So we’ve put in the effort to make our JavaScript code nicely unobtrusive and all is well — until it isn’t. A pesky Ajax request has messed up the DOM and suddenly there are elements in there where our unobtrusive behavioral goodness has not been attached. Never give up, never surrender. You won’t even have to get out Grabthar’s hammer, it’s only just a small matter of programming.

First, let’s shrink the problem a little. Let’s concentrate on changes effected through the abstraction layer provided by the Prototype library. Prototype implements several DOM-changing functions on the Element singleton and also mixes them into extended DOM elements. These functions are remove, update, replace, insert, wrap, and empty. Prototype also gives us the wrap function that wraps a function around another, somewhat akin to AOP‘s around advice, but without pointcuts.

Now the strategy is clear: Replace the original DOM-changing functions with versions that record a bit of information about these changes. A good mechanism to serve the actual notification is to use Custom Events. Then, anyone intersted in changes to a branch of the DOM can register as an observer for our new dom:changed event.

However, it is a good idea not to fire the event straightaway after the DOM has been changed. We might be in the middle of a batch of changes and the DOM in a correspondingly unorderly state. Besides, it is sometimes necessary to give the browser’s rendering engine a bit of breathing room after changes. For all this, at first the event data is appended to a list and a timer is set to dispatch the corresponding events some time later. This timer is reset for every change, so that for a batch of changes it only executes once and and then fires all events successively.

To understand the resulting behavior, it is important to know that JavaScript is single-threaded. As a result, when a timer times out, it does not interrupt the currently executing code. Rather, the function attached to the timer is scheduled to be executed as soon as there is nothing else to do. Therefore, the usual sequence is like this

  • An event triggers a DOM-changing function
    • Change the DOM
    • Remember the change
    • Change the DOM
    • Remember the change
  • Fire events for all remembered changes

Finally, here’s the code.

(function() {
  var methods = ['remove', 'update', 'replace', 'insert', 'wrap', 'empty'];
  var changes = [];
  var timeout;
  
  function rememberChange(element, method) {
    changes.push({
      element: $(element),
      parent: element.parentNode,
      operation: method
    });
  }
  
  function scheduleEvent() {
    if (timeout) {
      clearTimeout(timeout);
    }
    timeout = setTimeout(fireEvent, 10);
  }
  
  function fireEvent() {
    changes.each(function(change) {
      var affectedNode = change.element;
      var operation = change.operation;
      if (!change.parent || 
          (change.parent && !affectedNode.descendantOf(change.parent))) {
        affectedNode = change.parent || affectedNode.parentNode;
        operation += 'Child';
      }
      affectedNode.fire('dom:changed', {operation: operation});
    });
    changes = [];
  }

  methods.each(function(m) {
    Element.Methods[m] = Element.Methods[m].wrap(
      function(proceed, element) {
        rememberChange(element, m);
        scheduleEvent();
        return proceed.apply(null, $A(arguments).slice(1)); 
      });
  });
})();

When an element has been removed from the DOM, it is pointless to fire an event on it that bubbles upwards. There’s no surface to bubble up to anymore. Therefore, in this case, the dom:changed event is fired on the parent of the removed element and the operation is signified as removeChild.

To peek at the modification events, or do something useful with them, add code like this

document.observe('dom:loaded', function() {
  document.body.observe('dom:changed', function(event) {
    console.log('dom:changed: ', event.element(), event.memo.operation);
  });
});

JavaScript FSM v0.2.2

I’ve released a new version, 0.2.2, of the JavaScript FSM builder. There are some enhancements, such as accessors for successorStates and expectedEvents that I found useful for implementing a first (almost) practical example involving drag & drop.

Speaking of which, this whole thing didn’t come into being because of a pressing need. Currently, it’s more of a solution looking for a problem. I know that state machines can be very helpful for UI construction, however, this particular incarnation still has to prove itself.

All the rest

Finite State Machines in JavaScript

I like finite state machines and I wanted to try metaprogramming in JavaScript ever since I’d seen Adam McCrea presentation on the topic.

The result is an FSM builder in JavaScript. Machine descriptions look like my test machine here:

var Machine = FSM.build(function(fsm) { with (fsm) {
  onUnexpectedEvent(function() { ... });

  state('start', 'initial')
    .event('go')
      .goesTo('middle')
      .doing(function() { ... })
      .doing('phew')
    .event('run')
      .goesTo('finish')
    .onExiting(function() { ... });

  state('middle')
    .onUnexpectedEvent(function() { ... })
    .onEntering(function() { ... })
    .event('back')
      .goesTo('start')
      .onlyIf(function() { return true_or_false })
    .event('go')
      .goesTo('finish');

  state('finish', 'final');
}});

function TestMachine() {}
// amend TestMachine.prototype here all the way you want
TestMachine.prototype.phew = function() { ... };
TestMachine.prototype = new Machine(TestMachine.prototype);

There’s no documentation yet, but thorough unit testsspecs
using Nicolás Sanguinetti’s very nice js-spec framework.

Something to look at:

The script is meant to be usable independently of and play nicely with any other libraries.

The manner in which an FSM is attached to classes/objects is still rather convoluted. The general idea is that it (a) should be possible to insert the FSM as the first link in the prototype chain and (b) should be possible to directly attach it to an existing object. I’d appreciate suggestions how to make this nice and shiny.

Bash completion for script/generate

The Rails generator script/generate knows pretty well what things it can generate. In fact, it knows much better than I do. So, I think it could really give me some help when I’m typing along on the command line.

If you save the snippet below as /etc/bash_completion.d/generate you can enjoy this help, too.

_generate()
{
  local cur

  COMPREPLY=()
  cur=${COMP_WORDS[COMP_CWORD]}

  if [ ! -d "$PWD/script" ]; then
    return 0
  fi

  if [ $COMP_CWORD == 1 ] && [[ "$cur" == -* ]]; then
    COMPREPLY=( $( compgen -W '-h -v\\
      --help --version'\\
      -- $cur ))
    return 0
  fi

  if [ $COMP_CWORD == 2 ] && [[ "$cur" == -* ]]; then
    COMPREPLY=( $( compgen -W '-p -f -s -q -t -c\\
      --pretend --force --skip --quiet --backtrace --svn'\\
      -- $cur ))
    return 0
  fi

  COMPREPLY=( $(script/generate --help | \\
    awk -F ': ' '/^  (Plugins|Rubygems|Builtin|User):/ { gsub(/, */, "\n", $2); print $2 }' | \\
    command grep "^$cur" \\
  ))
}

complete -F _generate $default generate