February 21, 2008

Managing Views with ActiveRecord Migrations

Listen to this articleListen to this article

I just added very simple view support1 to redhillonrails_core trunk. The changes give you the ability to create and drop views using create_view :name, "definition" and drop_view :name respectively as well as preserving the views in schema.rb.

1WARNING: This is currently only supported for PostgreSQL. Creating views in MySQL will cause extra tables to be created in schema.rb! Probably not what you wanted.

February 15, 2008

Ambitious Scoping

Listen to this articleListen to this article

If you haven't checked out Ambition for generating ActiveRecord queries I highly recommend you do so. In a nutshell, it allows you to generate queries using the standard Ruby Enumeration idioms. Take for example the following snippet:

class Message < ActiveRecord::Base
  def self.unread
    select { |m| m.read_at == nil }.entries
  end
end

Message.unread
=> SELECT * FROM messages WHERE messages.read_at IS NULL

Notice anything missing? SQL perhaps?

Now clearly there's a whole lotta magic going on here. That said, ambition does use the standard ActiveRecord finders to query the database. One of the side-effects of this is that when navigating associations, you automatically get all the appropriate scoping.

So, for example, assuming a has_many relationship between User and Message we can do something like this:

user = User.detect { |c| c.name == 'Simon Harris' }
=> "SELECT * FROM users WHERE (users.name = 'Simon Harris') LIMIT 1"
user.messages.unread
=> "SELECT * FROM messages WHERE messages.read_at IS NULL AND messages.user_id = 3"

Here we navigated from user to messages and selected only those that have no read_at date.

As you can see, the restriction by user_id was automatically inserted for us by ActiveRecord as we navigated the association. We didn't have to do a thing. It just worked the way you would expect it to. Very cool!

However (there's always a but) this only worked because we included an execution trigger in Messages.unread. These "kickers" as they're known, include the standard Enueration methods first, entries, detect and size.

Without a kicker, the result of select is actually a query object that you can assign to a variable, call more selects on, or even store for later use! This last feature is kinda neat and leads to some nice stuff with partial caching but it also leads to some hidden complications.

If we remove the kicker from Message.unread so that it instead returns a query and call the kicker explicitly, this happens:

class Message < ActiveRecord::Base
  def self.unread
    select { |m| m.read_at == nil }
  end
end

user = User.detect { |c| c.name == 'Simon Harris' }
=> "SELECT * FROM users WHERE (users.name = 'Simon Harris') LIMIT 1"
user.messages.unread.entries
=> "SELECT * FROM messages WHERE messages.read_at IS NULL"

Where did all the scoping go?!

In the earlier example, the query was executed inside a method on an model class which, because it was called whilst navigating an association, means it was also executed using an appropriate with_scope.

In the second example however, because the query wasn't actually executed until sometime later--ie outside any model classes and associations--all the scoping information has been forgotten, as if it were never there in the first place. Bbbbbut, you want to have all that Ambition goodness and you'd like your scoping as well right? So what's a poor boy (or girl) to do?

I'm glad you asked. Here's a patch1 to ambitious-activerecord that remembers the scoping that was in play at the time the query was constructed. It seems to work for all my current uses.

1 DISCLAIMER: very quick-and-dirty and thoroughly untested

February 21, 2007

Rails, PostgreSQL and Case-Sensitivity

Listen to this articleListen to this article

Possibly the only thing I like about MySQL is when performing a search, the values 'SIMON' and 'sImOn' are considered equal—case-insensitive searching. PostgreSQL on the other hand considers them to be different—case-sensitive searching. Now I don't know about you but for %99.999~ of the applications I've ever written, I'd rather 'Australia' and 'AuStRaLiA' weren't considered different countries.

The "standard" approach to solving this problem is to change a query from this:

SELECT * FROM countries WHERE name = ?;

To something like this:

SELECT * FROM countries WHERE LOWER(name) = LOWER(?);

Thereby forcing the database to perform a pseudo case-insensitive search. The only problem is that all those nice indexes you've created to ensure fast, efficient searching are totally ignored. (Who can spell full-table scan?) Performance issues aside (I mean after all we know that premature optimisation is the root of all evil right?) what's just as annoying is that I can't actually guarantee uniqueness, which is pretty much the whole point! No, even with a unique index on countries.name, the database will still quite happily allow me to:

INSERT INTO countries (name) VALUES ('Australia');
INSERT INTO countries (name) VALUES ('AUSTRALIA');
INSERT INTO countries (name) VALUES ('aUsTrAlIa');

So when I perform a case-insensitive search as previously discussed, I'll end up with three (count 'em 3) records. Thankfully, there is a solution (of sorts): expression indexes.

PostgreSQL allows you to create indexes based on expressions, say for example LOWER(name), allowing us to create a unique, case-insensitive index as simply as:

CREATE UNIQUE INDEX index_countries_on_name ON countries (LOWER(name));

Ok, so perhaps you knew this already and you're wondering what all this has to do with Rails? Well I'm glad you asked.

Rails (as of 1.2) has a new option for validates_uniqueness_of named, oddly enough, case_sensitive. This is assumed to be true by default (meaning all searches are case-sensitive). Set it to false however and you'll magically get validation queries that look like:

SELECT * FROM countries WHERE (LOWER(countries.name) = 'australia') LIMIT 1;

To compliment this feature, I've recently enhanced the RedHill on Rails Plugins in two interesting (and hopefully useful) ways.

The first is in the core and supports the creation of case-insensitive indexes during schema migration:

add_index :countries, [:name], :unique => true, :case_sensitive => false

The second is in schema validations and causes case_sensitive => false to be passed as an option to validates_uniqueness_of whenever a case-insensitive index is detected.

(I also looked at the possibility of automagically surrounding query parameters, etc. with LOWER() inside find methods but given the myriad forms queries can take, it seems altogether too difficult for my feeble mind at this point.)

The upshot of all this is that at the very least, it should now be possible to add case-insensitivity to your queries and be assured that (bugs not withstanding) the performance of your application won't suddenly plummet as a consequence.

September 15, 2006

Automatically Validate Uniqueness of Columns with Scope

Listen to this articleListen to this article

The first cut at Schema Validations only applied validates_uniqueness_of for single-column unique indexes. This removed 80% of the cases in my code base but there were still cases where a scope was specified that lingered. Not any more.

The plugin now automatically generates validates_uniqueness_of with scope for multi-column unique indexes as well.

As always, there are some assumed conventions—which I believe will handle close to 99% of cases—around how to decide which column to validate versus which columns to consider part of the scope. The column to validate is chosen to be either:

  1. The last column in the index definition not ending in ‘_id’; or simply
  2. The last column in the index definition.

With all remaining columns considered part of the scope, following, what I believe to be, a typical typical composite unique index column ordering.

So, for example, given either of the following two statements in your schema migration:

add_index :states, [:country_id, :name], :unique => true
add_index :states, [:name, :country_id], :unique => true

The plugin will generate:

validates_uniqueness_of :name, :scope => [:country_id]

My next stop is to have a look at simple column constraints such as IN('male', 'female') and turn them into validates_inclusion_of :gender, :in => ['male', 'female'].

Perhaps tomorrow :)

September 09, 2006

Not My SQL

Listen to this articleListen to this article

Everyone else's favourite database just gave me the shits, again!

As part of my Schema Validations plugin for rails, I needed to see if a column has a default value. If it does, then there's no point in adding a validates_presence_of as the database will add one in. Ok, sounds sensible. Works just fine under PostgreSQL but my tests were failing when run against MySQL. Specifically, there was no validation being added for integer columns marked as NOT NULL. Huh?!

After a little investigation, I noticed that the meta-data that rails was collecting for mandiatory integer columns included a default of 0. So I looked in the test database and sure enough the columns all had a default of 0. But how? Why? I didn't put a default in my migrations...

A little more investigation and I noticed that the schema dump that is generated out of the development database and then run against the test database did indeed include the very same defaults. I then looked in the development database and to my surprise found no such defaults there. Aha! Mystery solved I presumed. Rails must have a bug for MySQL.

So I go and look at the code but alas, the code is the same for both PostgreSQL and MySQL. Something else must be happening. Time to get down and dirty on the command-line.

mysql> create table foo (col1 int, col2 int not null, col3 int default null) engine=InnoDB;

mysql> show columns from foo;
+-------+---------+------+-----+---------+-------+
| Field | Type    | Null | Key | Default | Extra |
+-------+---------+------+-----+---------+-------+
| col1  | int(11) | YES  |     | NULL    |       | 
| col2  | int(11) | NO   |     |         |       | 
| col3  | int(11) | YES  |     | NULL    |       | 
+-------+---------+------+-----+---------+-------+

If I have a nullable column then the default default (if that makes sense) is NULL. If I mark a column as mandiatory, the default default is...an empty string!? I wonder what would happen if I tried inserting a row and letting MySQL default all the values:

mysql> insert into foo () values ();

mysql> select * from foo;
+------+------+------+
| col1 | col2 | col3 |
+------+------+------+
| NULL |    0 | NULL | 
+------+------+------+

You have to be shitting me! I attempted to insert a row into a table without specifying a value for a column that is marked as NOT NULL and it inserts 0!? Hold on a second...what if I force the default to be NULL so that it behaves just like every other sensible database on the planet:

mysql> create table bar (col1 int not null default null) engine=InnoDB;
ERROR 1067 (42000): Invalid default value for 'col1'

Egads! OK let me try that in PostgreSQL:

psql=# create table bar (col1 int not null default null);
CREATE TABLE

Thank-you!

Sure, I could make the assumption that 0 was never going to be a valid identifier for a record in another table but why should I have to? As far as I can tell, MySQL is just making shit up! No wonder my brother says it reminds him of using Microsoft Access.

So, now I'm left with the task of working out how to patch rails to get around this. I think I'll just have to presume that empty strings are equivalant to NULL for manditory columns. Sheesh.

July 18, 2006

Schema Validations Plugin

Listen to this articleListen to this article

After listening to Prag Dave's Keynote Speech this afternoon, I was motivated to implement some of the things he'd been asking for. Here's my first cut at it.

As the doco says, the plugin reads some—ok only one at the moment but we'll see how many others I get done before the beer runs out—database column constraints and tries to apply the closest corresponding rails validation. The first one I implemented reads the NOT NULL constraints against columns and generates a corresponding validates_presence_of.

I literally just whipped it up with no tests or what-not and I've only played with it against PostgreSQL, so if it has bugs or behaves oddly for whatever reason, please let me know, send me as much info as possible and I'll make it work. Nothing better than having real people testing it for me ;-)

Update: Ok, so far the beer has lasted long enough to implement validation of numbers (including specific support for integers) and lengths of strings.

Update: Now calls validates_presence_of anytime you declare a belongs_to association for a NOT NULL foreign-key column.

Update: Single-column unique indexes are now converted to validates_uniqueness_of.

June 04, 2006

Schema Defining Plugin

Listen to this articleListen to this article

Just a quick update on my plugins. Two of them—Foreign Key Migrations and Row Version Migrations—needed to know if they were being run as part of ActiveRecord::Schema.define() so I extracted the common code into yet another plugin: Schema Defining. This new plugin provides a class method—ActiveRecord::Schema.defining?—that returns true if ActiveRecord::Schema.define() is currently running; false otherwise. Probably not a hugely useful plugin in the main but certainly a must have if, like me, you have a number of migration plugins that need to alter their behaviour accordingly.

June 03, 2006

Row Version Migrations

Listen to this articleListen to this article

The more I use Rails, the more plugins I seem to create. Each time I start a new project, almost the first thing I find I want to do is to copy a chunk of code from an existing project. The moment that happens, I think: Plugin!

So here it is. A plugin that automatically generates the following row version columns for every table:

:created_at,   :datetime,  :null => false
:updated_at,   :datetime,  :null => false
:lock_version, :integer,   :null => false, :default => 0

Not so difficult. In fact it seems almost trivial (as usual) but it means all my tables now get date/time stamps and optimistic locking for free.

Enjoy. As always, feedback, suggestions, patches, criticism, all welcome.

May 28, 2006

Alias a Static Method in Ruby

Listen to this articleListen to this article

As much as I love Ruby on Rails, one of the things that dissapoints me is the rather large number of static methods. Maybe it's my Java background but class methods just irk me: They're difficult to override, mock, stub-out, and they're inherintly thread-unfriendly. But that aside, the fact remains that they exist and, now and again, I need to mess with them. Case in point: My Foreign Key Migrations plugin.

The plugin was working fantastically—thanks to the efforts of Ted Davis for giving it a work out—until it came time to running functional tests. Unfortunately, but unsurprisingly, the schema dumper used to copy the database structure from development to test creates tables in alphabetical order. Now I can't actually think of a much better alternative but the problem is that this means the foreign key clauses were being generated against tables that possibly didn't exist at the time of execution. (For example, Order sorts before Product however the orders table has a foreign key to the products table.)

So anyway, the longer-term solution is to batch up the foreign-key declarations until the end of the script and then execute them but in the meantime, I just wanted to disable their generation altogether. In either case, I needed to interfere with the execution of ActiveRecord::Schema.define a static (boo, hiss) method. Now, for the fun bit.

In Ruby, the way to safely mix-in methods (rather than use subclassing) is to use some method chaining. For this we use alias_method. So, for example, if we wanted to override to_s to always place quotes around the value (nopt very useful but it will suffice for now) we could write a module like this:

module QuoteToS
  def self.included(base)
    base.class_eval do
      alias_method :to_s_without_quotes, :to_s unless method_defined?(:to_s_without_quotes)
      alias_method :to_s, :to_s_with_quotes
    end
  end

  def to_s_with_quotes
    "'#{to_s_without_quotes}'"
  end
end

This essentially says that when the module is included (ie mixed-in) to a class then: add a method named to_s_with_quotes; create an alias of the existing to_s named to_s_without_quotes; make an alias of the new to_s_with_quotes named to_s; and finally, whenever to_s. is executed, call the old to_s method (now named to_s_without_quotes) and surround the results with, you guess it, quotes.

To use this you would either manually include the module in a class or, more along the lines of aspects, force the inclusion with some code like this:

MyClass.send(:include, QuoteToS)

(As a side note, the use of unless method_defined?(:to_s_without_quotes) is to work-around a bug in Ruby 1.8.4 that causes an infinite recursion when using alias_method. I never detected it under Mac OS X but apparently it affects windows machines with monotonous regularity. D'oh!)

So that's all very well and good but what happens when you need to do the same thing with static methods? The answer is, use class << self and extend. In my case, overriding the behaviour of ActiveRecord::Schema.define looks something like this:

module ForeignKeyMigrations::Schema
  def self.included(base)
    base.extend(ClassMethods)
    base.class_eval do
      class << self
        alias_method :define_without_fk, :define unless method_defined?(:define_without_fk)
        alias_method :define, :define_with_fk
      end
    end
  end
  
  module ClassMethods
    def define_with_fk(info={}, &block)
      ...
      define_without_fk(info, &block)
    end
  end
end

Here, the use of base.extend causes all the methods defined within the module ClassMethods—an aribitrary name used by convention in most if not all the rails code I've ever seen—to be added as static methods on the class. Then, surrounding the alias_method calls within a class << self causes them to be executed in a static context.

Again, to have this code mixed-in to the existing ActiveRecord::Schema class looks like this:

ActiveRecord::Schema.send(:include, ForeignKeyMigrations::Schema)

Phew!

P.S. all the plugins are now available via SVN.

Update: See the comments on how to simplify the code thanks to Ryan Tomayko.

May 16, 2006

Auto-Generate Foreign-Key Constraints in Rails

Listen to this articleListen to this article

I literally just lobbed into my hotel room in Oslo after 30+ hours in transit. (I left home @ 12:30pm on Monday 15th and arrived here @ 21:30 on Tuesday 16th Melbourne time.) It's my first time in Scandinavia and so far I'm loving it. The people are friendly (and *ahem* rather attractive I must say) and I'm hanging out to try some scandinavian beer!

So anyways, after resisting the temptation to turn on my laptop, I finally caved and whipped up a little bit of code to auto-generate foreign-key relationships for migration scripts:

module ActiveRecord
  module ConnectionAdapters
    class ColumnDefinition
      alias nofk_to_sql :to_sql
      def to_sql
        name.to_s =~ /(.*)_id$/ ? "#{nofk_to_sql} REFERENCES #{Migrator.proper_table_name($1)} (id)" : nofk_to_sql
      end
      alias to_s :to_sql
    end
  end
end

The code assumes that if you have a column named customer_id in say an orders table, then you want a foreign key to the id column in the customers table. That doesn't handle situations where you have multiple foreign keys to the same table but...meh...I don't have models that sophisticated yet so bite me :)

Just looking back on it now, it was so trivial to implement that there is more syntactic noise than actual code. Hmmmm.

Update: the plugin is now available in downloadable form and supports a :references option for multiple columns or columns that aren't named for the table they reference.

May 08, 2006

Using a Single Development Database For all Rails Applications

Listen to this articleListen to this article

I used to have a separate database and/or user for each application I was developing. This worked for a while but once I had three or so projects underway, I grew tired of remembering which database and which login to use when running psql, etc. In addition, if anyone else wanted to do development, they too needed to either configure another rails environment (eg. james_development) or they had to create another database/user just as I had.

Then it occurred to me that psql (at least under Mac OSX) defaults to connecting to a database named for the currently logged in user. So, I reasoned, if all the developers had a default database, then if I could configure rails to connect to that I might be rid of my myriad databases. The problem was: how?

At first I tried not specifying a database in config/database.yml at all. D'oh! No luck. Then I wondered if I could add an ERB-ish macro for the current user (I may even have read that it was possible but I can't for the life of me recall where):

development:
  adapter:  postgresql
  database: <%= ENV["USER"] %>
  username: <%= ENV["USER"] %>

I'm not sure at what level this is being handled (rails, yaml, ruby, somewhere in between) but it worked! Now when the application loads in development, the name of the currently logged in user is substituted in for the name of the database.

The only other thing is to always remember to specify a table-name prefix in config/environments/development.rb and all the applications will happily co-exist in the same, default, database:

  config.active_record.table_name_prefix = "cjp_"

I've been playing with having all my applications in the one development database for a while now and it seems to be working out ok. On the down-side, it does mean I need to remember the table-name prefix when using psql, in some ways moving the problem rather than removing it, however it doesn't seem to get in the way especially because psql has table completion for table-names, column-names, indexes, you name it. (That's right: select * from cjp_[tab] where [tab] = '...';). The other downside of course is if I ever want to blow away the entire database I can't but then again, that's what rake migrate VERSION=0 is for.

I'm not sure if I'll continue in the long-term with this but it sure makes adding a new developer to the team dead-simple and makes my local machine configuration a lot simpler in the meantime. Besides, I needed an excuse to try using table-name prefixes so I could test my migration extensions. Guess what? They didn't handle it! They do now :)

May 04, 2006

Transactional DDL

Listen to this articleListen to this article

Have you ever been half way through a rails migration script only to have it bomb out with some error or another? You then correct the error and try to re-run it but because some of the statements have already been executed they fail when executed a second time.

Some databases (PostgreSQL for example) allow you to wrap DDL (Data Definition Language, create table, etc.) as well as DML (Data Manipluation Language, insert, etc.) in a transaction! This means that if any part of the script fails, the whole lot is rolled back. Imagine this: you drop a table, script rolls back and the table magically re-appears as if nothing had happened.

In rails migration scripts it's a simple matter of wrapping the entire up and/or down method with a Model.transaction do ... end. (Yet another time I really think the idea of making transactions a part of the model is particularly ridiculous. Transactions are cross-cutting concerns just like logging. Just ask the Aspect weenies, they'll tell ya.)

Here's a simple example:

  def self.up
    SystemProperty.transaction do
      create_table :system_property do |t|
        t.column :name,         :text,      :null => false
        t.column :value,        :text,      :null => false
        t.column :created_at,   :datetime,  :null => false
        t.column :updated_at,   :datetime,  :null => false
        t.column :lock_version, :integer,   :null => false, :default => 0
      end
    
      add_index :system_property, [:name], :unique => true
    end
  end

Now, I realise that create_table also supports the :force => true option so this is possibly not the best example but at least you can see how to go about making your migration scripts fully transactional, DDL and all.