Higher-order messaging

WARNING: Deep magic ahead! (and a very long article)

I stumbled on a concept called Higher-Order Messaging (HOM) today. It’s a handy way to allow an object-oriented language to call a message on all members of a collection without having to manually iterate over the collection, like so (Ruby):

[0, 1, 2].where.nonzero? => [1, 2]

That means “Give me the subset of [0, 1, 2] where the item is nonzero”. Quite natural syntax, eh? The usual Ruby equivalent is:

[0, 1, 2].select {|x| x.nonzero?}

which is a little more verbose. When I read the paper, I jumped immediately to implementing their select: method in Ruby. Thus launched an afternoon of investigation that led me to some interesting conclusions.

        <span id="more-60"></span>

        <h3>Say what now?</h3>

The name “higher-order messaging” comes by analogy with higher-order functions, as known in functional languages. In a language that supports higher-order functions, a function can be passed as a parameter to another function. The analogy, then, is that a message can take another message as a parameter. In the above example, that presumably means you’re passing the #nonzero? message as a parameter to the #where message; the #where method then calls #nonzero? on each item in the array.

But wait, that’s not quite right. Syntactically, you’re calling #nonzero? on the result of #where. That’s not at all the same as “passing #nonzero? as a parameter to #where“.

Well, “higher-order messaging” is a misnomer. What you’re actually doing is sending a message to a proxy object that’s returned by the “higher-order method”. Then, whatever message you send the proxy object, it sends that message to every member of the Enumerable it’s proxying.

First implementation

Here’s the code:

module Enumerable
  # 'select' is already taken
  def where
    WhereProxy.new(self)
  end
end

class WhereProxy
  def initialize(enum)
    @enum = enum
  end

  def method_missing(meth, *args, &block)
    @enum.select { |x| x.__send__(meth, *args, &block) }
  end
end

This is the natural first implementation. Your enumerable object gets a #where method that returns a proxy object that works as described above.

Once I got to this point, I looked around to see if anyone else had implemented HOM in Ruby already. Someone had, of course; and someone else improved upon it. Let’s take a look at these.

Another implementation

The first aforementioned implementation works about like my first implementation, but a little bit generalized. First, the author creates a superclass for all the proxy classes we’ll write (paraphrased from here):

class HigherOrderMessage
  instance_methods.each do |method|
    undef_method(method) unless method =~ /__(.+)__/
  end

  def initialize(obj)
    @obj = obj
  end
end

This saves us from having to write the same constructor for every proxy class; it also undefines every method it inherits from Object except #__id__ and #__send__ (the absolutely essential ones), so any method we call will be passed to the proxied object (including #id and #send!). To write a proxy class for a particular method, we subclass HigherOrderMessage and implement #method_missing to actually proxy messages as desired (paraphrased again):

class WhereProxy < HigherOrderMessage
  def method_missing(meth, *args, &block)
    @enum.select { |x| x.__send__(meth, *args, &block) }
  end
end

Now that we have our proxy class, we can implement the method that returns an instance of it (again, paraphrased):

module Enumerable
  def where() Where.new(self); end
end

An improvement

Okay, so this leaves us implementing a lot of classes, and a lot of methods that instantiate those classes and return the instance. (How very Java-y.) That’s a definite code smell, especially in Ruby. There must be a more terse way; the second implementor, of course, recognized this. Here’s the second implementation (paraphrased):

class HigherOrderMessage
  instance_methods.each do |method|
    undef_method(method) unless method =~ /__(.+)__/
  end

  def initialize(&block)
    @block = block
  end

  def method_missing(meth, *args)
    @block.call(meth, *args)
  end
end

See what happened there? Now, instead of subclassing HigherOrderMessage and implementing #method_missing, we just pass a block to the constructor, and that block *is* our specific #method_missing implementation. No subclassing needed.

So to actually implement a higher-order method, we return an instance of HigherOrderMessage with the proper code block. This gets us pretty far; we can even “chain higher-order methods” (rather innacurate terms) to create some interesting effects:

module Enumerable
  def every
    HigherOrderMessage.new do |meth, *args|
      self.each { |x| x.__send__(meth, *args) }
    end
  end

  def where
    HigherOrderMessage.new do |meth, *args|
      self.select { |x| x.__send__(meth, *args) }
    end
  end

  def all
    HigherOrderMessage.new do |meth, *args|
      self.collect { |x| x.__send__(meth, *args) }
    end
  end

  def do_inject(start_value)
    HigherOrderMessage.new do |meth, *args|
      self.inject(start_value) { |a, x| a.__send__(meth, x, *args) }
    end
  end

  def having
    HigherOrderMessage.new do |id, *args|
      HigherOrderMessage.new do |secid, *secargs|
        select { |x| x.__send__(id, *args).__send__(secid, *secargs) }
      end
    end
  end
end

[0,1,2,3,4].where.nonzero? => [1,2,3,4]
[0,1,2,3,4].do_inject(0).+ => 10
[0,1,2,3,4].having.succ < 3 => [0,1]

Catch that last one? That means “give me each element of [0, 1, 2, 3, 4] whose succeeding value is less than three”. Neat, huh?

Can we still do better?

There’s still a couple ways we can do better, though, and here’s where my own innovations start. First off, a lot of the implementations of iteration functions look the same. Can we find a way to avoid repeating all that nasty block-construction code? What I want is this:

module Enumerable
  define_curried_method(:every, :each)
  define_curried_method(:where, :select)
  define_curried_method(:all, :collect)
end

…that is, I know that #each, #select, and #collect all have the same signature (and their blocks all do too), even though they do different things. Can’t I write some meta-code to give me higher-order versions of these? Here’s how we do it:

class Module
  private
    def define_curried_method(name, method)
      define_method(name) do
        HigherOrderMessage.new do |sym, *args|
          self.__send__(method) { |x| x.__send__(sym, *args) }
        end
      end
    end
end

Making our new define_curried_method syntax a private instance method of Module puts it on par with attr_accessor and friends, so now our nice, terse higher-order method definitions on Enumerable work as expected.

Another problem is that there’s currently no way we can pass a block to the method we’re calling on every object in the collection. For example, if we have an array of strings and want to do the same substitution on all of them, here’s what we need:

%w[an array of strings, some with double letters].all.gsub(/(.)\1/) do |match|
  (match == 'rr' ? 'ARRRR' : match)
end
=> ["an", "aARRRRay", "of", "strings,", "some", "with", "double", "letters"]

Well, with a few modifications, we can make it work. There’s a snag, though; unlike methods, blocks in Ruby can’t take a block as their last parameter, since there’s no way to pass a block to a block, syntactically speaking. We can push a block onto an array, though, and that means we can pass a block to a block. The trick is that the receiving block needs to know how to call the block it was passed. Going back to our simpler, less automatic implementation:

module Enumerable
  def all
    HigherOrderMessage.new do |meth, *args|
      if args.last.is_a? Proc then block = args.pop; end
      self.collect { |x| x.__send__(meth, *args, &block) }
    end
  end
end

This is a start, but how does the args array get the block we pass as its last item? We need to catch the block in HigherOrderMessage#method_missing and push it onto the argument array for the block:

def method_missing(meth, *args, &block)
  args << block if block_given?
  @block.call(meth, *args)
end

Okay, that’s all well and good, but now we’ve lost the define_curried_method macro, and we have an extra line to write in the block in each of our higher-order method implementations. Can we fix our current macro implementation?

class Module
  private
    def define_curried_method(name, method)
      define_method(name) do
        HigherOrderMessage.new do |sym, *args|
          if args.last.is_a? Proc then block = args.pop; end
          self.__send__(method) do |x|
            x.__send__(sym, *args, &block)
          end
        end
      end
    end
end

There we go. Now we can pass blocks to our curried methods, and it’ll all work! Arrrrr!

%w[an array of strings, some with double letters].all.gsub(/(.)\1/) do |match|
  (match == 'rr' ? 'RRRR' : match)
end
=> ["an", "aRRRRay", "of", "strings,", "some", "with", "double", "letters"]

Limitations

  1. We can’t use the macro to implement our #do_inject, because the method it writes only passes one argument to the block; #inject needs two arguments to its block.

About this entry