MapReduce for Ruby: Ridiculously Easy Distributed Programming

Posted by stoyan Mon, 21 Aug 2006 07:03:00 GMT

Google’s MapReduce is now available for Ruby (via gem install starfish ). MapReduce is the technique used by Google to do monstrous distributed programming over 30 terabyte files.

Here is the basic code that will get you up and running with MapReduce in Starfish .
    # item.rb
    ActiveRecord::Base.establish_connection(
      :adapter  => "mysql",
      :host     => "localhost",
      :username => "root",
      :password => "",
      :database => "some_database" 
    )

    class Item < ActiveRecord::Base; end

    server do |map_reduce|
      map_reduce.type = Item
    end

    client do |item|
      logger.info item.id
    end
Now just run:
    starfish item.rb
and Starfish takes care of the rest. The code above does the following:
  • The server grabs all the items via: Item.find(:all)
  • Each of the clients grab an item from the collection
  • When there are no more items to be grabbed, everything shuts down

Just add REST (and it’s come by default with the Edge Rails) and you’ll have your own S3 or GDrive for free ;)

Comments

(leave url/email »)

  
Powered
Ruby Blogs Directory
Performa
Box.net Refer