Mechanical Turk in Ruby

published 11 Aug 2009 by Mark Percival
filed under: Code

This is part of a series of Mechanical Turk blog posts – check out more at Economics of Mechanical Turk, Mechanical Turk Pages in JQuery, and Building Mechanical Turk into Rails(coming soon).

If you’ve used Mechanical Turk, you already understand some of the initial challenges of getting a successful process in place. Even without getting into the development side of it, Mechanical Turk involves a great deal more than a technical problem that can be solved by a good API.

There’s the issue of payment, task design, and processing the results. It’s actually an incredibly challenging problem, which we’ve had fun solving. The variable list is huge, and eventually you start asking yourself questions like “What’s the best time to request a task if the ideal worker is in the Phillipines?”

I’m going to first touch the basics, then move into code and development using our RTurk ruby gem.

What is Mechanical Turk

Simply, it’s a service where you can ask a human worker to perform a task. It can be anything, quite literally. You can ask them to verify a picture is not vulgar, or call and do phone number verification. So when I go through some of the details I’m going to give some easy sample problems we can solve and narrow down the complexity.

How it works – Worker side

When a worker logs in, he sees a list of tasks available to him, the price and a basic description of the task to be performed. He clicks on one that he prefers and looks at the actual task. He can then choose to accept it, or go back to the list and choose a different task. At this point if he accepts the HIT he’s perform it and submits the answer.

Let’s have an example:

The task is to look at a picture and select if it’s vulgar or inappropriate. On his end all he sees is the Mechnical Turk website with a page framed with the question. It’s a simple iframe with the page inside detailing the task, showing the picture and asking for a response and submission. It’s your basic HTML form and it can be hosted on Amazon or offsite. Surround the the frame is the MTurk site with information like how much you’ve made and other metrics.

How it works – Requester side

You can use Amazons question builder, which uses a template and a simple web GUI editor to build a question for a worker, but we’re developers and doing this in Ruby, so lets ignore that since it’s quite trivial.

What you need to know is that there are two ways to submit a HIT. Submit an XML document with the questions, requirements and layout information, which Amazon will transform into an HTML form which they host locally. Or submit a URL which Amazon will place via iframe inside their own site.

I’m not even going to touch the XML route. In my opinion it’s far more complicated than it needs to be and doesn’t give you near the freedom to build the form how you wish. The ‘external HIT’, as Amazon refers to it, gives you an incredible amount of freedom to craft the task page and even use javascript UI tweaks.

When you make a request, you’ll send up the external HIT url, the requirements, and reward amount in a restful request. Your external page should include a form, obviously, that posts to Amazon’s external hit submission url, and one of your fields should include the AssignmentID so Amazon knows who it’s from.(It’ll be passed to you in the URL)

Let get to the code

I’m going to keep this simple, so the example task will be to flag a picture if it’s vulgar. And I’m going to use static page external HIT. In this case we are going to drop the Turker(Amazon’s term for the workers) to a page hosted on s3 at http://s3.amazonaws.com/squarepush.com/turk/picturerate.html

On the request side

It’s as simple and writing up the properties and creating the HIT.

Some things to note

  1. We are passing the picture_url parameter along with the URL
  2. MaxAssignments lets more that one person perform this task(ie. let 3 people rate the same picture)
  3. You’ll see we have a reward and qualifications of 90% approval rate
  4. In this case the turker will be directed to http://s3.amazonaws.com/squarepush.com/turk/picturerate.html?picture_url=example.jpg&AssignmentId=abcd12345 where you’ll be able to get the AssignmentID from the URL and place it in the form to be submitted.
require 'rturk'

props = {:Title=>"Tell me if this picture is safe for general audiences", 
         :MaxAssignments=>1, :LifetimeInSeconds=>3600, 
         :Reward=>{:Amount=>0.05, :CurrencyCode=>"USD"}, 
         :Keywords=>"twitter, blogging, writing, english", 
         :Description=>"I need to know if this picture is vulgar or adult only",
         :RequesterAnnotation=>"Example1",
         :AssignmentDurationInSeconds=>3600, :AutoApprovalDelayInSeconds=>3600, 
         :QualificationRequirement=>[{
           # Approval rate of greater than 90%
           :QualificationTypeId=>"000000000000000000L0", 
           :IntegerValue=>90, 
           :Comparator=>"GreaterThan", 
           :RequiredToPreview=>"false"
           }]
        }

@turk = RTurk::Requester.new(AWSAccessKeyId, AWSAccessKey, :sandbox => true)
page = RTurk::ExternalQuestionBuilder.build(
  "http://s3.amazonaws.com/squarepush.com/turk/picturerate.html", :picture_url => 'example.jpg')

# Turkers will be directed to http://s3.amazonaws.com/squarepush.com/turk/picturerate.html?picture_url=example.jpg&AssignmentId=abcd12345

@turk.create_hit(props, page)

Want to know more about the landing page for the Turker – I put it in a separate article – “Mechanical Turk External HIT Pages with JQuery”

Getting the results

After you’ve created your HIT’s it becomes a waiting game. Turkers will look for promising HIT’s that they believe are worth their time, open them up to preview them, then accept and complete them.

If you’re having multiple Turkers do a single task(getting more than one opinion of the picture for example), you’ll have multiple assignments attached to that HIT.

Amazon has what I consider to be a pretty goofy format for the answers, which come in the form of escaped XML, inside of a wrapper XML document. Yes, you read correctly – an escaped XML document inside of an XML document. I’ve taken the liberty of having the RTurk library go ahead and unescape it and pull it in as a hash. I hope you don’t mind.

Again, lets have an example (You’ll need to have saved each HIT id from the creation, which I usually put in a database, but I won’t go into that now)

@turk = RTurk::Requester.new(aws['AWSAccessKeyId'], aws['AWSAccessKey'], :sandbox => false)

answers = []
@turk.get_assignments_for_hit("ABCDEFG1234356789").each do |assignment|
   answers << assignment['Answer']
end

You should get answers that look something like this:

 [‘inappropriate’ => ‘yes’, ‘assignmentId’ => ‘12345abcde’]

At this point you can decline to pay the worker if you feel the task wasn’t completed correctly, or let the worker get paid automatically (set via the ‘AutoApprovalDelayInSeconds’).

Tips and Tricks

  1. Keep it external and build your own submission page. This could be a page on a rails app, or a simple HTML page with javascript. But keeping it external gives you the ability to customize as you see fit.
  2. Pass a unique record id with each HIT. You can’t assume that every incoming Turker completes the task, in fact many return it. Pass in an identifier to each HIT created and let Amazon deal with the messy details of returned HIT’s and previews.
  3. Make sure the page is easy to read and loads quickly. This will encourage Turkers to select and complete the HIT, and at lower prices.

Other Reading

Need Help?

Although we wrote and open sourced the RTurk gem with the intention of giving everyone easier access to Mechanical Turk, it’s still a complex system with a steep learning curve. Give us a call if you need help – we’re the Ruby experts when it comes to Mechanical Turk.


blog comments powered by Disqus