Tools for working with the Task Assembly service for managing crowdwork projects.
Start by installing the task-assembly client by running the following command.
pip install task-assembly
Note that the task-assembly client uses some AWS resources. If you haven't done so already, you will need to install
the AWS CLI and configure it with your credentials (aws configure
) before using the client.
Next, run the following command to configure your Task Assembly API key:
task-assembly configure --key <api-key>
At any point, you can run the following command to check that your account has been correctly configured:
task-assembly configure --validate
If you are new to Task Assembly, you can start with a small sample project by running task-assembly example
to
copy a set of files to your working directory. These can be used with the instructions below to set up your first
project and help you build familiarity with how Task Assembly works.
First, we'll need to create a task-type for this project. This is just a placeholder and will be removed in
future updates to Task Assembly. Simply run the following command and capture the id that is generated.
When building the example application, use NumbersExample
as the type name.
task-assembly create_task_type <type_name>
Now, we'll want to create a new Task Definition for our task using the Task Type created in the previous step.
When building the example application, use Numbers
as the definition name.
task-assembly create_task_definition <definition_name> <task_type_id>
This will generate a new definition for our project and write a definition.json file that we will use to capture our task attributes.
Now we can update the definition.json
file with the appropriate attributes for our task. We've included
the core attributes for the example task below:
DefinitionId: <your_definition_id>
TaskType: <your_task_type_id>
Title: Convert number to text
Description: Write out a number as text
RewardCents: 10
Lifetime: 3600
AssignmentDuration: 300
DefaultAssignments: 2
MaxAssignments: 5
AutoApprovalDelay: 0
You can now run the following command to update the definition in Task Assembly:
task-assembly update_task_definition
The task interface for our project is in the template.html
file, and you will note that it uses a very similar
templating language as Amazon SageMaker Ground Truth and the Amazon MTurk website. In most cases you can simply
copy/paste existing task templates from those tools and update the variable names within the {{ name }}
values.
To apply this new task template to our definition, we can simply add a reference to it in our definition.json
file:
TemplateFile: template.html
Then run the update_task_defintion
command again.
task-assembly update_task_definition
The update_task_definition
command can be used to submit any future changes you make to your definition.json.
Now that we've completed the setup for our task, we should start by creating a test task in the Sandbox environment. The Sandbox is a mirror of the Production environment but no money changes hands and work isn't generally completed unless you do it yourself. This is a great way to validate that your task is set up correctly before putting it in the hands of real Workers.
Note: The MTurk Sandbox is distinct from the Production environment, you will want to create a new account at https://requestersandbox.mturk.com with the same email address as your Production account and also link it to the same AWS account.
To create a single MTurk task in the Sandbox, you can use the following command. The --sandbox
flag lets Task
Assembly know to create the task in the appropriate environment and the --assignments 1
flag indicates that we
only need to ask one Worker to provide an answer to the task (just you). Finally, number=4
provides the input
data to our task interface which refers to {{ number }}
from the html template.
task-assembly create_task --sandbox --assignments 1 number=4
Take note of the task ID that is returned. You'll need that to retrieve the results.
This will create a new task for you in the Worker Sandbox, which you can view and work on by visiting https://workersandbox.mturk.com. You'll likely need to create an account there, feel free to use your personal Amazon account or any other account you wish. Once you're logged in, you can search for your username or the title of the task. Then you can accept and complete the task.
After you've completed your task, give MTurk and Task Assembly a few moments to process the result. Then run the following to get the output.
task-assembly get_task <task_id>
Now we can repeat the same process to create a task in the Production environment by simply removing the
--sandbox
and --assignments
flags.
task-assembly create_task number=4
The same get_task
command can be used to retrieve results of the created task.
You may have noticed that the task result above includes Result
and Responses
values that looks a bit like this:
{
"Result": {
"default": [
{
"WorkerId": "A12NBGVI9QN3DQ",
"Result": {
"default": {
"numberAsText": "Four"
}
},
"AssignmentId": "3ERMJ6L4DYRPC5L5VMTQ0TYQKQG7M2"
},
{
"WorkerId": "ACKG8OU1KHKO2",
"Result": {
"default": {
"numberAsText": "four"
}
},
"AssignmentId": "3FTYUGLFSUK7L719U0FQJJX0K6T5D3"
}
]
},
"Responses": [
{
"WorkerId": "A12NBGVI9QN3DQ",
"Result": {
"default": {
"numberAsText": "Four"
}
},
"AssignmentId": "3ERMJ6L4DYRPC5L5VMTQ0TYQKQG7M2"
},
{
"WorkerId": "ACKG8OU1KHKO2",
"Result": {
"default": {
"numberAsText": "four"
}
},
"AssignmentId": "3FTYUGLFSUK7L719U0FQJJX0K6T5D3"
}
]
}
This output is the default
structure that Task Assembly uses and ensures that all the relevant data is included.
Of course, simplifying this output is preferred and will greatly simplify downstream data processing. To do this, we'll
use handlers
that will perform data processing on each response and completed task. In most cases, AWS Lambda
functions are the best option for these steps and Task Assembly helps this process by generating the Lambdas on your
behalf if you wish. This tutorial will use Task Assembly to manage the Lambdas, but you're welcome to create and deploy
them normally using your preferred approach (Console, Cloud Formation, CDK, etc).
If building the example application you'll find the handler code for this tutorial in the handlers.py
file. The first
function process_response
simply retrieves the numberAsText
form field and returns it. We can pull it into our
project adding the following to our definition.json
:
HandlerFile: handlers.py
SubmissionHandlers:
- Name: value
FunctionName: process_response
This tells Task Assembly to run the process_response
on the form outputs and return it in an attribute name value
.
To test how this impacts task results, run the following to update the task definition and then redrive the task
results to reprocess the results:
task-assembly update_task_definition
task-assembly redrive_task <task-id>
Now the Responses
list should be the following, simpler, representation of the Worker responses.
[
{
"WorkerId": "A12NBGVI9QN3DQ",
"Result": {
"value": "Four"
},
"AssignmentId": "3ERMJ6L4DYRPC5L5VMTQ0TYQKQG7M2"
},
{
"WorkerId": "ACKG8OU1KHKO2",
"Result": {
"value": "four"
},
"AssignmentId": "3FTYUGLFSUK7L719U0FQJJX0K6T5D3"
}
]
Next, we want to consolidate these two responses into a single value as a result for this task. To do that, we'll
add a consolidation handler. The consolidate_result
function in the example handles this step by simply looking
for agreement between Workers. If two Workers agree, then we'll use that result. If not, we'll return {'extend': True}
which will prompt Task Assembly to ask an additional Worker to provide a response, up to the MaxAssignments
. To add
this function to our task definition, we simply include the following in our definition.json
.
HandlerFile: handlers.py
SubmissionHandlers:
- Name: value
FunctionName: process_response
ConsolidationHandlers:
- Name: value
FunctionName: consolidate_result
Running update_task_definition
and redrive_task
will invoke this new consolidation logic, and we will now have a
single value result for our task.
{
"value": "four"
}
Now that our task is set up, we can run it on a larger set of data. If using the example, the batch.csv
file contains
a small batch of data. It is simply a single column CSV with column header number
that corresponds to our task input,
but multiple values can be used for more complex projects. To submit the batch, the following command can be used. The
first parameter is the name we want to give the batch numbers1
, followed by the source data file for the batch,
batch.csv
. Finally, we provide an S3 location we will use to store the input and output files created by Task Assembly.
task-assembly submit_batch numbers1 batch.csv s3://taskassembly-test/batches
The result is a batch_id we can use to monitor progress and retrieve the results. To get batch status, the following can be used:
task-assembly get_batch_status <batch_id>
This command can be used to monitor your batch progress. When the batch is complete, running the following to retrieve the results to a CSV file.
task-assembly get_batch_results <batch_id> output.csv
The last step is to establish some test tasks that we can use to ensure we're getting Workers who will
do a good job on our task. For the example we've created a data file of known or gold answers in gold.json
.
When Workers first start working on this task they will be prompted to answer at least two of these before
they can work on our real data.
To evaluate accuracy, we'll add a new handler score_response
that scores each Worker's response. For
this we'll use a very simple comparison and give them a score of 100 if they provide an exact match,
80 if they don't match on the inclusion of "and" or "-" in their answer (i.e. "twenty-four" or
"one hundred and six"), and 0 if they still don't match.
The following statements can be added to our definition.json
file to enable scoring of new workers.
As you can see, Workers will need to complete at least two tests with an average score of 80 before
they can begin working on the real tasks. Note that because it generally takes at least 10-15 seconds
before scoring is complete so Workers will often be asked to do a third test if they quickly accept another
task after completing their second test.
ScoringHandler:
FunctionName: score_response
GoldAnswersFile: gold.json
TestPolicy:
MinTests: 2
MinScore: 80
We can now create a new batch to begin using the new scoring that we've created. However, because scoring
doesn't work well with small batches, expand the size of your batch.csv
file to at least 20 numbers
before submitting a new batch to Workers. This limitation is a result of how MTurk handles HITs with
fewer than 10 Assignments and will be addressed in a future release of Task Assembly.
Even with a larger set items in this test batch, you may not get responses for all of your items, however
subsequent batches should have higher yield.
Once you've submitted a new batch, you'll note that get_batch_results
will now show test responses in
the results. Because this is the first time we've run a batch with scoring enabled, all Workers will need
to complete 2-3 test tasks. This means that we'll see a larger number of test responses and fewer Task
responses in our counts. In future batches the portion of responses associated with tests will steadily
decline.
- Response Counts:
Task: 36
Test: 109
Total: 145
To see how Workers did on your tests and their relative contribution to the final output, run
task_assembly list_workers workers.csv
to generate a report on how many responses
each Worker has provided for this task definition and their scores on the test tasks.
There are two additional policy options you can use to better control how tests are assigned,
ConsumeAssignments
and MinTestPercentage
.
The ConsumeAssignments
flag can be set to modify how Task Assembly decides to assign tasks to Workers.
By default, Task Assembly will assign the real task to a Worker who hasn't completed all of their tests
yet if there aren't enough available Assignments to meet the requested DefaultAssignments
. For example,
if we have DefaultAssignments
of 2 and 7 Workers have already completed tests using this HIT, then
TaskAssembly will ensure we get a result, regardless of whether they have completed any tests. MTurk
will not allow a HIT to have more than 9 Assignments if it starts with less than 10.
By setting ConsumeAssignments
to true, we let TaskAssembly know that it should enforce the test policy,
regardless of available Assignments.
TestPolicy:
ConsumeAssignments: true
MinTests: 2
MinScore: 80
MinTestPercentage: 3
The MinTestPercentage
tells Task Assembly to continually test Workers to ensure that their work continues
to be of good quality. In this case, setting the value to 3 tells Task Assembly that 3% of a Worker's tasks
should be tests.
A good practice for building a set of gold data you can use for testing is to do the work yourself in the MTurk Sandbox and then using your test responses as gold. To start simply create a test batch using a sample of your input data.
task-assembly submit_batch numbers_sample sample_data.csv s3://taskassembly-test/batches --sandbox --assignments 1
You can now complete this batch in the MTurk Worker Sandbox (https://workersandbox.mturk.com/). When the work is complete you can run the following command to generate a gold.json file containing a formatted gold set based on your responses.
task-assembly build_gold_from_batch <batch-id>