Friday, April 5, 2013

Captricity - the future of transcribing paper data

I heard about Captricity in my twitter feed (thanks Yaw) - and it sounded pretty great: upload images of your paper forms and Captricity will send you back all of the data in a spreadsheet. I told Jonas, Florens and Rita about it and we decided it would be interesting to investigate. Jonas sent me a form from one of his projects with some sample data in it, and today via a Skype screen share we tried it out together. It was pretty impressive. I uploaded the forms as PNG files. Then I had to draw boxes over the different fields on the form and give a name to it. We did just a few fields like name and user ID and a circle one option. The user interface was simple and easy to use. After defining the fields, we uploaded the same form but with different data on it, and that was it. They said it might take a few days for the results to come back, so we continued our meeting - but a few minutes later I received an email that the result was complete, and indeed it was. Captricity had successfully captured all of the hand written information into a nice table ready for export to a spreadsheet. The only thing it struggled on was the circle response:

But I might have setup the field wrong. And that's okay, because they gave me the option to reprocess the data. Overall, I am super impressed - the user interface is great and it does the job well with minimal fuss. 

It costs $0.20 per page (they give you free credit to test it out with). It works by sending the images to Amazon Mechanical Turk for processing by humans that are willing to do little jobs (like transcribing a couple of words) for a few pennies. This could be a problem if you have sensitive data (Captricity points out that no single Amazon Mechanical Turk user will see more than a few limited regions of the form). They also offer an option to let your own transcribers fill in the data. They would just login to Captricity and work through all of the pieces of data that would normally be sent to the Turk. This seems like a pretty great option when you have sensitive data, or when it might be cheaper (Tanzanian transcribers for instance), to just do the transcription yourself inside of a very nice interface.