Fundamental concepts in cultural heritage crowdsourcing
I will introduce some of the fundamental concepts in crowdsourcing by describing tasks commonly found in cultural heritage projects, with examples for each.
When teaching crowdsourcing, a simple, informal categorisation of participant actions can be used in conjunction with categories of task size and role.’ Tasks are grouped into three types, according to their size and role: microtasks, macrotasks, and metatasks, which I will briefly define before describing participant actions."
Microtasks are small, rapid, self-contained tasks. For example, the New York Public Library’s Building Inspector has broken down the task of checking building shapes and text transcribed from historical fire insurance maps into five extremely focused, tiny microtasks embedded in a specialised interface. Microtasks can be addictively satisfying because several can be completed in a short amount of time. Tasks such as tagging images are popular microtasks. In some cases, the simplicity of the task combined with the unpredictability of the items that appear in the queue can ‘hook’ participants.
Macrotasks are longer, and/or more complex tasks that often involve higher order decisions about what to record and how. The text transcription task in Transcribe Bentham* is a macrotask because the handwriting is difficult to decipher, whole pages are transcribed at a time (rather than line-by-line), and because participants can also ‘mark-up’ transcribed text to highlight insertions, deletions, etc., adding complexity to the task.9
Metatasks are activities that relate to the overall project, rather than individual tasks.This includes taking part in project design or analysis, and contributing questions, comments and answers to participant discussion fora. The Old Weather forum1" is a justly famous example of the benefits of participant discussion, with a wealth of information shared and topics discussed.
Participant actions can be described according to how much creative freedom they have when completing the task and where it fits into the overall workflow. An informal categorisation I have found effective in teaching is:‘type what you see ,‘describe what you see’,‘share what you know ,‘share what you have ,‘validate other inputs'.
‘Type what you see’ tasks ask participants to type out or correct transcriptions from the item presented to them, and offer very little creative freedom. These tasks may be micro- or macrotasks. Transcription has been called a ‘mechanical’ task (Dunn & Hedges, 2012) but the difficulty varies according to the source material. Printed text is easier to decipher than unfamiliar older forms of handwritten text with unorthodox orthography that may require the transcriber to make difficult decisions.The National Library of Australia’s Trove (Holley, 2009,2010) platform for newspaper collections11 includes functions to correct errors in automatically-generated text, and has been both hugely influential and productive (Holley, 2009, 2010). Other examples include the New York Public Library's What’s on the Menn project.12 Like Trove, the Menu interface shows the benefits of expert attention during the design process—the front page anticipates and addresses common barriers to participation, provides a range of tasks to suit different preferences, and the task itself is tightly focused on the transcription task, with items pre-processed to minimise distractions.
Transcription tasks may require a single contributor to transcribe an entire passage or page of text or audio, or they may break the task into smaller components (e.g. a line of text or a snippet of a recording). The British Library’s In the Spotlight'3 project first asks participants to mark out the titles of plays on historical playbills; marked titles are then transcribed in a separate task. These tasks may not offer much creative freedom, but they can be immensely engaging, and lead to exploration of the collections and related topics outside the task.
‘Describe what you see’ tasks are designed to annotate items with additional information from formal taxonomies or informal folksonomies (Vander Wai, 2007), and includes identification and classification tasks such as tagging items with descriptive keywords. Image tagging on Flickr Commons'4 is perhaps not quite ‘crowdsourcing’, as the tagging activity can be spontaneous rather a response to direct requests from the relevant GLAMs, but it provides a good example of the benefit of user-contributed keywords in aiding discoverability (Springer et al., 2008). Other early, influential projects include the art tagging projects steve.museum,15 Brooklyn Museum's game, lag! You’re It (Bernstein, 2014), and Waisdd? for video tagging (Oomen, Gligorov, & Hildebrand, 2014).The BBC's World Service Archive prototype used a combination of crowdsourcing and automated tagging on audio files (Raimond, Smethurst, & Feme, 2014). Non-text forms of descriptive annotation include the Klokan Ceoreferencer implemented by the British Library16 and the Micropasts ‘photomasking’ task that helps generate 3D models from photographs (Veldhuizen & Keinan-Schoonbaert, 2015).
‘Share what you know’ tasks may collect factual information or personal stories about collections by drawing on existing knowledge, or asking volunteers to conduct research.The Lives of the First World War1' project asked participants to commemorate people who served in the war by ‘sharing their stories, find their records and adding known facts’, targeting the enthusiasm and research abilities of family and local historians. The Museum of Design in Plastics 10 Most Wanted project crowdsourced research into their specialist collection (Lambert, Winter, & Blume, 2014), and comments on Flickr Commons sometimes note personal research or family stories about people, places, artefacts and events.
‘Share what you have’ projects collect items physically or digitally. RunCoCo’s Community Collection Model (Berglund I’rytz, 2013) has been adapted by Europeana for their First World War and Migration collecting projects.18The British Library’s UK Soundmap project collected audio recordings over 2010—11.19 The Letters 1916—1923 project2" digitises and transcribes items held in private and public collections.
Tasks to ‘validate other inputs' can be designed to crowdsource quality control processes for content created in other tasks. They tend to occur within ‘ecosystems’ of tasks,21 a design pattern in which task interfaces or applications are combined to process different aspects of the same source materials. Building Inspector is an example of this, as each of the five tasks offered contribute to the larger goal of digitising the maps.Validation tasks may be micro-, macro- or meta-tasks, and include checking tags or annotations added by others, or moderating forum discussions. Increasingly, participants are verifying the results of tasks by software, not people, as ‘human computation’ systems develop (Collings, 2015; Crowley & Zisserman, 2016).