Presentation on theme: "Crowd Sourcing and Community Management Capabilities Available within Symbiota Data Portals Nico Franz 1, Corinna Gries 2, Thomas Nash III 2 & Edward Gilbert."— Presentation transcript:
Crowd Sourcing and Community Management Capabilities Available within Symbiota Data Portals Nico Franz 1, Corinna Gries 2, Thomas Nash III 2 & Edward Gilbert 1 1 School of Life Sciences, Arizona State University 2 Center for Limnology, University of Wisconsin TDWD 2013 Annual Conference, Florence, Italy Building and Maintaining Crowd Sourcing Websites and Their Communities October 29, 2013 Presentation
Lichen, Bryophytes and Climate Change – scope and goals Support – NSF ADBC Program – Award EF Covering ~ 2.3 million specimens: 900,000 lichens & 1.4 million bryophytes 90% of all specimens housed in this region; > 60 non-governmental herbaria LBCC has 16 focused digitization centers where voucher labels are imaged = LBCC imaging centers
LBCC sustaining Symbiota-based data portals Consortium of North American Lichen Herbaria URL: Currently with 51 member collections Total of 1,084,888 records (October, 2013) Consortium of North American Bryophyte Herbaria URL: Currently with 46 member collections Total of 1,437,735 records (October, 2013) Each portal is sustained by Symbiota CNALH CNABH
Lichen portal – 7,302 visitors / 3 months Bryophyte portal – 1,530 visitors / 3 months LBCC member portals are active virtual environments
Overview of the LBCC digitization workflow (label imaging) The LBCC digitization workflow (completion of records) depends critically on the sustained participation of editors/transcribers, and of volunteers.
Imaging Stage Capture Image barcode in file name Create Skeleton File species name, country, state, exsiccati, etc. Upload to FTP server Image processing extract barcode, create web versions, map to portal DBs Herbarium Database Automated OCR Tesseract, ABBYY Existing Record simply link image Upload to FTP server Image URLs Manage Specimen Data in Portal Manage / Review Records in Portal Symbiota Editor review, edit, keystroke Create New Record barcode, image, skeletal data Automated NLP Darwin Core Parsing Crowdsourcing Central Detailed workflow diagram LBCC Crowdsourcing elements Workflow "closes" in home collection
How does LBCC engage volunteers?
LBCC Drupal website is primarily a means to socially and intellectually engage and recruit prospective volunteers.
Login to each member portal is simple, requiring no special rights.
"Annotate the Harriman Alaska Expedition"
"Transcribe the ALCAN Expedition" "Create your own…" * * Instant feedback on data volume.
Listing of pending "Harriman" records; each Symbiota ID is clickable to edit.
The LBCC digitization workflow pipeline has produced a "skeletal record", including: Record GUID Thesaurus-ratified Scientific Name (not editable) OCR of voucher locality label image "Parse OCR (LBCC)" [a custom LBCC program] will get the transcription process underway. LBCC Crowd Sourcing Central Record
1. Initial "Parse OCR" outcome (issues with lat/long transcription)
2. Correction of the parse using Symbiota tools (e.g. GeoLocate)
3. Approaching a clean record 1 transcription, ready for saving 1 DwC Class: CleanRecord – Utter these two words in front of a TDWG audience, then immediately prepare to… [remainder not yet ratified].
Crowd Sourcing Central – Score Board * Options to review one's submitted records and review points assigned (by the collection's manager). * See also Appendix I.
Crowd Sourcing Central – User's Review Pages * My pending records with LBCC. My 2/4 approved records/points. * See also Appendix III.
Crowd Sourcing Central – Collection Manager's Control Panel * See also Appendix II. 4215 newly digitized records are available for addition to the queue. 25 submissions pending.
Crowd Sourcing Central – Collection Manager's Review Pages * See also Appendix III. 2 points = default score. Specific feedback possible.
A key purpose of the LBCC portal CS entry environment is to create a user experience that is personalized. Special expeditions are a subset of the records queue for CS data entry, and are identified as being part of a "special group/theme" of specimens. Expeditions are meant to educate those who are performing the data entry about a specific event. They also aid data entry because the user generally deals with a homogeneous type of label format, as opposed to shifting between numerous layout types. User input and managerial control (review, feedback, scoring) are interactively facilitated in the same Crowd Sourcing module implemented in Symbiota. Lichen, Bryophytes and Climate Change – CS in review
TDWG 2013 Symposium organizers – Paul Kenneth Flemons Ben Brandt & John Brinda – LBCC software development Participating CNALH & CNABH collections NSF Award EF "Digitization TCN – Collaborative Research: North American Lichens and Bryophytes: Sensitive Indicators of Environmental Quality and Change." Acknowledgments https://sols.asu.edu
Appendix I: Crowd Sourcing Central "rules of engagement" 1.Available at 2.Shows scores and collections participating in crowd sourcing, along with their statistics. 3.Available to all viewers of the site, irrespective of whether they are logged in. 4.If users log in, then their scores will be displayed in a separated information "box". 5.The link above will generally – on most portal sites – be added to the main left menu, or made available from another crowd sourcing page that is custom generated for a project. For instance, LBCC will likely link to this page from their main Drupal page. 6.Clicking on "review records" within the Current User's Standing box will take the user to the Review page (see Appendix III). 7.Clicking on numbers within the collection table will take the user to a list of specimens queued up for data entry (and open specimens within the CS queue).
Appendix II: Collection Manager's Crowdsourcing Control Panel 1.Available at 2.Available only to collection managers. 3.Shows statistics only for a given collection. 4.Available also from the collections control panel (not yet implemented in the public site) and in Crowd Sourcing Central (via the editing symbol to the right of the collection names). 5.Allows managers to edit crowd sourcing instructions or link to a training URL. 6.A link to the right of "Available to Add" is where a collection manager would add their records to the Crowdsourcing queue.
Appendix III: Review Pages for contributors, managers 1.Available from a collection manager's perspective or a user's perspective, yet behaves somewhat differently depending on the perspective. 2.Collection manager perspective: 1.Main purpose is to enable a quick review of specimen records that are pending (or re-review of closed records). 2.Available from the Collection Manager’s Crowdsourcing Control Panel by clicking on the "Review" link to the right of the numbers. 3.A collection manager can assign points to an annotated record (2 points is the default value), comment, and change the CS status to closed (approved). 4.Managers can edit all records, whether they are pending or closed. 3.User perspective: 1.Available from Crowd Sourcing Central by clicking on "Review Records" within the Current standing box. 2.Allows user to review and access records with pending status. 3.Allows user to review points and provide comments for closed records. 4.Users can edit all pending records. 5.Users can review yet not edit records that have been closed by a collection manager.