Hackathon Challenge: (Semi-) Automating DNA Collection Sara Farmer Noah Hofmann-Smith Jonathan Undy
Outline Need to assess country preparedness on onset of disaster QUICKLY. Lots of sources, but is not machine accessible.
Motivation Websites: Html, xls, csv, apis etc Template Creator Partially-filled indicators spreadsheet Researchers Completed indicators spreadsheet DNA Analyst
Outline 2 Process for automation: Scrape data from webpages Transform scraped data into CSV files Automatically load data from CSV files into standard Excel report Sara and team (partially completed already) Noah and Jonathan
Scraping data and CSV files (Sara)
Scrapers
CSV Data Files
Loading from CSV files to Excel (Noah & Jonathan) Challenges: Key indicators referred to differently by different sources Several years’ worth of data Countries not included in all datasets
Challenges going forward Improving data quality. (E.g. unpacking compound data items from the same field.) Continue to develop the standard list of indicators. “Close the loop”. Eliminate manual cleaning of the scraped data.