Presentation is loading. Please wait.

Presentation is loading. Please wait.

Intro To Data Journalism Marc Ellison

Similar presentations


Presentation on theme: "Intro To Data Journalism Marc Ellison"— Presentation transcript:

1 Intro To Data Journalism Marc Ellison

2 Overview Who is this guy? Who is this guy? What is data journalism and why does it matter? What is data journalism and why does it matter? Who’s doing it? Who’s doing it? OK, but what have you done? OK, but what have you done? Data journalism in the Canadian newsroom Data journalism in the Canadian newsroom Get mappin’ Get mappin’ Get scrapin’ Get scrapin’ Resources Resources Questions Questions

3 Who is this guy? Freelance data- and photojournalist Freelance data- and photojournalist Produced features and multimedia for variety of publications Produced features and multimedia for variety of publications Worked in Canada, Rwanda, South Sudan and Uganda. Worked in Canada, Rwanda, South Sudan and Uganda. BA in History BA in History MSc in Computer Science + 10 years as web developer MSc in Computer Science + 10 years as web developer Pre-midlife crisis Pre-midlife crisis

4 So, what is data journalism? “Data journalism is obtaining, reporting on, curating and publishing data in the public interest.” [Jonathan Stray] “Data journalism is [...] the convergence of a number of fields [...] - from investigative research and statistics to design and programming.” [Paul Bradshaw]

5 So, what is data journalism?  “Data driven journalism is a workflow that consists of the following elements: digging deep into data by scraping, cleansing and structuring it, filtering by mining for specific information, visualizing it and making a story.” [Mirko Lorenz]

6 Why do it? Rapid advancement of technology == greater digitalization of data Rapid advancement of technology == greater digitalization of data People’s lives are data People’s lives are data Help/prove a complex story Help/prove a complex story Reveal “abstract threats” to society Reveal “abstract threats” to society Combined with traditional reporting techniques we can tell stories in more compelling + innovative ways Combined with traditional reporting techniques we can tell stories in more compelling + innovative ways

7 Why do it? Age of open data (debatable in Canada) Age of open data (debatable in Canada) Add a string to your bow: being a good writer is no longer enough i.e. job ads for “multimedia journalists” Add a string to your bow: being a good writer is no longer enough i.e. job ads for “multimedia journalists” Fill a niche: handful of recognized data journalists in Canada Fill a niche: handful of recognized data journalists in Canada As more and more paywalls go up, outlets are looking for inventive ways to drive traffic to their sites and increase subscriptions As more and more paywalls go up, outlets are looking for inventive ways to drive traffic to their sites and increase subscriptions

8 Why do it?  Journalists need to be data-savvy. It used to be that you would get stories by chatting to people in bars…But now it's also going to be about poring over data and equipping yourself with the tools to analyze it and picking out what's interesting.” [Tim Berners-Lee]

9 “Data Journalism Is The New Punk”

10 No excuses: free tools and tutorials “Arguably punk was most important in its influence, encouraging kids in the suburbs to take up instruments, with little or no musical training. It represented a DIY ethos and a shake-up of the old established order. It was a change. “Crucial to it was the idea: anyone can do it.” [Simon Rogers] datablog/2012/may/24/data- journalism-punk#

11 Who’s doing it? The Guardian: Iraqi War Logs tive/2010/oct/23/wikileaks-iraq-deaths-map

12 Who’s doing it? The Guardian: UK Riots

13 Who’s doing it? NY Times: Basketball Statistics shot-analysis.html?ref=multimedia

14 Who’s doing it? NY Times: Election itics/paths-to-the-white-house.html

15 Who’s doing it? LA Times: Murder Map

16 Who’s doing it? Telegraph (UK): MP expenses home

17 Who’s doing it? OpenFile: CensusFile

18 OK, but what have you done? Mapping B.C.’s bicycle collisions Vancouver Sun

19 OK, but what have you done? Search Public Sector Salaries Vancouver Sun

20 OK, but what have you done? Mapping Grow-Op Busts Vancouver Sun

21 OK, but what have you done? Stanley Cup Riot Charge Database Vancouver Sun

22 OK, but what have you done? Failed Restaurant Inspections Vancouver Sun

23 Data journalism in the Canadian newsroom  Canada and optimism in Data Journalism Handbook optimism  Look at outlets in book – UK, Germany, USA, France – none are Canadian  “They don’t know what they’re looking for”  “Work off the side of your desk”  Hacks ‘n’ Hackers as an outlet?

24 The process: chicken and the egg? Diagram: Mirko Lorenz

25 The process 1) Find: Searching for data on the web 2) Clean: Process to filter and transform data, preparation for visualization 3) Visualize: Displaying the pattern, either as a static or animated visual 4) Publish: Integrating the visuals, attaching data to stories 5) Distribute: Enabling access on a variety of devices, such as the web, tablets and mobile 6) Measure: Tracking usage of data stories over time and across the spectrum of uses. [Paul Bradshaw]

26 Get Mappin’ Case Study: Bicycle-Car collisions In B.C.

27 What data? What is the story you want to tell? What is the story you want to tell? Does it need telling – think of your pitch? Cycling is a hot-button issue Does it need telling – think of your pitch? Cycling is a hot-button issue Brainstorm what data may be available and where you can find it i.e. plan ahead if FOI/ATI is needed? In this case: ICBC Brainstorm what data may be available and where you can find it i.e. plan ahead if FOI/ATI is needed? In this case: ICBC How will you visualize the data i.e. map, graph, database? How will you visualize the data i.e. map, graph, database?

28 Get your data Is the data online freely available? Is the data online freely available? Speak to people in the know i.e. city or govt officials Speak to people in the know i.e. city or govt officials Crowdsourcing: BuzzData, GeoCommons etc. Crowdsourcing: BuzzData, GeoCommons etc. Webscraping Webscraping Past ATI/FOI requests – someone may have already requested the data Past ATI/FOI requests – someone may have already requested the data ATI/FOI – if so, plan – this takes time and you’ll likely deal with privacy issues ATI/FOI – if so, plan – this takes time and you’ll likely deal with privacy issues ATI/FOI – make your request specific, and be in constant dialogue with dept ATI/FOI – make your request specific, and be in constant dialogue with dept Think about the data format you want:.CSV,.XLS,. KML,.KMZ,.SHP… Think about the data format you want:.CSV,.XLS,. KML,.KMZ,.SHP…

29 Clean your data The data you get will rarely be good enough to use as is… The data you get will rarely be good enough to use as is… Missing data, multiple files, irrelevant columns… Missing data, multiple files, irrelevant columns… Tools: Google Refine, Data Wrangler, Excel or Google Spreadsheets Tools: Google Refine, Data Wrangler, Excel or Google Spreadsheets ICBC data: missing city names, longitude and latitudes… ICBC data: missing city names, longitude and latitudes… ICBC data: use Excel and data sort to remove bad data ICBC data: use Excel and data sort to remove bad data

30 Map your data 1) Create a GeoCommons acccount 2) Click ‘create a map’ 3) Upload your clean data to GeoCommons (they support SHP files, CSV, KML, RSS 4) While it uploads give it a name, description, citation 5) Wait while it processes your data 6) Choose a theme – i.e. Incident by year

31 Add layers of data Open Data – enhance map – traffic lights, bike paths, neighbourhood boundaries etc Open Data – enhance map – traffic lights, bike paths, neighbourhood boundaries etc tacatalogue/index.htm tacatalogue/index.htm tacatalogue/index.htm tacatalogue/index.htm Simply download ZIP file, unzip and upload 4 files, and then add as new layers on your map Simply download ZIP file, unzip and upload 4 files, and then add as new layers on your map …or use crowdsourced data in GeoCommons …or use crowdsourced data in GeoCommons

32 Other cool tricks/features Select your main data layer – select ‘analyze’ option Select your main data layer – select ‘analyze’ option Select aggregation Select aggregation Select neighbourhoods as boundary Select neighbourhoods as boundary Attribute = year Attribute = year Calculation = count Calculation = count 3D-Street View feature – particularly relevant for cyclists 3D-Street View feature – particularly relevant for cyclists Animate your data – collisions over time! Animate your data – collisions over time! People can view your data at bottom or click on map points People can view your data at bottom or click on map points

33 Analyze your data Collision map tells multiple stories… Collision map tells multiple stories… …Vancouver’s most dangerous intersection …Vancouver’s most dangerous intersection …Bike paths != safety …Bike paths != safety …Need more bike paths? …Need more bike paths? …Have collisions reduced over time as result of bike lanes? …Have collisions reduced over time as result of bike lanes?

34 Embed and share your data HTML HTML Iframe Iframe Map complements your written story Map complements your written story Facebook/Twitter Facebook/Twitter Comments Comments

35 Get Scrapin’

36 Definition: an automated way of getting data from a website Definition: an automated way of getting data from a website Saves us time (if 1000s of pages) and the data comes to us Saves us time (if 1000s of pages) and the data comes to us Data isn’t always available to download in a handy PDF or spreadsheet Data isn’t always available to download in a handy PDF or spreadsheet Allows us to map or tweet out findings! Allows us to map or tweet out findings! We can even send data automatically to Dropbox! We can even send data automatically to Dropbox! Data Journalism Handbook and Visualize This include basic introductions Data Journalism Handbook and Visualize This include basic introductions

37 Don’t be scared: Meet Scraperwiki Free/open source – you can see other peoples’ code and adapt it! Free/open source – you can see other peoples’ code and adapt it! Learning curve – you have to teach yourself Python Learning curve – you have to teach yourself Python …or pay people to do it for you! …or pay people to do it for you! No need for painstaking setup No need for painstaking setup Built-in libraries to do mapping, tweeting, encoding of URLs using bit.ly, graphics and storing of data to database Built-in libraries to do mapping, tweeting, encoding of URLs using bit.ly, graphics and storing of data to database Schedule your scrapers Schedule your scrapers Great tutorials and online resources Great tutorials and online resources

38 Basic Steps: RestoCop case study 1) What do you want to do i.e. store data, tweet etc? 2) Analyze page hierarchy i.e. cha.ca/Main cha.ca/Main cha.ca/Main 3) Search page HTML for easily identifiable, repetitive HTML tags?Check out Firefox plugin Outwit Hub 4) Start coding!

39 Restaurant Inspections

40 Web page analysis

41

42

43

44 Webscraping “challenges” “Nose-bloodying” learning curve: HTML and Python “Nose-bloodying” learning curve: HTML and Python Web page changes can ‘break’ your scrapers! Web page changes can ‘break’ your scrapers! Dealing with cookies Dealing with cookies Limitations: badly- formatted HTML, CAPTCHA, and session- based sites Limitations: badly- formatted HTML, CAPTCHA, and session- based sites

45 Other cool tools Excel/Google spreadsheets Excel/Google spreadsheets Google Fusion Tables Google Fusion Tables Google Charts Google Charts Tableau Public Tableau Public Many Eyes Many Eyes Open Heat Map Open Heat Map Tile Mill Tile Mill D3.js D3.js R Document Cloud Document Cloud Overview Overview Meograph Meograph Zeega Zeega

46 Resources [web] center/listservs/subscribe-nicar-l/ center/listservs/subscribe-nicar-l/ center/listservs/subscribe-nicar-l/ center/listservs/subscribe-nicar-l/ central/index.html central/index.html central/index.html central/index.html government/open-data-catalogue.aspx government/open-data-catalogue.aspx government/open-data-catalogue.aspx government/open-data-catalogue.aspx

47 Resources [books+articles] Data journalism handbook Data journalism handbook Visualize this – Nathan Yau Visualize this – Nathan Yau : 1413/datajournalism // : 1413/datajournalism // 1413/datajournalism // 1413/datajournalism // ills/how-to-get-to-grips-with- data-journalism/s7/a542402/ ills/how-to-get-to-grips-with- data-journalism/s7/a542402/ ills/how-to-get-to-grips-with- data-journalism/s7/a542402/ ills/how-to-get-to-grips-with- data-journalism/s7/a542402/ s/datablog/2010/oct/01/data- journalism-how-to-guide s/datablog/2010/oct/01/data- journalism-how-to-guide s/datablog/2010/oct/01/data- journalism-how-to-guide s/datablog/2010/oct/01/data- journalism-how-to-guide journalism-faith-in-numbers journalism-faith-in-numbers journalism-faith-in-numbers journalism-faith-in-numbers 011/04/16/ijf11-lessons-in-data- journalism-from-the-new-york- times/ 011/04/16/ijf11-lessons-in-data- journalism-from-the-new-york- times/ 011/04/16/ijf11-lessons-in-data- journalism-from-the-new-york- times/ 011/04/16/ijf11-lessons-in-data- journalism-from-the-new-york- times/

48 Resources [people] https://twitter.com/smfroger s https://twitter.com/smfroger s https://twitter.com/smfroger s https://twitter.com/smfroger s s.com/ s.com/ s.com/ s.com/ et et et et om/author/chadskeltonvansu n/ om/author/chadskeltonvansu n/ om/author/chadskeltonvansu n/ om/author/chadskeltonvansu n/ dpress.com/ dpress.com/ dpress.com/ dpress.com/ https://twitter.com/marshall k/datajournalists/members https://twitter.com/marshall k/datajournalists/members https://twitter.com/marshall k/datajournalists/members https://twitter.com/marshall k/datajournalists/members

49 Over to you… Questions


Download ppt "Intro To Data Journalism Marc Ellison"

Similar presentations


Ads by Google