Using data for stories USC Jennifer LaFleur, ProPublica Coulter Jones, The Center for Investigative Reporting.

Slides:



Advertisements
Similar presentations
Emission Inventory System Reports Course Sally Dombrowski
Advertisements

1 COMM 301: Empirical Research in Communication Lecture 15 – Hypothesis Testing Kwan M Lee.
Building a document (and data) state of mind IRE Better Watchdog Workshop, St. Louis Jennifer LaFleur, ProPublica (with materials from Jaimi Dowdell)
Background Readiness These checks are used often by employers for evaluating a job candidate's for: Criminal History Credit History Character Reference.
Using data for investigations VVOJ 2013 Jennifer LaFleur, Center for Investigative Reporting.
Tracking non-profit money (and some stuff about money and schools) Education Reporting Fellowships Columbia University Jennifer LaFleur, ProPublica.
Data Collection Methods
RETIRING BABY BOOMERS Esther Kim. U SING THE P ITCHBOOK T EMPLATE Background Information The term "Baby Boomers" refers to the population born between.
The art of requesting and negotiating for data NICAR 2012 David Hunn, St. Louis Post-Dispatch Jennifer LaFleur, ProPublica.
What Can YOU Do to Help Prevent Healthcare Fraud? Sponsored by: Idaho Commission on Aging Senior Medicare Patrol Program Presented by: (Presenter name,
On the beat: Developing stories with land data Paula Lavigne The Dallas Morning News IRE Fort Worth
What to do when you can’t get all the data NICAR 2012, St. Louis Helena Bengtsson, SVT Jennifer LaFleur, ProPublica.
© Oklahoma State Department of Education. All rights reserved. 1 Beware! Consumer Fraud Standard 9. 1 Fraud and Identity Theft.
What Can YOU Do to Prevent Healthcare Fraud? Funded by: This project is supported in part by grant numbers 90MP0026 and 90MP0127 from the U.S. Administration.
Social Welfare System....Those goods and services that a society believes to be a collective responsibility. Although the terms convey a sense of order,
Re-validation of the Nonviolent Offender Risk Assessment Instrument: Preliminary Findings.
Introduction to Health Economics. Per Capita Total Current Health Care Expenditures, U.S. and Selected Countries, 2007 ^OECD estimate. *Differences in.
Your Vote, Your Voice, Your Choice!
Background Since 1984, Montgomery County has had two human services levies. One levy is about to expire and the county commissioners have put a replacement.
Chapter 4 “Television News: A Handbook for Reporting, Shooting, Editing & Producing”
Thinking Outside the Box: Linking an Immunization Registry with Schools Tina Ellis Coyle RECIN Immunization Registry Marshfield Clinic Marshfield, Wisconsin.
G O T V Get Out The Vote A Workshop on Voting. Who We Are Volunteers Goals To inform you about your right to vote To motivate you to carry the message.
American Community Survey Presented at the Meeting of the National Neighborhood Indicators Partnership Susan Schechter May
The American Community Survey Texas Transportation Planning Conference Dallas, Texas July 19, 2012.
Your Table Is Waiting! Census 2010 Accessing and Using the Data Linda Clark Information Services Specialist U.S. Census Bureau Seattle Region April 19,
All About the Money: The State Budget One Voice: A Collaborative for Health and Human Services September 30, 2004 Eva De Luna Castro, Budget Analyst
What to do when you can’t get all the data VVOJ 2013 Jennifer LaFleur, CIR.
A Brief Demography of California Hans Johnson Public Policy Institute of California November 30, 2010.
10.1 Estimating With Confidence
Trends Affecting Grantwriting. Background Is it “The Economy, Stupid?”
Spotlight on the Federal Health Care Reform Law. 2. The Health Care and Education Affordability Reconciliation Act of 2010 was signed March 30, 2010.
Joint Finance Committee Hearing FY 2014 Mary Peterson Division Director February 20, 2012 Department of Health and Social Services Division of Long Term.
Introduction to the Public Use Microdata Sample (PUMS) File from the American Community Survey Updated February 2013.
Using the American Community Survey (ACS) Maryland Sate Data Center Affiliate Meeting April 4, 2007.
Linda Zellmer Government Information & Data Services Librarian Western Illinois University
© Family Economics & Financial Education – Updated May 2012 – Types of Insurance – Slide 1 Funded by a grant from Take Charge America, Inc. to the Norton.
Sole Proprietorship. Types of Businesses Sole Proprietorship Partnership Corporation.
Election News and Numbers Making sense of polls, statistics and more for the 2008 election!
Warm Up The Ohio Department of Health wishes to estimate the percentage of Ohio State students who smoke cigarettes at each of Ohio State's campuses. 100.
DISTRIBUTIONS. What is a “distribution”? One distribution for a continuous variable. Each youth homicide is a case. There is one variable: the number.
American Community Survey (ACS) 1 Oregon State Data Center Meeting Portland State University April 14,
Bulletproofing the investigation JAWS 2014 Jennifer LaFleur, CIR (With artwork from young friends of journalism)
THE URBAN INSTITUTE Examining Long-Term Care Episodes and Care History for Medicare Beneficiaries: A Longitudinal Analysis of Elderly Individuals with.
Guide for putting on a road race. Original planning 9+months out Get location for run, measure course Figure out date for venue (check other races and.
VerdierView Graph # 1 OVERVIEW Problems With State-Level Estimates in National Surveys of the Uninsured Statistically Enhancing the Current Population.
Joint Finance Committee Hearing FY 2013 Susan Del Pesco Division Director February 21, 2012 Department of Health and Social Services Division of Long Term.
City Budget and Tax Levy 27 Jan 2014 This Power Point Presentation is a working draft. It may contain information that upon further revue and research.
Part III – Gathering Data
Chapter 15 Credit. Factors to Consider Before Using Credit Chapter 15 Consumer Credit What should you know before using credit? Do you have the cash you.
Do the math: Outsmarting Stats Holly Hacker EWA Webinar Jan
Methods and Techniques for Integration of Small Datasets September 13-14, 2005 St. Louis, Missouri Sponsored by the U.S. Department of Housing and Urban.
DISTRIBUTIONS. What is a “distribution”? One distribution for a continuous variable. Each youth homicide is a case. There is one variable: the number.
5 Accounting Careers BY: SEBASTIAN HESSE. Accountant  An accountant is a person who keeps or inspects financial records for a person, company, or a group.
Carroll County Advisement Program FINANCIAL LITERACY *IDENTITY THEFT *MONEY MANAGEMENT.
Math for communicators. In general Many press releases/stories/ads are based on numbers. Familiarity with basic math is necessary to convey the info the.
Access Queries and Forms. Adding a New Field  To insert a field after you have saved your table, open Access, and open the table  It is easier to add.
Intro to OBRA and the Nursing Assistant. INTRODUCTION  You need to know:  What you can and cannot do  What conduct is right & wrong  Your legal limits.
Copyright © 2010 Pearson Education, Inc. Slide
THIS TRAINING IS REQUIRED IN ORDER TO OBTAIN SECURITY TO INITIATE HIRING PACKETS FOR NEW EMPLOYEES. Hire Xpress User’s Training NAU’s Automated Hiring.
Keep Kansas Dollars in Kansas with a Kansas Solution: The Bridge to a Healthy Kansas Insert Meeting Name Your Name Date.
National Reporting Project: Watchdog the Stimulus Covering the American Recovery and Reinvestment Act.
The benefits received from Social Security are based on the earnings your employer (or you if self-employed) reported, using your Social Security number.
CAR for Immigration Stories Steve Doig Arizona State University.
Voter Opinion Survey December 2016 Kentucky.
Saving Saving money can be difficult. Many people do not know where to start. There are many ways to save and places to cut cost. You will find suggestions.
Tsegazaab T. Weldegebrial Masters in Health Informatics
______________ COUNTY IS
Big Data on the Web News Gathering.
University of South Florida
Part 1: Data Sources Frank Porell
Presentation transcript:

Using data for stories USC Jennifer LaFleur, ProPublica Coulter Jones, The Center for Investigative Reporting

Get hired after school

Why data? It takes you beyond the anecdote It’s easier than counting sheets of paper It lets you find stories that otherwise would be missed

Why data? Contrasts are in the data

Caution: This slide contains extreme nerdiness

Why Computer-Assisted Reporting? Contrasts are in the data Your most powerful figures are in the data

Source: California Health Dept. data, Medicare billing data Findings: Some hospitals had “alarming rates of a Third World nutritional disorder among its Medicare patients.”

Why data? Contrasts are in the data Your most powerful figures are in the data You can make connections you might not be able to make otherwise

Data: Youth prison workers, criminal convictions and grievance data Findings: Employees with criminal backgrounds were more likely to be accused of abusing inmates.

Findings: Gov. Brown signed dozens of bills that Schwarzenegger had vetoed. Method: Comparative text analysis of bills, not by sponsor or bill name.

Data: Federal bridge inspections and stimulus funding. Findings: Some of the nation’s worst bridges did not get stimulus funds.

Why data? Contrasts are in the data Your most powerful figures are in the data You can make connections you might not be able to make otherwise You can test assumptions

Source: NHTSA complaint data Findings: “…unintended acceleration has been a problem across the auto industry.”

Collecting the data

If something is inspected Licensed Enforced or Purchased …There probably is a database Where’s the data?

If there is a report Or a form There probably is a database Where’s the data?

Sometimes data is readily available online for download Where’s the data?

Source: Census Findings: “Fueled by the dismal economy and high unemployment, more Americans…are doubling up”

Source: Medicaid nursing home survey data and finance data, housing data Findings: “…a shortage of places for the disabled to live outside a nursing home and regulations that critics say make it hard to qualify for home services mean many who want out continue to receive expensive nursing care.”

Sometimes you have to scrape it. That usually involves programs that automate searching tasks on Web sites. Where’s the data?

More often you need to go to an agency to get the data This can be tricky if an agency doesn’t want to release it. Where’s the data?

Source: School district credit card purchases Findings: District card holders made questionable purchases with their cards.

Sometimes, there is no data. But it’s okay because there are techniques for sampling and building a database.

ProPublica pulled a random sample of 500 names from a list of individuals who had been granted or denied pardons (around 2,000). We created a database from months or researching individuals: their crime, age, sentence… We found that even after controlling for other factors, whites were more likely to get a pardon.

Cleaning data

Remember that data are not perfect

Top donors to state campaigns in California since 2001 Who are they and what do they care about?

It doesn’t mean you can’t use it… Do integrity checks to find the flaws Add caveats where necessary Do your own analysis rather than relying on an agency’s analysis of bad data

Integrity checks for every data set Read the documentation. Understand the contents of every field. Know how many records you should have. Check counts and totals against reports. Are all possibilities included? All states, all counties, correct ranges?

Integrity checks for every data set Internal data checks: Is there more money going to sub- contractors than went to the prime contractor? Are there more teachers than students? Do people have birth dates in the future or so long ago they would be long gone?

If your data is in Excel, use the filter function to see what the values are in individual fields.

Integrity checks for every data set Check for missing or misplaced data Use a standard naming convention for files and tables (I wouldn’t recommend “final”) Check for duplicates Take margins of error into account if necessary

2010 Census ACS: Median HH Income by Metro Area

Be creative when you look for duplicates

Beyond the basics Keep a notes file Don’t work off your original database Know the source Check against summary reports Use the right tool Check for outliers when it comes to ups and downs

Truck accidents by year and agency

Beyond the basics Check with experts Are there standards? (ex: a drop by more than 10 perc pts is a red flag) Find out what others have done Gut check Go physically see a record or spot check against documents

Voter Fraud Dozens of St. Louis voters are being wrongly accused of casting ballots from fraudulent addresses in last year's Nov. 7 election. They are among thousands of registered voters who, based on city property records, appear to live on vacant lots.

Texas test score data official results versus district Duncanville district reported 4 th grade writing Official report for Duncanville 4 th grade writing Courtesy Holly Hacker, The Dallas Morning News

Three rounds of analysis after bouncing off subjects and experts Demographically based Voir dire Socioeconomics

Checks when you’re matching data A name is not enough. Lots of people have the same name Get dates of birth and other information to make sure you have the correct person.

Even people with seemingly unique names aren’t so unique

Source: Illinois health data, police data Findings: Dangerous systemic failed to protect elderly patients in Illinois nursing homes that also house mentally ill younger residents, including murderers, sex offenders, and armed robbers.

Reporting studies by others Get the questionnaire and methodology Beware of nonscientific methods: Web surveys, man on the street Know the sample size..sampling error Account for margin of error and non-response when drawing conclusions Run statistical tests on the data if possible

Reporting data Consider reporting rates not raw numbers Avoid false precision: percent said … in a poll with a 5 percentage point margin of error Avoid number overload. About half is usually just as useful as 51 percent in most cases Adjust money for inflation When analyzing income, use median rather than average (Bill Gates factor)

When the data is the problem – you might still have a story Erroneous government databases – can often be a story themselves

Manipulating data for stories and apps

Where’s the data? Know which tool to use Reporting individual records Reporting individual records Counting/summing Counting/summing Mapping Mapping Statistics Statistics

Source: Medicaid outcomes data for dialysis facilities Findings: A CMS online tool did not tell the whole story about facilities. In some counties the gap in measures, such as survival rate were vast.

Source: Washington Health Department data Findings: “MRSA has been quietly killing in hospitals for decades.” But no one had tracked it until this story.

Source: Dept. of Ed data and surveys of campus crisis clinics Findings: Many campuses had lax enforcement and reporting loop holes mean problems go unchecked.

Source: EPA and state data on hazardous chemical locations Findings: Dallas County has 900+ sites that store hazardous chemicals

Source: Dam inspection data from Texas and federal government Findings: Dam records had not been updated to account for population growth

Source: 311 calls for downed trees Findings: After a tornado swept across New York City, 311 calls for downed trees helps trace its path

Source: City Budget Findings: Some neighborhoods suffer more than others as mayor cuts budgets

Disparities in water usage “Water use highest in poor areas of the city” Mapping and statistical analysis