Presentation is loading. Please wait.

Presentation is loading. Please wait.

Lecy ∙ Data-Driven DM Lecture 09 Working with Maps Files and Census Data.

Similar presentations


Presentation on theme: "Lecy ∙ Data-Driven DM Lecture 09 Working with Maps Files and Census Data."— Presentation transcript:

1 Lecy ∙ Data-Driven DM Lecture 09 Working with Maps Files and Census Data

2 census data

3 Census Topics by Table From: http://censusreporter.org/topics/ Each link below brings you to a page describing the concepts and tables covered by the Census and American Community Survey on that particular topic. Getting Started The Census is a big subject and there’s a lot to learn, but you don’t have to learn it all at once. Here’s some help knowing the lay of the land. Table Codes While Census Reporter hopes to save you from the details, you may be interested to understand some of the rationale behind American Community Survey table identifiers. Age and Sex How the Census approaches the topics of age and sex. Children Tables concerning Children. Helpful to consider in relation to Families. Commute Commute data from the American Community Survey. Employment While the ACS is not always the best source for employment data, it provides interesting information for small geographies that other sources don’t cover. Families Families are an important topic in the ACS and a key framework for considering many kinds of data.

4 Geography Geography is fundamental to the Census Bureau’s process of tabulating data. Here are the key concepts you need to understand. Health Insurance The ACS has a number of questions that deal with health insurance and many corresponding tables. Income How the Census approaches the topic of income. Migration How the Census deals with migration data. Poverty Poverty data and how it is used within the ACS. Public Assistance Public assistance data from the ACS. Race and Hispanic Origin Race is a complex issue, and no less so with Census data. A large proportion of Census tables are broken down by race. Same-Sex Couples Although Census does not ask about them directly, there are a number of ways to get at data about same-sex couples using ACS data. Seniors In addition to basic Census data about age, there are a small number of Census tables which focus directly on data about older Americans, and on grandparents as caregivers. Veterans and Military Data collected about past and present members of the U.S. Armed Forces.

5 American Community Survey The American Community Survey (ACS) is an ongoing statistical survey by the U.S. Census Bureau, sent to approximately 250,000 addresses monthly (or 3 million per year). It regularly gathers information previously contained only in the long form of the decennial census. It is the largest survey other than the decennial census that the Census Bureau administers. Note: What is the difference between 1 year, 3 year, 5 year?

6 http://factfinder.census.gov/faces/nav/jsf/pages/index.xhtml

7 STEP 01: GO TO THE “AMERICAN FACTFINDER” HOMEPAGE Click on “Advanced Search”

8 STEP 02: SEARCH BY TABLE NAME Enter your topic of table name. For this lab we will use poverty measures contained in the table: S1701

9 STEP 03: SELECT YOUR UNIT OF ANALYSIS (GEOGRAPHY) Click on “Geographies” and select “Census Tract”

10 STEP 03: SELECT YOUR UNIT OF ANALYSIS (GEOGRAPHY) Follow the prompts to select your state, then select “all counties”

11 STEP 04: MATCH DATA YEAR TO SHAPEFILES YEAR Select the version of the data that matches your shapefiles. Recall that census tract boundaries may change over time, so matching data to shapefiles ensures accuracy.

12 STEP 05: DOWNLOAD IN CSV FORMAT

13 CONTENTS Text File: contains information on the table that you downloaded (ignore for now). Metadata File: Contains variable names and definitions. With Ann File: Your actual data, first line contains variable names (annotated) – is the largest file. Readme File: disclaimers (ignore for now). Read this file into R

14 geographic units

15 In the U.S., Census tracts are "Designed to be relatively homogeneous units with respect to population characteristics, economic status, and living conditions, census tracts average about 4,000 inhabitants." Why are the tracts different sizes?

16 Nested Administrative Units States Counties Census Tracts Census Blocks Census Tract is usually the most granular level of data provided by census County Census Tract Block Group Block

17 Example: Household Income by Census Tract

18 FIPS Code Structure CountyCensus TractState 4200300010242-003-0001.02 11-digit Geo ID Code

19 Which counties are in my MSA? http://www.bea.gov/faq/index.cfm?faq_id=470 Pittsburgh, PA (Metropolitan Statistical Area) Allegheny, PA [42003]42003 Beaver, PA [42007]42007 Butler, PA [42019]42019 Fayette, PA [42051]42051 Washington, PA [42125]42125 Westmoreland, PA [42129]42129 FIPS Code for Allegheny County (42003) 42 = Pennsylvania 003 = Allegheny County

20 visual narrative

21 Your data tells a story

22 WHAT PATTERN DO YOU SEE?

23 WHAT PATTERN DO YOU SEE NOW? Equal Intervals Quantiles Geometric These maps are all made with the same data using different intervals for the break points.

24 BREAK POINTS FOR NORMAL DISTRIBUTIONS http://uxblog.idvsolutions.com/2010/03/crazy-world-of-range-breaks.html

25

26 BREAK POINTS FOR SKEWED DISTRIBUTIONS

27

28 HIGHLY-SKEWED DISTRIBUTIONS: Think about skewed distributions like you think about classes of wine. The first level of quality is a bottle for $3, the next level is a bottle at $6, the next level is at $10-12, the next level at $25, the next level at $50, then $100. To move up one class, you basically double the price. If you want a scale that translates wine prices into class, it would look something like: $0 - $4 Cheap wine $4 - $8 Low quality $8 - $20 Medium quality $20 - $80 High quality $80 + Excellent quality

29 http://www.medpagetoday.com/PublicHealthPolicy/GeneralProfessionalIssues/50497

30 Interval Size Increases

31 CLASSIFYING DATA Process of placing data into groups (classes or bins) that have a similar characteristic or value Break points Breaks the total attribute range up into these intervals Keep the number of intervals as small as possible (5-7) Use a mathematical progression or formula instead of picking arbitrary values Break points

32 CLASSIFICATIONS Quantiles Places the same number of data values in each class Will never have empty classes or classes with too few or too many values Attractive in that this method produces distinct map patterns Analysts use because they provide information about the shape of the distribution. Example: 0–25%, 25%–50%, 50%–75%,75%–100%

33 Equal intervals Divides a set of attribute values into groups that contain an equal range of values Best communicates with continuous set of data Easy to accomplish and read Not good for clustered data Produces map with many features in one or two classes and some classes with no features CLASSIFICATIONS

34 Increasing interval widths Long-tailed distributions Data distributions deviate from a bell-shaped curve and most often are skewed to the right with the right tail elongated Example: Keep doubling the interval of each category, 0–5, 5–15, 15–35, 35–75 have interval widths of 5, 10, 20, and 40. CLASSIFICATIONS

35 Exponential scales Popular method of increasing intervals Use break values that are powers such as 2 n or 3 n Generally start out with zero as an additional class if that value appears in your data Example: 0, 1–2, 3–4, 5–8, 9–16, and so forth CLASSIFICATIONS

36 CHOROPLETH MAP EXAMPLE Percentage of vacant housing units by county

37 RULES OF THUMB: 1.First Rule: use common sense! What do your groups represent, and are they meaningful? Are you misleading your audience with unreasonable breaks? 2.Binning by quantiles is typically a safe way to create breaks to show low, medium, and high values. 3.If a lot of your data is bunched together (for example, half of your values are close to zero), quantiles will not be meaningful because it will imply differences that do not exist. 4.If your distribution is skewed, consider increasing-interval or exponential scales. For example, define the first group as 0-2, second as 2-6, third as 6-14, next as 14-30 (your interval size doubles each break).

38 U.S. population by state, 2000 Original map (natural breaks)

39 Not good because too many values fall into low classes Equal interval scale

40 Shows that an increasing width (geometric) scale is needed Quantile scale

41 Custom geometric scale Experiment with exponential scales with powers of 2 or 3.

42 Normalizing data

43 Divides one numeric attribute by another in order to minimize differences in values based on the size of areas or number of features in each area Examples: Dividing the number of vacant housing units by the total number of housing units yields the percentage of vacant units Dividing the population by area of the feature yields a population density 43 NORMALIZING DATA

44 Number of vacant housing units by state, 2000 NON-NORMALIZED DATA

45 Percentage vacant housing units by state, 2000 NORMALIZED DATA

46 California population by county, 2007 NON-NORMALIZED DATA

47 California population density, 2007 NORMALIZED DATA

48 Selecting colors

49 SEQUENTIAL SCALE http://colorbrewer2.org/

50 DIVERGENT SCALE

51 RULES OF THUMB: 1.If you are highlighting a class of individuals, like those in poverty, a single color might be sufficient (FE red for in poverty, gray for not). 2.When you are highlighting performance along a single dimension, use a sequential scale with white or gray at the bottom of your color scale and a dark color at the top. 3.If your comparison is relative to the average put white or gray in the middle to represent “average.” Consider red for low, blue or green for high (depending on the context if high is good, low is bad). 4.Five to seven categories is typical.


Download ppt "Lecy ∙ Data-Driven DM Lecture 09 Working with Maps Files and Census Data."

Similar presentations


Ads by Google