Presentation is loading. Please wait.

Presentation is loading. Please wait.

Structuring Data to Facilitate Analysis Jerry J. Vaske Jay Beaman Colorado State University Warner College of Natural Resources Human Dimensions of Natural.

Similar presentations


Presentation on theme: "Structuring Data to Facilitate Analysis Jerry J. Vaske Jay Beaman Colorado State University Warner College of Natural Resources Human Dimensions of Natural."— Presentation transcript:

1 Structuring Data to Facilitate Analysis Jerry J. Vaske Jay Beaman Colorado State University Warner College of Natural Resources Human Dimensions of Natural Resources Fort Collins, CO Workshop at the 2008 Pathways to Success Conference: Integrating Human Dimensions into Fish & Wildlife Mgmt.

2 Workshop Foundation

3 Workshop Objectives Illustrate strategy for: –Facilitating analysis of 2006 National Survey of Fishing, Hunting, and Wildlife-Associated Recreation (FHWAR) –Increasing the usability of FHWAR data for management, planning & policy Compare two types of data structures: –Flat files –Relational Entities

4 Traditional Flat File Rows = Respondents Columns = Variables

5 Flat Files – Journal Article Example Every journal article has: One or more authors Title Journal name Specifics about date of publication: Year Volume number Issue number Page numbers Potentially keywords

6 Flat File Data Structure for Journal Articles

7 Potential Issues with Flat Files Problem –Diefenbach et al (2005) article had 7 co-authors –7 columns (variables) necessary to accommodate all authors last names –19 of 26 articles in flat file had only 1 or 2 authors –67% of author fields empty –If first names included – more empty fields Solution – Relational database

8 Relational Databases Definition –Set of tables containing data for predefined categories –Data stored in separate files (tables) that are linked Terminology –Table = Entity (E) –Rows (tuples) in table = information about an object (e.g., journal article or respondent) –Columns (attributes) = variables –Two types of relations (R) 1.Set of tuples – a table with attributes (these Rs store data) 2.Algebraic (Person ID in Table A = Person ID in Table B) (these relations use data stored in entities)

9 Relational Data Structure for Journal Articles Article Entity Article ID Journal ID Article title Year, Issue, Pages Journal Entity Journal ID Journal name Publisher info Author Entity Author ID Last, First name Contact info Keyword Entity Keyword ID (attitudes, norms) (R2) Relation Journal ID (R3) Relational Table Article ID Keyword ID (R1) Relational Table Article ID Author ID

10 Comparison Flat File vs. Relational Database Flat file Relational Database R1 = table multiple authors (AuthorID) linked to given article (ArticleID) ArticleIDAuthorID 2059314 2059 59 2059233 Author entity and R1 (author–article relation) can have any number of rows so all authors of an article can be identified

11 FHWAR Flat File Example Fishing, Hunting & Wildlife-Associated Recreation (FHWAR) National Survey – Conducted about every 5 years –1955 – first survey –2006 – most recent survey Data on hunters, anglers, wildlife watchers: –Sportsperson expenditures –Species sought in different states Data collection costs (1991–2006) $55 million (in 2008 $) 1991-2006 data comparable within limits but not integrated

12 2006 – FHWAR Flat File Data Data distributed on CD containing 3 ASCII text files: 1.Screening data 2.Sportsperson (hunting & fishing) data 3.Wildlife Watcher data Data file# of Records# of Variables Screening144,509 56 Sportsperson 21,9423,765 Wildlife Watcher 11,285 772

13 FHWAR Flat File – Analytical Issues Important issues –4,500+ vars with obtuse variable names (e.g., NCU_STD1) –200 pages of documentation –Census conversion programs do not create variable labels or value labels Major issues –Data compression –Conceptual complexity

14 Analytical Issues Affecting Use Data compression –No hunters hunt in all 50 states (at most 8 in 2006 data) –To avoid numerous empty cells data are compressed (e.g., the values for 3 vars are combined into a single var) –For example:days of participation is combined with an activity (e.g., big game or small game hunting) in a given state (in the order states are mentioned) –Compressed vars cannot be directly analyzed by SAS or SPSS Conceptual complexity –When uncompressed to blocks of 50 states 20,000 variables –Difficult to visualize analysis strategy –Flat FHWAR files hide data structure

15 Relational File Structure Illustration Entity 1.PERSON 2.SPORTSPERSON 3.HUNTING_ACTIVITY 4.TRIP_EXPENDITURES Based on flat file: Screening Sportsperson Four entities half of the 2006 FHWAR flat file data

16 PERSON Entity 6 control variables (e.g., Person_Weight) 10 demographic variables (e.g., Age, Sex) 8 hunting variables (e.g., Hunted_2005) 8 fishing variable (e.g., Fished_2005) 6 residential wildlife watching variables (e.g., Home_Observe_2005) 5 non-residential wildlife watching variables (e.g., Trip_Watch_2005)

17 SPORTSPERSON Entity 6 control variables (e.g., Person_ID, Sportsperson_Weight) 11 demographic variables (e.g., Age, Sex) 15 national summary variables (e.g., Hunted_2006) PERSON variables in SPORTSPERSON could be obtained from PERSON but also included SPORTSPERSON to simplify analyses

18 TRIP_EXPENDITURES Entity TRIP_EXPENDITURES entity reduces 844 compressed vars to 10 vars Person_IDUnique person ID Sportsperson_WeightSportsperson weight SpenderPerson in TRIP_EXPENDITURES entity State_of_ResidenceState of residence Location_Trip_SpendingState of spending Spend_State_of_ResidenceExpenditure in state of residence Fish_HuntFishing or hunting expenditure Fish_Hunt_TypeFishing or hunting type Trip_Expend_CategorySportsperson expenditure categories DollarsAmount spent in dollars

19 HUNTING_ACTIVITY Entity Person_IDUnique person ID Sportsperson_WeightSportsperson weight HunterPerson in HUNTING_ACTIVITY entity Table_Cell_DescriptionDescription of tables (e.g., relation to FH3 variables) Sub_Table_IDSportsperson sub-table identifier State_of_ResidenceState of residence (start Wave 3) In_State_of_ResidenceParticipation in state of residence Activity_LocationGeographic location for activity (USA or a State) Private_PublicActivity of private or any public land Fish_Hunt_TypeFishing or hunting type Response_UnitParticipation = 1, Days = 2, Trips = 3 ResponseParticipation (1 = Yes), # of days, or # of trips HUNTING_ACTIVITY entity reduces 840 compressed vars to 12 vars

20 Variable: Sub_Table_ID HUNTING_ACTIVITY entity A collection of state-level sub-tables to facilitate analysis

21 Visualizing the 4 Entities Person (Screening data) Control Variables Person ID Person Weight Demographics (11) Hunting (8) Hunted Ever Hunt Intentions Fishing (8) Wildlife Watching Residential (6) Trips (5) Sportsperson (Sportsperson data) Control Variables Person ID Sportsperson Weight Demographics (11) National summary species variables (15) Hunted 2006 Big game hunted Days big game hunted Trips hunting big game Hunting Activity (Sportsperson data) Person ID Sportsperson Weight Sub Table ID Response Unit Response Trip Expenditure (Sportsperson data) Person ID Sportsperson Weight Spending categories Dollars

22 Summary About 1,750 flat file variables reduced to < 60 Obtuse variable names replaced with intuitive names Compressed flat file variables cannot be directly used in SPSS or SAS Variables in relational entities can be used in analysis Details in Beaman & Vaske (2008)

23 Entity Data Files EntitySAS filenameSPSS filename PERSONPerson.sas7bdatPerson.sav SPORTSPERSONSportsperson.sas7bdatSportsperson.sav HUNTING_ACTIVITYHunting_Activity.sas7bdatHunting_Activity.sav TRIP_EXPENDITURESTrip_Expenditures.sas7bdatTrip_Expenditures.sav (http://welcome.warnercnr.colostate.edu/~jerryv/) To simplify analyses 2 additional entities: Hunting_Activity_and_Demographics Trip_Expenditures_and_Demographics

24 SAS Code & SPSS Syntax Figure number:SAS codeSPSS syntax Figure 3Figure_3_Syntax.sps Figure 4Figure_4.sasFigure_4_Syntax.sps Figure 5Figure_5_Syntax.sps Figure 6Figure_6.sas Figure 7Figure_7_Syntax.sps Figure numbers based on Beaman & Vaske (2008) http://welcome.warnercnr.colostate.edu/~jerryv/

25 Example – Hypothesis Average days of elk hunting varies between Colorado vs. Wyoming and by hunters sex

26 Flat File to Entity for Hypothesis Data FHWAR6.Hunt_BGspecies_States ; Length Person_ID 5 Sportsperson_Weight 4 Sex 3 State_of_Residence Activity_Location Fish_Hunt_Type Response_Unit Response 4 ; Set FHWAR6.fh3 (rename = (sex = xsex)) ; Keep Person_ID Sportsperson_Weight Sex State_of_Residence In_State Response Activity_Location Fish_Hunt_Type Response_Unit ; Person_ID = PersonID ; Sportsperson_Weight = spwgt ; Sex = Xsex ; State_of_Residence = put (resstate, $st2num2.) ; * Array stores info to identify state when decompressing ; Array a1( 2, 8 ) HUNTSTD1-HUNTSTD8 STDAYSHD1-STDAYSHD8 ;

27 * Array stores info to associate species with variables ; Array gam1( 9) g1-g9 ; Retain g1 1 g2 2 g3 3 g4 4 g5 5 g6 6 g7 7 g8 40 g9 41 ; Array a7( 2, 9, 8 ) bgame1d1--bgdifday9d8 ; Do m = 1 To 2 ; Do j=1 To 9 ; Do k=1 To 8 ; If a1( 1, k) = ' ' Then Goto End7 ; Fish_Hunt_Type = gam1(j) ; If m = 1 Then Do ; Response_Unit = 1 ; End ; Else Do ; Response_Unit = 2 ; End ; Response = a7(m, j, k) ; Activity_Location = put(a1( 1, k), $st2num2. ) ; If Activity_Location = State_of_Residence Then In_State = 1 ; Else In_State = 0 ; * Outputs data for hypothesis; If Response > 0 Then Output ; End7: End ; End ; End ; run ;

28 SAS Entity to SPSS Entity Get SAS Data = C:\Hunt_BGspecies_States.sas7bdat. Add Value labels Save Outfile = C\ Hunt_BGspecies_States.sav.

29 Testing Hypothesis with Relational Entity GET File = 'C:\Hunt_BGspecies_States.sav'. WEIGHT BY Sportsperson_Weight. Select if (Activity_Location = 8 or Activity_Location = 56). Select if (Fish_Hunt_Type = 2). Select if (Response_Unit = 2). UNIANOVA Response BY Sex Activity_Location. Opens data Weights data CO hunters WY hunters Elk hunters Days of participation ANOVA GET File = 'C:\FHWAR\Hunting_Activity.sav'. Select if (Sub_Table_ID = 10). WEIGHT BY Sportsperson_Weight. Select if (Activity_Location = 8 or Activity_Location = 56). Select if (Fish_Hunt_Type = 2). Select if (Response_Unit = 2). UNIANOVA Response BY Sex Activity_Location.

30 Results

31 Conclusions Analyses that are difficult to perform with flat file data are possible with relational structure Restructuring all of 2006 FHWAR data as well as data from 1991, 1996, & 2001 would: –Yield similar analysis capabilities –Allow for trend analysis –New practical opportunities for state agencies

32 Practical Opportunity State agencies have accurate records of license sales (e.g., hunting only, fishing only, combos) With potentially 100s of licenses, permits, & stamps sold, not practical to ask about specific licenses in a flat file Moving to relational structure for obtaining license data has advantages …

33 Advantages of Relational License Data 1.Can ask about actual state license sales All state license info can be pre-stored in one entity Size of entity would not impact other data entities 2.Questions about specific license cost not necessary; correct information pre-stored 3.Establishing relationship between state specific license sales & FHWAR data provides foundation for benchmarking / calibrating meaningful estimates based on FHWAR

34 From Analysis to Data Collection Entity based models: –facilitate analyses –can also enhance data collection Currently working with software company Techneos (www.techenos.com) to implement pilot models that yield:www.techenos.com –more consistent and –accurate data collection

35 Questions?


Download ppt "Structuring Data to Facilitate Analysis Jerry J. Vaske Jay Beaman Colorado State University Warner College of Natural Resources Human Dimensions of Natural."

Similar presentations


Ads by Google