Presentation is loading. Please wait.

Presentation is loading. Please wait.

DATA MANAGEMENT Using EpiData and SPSS References Public domain (pdf) book on data management: Bennett, et al. (2001). Data Management for Surveys and.

Similar presentations


Presentation on theme: "DATA MANAGEMENT Using EpiData and SPSS References Public domain (pdf) book on data management: Bennett, et al. (2001). Data Management for Surveys and."— Presentation transcript:

1

2 DATA MANAGEMENT Using EpiData and SPSS

3 References Public domain (pdf) book on data management: Bennett, et al. (2001). Data Management for Surveys and Trials. A Practical Primer Using EpiData. The EpiData Documentation Project. : EpiData Association Website: Importing raw data into SPSS: m m m

4 Data Management Planning data needsPlanning data needs Data collectionData collection Data entry and controlData entry and control Validation and checkingValidation and checking Data cleaning and variable transformationData cleaning and variable transformation Data backup and storageData backup and storage System documentationSystem documentation OtherOther

5 Types of Data Base Management Systems (DBMSs) Spreadsheets (e.g., Excel, SPSS Data Editor)Spreadsheets (e.g., Excel, SPSS Data Editor) Prone to error, data corruption, & mismanagementProne to error, data corruption, & mismanagement Lack data controls, limited programmabilityLack data controls, limited programmability Suitable only for small and didactic projectsSuitable only for small and didactic projects Also good for last step data cleaningAlso good for last step data cleaning Commercial DBMS programs (e.g., Oracle, Access)Commercial DBMS programs (e.g., Oracle, Access) Limited data control, good programmabilityLimited data control, good programmability Slow & expensiveSlow & expensive Powerful and widely availablePowerful and widely available Public domain programs (e.g., EpiData, Epi Info)Public domain programs (e.g., EpiData, Epi Info) Controlled data entry, good programmabilityControlled data entry, good programmability Suitable for research and field useSuitable for research and field use

6 We will use two platforms: EpiDataEpiData controlled data entrycontrolled data entry data documentationdata documentation export (“write”) dataexport (“write”) data SPSSSPSS import (“read”) dataimport (“read”) data analysisanalysis reportingreporting

7 What is EpiData ? EpiData is computer program (small in size 1.2Mb) for simple or programmed data entry and data documentationEpiData is computer program (small in size 1.2Mb) for simple or programmed data entry and data documentation It is highly reliableIt is highly reliable It runs on Windows computersIt runs on Windows computers Runs on Macs and Linus with emulator software (only)Runs on Macs and Linus with emulator software (only) InterfaceInterface pull down menuspull down menus work barwork bar

8 History of EpiInfo & EpiData 1976–1995: EpiInfo (DOS program) created by CDC (in wake of swine flu epidemic)1976–1995: EpiInfo (DOS program) created by CDC (in wake of swine flu epidemic) Small, fast, reliable, 100,000+ users worldwideSmall, fast, reliable, 100,000+ users worldwide 1995–2000: DOS dies slow painful death1995–2000: DOS dies slow painful death 2000: CDC releases EpiInfo : CDC releases EpiInfo2000 Based on Microsoft Jet (Access) data engineBased on Microsoft Jet (Access) data engine Large, slow, unreliable (resembled EpiInfo in name only)Large, slow, unreliable (resembled EpiInfo in name only) 2001: Loyal EpiInfo user group decides it needs real “EpiInfo for Windows”2001: Loyal EpiInfo user group decides it needs real “EpiInfo for Windows” Creates open source public domain programCreates open source public domain program Calls program “EpiData”Calls program “EpiData”

9 Goal: Create & Maintain Error- Free Datasets Two types of data errorsTwo types of data errors Measurement error (i.e., information bias) – discussed last couple of weeksMeasurement error (i.e., information bias) – discussed last couple of weeks Processing errors = errors that occur during data handling – discussed this weekProcessing errors = errors that occur during data handling – discussed this week Examples of data processing errorsExamples of data processing errors Transpositions (91 instead of 19)Transpositions (91 instead of 19) Copying errors (O instead of 0)Copying errors (O instead of 0) Additional processing errors described on p. 18.2Additional processing errors described on p. 18.2

10 Avoiding Data Processing Errors Manual checks (e.g., handwriting legibility)Manual checks (e.g., handwriting legibility) Range and consistency checks* (e.g., do not allow hysterectomy dates for men)Range and consistency checks* (e.g., do not allow hysterectomy dates for men) Double entry and validation*Double entry and validation* Operator 1 enters dataOperator 1 enters data Operator 2 enters data in separate fileOperator 2 enters data in separate file Check files for inconsistenciesCheck files for inconsistencies Screening during analysis (e.g., look for outliers)Screening during analysis (e.g., look for outliers) * covered in lab

11 Controlled Data Entry Criteria for accepting & rejecting dataCriteria for accepting & rejecting data Types of data controlsTypes of data controls Range checks (e.g., restrict AGE to reasonable range)Range checks (e.g., restrict AGE to reasonable range) Value labels (e.g., SEX : 1 = male, 2 = female )Value labels (e.g., SEX : 1 = male, 2 = female ) Jumps (e.g., if “male,” jump to Q8)Jumps (e.g., if “male,” jump to Q8) Consistency checks (e.g., if “sex = male,” do not allow “hysterectomy = yes”)Consistency checks (e.g., if “sex = male,” do not allow “hysterectomy = yes”) Must entersMust enters etc.etc.

12 Data Processing Steps 1.File naming conventions 2.Variables types and names 3.QES (questionnaire) development 4.Convert.QES file to.REC (record) file 5.Add.CHK file 6.Enter data in REC file 7.Validate data (double entry procedure) 8.Documentation data (code book) 9.Export data to SPSS 10.Import data into SPSS

13 Filenaming and File Management c:\path\filename.extc:\path\filename.ext A web address is a good example of a filename, e.g., web address is a good example of a filename, e.g., Some systems are case sensitive (Unix)Some systems are case sensitive (Unix) Others are not (Windows)Others are not (Windows) Always be aware ofAlways be aware of Physical location (local, removable, network)Physical location (local, removable, network) Path (folders and subfolders)Path (folders and subfolders) Filename (proper)Filename (proper) ExtensionExtension Demo Windows Network Explorer: right-click Start Bar > ExploreDemo Windows Network Explorer: right-click Start Bar > Explore

14 File extensions you should know Extension Software program.qes EpiInfo/EpiData questionnaire.rec EpiInfo/EpiData records (data).chk EpiInfo/EpiData check (controls & labels).not EpiData notes (data documentation).sav SPSS permanent data file.sps SPSS syntax file (program).txt Generic (flat) text data.htm Web Browser.doc Microsoft Word.xls Microsoft Excel

15 Selected EpiData Variable Types Variable Type Examples Text _ _ Numeric # ##.# Date Auto ID Sondex (sanitized)

16 EpiData Variable Names Variable name based on text that occurs before variable type indicator codeVariable name based on text that occurs before variable type indicator code EpiData variable naming default vary depending on installationEpiData variable naming default vary depending on installation Create variable names exactly as specifiedCreate variable names exactly as specified To be safe, denote variable names in {curly brackets} For example, to create a two byte numeric variable called age, use the question:For example, to create a two byte numeric variable called age, use the question: What is your {age}? ##

17 Demo / Work Along Create QES file [demo.qes]Create QES file [demo.qes] Convert QES to REC [demo.rec]Convert QES to REC [demo.rec] Create CHK file [demo.chk]Create CHK file [demo.chk] Create double entry file [demo2.rec]Create double entry file [demo2.rec] Enter dataEnter data Validate dataValidate data FnameLnameDOBSEXDEATHAGE JohnSnow3/15/ GeorgeOrwell6/25/

18 We will stop here and pick up the second part of the lecture next week “Stay tuned”

19 Codebooks Contain info that helps users decipher data file content and structureContain info that helps users decipher data file content and structure Includes:Includes: Filename(s)Filename(s) File location(s)File location(s) Variable namesVariable names Coding schemesCoding schemes UnitsUnits Anything else you think might be usefulAnything else you think might be useful

20 EpiData codebook generators

21 File Structure Codebook Full codebook contains descriptive statistics (demo)

22 Full Codebook Notice descriptive statistics

23 Conversion of Data File Requires common intermediate file formatRequires common intermediate file format Examples of common intermediate filesExamples of common intermediate files.TXT = plain text.TXT = plain text.DBF = dBase program.DBF = dBase program.XLS = Excel.XLS = Excel StepsSteps Export.REC file .TXT fileExport.REC file .TXT file Import.TXT file into SPSSImport.TXT file into SPSS Save permanent SAV fileSave permanent SAV file

24 Current Export Formats Supported by EpiData

25 Plain (“raw”) TXT data plain ASCII data formatplain ASCII data format no column demarcationsno column demarcations no variable namesno variable names no labelsno labels

26 TXT file with codebook tox-samp.txttox-samp.not

27 SPSS Data Export / Import TXT (raw data) REC SPS (syntax) SAV

28 Top of tox-samp.sps Lines beginning with * are comments (ignored by command interpreter) Next set of commands show file location and structure via SPSS command syntax

29 Bottom part of tox-samp.sps file Labels being imported into SPSS Delete * if you want this command to run

30 Opening the SPS (command) file

31 Running the SPS file

32 Ethics of Data Keeping Confidentiality (sanitized files – free of identifiers)Confidentiality (sanitized files – free of identifiers) BeneficenceBeneficence EquipoiseEquipoise Informed consent (To what extent?)Informed consent (To what extent?) Oversight (IRB)Oversight (IRB)


Download ppt "DATA MANAGEMENT Using EpiData and SPSS References Public domain (pdf) book on data management: Bennett, et al. (2001). Data Management for Surveys and."

Similar presentations


Ads by Google