Presentation is loading. Please wait.

Presentation is loading. Please wait.

Computing at Stanford and Introduction to SAS HRP223 – Topic 0 Sept 24 th, 2012 Copyright © 1999-2012 Leland Stanford Junior University. All rights reserved.

Similar presentations


Presentation on theme: "Computing at Stanford and Introduction to SAS HRP223 – Topic 0 Sept 24 th, 2012 Copyright © 1999-2012 Leland Stanford Junior University. All rights reserved."— Presentation transcript:

1 Computing at Stanford and Introduction to SAS HRP223 – Topic 0 Sept 24 th, 2012 Copyright © 1999-2012 Leland Stanford Junior University. All rights reserved. Warning: This presentation is protected by copyright law and international treaties. Unauthorized reproduction of this presentation, or any portion of it, may result in severe civil and criminal penalties and will be prosecuted to maximum extent possible under the law.

2 Objectives Administrivia Software tools at Stanford – Security at Stanford Software tools not endorsed by Stanford Data SAS

3 General The course website has critical details: www.stanford.edu/class/hrp223/ If you can, please print the slides just before the start of class. Administrivia

4 Goals This course will provide practical solutions to problems that arise before doing analyses as well as the final push toward getting the results. I will talk about issues like finding unruly data, massaging data into a useful format, building datasets of valid data and choosing statistics. Administrivia

5 Getting Help Mike Hurley mphurley@stanford.edu is the TA for the course.mphurley@stanford.edu His office hours will be announced weekly. I will be available for online Q&A at balise@stanford.edu or preferably, on the class newsgroup. I will answer questions every morning around dawn. If you post to the newsgroup and do not hear back quickly please email me.balise@stanford.edu Things labeled “Assignment”, but not “Homework”, can be done with the help of classmates. You are strongly encouraged to discuss your problems up until you start writing your answers to the homework problems. Administrivia

6 Preliminaries I assume you know how to use Windows or Mac OS. For this class you need access to a machine with: – Windows XP Pro or Vista Business/Ultimate – Windows 7 Professional/Business/Ultimate. XP Home Edition or Vista Home Edition will not work and Windows 7 Home Premium may work with the software in this class. I use: XP Pro, 7 Pro, and XP Pro running in Parallels on the Mac. Administrivia

7 Getting a Computer If you want to get a new computer, you can get one at a very good price through Stanford. You can get ideas on what is an acceptable computer here: itservices.stanford.edu/service/helpdesk/recommended You want to have XP Pro or the Business or Ultimate version of Vista or ideally Windows 7 Professional. Administrivia

8 Use Lane This is useful!

9 http://lane.stanford.edu/help/cool-tools-proxyBookmarklet.html This too! Bookmark at Lane.

10 Not All Books are Indexed Stanford has many online tech books which are not indexed at Lane or the main campus index, searchworks: searchworks.stanford.edu searchworks.stanford.edu

11 The outside book is now bookmarked on all my machines.

12 Free Stanford Tools You can get access to free software from Stanford by going here: https://itservices.stanford.edu/service/ess You must use antivirus software. You will fail the course if you send me a document that contains a virus or other malicious code. There is no forgiveness for this offense and this is not open to debate. Stanford Software

13 Get the Sophos Scanner Stanford Software

14 Virus and Worm Issues Virus scan before you email me anything! Right click on the file you want to scan and then pick Scan with Sophos Anti-Virus Sophos keeps itself updated constantly. Stanford Software

15 – Sophos Anti-Virus (For both Windows & Mac OS) Watches for suspicious things and stops them until you authorize the software Stanford Software If your quarantine has a file get help You can submit suspicious files

16 Stanford Desktop Tools This allows you to install and update BigFix, Security Self-Help and Open AFS and other tools. – BigFix automatically checks for important software updates. – Security Self-Help checks and allows you to fix security weaknesses on your machine. – Open AFS lets you have access to your UNIX account like it is just another Windows hard drive. Stanford Software

17 Stanford Desktop Tools Stanford Software

18 Your UNIX Account You have a website made for you already: – www.stanford.edu/~YOUR_SUNET_ID UNIX stuff https://itservices.stanford.edu/service/afs – If you do not want AFS you can also use SecureFX which you can get from ESS or just go to afs.stanford.edu afs.stanford.edu – You can use Stanford Desktop Tools to mount your UNIX drive just like another hard drive. I get stuff on the web quickly with Open AFS – Do NOT put confidential/HIPAA sensitive stuff out there. Stanford Software

19 afs.stanford.edu is the easy way to move files to your UNIX space.

20 Use Your Website

21 Use AFS and your Website Mount your drive then you can put stuff in the WWW folder! Install OpenAFS

22 My UNIX Space Stanford Software

23 After AFS is Installed Stanford Software

24 SecureFX Stanford Software

25 Secure AFS You can make a space that can hold PHI and be shared by anybody with a SUNet ID. 1.Setup the workgroup that will serve as your access control list: http://workgroup.stanford.edu 2.Request the Secure AFS space: https://tools.stanford.edu/cgi-bin/secure-group-request 3.Request the Secure AFS space: https://tools.stanford.edu/cgi-bin/secure-group-request To access you need OpenAFS client installed: http://www.stanford.edu/service/openafs as well as Kerberos installed (Windows): https://itservices.stanford.edu/service/ess/pc/kfw or configured (Mac): https://itservices.stanford.edu/service/ess/mac/kfm

26 stanford.box.edu You have drop box that is NOT safe for protected health information.

27 Stanford Software

28 Passwords The Leland system places restrictions on passwords. You should set your passwords on other machines to be just as hard to crack. https://itservices.stanford.edu/service/unixcomputing/unix/passwords You can use Stanford’s Security Self-Help Tool which comes with Stanford Desktop Tools to check your passwords. Security

29 General Security The biggest weaknesses in computer security are the legal users of the system. – Walking away from a terminal – Using passwords that are easy to crack – Taking data off of restricted machines – Viruses and Trojan horses will kill you if you let them! Security

30 Email Email provides all the confidentiality of a postcard. If you are sending HIPAA sensitive information you can secure your email: https://itservices.stanford.edu/service/secureemail Security

31 Unsolicited Email Spam™, Spam™, Spam™, wonderful Spam™, yes wonderful Spam™ You may get unsolicited commercial solicitations, advertisements, chain letters, or pornography through your Stanford email account. – NEVER respond to these messages, never use the REMOVE provided in the email. – NEVER put your email address on a web page. Security

32 At webmail.stanford.edu you can choose the Preferences tab and Filters from the left to automatically sack repeat offenders.webmail.stanford.edu Security

33 Back up your work! Each year, on average, one student in five loses all their work. Plan on your computer being destroyed at the worst possible time this year. – Coffee, computer worm or virus, small child with refrigerator magnet, physical hard drive failure, theft, bicycle crash, etc. Every day back up your work to more than one location. Security

34 Where to Backup PLEASE use removable media if you have no network access – – Floppy disk, CD, DVD, flash media NEVER backup or share confidential data (HIPPA sensitive protected health information) on mobile media without talking to security experts first. At home I use www.crashplan.com. Ask your Tech support person for recommendations.www.crashplan.com Security

35 Encrypted USB drives USB drives (also called thumb drives) are a very convenient way to keep backups and allow you to move your data around. However, they are very easy to lose! NEVER store unencrypted, restricted data on a USB drive. You can encrypt at the file level (Excel, winZip) – ok You can encrypt the whole drive (PGP disk, TrueCypt) – Better. You can have a hardware encrypted USB drive – BEST! – There are many manufacturers, however, most are Windows only. – IronKey supports both Windows and Mac and is highly recommended. 1 Gig for $50 up to 32 Gig for $250 on Amazon Security

36 Get an Encrypted Flash Drive IronKey 8 GB > $100 Corsair 8 GB Padlock 2 ~ $30

37 Keep Your Phone Safe Android phones can not be used to access Protected Health Information… itservices.stanford.edu/service/mobiledevice/cellular/android Securing an iPhone is easy with Mobile Device Management (MDM): itservices.stanford.edu/service/mobiledevice/management

38 Protect Your Data

39 Two-Step Authentication!

40 https://accounts.stanford.edu/ Click Manage then turn on Two-Step Auth

41 Data Management and Analysis Tools of the Trade Containers to hold data – Microsoft Excel – REDCap Analysis tools – SAS with Enterprise Guide – R with Rcmdr Other Software

42 Excel is not a good place for HIPAA sensitive (PHI) material makes it easy to enter bad data can be a huge headache to import Other Software

43 REDCap is a good place for HIPAA sensitive (PHI) material makes it hard to enter bad data is mostly painless to import for analysis Other Software

44 SAS 9.3 SAS is an old programming language where you type commands and run a bunch of things at once. Other Software

45 Enterprise Guide 5.1 EG is a newish programming environment where you type commands or point and click. Other Software

46 R 2.15.1 http://cran.cnr.Berkeley.eduhttp://cran.cnr.Berkeley.edu R is a modern programming language with user hostile help files…. Other Software

47 R Studio http://www.rstudio.org/http://www.rstudio.org/ Studio is an Integrated Development Environment (IDE) for R.

48 R Commander Rcmdr is a friendly, but incomplete, graphical user interface (GUI) for R. Other Software

49 Getting SAS If you have a machine with XP, Vista or Windows 7 Pro, Business or Ultimate and more than 30 Gig of extra hard drive space you can get SAS for $65 per year. Place the order here: https://itservices.stanford.edu/service/softwarelic/sas – There is a digital download that is HUGE (15+ Gig not Meg). If you have a wired connection on campus use it. The instructions for installing it can be found here: http://www.stanford.edu/class/hrp223/2012/InstallingSAS93_20120702.pptx Other Software

50 SAS for Free on Campus If you don’t mind working in a public place, SAS is in the Lane library and M202 lounge. med.stanford.edu/irt/classrooms/features/computer_labs.html Other Software

51 Other Tools I Regularly Use File manipulation – NotePad++ – UltraEdit – Ultracompare Info Management – FileLocator Pro – Google Sites Other Software

52 NotePad++ Excellent free text editor

53 UltraEdit If you work with huge text files, get UltraEdit and buy the perpetual license. www.ultraedit.com Other Software

54 UltraCompare A tool to track changes in code or other text files www.ultraedit.com/products/ultracompare.html Other Software

55 FileLocator Pro If you can’t find files on your machine, consider FileLocator Pro. www.mythicsoft.com/default.aspx Other Software

56 Google Sites If you need to keep track of tons of random facts (like code snippets) consider using Google Sites. https://sites.google.com/ Other Software

57

58 What is Data? Stuff that … – will make you famous or cry – you want to pull from the electronic medical record – the information you will need to store if it is not in the medical record Data

59 Structured vs. Unstructured Unstructured data – Text like dictations, operation notes, data entry comments – Difficult to process Structured data – Afford the ability to build ontologies – Dates – Pick lists (multiple choice) – Relatively easy to process Data

60 Structuring Biomedical Data RxNORM for drug ingredients / brand names ICD-9 for billing diagnostic and procedure codes – fairly coarse but nicely hierarchical ICD-O for detailed cancer pathology CPT for procedures – No hierarchical structure, difficult to search SNOMED-CT – for general purpose clinical terms – Hierarchical, detailed and vast but with some gaps Data

61 What is structured data? All pieces of information that you collect and calculate as part of a study are data. Every person’s response to a questionnaire is called a data point. There are two fundamentally different types of data: numeric and character. – Numeric data is always … numeric. Information that you could want to do math on is numeric data. – Character data is alphanumeric. It includes the obvious things like names and addresses, but it also includes numbers that you should not do math on. Some systems, like R, make finer distinctions and let you set data so they are forced to be factors. Data

62 What is data coding? A question such as, “What is your current age in years?” is going to generate numeric data. A question such as, “At what age did you first contract a sexually transmitted disease?” is going to generate numeric data …. But you are going to need to allow for the possibility that somebody has never contracted a sexually transmitted disease. … and you always need to allow for people who never knew or do not remember information or who may be dishonest in their answers. Data

63 What is data coding? (2) When you have a question that generates numeric data and your subject’s response is not a “real number” you can code a bogus value. – “Not applicable” can be coded as age –1000000. – “Do not know” can be coded as –2000000. The better way to deal with this problem is to use the value “NULL.” – SAS allows you to code 27 different types of NULL. – Null values make your job easier when you try to do math on the values. Data

64 Missing Data SAS represents missing character data as a pair of quotes with nothing between them and missing numbers are stored as a decimal place. You can also use.A,.B, etc. to code for missing numbers but you can’t enter them directly. Data

65 What is data coding? (3) Questions that generate alphanumeric data are always complex compared to numeric data. “Where were you born?” can be coded as a string of letters from a fill-in-the-blank question or coded as letters or numbers from a multiple choice format question. – Do not use null in fill-in-the-blanks. Data

66 Typical Tasks Importing data Cleaning Making a subset Numeric and graphical summaries Analyses with graphics Summary reports or Doing simple math Data

67 Basics While most people use SAS for processing complex collections of data, it can be used for simple math. The techniques that you use for simple math are also used to make complex changes to any size data sets. I hope this stuff will make your lives easier in statistics classes… SAS

68 Using EG for Math SAS

69

70 A data set is shown in the flowchart. It’s contents are displayed in the programming windowpane. You can see it stored in the temporary “work library” by browsing the Server List. SAS Make a temporary dataset to hold the answer.

71 The Log tab gives you feedback on what SAS did. SAS

72 No Need for a Data Set For a simple calculation you do not need to make a dataset to hold a single number. You have the number show up in the log window. 1.Give SAS a formula. 1+1 2.Tell it what to call the results. theAnswer 3.Print the results out. putlog theAnswer = 4.Tell it you are done giving it instructions. Use short meaningful names that do not include spaces, punctuation characters, or leading numbers. SAS

73 Basic Math You put the instructions together by typing a program into the code window, like this: data _null_; theAnswer = 1 + 1; putlog theAnswer =; run; Run it. Don’t bother to store the results in a dataset. SAS

74 The count of how many lines have been submitted The Answer SAS

75 Don’t panic…. The help that ships with SAS is good. It is its own program hidden in the documentation subfolder inside the SAS folder off the Windows start button.

76 Search for functions and call routines by category

77 Click the Favorites tab.

78 Final Administrivia Please save a table for the people who are officially enrolled (or are taking the class for deferred credit). Bring a laptop with SAS if possible. Grades (pass/fail only) – Pass 4 of 4 homework assignments for 3 units – Pass 3 of 4 homework assignments for 2 units


Download ppt "Computing at Stanford and Introduction to SAS HRP223 – Topic 0 Sept 24 th, 2012 Copyright © 1999-2012 Leland Stanford Junior University. All rights reserved."

Similar presentations


Ads by Google