HRP223 2008 Administrivia - General zThe course website has critical details: www.stanford.edu/class/hrp223/ zYou will fail the course if you send me a document that contains a virus or other malicious code. There is no forgiveness for this offense and this is not open to debate.
HRP223 2008 Administrivia – Goals zThis course will provide practical solutions to problems that arise before doing analyses as well as the final push toward getting the results. zI will talk about issues like security, finding unruly data, massaging data into a useful format, building datasets of valid data and choosing statistics.
HRP223 2008 Administrivia – Getting Help zLamiya Sheikh email@example.com is the TA for the firstname.lastname@example.org zOur office hours will be announced weekly. I will be available for online Q&A at email@example.com or preferably, on the class newsgroup. I will answer questions every morning around firstname.lastname@example.org zThings labeled “Assignment”, but not “Homework”, can be done with the help of classmates. zYou are strongly encouraged to discuss your problems up until you start writing your answers to the homework problems.
HRP223 2008 Administrivia – Real Data! zThere will be almost no toy data sets in this class. yYour solutions will work on huge datasets. zYou will use generic (ungrouped) data. The data will be very close to reality. yI will not invent any problems. zBecause most of the data is “live”, I will introduce small changes to the data to prevent you from beating the authors to press.
HRP223 2008 Administrivia – Which SAS? zThe homework problems will require you to work with SAS. I will be showing you SAS/Enterprise Guide which only runs on Windows.
HRP223 2008 Getting a Computer zIf you want to get a new computer, you can get one at a very good price through Stanford. You can get ideas on what is an acceptable computer here: www.stanford.edu/dept/itss/ess/adminapps/recommended.html
HRP223 2008 Free Stanford Tools zYou can get access to free software from Stanford by going here: www.stanford.edu/dept/itss/ess/ zYou must use antivirus software to use a computer these days. yThe Symantec/Norton Antivirus which has been used for years is going away at the end of the month. Upgrade now.
HRP223 2008 Virus and Worm Issues (3) zVirus scan before you email me anything! zRight click on the file you want to scan and then pick Scan with Sophos Anti-Virus….
HRP223 2008 Stanford Desktop Tools zThis allows you to install and update BigFix, Security Self-Help and Open AFS and other tools. yBigFix automatically checks for important software updates. ySecurity Self-Help checks and allows you to fix security weaknesses on your machine. yOpen AFS lets you have access to your UNIX account like it is just another Windows hard drive.
HRP223 2008 AFS zYou have a website made for you already: ywww.stanford.edu/~YOUR_SUNET_ID zUNIX stuff yYou can use Stanford Desktop Tools to mount your UNIX drive and get stuff on the web quickly with Open AFS www.stanford.edu/services/afs/intro/index.html www.stanford.edu/services/web/howto.leland.html yDo NOT put confidential/HIPAA sensitive stuff out there.
Passwords zThe Leland system places restrictions on passwords. You should set your passwords on other machines to be just as hard to crack. www.stanford.edu/services/unix/passwords.html zYou can use Stanford’s Security Self-Help Tool which comes with Stanford Desktop Tools to check your passwords. zIf you do not know how to set or change your password look here: www.stanford.edu/group/security/securecomputing/setpass.html
HRP223 2008 Security zEvery year I get viruses and worms sent to me unwittingly. zFour years ago the department had half a dozen machines “hacked-into” by an unknown assailant, giving the person total control over the machines. zEvery day I get dozens or hundreds of hacker/cracker “probes” looking for weaknesses in my Windows XP machine’s security. zAssume that somebody is always looking over your shoulder on the web and people are reading your email.
HRP223 2008 Security (2) zThe biggest weaknesses in computer security are the legal users of the system. yWalking away from a terminal yUsing passwords that are easy to crack yTaking data off of restricted machines yViruses and Trojan horses will kill you if you let them!
HRP223 2008 Microsoft’s Critical Mistakes zMicrosoft is notorious for producing programs with security problems. The latest operating systems have built-in tools to fix problems when Microsoft fixes them. zWith XP with SP 2 or SP 3, you can easily set your machine to update itself. You should run Windows Update and download and apply all critical security updates often.
Security - Email zEmail provides all the confidentiality of a postcard. zSecure your email: yThere are programs which will scramble your email while it is in route, effectively making it impossible for people to read it without your permission. yAsk your security professional for help.
HRP223 2008 Security – Unsolicited Email zSpam™, Spam™, Spam™, wonderful Spam™, yes wonderful Spam™ zYou may get unsolicited commercial solicitations, advertisements, chain letters, or pornography through your Stanford email account. yNEVER respond to these messages, never use the REMOVE provided in the email. yNEVER put your email address on a web page.
HRP223 2008 SPAM Filter and Malicious Emails zYou can tell the Stanford mail system to filter your mail and automatically remove things that are probably junkmail. Go here: tools.stanford.edu and you can set your mail to be filtered. Definitely have it remove spam marked with SPAM: ##### tools.stanford.edu zA fairly new attack is to embed database access code in the body of an email. When your virus scanner notices this it will treat your entire inbox as if it has a virus in it. This can be very bad if your virus scanner is set to delete all files with viruses.
HRP223 2008 Back up your work! zEach year, on average, one student in five loses all their work. Plan on your computer being destroyed at the worst possible time this year. yCoffee, computer worm or virus, small child with refrigerator magnet, physical hard drive failure, theft, bicycle crash, etc. zEvery day back up your work to more than one location.
HRP223 2008 Where to Backup zPLEASE use removable media if you have no network access – yFloppy disk, CD, DVD, flash media zNEVER backup confidential data (HIPPA sensitive data) to mobile media without talking to security experts first.
HRP223 2008 How to Backup zYou will forget to back up your work. If you can, use a program to do the backup automatically. zI use an inexpensive program called Second Copy 2000 by Centered Software. www.centered.com zIt copies all my work to the department’s server and even keeps the old version of my work. zTalk to your security expert.
HRP223 2008 Other Tools I Use zI keep a list of useful links here: www.stanford.edu/class/hrp223/2008/usefulLinks.html
HRP223 2008 UltraEdit zIf you work with text files, get UltraEdit and buy the perpetual license. www.ultraedit.com
HRP223 2008 UltraCompare zTo track changes in code or other text files www.ultraedit.com/products/ultracompare.html
HRP223 2008 FileLocator Pro zIf you can’t find files on your machine, get FileLocator Pro. www.mythicsoft.com/default.aspx
HRP223 2008 MyInfo zIf you need to keep track of tons of random facts (like code snippets) get MyInfo www.milenix.com
HRP223 2008 Data Management and Analysis zUse the software which has handy support. SAS, Stata, SPSS and S-Plus (but not R) are fairly user- friendly. The strengths of each: yR is free. xInstall the Rcmdr package, then type library(Rcmdr) yS-plus is wonderful if you are going to invent statistics. ySAS is strong for major data manipulation and database processing. yUse SPSS if you want a clean graphical user interface (GUI) or if you are statistics phobic.
HRP223 2008 SAS vs. S-Plus zI believe that SAS is the de facto standard for biological, clinical, and medical research in the USA (as well as the rest of the world). R and S-Plus are very popular with the statisticians on campus. zVirtually all pharmaceutical companies use SAS for analysis of clinical trial data for assessment of safety and efficacy of drugs. zS-Plus’ strengths are in graphics and developing new statistics (and perhaps its object-oriented model). Its weakness is poor usability for non-programmers. However, it is making huge gains in usability. zI find S-plus relatively difficult to use for data management.
HRP223 2008 R/S-Plus zIf you would like to learn R or S-Plus, I strongly recommend that you go with S- Plus for Windows. zCome talk to me for reference books.
HRP223 2008 Where can I get SAS? zIf you have $60 for the year www.stanford.edu/services/softwarelic/sas/ zIf you want to use the computer lab med.stanford.edu/irt/classrooms/features/computer_labs.html
HRP223 2008 Which Parts to Install zDuring the install it will ask you what components to install. NO NOT USE THE DEFAULT ACADEMIC INSTALL. It is bugged and will not give you stuff you need. Check on everything listed on the next slide:
HRP223 2008 Install these zSAS/ACCESS Interface to DB2 zSAS/ACCESS Interface to MySQL zSAS/ACCESS Interface to Netezza zSAS/ACCESS Interface to ODBC zSAS/ACCESS Interface to OLE DB zSAS/ACCESS Interface to ORACLE zSAS/ACCESS Interface to PC Files zSAS/ACCESS Interface to SYBASE zSAS/AF Software zSAS/ASSIST Software zSAS/CONNECT Software zSAS/EIS Software zSAS Bridge for ESRI zSAS/ETS Software zSAS/FSP Software z SAS/GRAPH Software z SAS/IML Software z SAS/INSIGHT Software z SAS/LAB Software z SAS/OR Software z SAS/QC Software z SAS/SECURE z SAS/SHARE z SAS/STAT z SAS/ACCESS z Enterprise Miner Client Solution z SAS/Genetics z SAS Text Miner Client z SAS Text Miner for Spanish
HRP223 2008 Updating SAS zMake sure to patch your version of SAS with service patch 4 for SAS version 9.1.3 and all Alert status patches. ftp.sas.com/techsup/download/hotfix/hotfix.html zAlso patch Enterprise Guide 4.1. ftp.sas.com/techsup/download/hotfix/ent_guide41.html#018313 zSign up for email notification of new patches (called TSNEWS-L): support.sas.com/documentation/periodicals/index.html#ts support.sas.com/documentation/periodicals/index.html#ts
HRP223 2008 Modern SAS zEnterprise Guide organizes work into projects and uses a flowchart analogy to show what is done. zEnterprise Guide builds code for you and it is very good for building analyses, but data management is still best done with some code.
HRP223 2008 Things You Do With SAS zUse it as an overpriced calculator … or zGet data into the system. zGet to know your data. zFind subsets of your data. zPerform analyses. zVisualize your results. zShare the information. All these things can be done by typing code or using point-and- click tools. Some things are best done with code.
HRP223 2008 Basics zWhile most people use SAS for processing complex collections of data, it can be used for simple math. The techniques that you use for simple math are also used to make complex changes to any size data sets.
HRP223 2008 Basic Math To do a simple calculation you do the following: 1.Give SAS a formula. 1+1 2.Tell it what to call the results. theAnswer 3.Print the results out. putlog _____ 4.Tell it you are done giving it instructions. 5.Tell it to carry out the instructions.
HRP223 2008 zTell it to create a code object in the flowchart.
HRP223 2008 Basic Math zYou put the instructions together by typing a program into the code window, like this: data _null_; theAnswer = 1 + 1; putlog theAnswer; run; zRun it. Don’t bother to store the results.
HRP223 2008 Basic Math zThe log window shows you SAS’s thoughts about your code. zWhen you make a mistake in your code, the line numbers can point you toward the answer. The count of how many lines have been submitted The Answer
HRP223 2008 Basic Math zIf you want to save the results into a table that looks like a spreadsheet, provide the name of the dataset on the line that has the key word data, like this: data someData; theAnswer = 1 + 1; run; Save the results in a dataset called someData.
HRP223 2008 Viewing the Table zYou see the content of a table displayed automatically when it is created or you can double click on it.
HRP223 2008 Datasets zSince the introduction of SAS 7, dataset names can be from 1 to 32 characters. Prior to SAS 7, they had to be 8 characters or less. zThe names can begin with a letter or an underscore (i.e., _ ). They can contain letters, numbers or underscores. zCapitalization does not matter to SAS but mixed case can make your names easier to read. zMake your dataset names meaningful. y“Demographics” or “demo” are much better names than “d”.
HRP223 2008 Basic Math zThe note in the log which appears after you push the run button tells you that SAS successfully created a new dataset. While I specified the name of the data set as “someData”, SAS uses “work.someData”. This work ‘library’ is just shorthand notation referring to a folder on your hard drive that is emptied and deleted every time you quit SAS. So, the dataset “someData” is stored in the work folder and it will be destroyed when you quit.
HRP223 2008 Notes on Notes zThe note says that the dataset has 1 observation. That means that the table has just one row. The 1 variable statement means that the table has only one column. SAS datasets can contain millions of observations and can contain tens of thousands of variables.
HRP223 2008 Not So Basic Math zYou can use the same trickery to do more complex math: data mathStuff; x = 24; square = x ** 2; poly = (x**3/3)-(x**2/2)-6x-4; putlog square= poly=; run;
HRP223 2008 A Calculator with Functions zIf you remember your calculator from when you learned trigonometry, you will recall that it had function buttons to do things like calculate a cosine or sine. SAS has those functions and hundreds more. You tell SAS to do a function by typing the code word for the function you want done, followed by some details in parentheses.
HRP223 2008 Function Example data _null_; someTrigThing = sin(1); putlog someTrigThing; run;
HRP223 2008 Function Example (2) zYou can use variables with functions like this: data _null_; numberOne = 1; someTrigThing = sin(numberOne); putlog someTrigThing; run;
HRP223 2008 Functions zI will introduce you to dozens of functions later. The important thing to remember is that they all work the same way. You type the function name with “arguments” in parentheses. You probably will never need trig functions, but other functions are extremely useful when you are taking statistics classes. Rather than looking up “density functions” in tables, you can get SAS to give you the values.
HRP223 2008 How can you find a function? zSay you need a function to compute some crazy thing like factorial. y5! = 5*4*3*2*1 zYou can write the math yourself. data _null_; fiveFactorial = 5*4*3*2; putlog fiveFactorial=; run;
HRP223 2008 How can you find a function? (2) zOr you can look up the function in the SAS online documentation. zOnLineDoc is: support.sas.com/onlinedoc/913/docMainpage.jsp support.sas.com/onlinedoc/913/docMainpage.jsp
How can you find a function? (3) zOnce you are at the documentation, you can search using keywords. yPick search for words then select all documentation. zIn this case I looked for “factorial function” and one of the results was this: zFunctions and CALL Routines : FACT Function zThat link gave me the syntax for factorial Fact(n) zSo my code can be simplified to data _null_; fiveFactorial = fact(5); putlog fiveFactorial=; run; zThe link also gave me several other related functions that math people seem to obsess over like comb and perm…. You can read what those are at your leisure.
HRP223 2008 SAS Programming zAs you will discover, SAS programming involves mastering only five things: yDescriptive comments xnotes to you and other programmers ySAS options xwhere your data is and how pages are formatted yData steps xmanipulate data yProcedures xsummarize data yMacro commands xautomate repetitive tasks