Presentation is loading. Please wait.

Presentation is loading. Please wait.

Jon K. Peck Technical Advisor SPSS Inc. May, 2006

Similar presentations


Presentation on theme: "Jon K. Peck Technical Advisor SPSS Inc. May, 2006"— Presentation transcript:

1 Jon K. Peck Technical Advisor SPSS Inc. peck@spss.com May, 2006
Programmability in SPSS 14: A Radical Increase in Power A Platform for Statistical Applications Jon K. Peck Technical Advisor SPSS Inc. May, 2006 The plan is as follows: We will first look at the new set of four general power features, focusing mainly on programmability. We will look at some code in the demos, but the focus today is on what you can do with a bit of the how to. Now let’s look at the five power features. Copyright (c) SPSS Inc, 2006

2 GPL provides new programming power for graphics.
The Five Big Things External Programming Language (BEGIN PROGRAM) Multiple Datasets XML Workspace and OMS Enhancements Dataset and Variable Attributes Drive SPSS Processor Externally Working together, they dramatically increase the power of SPSS. SPSS becomes a platform that enables you to build statistical/data manipulation applications. The external programming language is the focus of this talk, but it interacts closely with other features Not to mention GPL for graphics. First a brief discussion of #2 through #5, then we will go into #1. GPL provides new programming power for graphics. Copyright (c) SPSS Inc, 2006

3 Multiple Datasets Many datasets open at once
One is active at a time (set by syntax or UI) DATASET ACTIVATE command Each dataset has a Data Editor window Copy, paste, and merge between windows Write tabular results to a dataset using Output Management System Retrieve via Programmability No longer necessary to organize jobs linearly No need to open a dataset and save and close it in order to open another. match/add can work with the active data and other datasets. Especially useful when merging non-SPSS datasets. Copyright (c) SPSS Inc, 2006

4 XML Workspace Store dictionary and selected results in workspace
Write results to workspace as XML with Output Management System (OMS) Retrieve selected contents from workspace via external programming language Persists for entire session Communication point OMS can also create a new dataset that can be retrieved in the program as case data. Use Xpath to select parts of objects or retrieve entire XML tree and parse in the external programming language. We expect to do more with the workspace in future Copyright (c) SPSS Inc, 2006

5 OMS Output: XML or Dataset
Write tabular results to Datasets with OMS Main dataset remains active Prior to SPSS 14, write to SAV file, close active, and open to use results Tables can be accessed via workspace or as datasets XML workspace and XPath accessors are very general Accessed via programmability functions Dataset output more familiar to SPSS users Accessed via programmability functions or traditional SPSS syntax Use with DATASET ACTIVATE command Copyright (c) SPSS Inc, 2006

6 Attributes Extended metadata for files and variables
VARIABLE ATTRIBUTE, DATAFILE ATTRIBUTE Keep facts and notes about data permanently with the data. E.g., validation rules, source, usage, question text, formula Two kinds: User defined and SPSS defined Saved with the data in the SAV file Can be used in program logic SPSS has metadata already: var labels, value labels, missing values etc. Now users can create their own properties/attributes Copyright (c) SPSS Inc, 2006

7 Programmability Integrates external programming language into SPSS syntax BEGIN PROGRAM … END PROGRAM set of functions to communicate with SPSS SPSS has integrated the Python language SDK enabling other languages available New: VB.NET available soon External processes can drive SPSS Processor VB.NET works only in this mode SPSS Developer Central has SDK, Python Integration Plug-In, and many extension modules Available for all SPSS 14 platforms This is something radical. These are different from input programs or transformation programs SPSS has integrated Python, and, using the SDK, you can integrate other languages such as .NET if you want to use that. Now let’s talk about the Python language a little. Copyright (c) SPSS Inc, 2006

8 The Python Language Free, portable, elegant, object oriented, versatile, widely supported, easy to learn,… Download from Python.org. Version or later required Python tutorial Python user discussion list The Cheeseshop: Third-party modules Python is not on the SPSS 14 cd or on Developer Central Get it from the Python web site SPSS does not control the Python language or its future development. Copyright (c) SPSS Inc, 2006

9 Legal Notice SPSS is not the owner or licensor of the Python software. Any user of Python must agree to the terms of the Python license agreement located on the Python web site.  SPSS is not making any statement about the quality of the Python program. SPSS fully disclaims all liability associated with your use of the Python program. By opening SPSS to third party languages, we can take advantage of progress on those and do more, faster. Not limited by our own resources Now what can you do with Programmability? Copyright (c) SPSS Inc, 2006

10 Programmability Enables…
Generalized jobs by controlling logic based on Variable Dictionary Procedure output (XML or datasets) Case data (requires SPSS ) Environment Enhanced data management Manipulation of output Computations not built in to SPSS Use of intelligent Python IDE driving SPSS (14.0.1) statement completion, syntax checking, and debugging External Control of SPSS Processor Copyright (c) SPSS Inc, 2006

11 Programmability Makes Obsolete…
SPSS Macro except as a shorthand for lists or constants Learning Python is much easier than learning Macro SaxBasic except for autoscripts but autoscripts become less important These have not gone away. The SPSS transformation language continues to be important. Copyright (c) SPSS Inc, 2006

12 Demonstration Code and supporting modules can be downloaded from SPSS Developer Central examples are on the CD Copyright (c) SPSS Inc, 2006

13 Initialization for Examples
* SPSS Directions, May * In preparation for the examples, specify where SPSS standard data files reside. BEGIN PROGRAM. import spss, spssaux spssaux.GetSPSSInstallDir("SPSSDIR") END PROGRAM. This program creates a File Handle pointing to the SPSS installation directory, where the sample files are installed Copyright (c) SPSS Inc, 2006

14 Example 0: Hello, world * EXAMPLE 0: My first program. BEGIN PROGRAM.
import spss print "Hello, world!" END PROGRAM. Inside BEGIN PROGRAM, you write Python code. import spss connects program to SPSS. Import needed once per session. Output goes to Viewer log items. Executed when END PROGRAM reached. Run Copyright (c) SPSS Inc, 2006

15 Example 1: Run SPSS Command
*Run an SPSS command from a program; create file handle. BEGIN PROGRAM. import spss, spssaux spss.Submit("SHOW ALL.") spssaux.GetSPSSInstallDir("SPSSDIR") END PROGRAM. Submit, in module spss is called to run one or more SPSS commands within BEGIN PROGRAM. One of many functions (API's) that interacts with SPSS. GetSPSSInstallDir, in the spssaux module, creates a FILE HANDLE to that directory Run Copyright (c) SPSS Inc, 2006

16 Example 2: Some API's * Print useful information in the Viewer and then get help on an API. BEGIN PROGRAM. spss.Submit("GET FILE='SPSSDIR/employee data.sav'.") varcount = spss.GetVariableCount() casecount = spss.GetCaseCount() print "The number of variables is " + str(varcount) + " and the number of cases is " + str(casecount) print help(spss.GetVariableCount) END PROGRAM. There are API's in the spss module to get variable dictionary information. help function prints short API documentation in Viewer. Run Copyright (c) SPSS Inc, 2006

17 Example 3a: Data-Directed Analysis
* Summarize variables according to measurement level. BEGIN PROGRAM. import spss, spssaux spssaux.OpenDataFile("SPSSDIR/employee data.sav") # make variable dictionaries by measurement level catVars = spssaux.VariableDict(variableLevel=['nominal', 'ordinal']) scaleVars = spssaux.VariableDict(variableLevel=['scale']) print "Categorical Variables\n" for var in catVars: print var, var.VariableName, "\t", "var.VariableLabel" Continued Copyright (c) SPSS Inc, 2006

18 Example 3a (continued) # summarize variables based on measurement level if catVars: spss.Submit("FREQ " + " ".join(catVars.variables)) if scaleVars: spss.Submit("DESC "+" ".join(scaleVars.variables)) # create a macro listing scale variables spss.SetMacroValue("!scaleVars", " ".join(scaleVars.variables)) END PROGRAM. DESC !scaleVars. " ".join(['x', 'y', 'z']) produces 'x y z' Run Copyright (c) SPSS Inc, 2006

19 Example 5: Handling Errors
* Handle an error. Use another standard Python module. BEGIN PROGRAM. import sys try: spss.Submit("foo.") except: print "That command did not work! ", sys.exc_info()[0] END PROGRAM. Errors generate exceptions Makes it easy to check whether a long syntax job worked Hundreds of standard modules and many others available from SPSS and third parties Run Copyright (c) SPSS Inc, 2006

20 Example 8: Create Basis Variables
* Create set of dummy variables for a categorical variable and a macro name for them. BEGIN PROGRAM. import spss, spssaux, spssaux2 mydict = spssaux.VariableDict() spssaux2.CreateBasisVariables(mydict.["educ"], "EducDummy", macroname = "!EducBasis") spss.Submit("REGRESSION /STATISTICS=COEF /DEP=salary" + "/ENTER=jobtime prevexp !EducBasis.") END PROGRAM. Uses dictionary object indexed by variable name. The first category is omitted. Discovers educ values from the data and generates appropriate transformation commands. Creates macro !EducBasis Run Copyright (c) SPSS Inc, 2006

21 Example 9: Merge Directory Contents
* Automatically add cases from all SAV files in a directory. BEGIN PROGRAM. import glob savlist = glob.glob("c:/temp/parts/*.sav") if savlist: cmd = ["ADD FILES "] + ["/FILE='" + fn + "'" for fn in savlist] + [".", "EXECUTE."] spss.Submit(cmd) print "Files merged:\n", "\n".join(savlist) else: print "No files found to merge" END PROGRAM. The glob module resolves file-system wildcards If savlist tests whether there are any matching files. Run Copyright (c) SPSS Inc, 2006

22 Example 10: Use Parts of Output - XML
* Run regression; get selected statistics, but do not display the regular Regression output. Use OMS and Xpath wrapper functions. BEGIN PROGRAM. import spss, spssaux spssaux.OpenDataFile("SPSSDIR/CARS.SAV") try: handle, failcode = spssaux.CreateXMLOutput(\ "REGRESSION /DEPENDENT accel /METHOD=ENTER weight horse year.", visible=False) horseCoef = spssaux.GetValuesFromXMLWorkspace(\ handle, "Coefficients", rowCategory="Horsepower", colCategory="B",cellAttrib="number") print "The effect of horsepower on acceleration is: ", horseCoef Rsq = spssaux.GetValuesFromXMLWorkspace(\ handle, "Model Summary", colCategory="R Square", cellAttrib="text") print "The R square is: ", Rsq spss.DeleteXPathHandle(handle) except: print "*** Regression command failed. No results available." raise END PROGRAM. Example assumes that SPSS Output Labels are set to "Labels", not "Names". A later example will show this using datasets instead of XML Run Copyright (c) SPSS Inc, 2006

23 Example 11: Transformations in Python Syntax
BEGIN PROGRAM. import spss, Transform spssaux.OpenDataFile('SPSSDIR/employee data.sav') newvar = Transform.Compute(varname="average_increase", varlabel="Salary increase per month of experience if at least a year",\ varmeaslvl="Scale",\ varmissval=[999,998,997],\ varformat="F8.4") newvar.expression = "(salary-salbegin)/jobtime" newvar.condition = "jobtime > 12" newvar.retransformable=True newvar.generate() # Get exception if compute fails Transform.timestamp("average_increase") spss.Submit("DISPLAY DICT /VAR=average_increase.") spss.Submit("DESC average_increase.") END PROGRAM. Run Copyright (c) SPSS Inc, 2006

24 Example 11A: Repeat Transform
BEGIN PROGRAM. import spss, Transform try: Transform.retransform("average_increase") Transform.timestamp("average_increase") except: print "Could not update average_increase." else: spss.Submit("display dictionary"+\ "/variable=average_increase.") END PROGRAM. Transformation saved using Attributes Run Copyright (c) SPSS Inc, 2006

25 Example 12: Controlling the Viewer Using Automation
BEGIN PROGRAM. import spss, viewer spss.Submit("DESCRIPTIVES ALL") spssapp = viewer.spssapp() try: actualName = spssapp.SaveDesignatedOutput(\ "c:/temp/myoutput.spo") except: print "Save failed. Name:", actualName else: spssapp.ExportDesignatedOutput(\ "c:/temp/myoutput.doc", format="Word") spssapp.CloseDesignatedOutput() END PROGRAM. Uses OLE automation methods. Client only, local mode Run Copyright (c) SPSS Inc, 2006

26 Example 13: A New Procedure Poisson Regression
BEGIN PROGRAM. import spss, spssaux from poisson_regression import * spssaux.OpenDataFile(\ 'SPSSDIR/Tutorial/Sample_Files/autoaccidents.sav') poisson_regression("accident", covariates=["age"], factors=["gender"]) END PROGRAM. Poisson regression module built from SPSS CNLR and transformations commands. PROGRAMS can get case data and use other Python modules or code on it. Run Copyright (c) SPSS Inc, 2006

27 Example 14: Using Case Data
* Mean salary by education level. BEGIN PROGRAM. import spssdata data = spssdata.Spssdata(indexes=('salary', 'educ')) Counts ={}; Salaries={} for case in data: cat = int(case.educ) Counts[cat] = Counts.get(cat, 0) + 1 Salaries[cat] = Salaries.get(cat,0) + case.salary print "educ mean salary\n" for cat in sorted(Counts): print " %2d $%6.0f" % (cat, Salaries[cat]/Counts[cat]) del data END PROGRAM. Run Copyright (c) SPSS Inc, 2006

28 Example 14a: Output As a Pivot Table
BEGIN PROGRAM. # <accumulate Counts and Salaries as in Example 14> desViewer = viewer.spssapp().GetDesignatedOutput() rowcats = []; cells = [] for cat in sorted(Counts): rowcats.append(int(cat)) cells.append(Salaries[cat]/Counts[cat]) ptable = viewer.PivotTable("a Python table", tabletitle="Effect of Education on Salary", caption="Data from employee data.sav", rowdim="Years of Education", rowlabels=rowcats, collabels=["Mean Salary"], cells = cells, tablelook="c:/data/goodlook.tlo") ptable.insert(desViewer) END PROGRAM. Run Copyright (c) SPSS Inc, 2006

29 Exploring OMS Dataset Output
get file='c:/spss14/cars.sav'. DATASET NAME maindata. DATASET DECLARE regcoef. DATASET DECLARE regfit. OMS /IF SUBTYPE=["coefficients"] /DESTINATION FORMAT = sav OUTFILE=regcoef. OMS /IF SUBTYPE=["Model Summary"] /DESTINATION FORMAT = sav OUTFILE=regfit. REGRESSION /DEPENDENT accel /METHOD=ENTER weight horse year. OMSEND. Use OMS directly to figure out what to retrieve programmatically Copyright (c) SPSS Inc, 2006

30 Example 10a: Use Bits of Output - Datasets
BEGIN PROGRAM. import spss, spssaux, spssdata try: coefhandle, rsqhandle, failcode = spssaux.CreateDatasetOutput(\ "REGRESSION /DEPENDENT accel /METHOD=ENTER weight horse year.", subtype=["coefficients", "Model Summary"]) cursor = spssdata.Spssdata(indexes=["Var2", "B"], dataset=coefhandle) for case in cursor: if case.Var2.startswith("Horsepower"): print "The effect of horsepower on acceleration is: ", case.B cursor.close() Example assumes that SPSS Output Labels are set to "Labels", not "Names". Copyright (c) SPSS Inc, 2006

31 Example 10a: Use Bits of Output – Datasets (continued)
cursor =spssdata.Spssdata(indexes=["RSquare"], dataset=rsqhandle) row = cursor.fetchone() print "The R Squared is: ", row.RSquare cursor.close() except: print "*** Regression command failed. No results available." raise spssdata.Dataset("maindata").activate() spssdata.Dataset(coefhandle).close() spssdata.Dataset(rsqhandle).close() END PROGRAM. Run Copyright (c) SPSS Inc, 2006

32 What We Saw Variable Dictionary access
Procedures selected based on variable properties Actions based on environment Automatic construction of transformations Error handling Variables that remember their formulas Management of the SPSS Viewer New statistical procedure Access to case data We used programmability to easily solve problems that were difficult to handle in earlier versions. I asked at the beginning about challenges that have been hard to solve with SPSS. I hope that you have seen the glimmer of some solutions with SPSS 14. Now that you are excited, how do you get started? Copyright (c) SPSS Inc, 2006

33 Externally Controlling SPSS
SPSS Processor (backend) can be embedded and controlled by Python or other processes Build applications using SPSS functionality invisibly Application supplies user interface No SPSS Viewer Allows use of Python IDE to build programs Pythonwin or many others Copyright (c) SPSS Inc, 2006

34 PythonWin IDE Controlling SPSS
The PythonWin I D E is available from There are many other choices for a Python I D E. Copyright (c) SPSS Inc, 2006

35 What Are the Programmability Benefits?
Extend SPSS functionality Write more general and flexible jobs Handle errors React to results and metadata Implement new features Write simpler, clearer, more efficient code Greater productivity Automate repetitive tasks Build SPSS functionality into other applications Copyright (c) SPSS Inc, 2006

36 Python and Plug-In On the CD in SPSS 15
Getting Started SPSS 14 ( for data access and IDE) Python (visit Python.org) Installation Tutorial Many other resources SPSS® Programming and Data Management, 3rd Edition: A Guide for SPSS® and SAS® Users new SPSS Developer Central Python Plug-In ( version covers ) Example modules Dive Into Python (diveintopython.org) book or PDF Practical Python by Magnus Lie Hetland Python Cookbook, 2nd ed by Martelli, Ravenscroft, & Ascher Python and Plug-In On the CD in SPSS 15 Copyright (c) SPSS Inc, 2006

37 Recap Five power features of SPSS 14
Examples of programmability using Python How to get started: materials and resources Copyright (c) SPSS Inc, 2006

38 Questions ? ? ? ? Copyright (c) SPSS Inc, 2006

39 In Closing Working together these new features give you a dramatically more powerful SPSS. SPSS becomes a platform that enables you to build your own statistical applications. Programmability Multiple datasets XML Workspace and OMS enhancements Attributes External driver application Copyright (c) SPSS Inc, 2006

40 Contact Jon Peck can now be reached at: peck@us.ibm.com
Copyright (c) SPSS Inc, 2006


Download ppt "Jon K. Peck Technical Advisor SPSS Inc. May, 2006"

Similar presentations


Ads by Google