Presentation is loading. Please wait.

Presentation is loading. Please wait.

Programmability in SPSS 14: A Radical Increase in Power A Platform for Statistical Applications Jon K. Peck Technical Advisor SPSS Inc. May,

Similar presentations

Presentation on theme: "Programmability in SPSS 14: A Radical Increase in Power A Platform for Statistical Applications Jon K. Peck Technical Advisor SPSS Inc. May,"— Presentation transcript:

1 Programmability in SPSS 14: A Radical Increase in Power A Platform for Statistical Applications Jon K. Peck Technical Advisor SPSS Inc. May, 2006 Copyright (c) SPSS Inc, 2006

2 1. External Programming Language (BEGIN PROGRAM) 2. Multiple Datasets 3. XML Workspace and OMS Enhancements 4. Dataset and Variable Attributes 5. Drive SPSS Processor Externally Working together, they dramatically increase the power of SPSS. SPSS becomes a platform that enables you to build statistical/data manipulation applications. GPL provides new programming power for graphics. The Five Big Things Copyright (c) SPSS Inc, 2006

3 Many datasets open at once One is active at a time (set by syntax or UI) DATASET ACTIVATE command Each dataset has a Data Editor window Copy, paste, and merge between windows Write tabular results to a dataset using Output Management System Retrieve via Programmability No longer necessary to organize jobs linearly Multiple Datasets Copyright (c) SPSS Inc, 2006

4 XML Workspace Store dictionary and selected results in workspace Write results to workspace as XML with Output Management System (OMS) Retrieve selected contents from workspace via external programming language Persists for entire session Copyright (c) SPSS Inc, 2006

5 OMS Output: XML or Dataset Write tabular results to Datasets with OMS Main dataset remains active Prior to SPSS 14, write to SAV file, close active, and open to use results Tables can be accessed via workspace or as datasets XML workspace and XPath accessors are very general Accessed via programmability functions Dataset output more familiar to SPSS users Accessed via programmability functions or traditional SPSS syntax Use with DATASET ACTIVATE command Copyright (c) SPSS Inc, 2006

6 Attributes Extended metadata for files and variables VARIABLE ATTRIBUTE, DATAFILE ATTRIBUTE Keep facts and notes about data permanently with the data. E.g., validation rules, source, usage, question text, formula Two kinds: User defined and SPSS defined Saved with the data in the SAV file Can be used in program logic Copyright (c) SPSS Inc, 2006

7 Programmability Integrates external programming language into SPSS syntax BEGIN PROGRAM … END PROGRAM set of functions to communicate with SPSS SPSS has integrated the Python language SDK enabling other languages available New: VB.NET available soon External processes can drive SPSS Processor VB.NET works only in this mode SPSS Developer Central has SDK, Python Integration Plug-In, and many extension modules SPSS Developer Central Available for all SPSS 14 platforms Copyright (c) SPSS Inc, 2006

8 The Python Language Free, portable, elegant, object oriented, versatile, widely supported, easy to learn,… Download from Version or later required Python tutorial Python user discussion list The Cheeseshop: Third-party modules Copyright (c) SPSS Inc, 2006

9 Legal Notice SPSS is not the owner or licensor of the Python software. Any user of Python must agree to the terms of the Python license agreement located on the Python web site. SPSS is not making any statement about the quality of the Python program. SPSS fully disclaims all liability associated with your use of the Python program. Copyright (c) SPSS Inc, 2006

10 Programmability Enables… Generalized jobs by controlling logic based on Variable Dictionary Procedure output (XML or datasets) Case data (requires SPSS ) Environment Enhanced data management Manipulation of output Computations not built in to SPSS Use of intelligent Python IDE driving SPSS (14.0.1) statement completion, syntax checking, and debugging External Control of SPSS Processor Copyright (c) SPSS Inc, 2006

11 Programmability Makes Obsolete… SPSS Macro except as a shorthand for lists or constants Learning Python is much easier than learning Macro SaxBasic except for autoscripts but autoscripts become less important These have not gone away. The SPSS transformation language continues to be important. Copyright (c) SPSS Inc, 2006

12 Demonstration Code and supporting modules can be downloaded from SPSS Developer CentralSPSS Developer Central examples are on the CD Copyright (c) SPSS Inc, 2006

13 Initialization for Examples * SPSS Directions, May * In preparation for the examples, specify where SPSS standard data files reside. BEGIN PROGRAM. import spss, spssaux spssaux.GetSPSSInstallDir("SPSSDIR") END PROGRAM. This program creates a File Handle pointing to the SPSS installation directory, where the sample files are installed Copyright (c) SPSS Inc, 2006

14 * EXAMPLE 0: My first program. BEGIN PROGRAM. import spss print "Hello, world!" END PROGRAM. Inside BEGIN PROGRAM, you write Python code. import spss connects program to SPSS. Import needed once per session. Output goes to Viewer log items. Executed when END PROGRAM reached. Run Copyright (c) SPSS Inc, 2006 Example 0: Hello, world

15 *Run an SPSS command from a program; create file handle. BEGIN PROGRAM. import spss, spssaux spss.Submit("SHOW ALL.") spssaux.GetSPSSInstallDir("SPSSDIR") END PROGRAM. Submit, in module spss is called to run one or more SPSS commands within BEGIN PROGRAM. One of many functions (API's) that interacts with SPSS. GetSPSSInstallDir, in the spssaux module, creates a FILE HANDLE to that directory Run Copyright (c) SPSS Inc, 2006 Example 1: Run SPSS Command

16 * Print useful information in the Viewer and then get help on an API. BEGIN PROGRAM. spss.Submit("GET FILE='SPSSDIR/employee data.sav'.") varcount = spss.GetVariableCount() casecount = spss.GetCaseCount() print "The number of variables is " + str(varcount) + " and the number of cases is " + str(casecount) print help(spss.GetVariableCount) END PROGRAM. There are API's in the spss module to get variable dictionary information. help function prints short API documentation in Viewer. RunRunRunRun Copyright (c) SPSS Inc, 2006 Example 2: Some API's

17 Example 3a: Data-Directed Analysis * Summarize variables according to measurement level. BEGIN PROGRAM. import spss, spssaux spssaux.OpenDataFile("SPSSDIR/employee data.sav") # make variable dictionaries by measurement level catVars = spssaux.VariableDict(variableLevel=['nominal', 'ordinal']) scaleVars = spssaux.VariableDict(variableLevel=['scale']) print "Categorical Variables\n" for var in catVars: print var, var.VariableName, "\t", "var.VariableLabel" Continued Copyright (c) SPSS Inc, 2006

18 # summarize variables based on measurement level if catVars: spss.Submit("FREQ " + " ".join(catVars.variables)) if scaleVars: spss.Submit("DESC "+" ".join(scaleVars.variables)) # create a macro listing scale variables spss.SetMacroValue("!scaleVars", " ".join(scaleVars.variables)) END PROGRAM. DESC !scaleVars. " ".join(['x', 'y', 'z']) produces 'x y z' Run Copyright (c) SPSS Inc, 2006 Example 3a (continued)

19 * Handle an error. Use another standard Python module. BEGIN PROGRAM. import sys try: spss.Submit("foo.") except: print "That command did not work! ", sys.exc_info()[0] END PROGRAM. Errors generate exceptions Makes it easy to check whether a long syntax job worked Hundreds of standard modules and many others available from SPSS and third parties Run Copyright (c) SPSS Inc, 2006 Example 5: Handling Errors

20 * Create set of dummy variables for a categorical variable and a macro name for them. BEGIN PROGRAM. import spss, spssaux, spssaux2 mydict = spssaux.VariableDict() spssaux2.CreateBasisVariables(mydict.["educ"], "EducDummy", macroname = "!EducBasis") spss.Submit("REGRESSION /STATISTICS=COEF /DEP=salary " + " /ENTER=jobtime prevexp !EducBasis.") END PROGRAM. Discovers educ values from the data and generates appropriate transformation commands. Creates macro !EducBasis Run Run Copyright (c) SPSS Inc, 2006 Example 8: Create Basis Variables

21 * Automatically add cases from all SAV files in a directory. BEGIN PROGRAM. import glob savlist = glob.glob("c:/temp/parts/*.sav") if savlist: cmd = ["ADD FILES "] + ["/FILE='" + fn + "'" for fn in savlist] + [".", "EXECUTE."] spss.Submit(cmd) print "Files merged:\n", "\n".join(savlist) else: print "No files found to merge" END PROGRAM. The glob module resolves file-system wildcards If savlist tests whether there are any matching files. Run Copyright (c) SPSS Inc, 2006 Example 9: Merge Directory Contents

22 * Run regression; get selected statistics, but do not display the regular Regression output. Use OMS and Xpath wrapper functions. BEGIN PROGRAM. import spss, spssaux spssaux.OpenDataFile("SPSSDIR/CARS.SAV") try: handle, failcode = spssaux.CreateXMLOutput(\ "REGRESSION /DEPENDENT accel /METHOD=ENTER weight horse year.", visible=False) horseCoef = spssaux.GetValuesFromXMLWorkspace(\ handle, "Coefficients", rowCategory="Horsepower", colCategory="B",cellAttrib="number") print "The effect of horsepower on acceleration is: ", horseCoef Rsq = spssaux.GetValuesFromXMLWorkspace(\ handle, "Model Summary", colCategory="R Square", cellAttrib="text") print "The R square is: ", Rsq spss.DeleteXPathHandle(handle) except: print "*** Regression command failed. No results available." raise END PROGRAM. Run Copyright (c) SPSS Inc, 2006 Example 10: Use Parts of Output - XML

23 BEGIN PROGRAM. import spss, Transform spssaux.OpenDataFile('SPSSDIR/employee data.sav') newvar = Transform.Compute(varname="average_increase", varlabel="Salary increase per month of experience if at least a year",\ varmeaslvl="Scale",\ varmissval=[999,998,997],\ varformat="F8.4") newvar.expression = "(salary-salbegin)/jobtime" newvar.condition = "jobtime > 12" newvar.retransformable=True newvar.generate()# Get exception if compute fails Transform.timestamp("average_increase") spss.Submit("DISPLAY DICT /VAR=average_increase.") spss.Submit("DESC average_increase.") END PROGRAM. Run Copyright (c) SPSS Inc, 2006 Example 11: Transformations in Python Syntax

24 BEGIN PROGRAM. import spss, Transform try: Transform.retransform("average_increase") Transform.timestamp("average_increase") except: print " Could not update average_increase. " else: spss.Submit( " display dictionary " +\ " /variable=average_increase.") END PROGRAM. Transformation saved using Attributes Run Copyright (c) SPSS Inc, 2006 Example 11A: Repeat Transform

25 BEGIN PROGRAM. import spss, viewer spss.Submit("DESCRIPTIVES ALL") spssapp = viewer.spssapp() try: actualName = spssapp.SaveDesignatedOutput(\ "c:/temp/myoutput.spo") except: print "Save failed. Name:", actualName else: spssapp.ExportDesignatedOutput(\ "c:/temp/myoutput.doc", format="Word") spssapp.CloseDesignatedOutput() END PROGRAM. Run Copyright (c) SPSS Inc, 2006 Example 12: Controlling the Viewer Using Automation

26 BEGIN PROGRAM. import spss, spssaux from poisson_regression import * spssaux.OpenDataFile(\ 'SPSSDIR/Tutorial/Sample_Files/autoaccidents.sav') poisson_regression("accident", covariates=["age"], factors=["gender"]) END PROGRAM. Poisson regression module built from SPSS CNLR and transformations commands. PROGRAMS can get case data and use other Python modules or code on it. Run Copyright (c) SPSS Inc, 2006 Example 13: A New Procedure Poisson Regression

27 * Mean salary by education level. BEGIN PROGRAM. import spssdata data = spssdata.Spssdata(indexes=('salary', 'educ')) Counts ={}; Salaries={} for case in data: cat = int(case.educ) Counts[cat] = Counts.get(cat, 0) + 1 Salaries[cat] = Salaries.get(cat,0) + case.salary print "educ mean salary\n" for cat in sorted(Counts): print " %2d $%6.0f" % (cat, Salaries[cat]/Counts[cat]) del data END PROGRAM. Run Copyright (c) SPSS Inc, 2006 Example 14: Using Case Data

28 BEGIN PROGRAM. # desViewer = viewer.spssapp().GetDesignatedOutput() rowcats = []; cells = [] for cat in sorted(Counts): rowcats.append(int(cat)) cells.append(Salaries[cat]/Counts[cat]) ptable = viewer.PivotTable("a Python table", tabletitle="Effect of Education on Salary", caption="Data from employee data.sav", rowdim="Years of Education", rowlabels=rowcats, collabels=["Mean Salary"], cells = cells, tablelook="c:/data/goodlook.tlo") ptable.insert(desViewer) END PROGRAM. Run Copyright (c) SPSS Inc, 2006 Example 14a: Output As a Pivot Table

29 get file='c:/spss14/cars.sav'. DATASET NAME maindata. DATASET DECLARE regcoef. DATASET DECLARE regfit. OMS /IF SUBTYPE=["coefficients"] /DESTINATION FORMAT = sav OUTFILE=regcoef. OMS /IF SUBTYPE=["Model Summary"] /DESTINATION FORMAT = sav OUTFILE=regfit. REGRESSION /DEPENDENT accel /METHOD=ENTER weight horse year. OMSEND. Use OMS directly to figure out what to retrieve programmatically Copyright (c) SPSS Inc, 2006 Exploring OMS Dataset Output

30 BEGIN PROGRAM. import spss, spssaux, spssdata try: coefhandle, rsqhandle, failcode = spssaux.CreateDatasetOutput(\ "REGRESSION /DEPENDENT accel /METHOD=ENTER weight horse year.", subtype=["coefficients", "Model Summary"]) cursor = spssdata.Spssdata(indexes=["Var2", "B"], dataset=coefhandle) for case in cursor: if case.Var2.startswith("Horsepower"): print "The effect of horsepower on acceleration is: ", case.B cursor.close() Copyright (c) SPSS Inc, 2006 Example 10a: Use Bits of Output - Datasets

31 cursor =spssdata.Spssdata(indexes=["RSquare"], dataset=rsqhandle) row = cursor.fetchone() print "The R Squared is: ", row.RSquare cursor.close() except: print "*** Regression command failed. No results available." raise spssdata.Dataset("maindata").activate() spssdata.Dataset(coefhandle).close() spssdata.Dataset(rsqhandle).close() END PROGRAM. Run Copyright (c) SPSS Inc, 2006 Example 10a: Use Bits of Output – Datasets (continued)

32 Variable Dictionary access Procedures selected based on variable properties Actions based on environment Automatic construction of transformations Error handling Variables that remember their formulas Management of the SPSS Viewer New statistical procedure Access to case data Copyright (c) SPSS Inc, 2006 What We Saw

33 SPSS Processor (backend) can be embedded and controlled by Python or other processes Build applications using SPSS functionality invisibly Application supplies user interface No SPSS Viewer Allows use of Python IDE to build programs Pythonwin or many others Copyright (c) SPSS Inc, 2006 Externally Controlling SPSS

34 Copyright (c) SPSS Inc, 2006 PythonWin IDE Controlling SPSS

35 Extend SPSS functionality Write more general and flexible jobs Handle errors React to results and metadata Implement new features Write simpler, clearer, more efficient code Greater productivity Automate repetitive tasks Build SPSS functionality into other applications Copyright (c) SPSS Inc, 2006 What Are the Programmability Benefits?

36 SPSS 14 ( for data access and IDE) Python (visit Installation Tutorial Many other resources SPSS® Programming and Data Management, 3rd Edition: A Guide for SPSS® and SAS® Users new SPSS® Programming and Data Management SPSS Developer Central Python Plug-In ( version covers ) Example modules Dive Into Python ( book or PDF Practical Python by Magnus Lie Hetland Python Cookbook, 2 nd ed by Martelli, Ravenscroft, & Ascher Python and Plug-In On the CD in SPSS 15 Copyright (c) SPSS Inc, 2006 Getting Started

37 Five power features of SPSS 14 Examples of programmability using Python How to get started: materials and resources Copyright (c) SPSS Inc, 2006 Recap

38 ? ? ? ? Questions

39 Working together these new features give you a dramatically more powerful SPSS. SPSS becomes a platform that enables you to build your own statistical applications. 1. Programmability 2. Multiple datasets 3. XML Workspace and OMS enhancements 4. Attributes 5. External driver application Copyright (c) SPSS Inc, 2006 In Closing

40 Jon Peck can now be reached at: Copyright (c) SPSS Inc, 2006 Contact

Download ppt "Programmability in SPSS 14: A Radical Increase in Power A Platform for Statistical Applications Jon K. Peck Technical Advisor SPSS Inc. May,"

Similar presentations

Ads by Google