Presentation is loading. Please wait.

Presentation is loading. Please wait.

Programmability in SPSS 14, SPSS 15 and SPSS 16

Similar presentations

Presentation on theme: "Programmability in SPSS 14, SPSS 15 and SPSS 16"— Presentation transcript:

1 Programmability in SPSS 14, SPSS 15 and SPSS 16
The Revolution Continues Jon Peck Technical Advisor SPSS Copyright (c) SPSS Inc, 2007

2 Agenda Recap of SPSS 14 Python programmability Developer Central
New features in SPSS 15 programmability Writing first-class procedures Updating the data New features in SPSS 16 programmability Interacting with the user Q & A Conclusion Copyright (c) SPSS Inc, 2007

3 Quotations from SPSS Users
"Because of programmability, SPSS 14 is the most important release since I started using SPSS fifteen years ago." "I think I am going to like using Python." "Python and SPSS 14 and later are, IMHO, GREAT!" "By the way, Python is a great addition to SPSS." From InfoWorld (April 19, 2007) "Of all the tools fueling the dynamic-language trend in the enterprise, general-purpose dynamic languages such as Python and Ruby present the greatest upside for enhancing developer productivity." Copyright (c) SPSS Inc, 2007

4 The Combination of SPSS and Python
SPSS provides a powerful engine for statistical and graphical methods and for data management. Python® provides a powerful, elegant, and easy-to-learn language for controlling and responding to this engine. Together they provide a comprehensive system for serious applications of analytical methods to data. Copyright (c) SPSS Inc, 2007

5 Programmability Features in SPSS 14, 15, and 16
SPSS 14.0 provided Programmability Multiple datasets Variable and File Attributes Programmability read-access to case data Ability to control SPSS from a Python program SPSS 15 adds Read and write case data Create new variables directly rather than generating syntax Create pivot tables and text blocks via backend API's Easier setup SPSS 16 will add EXTENSION command for user procedures with SPSS syntax Dataset features for complex data management Ability to use R procedures within SPSS through R Plug-In Copyright (c) SPSS Inc, 2007

6 Programmability Advantages
Makes possible easy jobs that respond to datasets, output, environment Allows greater generality, more automation Makes jobs more robust Allows extending the capabilities of SPSS Enables better organized and more maintainable code Facilitates staff specialization Increases productivity More fun Copyright (c) SPSS Inc, 2007

7 Programmability Overview
Python extends SPSS via General programming language Access to variable dictionary, case data, and output Access to standard and third-party modules SPSS Developer Central modules Module structure for building libraries of code Runs in "back-end" syntax context (like macro) SaxBasic scripting runs in "front-end" context Two modes Traditional SPSS syntax window Drive SPSS from Python (external mode) Optional install (licensed with SPSS Base) Other new SPSS 14 features enhance programmability: multiple concurrent datasets variable and file attributes XML workspace and OMS enhancements Copyright (c) SPSS Inc, 2007

8 Legal Notice SPSS is not the owner or licensor of the Python software. Any user of Python must agree to the terms of the Python license agreement located on the Python web site.  SPSS is not making any statement about the quality of the Python program. SPSS fully disclaims all liability associated with your use of the Python program. Copyright (c) SPSS Inc, 2007

9 The SPSS Programmability Software Development Kit
Supports implementing various programming languages Requires a programmer to implement a new language VB.NET Plug-In available on Developer Central Works only in external mode Copyright (c) SPSS Inc, 2007

10 How Programmability Works
Python interpreter embedded within SPSS SPSS runs in traditional way until BEGIN PROGRAM command is found Python collects commands until END PROGRAM command is found; then runs the program Python can communicate with SPSS through API's (calls to functions) Includes running SPSS syntax inside Python program Includes creating macro values for later use in syntax Python can access SPSS output and data OMS is a key tool Copyright (c) SPSS Inc, 2007

11 Example: Summarize Categorical Variables
BEGIN PROGRAM. import spss, spssaux spssaux.GetSPSSInstallDir("SPSSDIR") spssaux.OpenDataFile("SPSSDIR/employee data.sav") # find categorical variables catVars = spssaux.VariableDict(variableLevel=['nominal', 'ordinal']) if catVars: spss.Submit("FREQ " + " ".join(catVars.variables)) # create a macro listing categorical variables spss.SetMacroValue("!catVars", " ".join(catVars.variables)) END PROGRAM. DESC !catVars. Copyright (c) SPSS Inc, 2007 Run

12 Programmability Inside or Outside SPSS
Two modes of operation SPSS Drives mode (inside): traditional syntax context BEGIN PROGRAM …program… END PROGRAM Program in 14, 15, or 16 is in Python or, new in 16, in R X Drives mode (outside): eXternal program drives SPSS Python interpreter (or VB.NET) No SPSS Viewer, Data Editor, or SPSS user interface Output sent as text to the application – can be suppressed Has performance advantages Build programs with an IDE Even if to be run in traditional mode Copyright (c) SPSS Inc, 2007

13 PythonWin IDE Controlling SPSS (eXternal Mode)
The PythonWin I D E is available from There are many other choices for a Python I D E. Copyright (c) SPSS Inc, 2007

14 Python Resources Be productive quickly
Get more return as you learn more Python Tutorial Cheeseshop over 2200 packages as of April 11, 2007 SPSS Developer Central SPSS Programming and Data Management, 4th ed, 2006. Copyright (c) SPSS Inc, 2007

15 Python Books Dive Into Python book or PDF
Practical Python by Magnus Lie Hetland Extensive examples and discussion of Python Python Cookbook, 2nd ed by Martelli, Ravenscroft, & Ascher Python in a Nutshell, 2nd ed by Martelli, O'Reilly Very clear, comprehensive reference material wxPython in Action by Rappin and Dunn Explains user interface building with wxPython Copyright (c) SPSS Inc, 2007

16 Cheeseshop: scipy scipy 0.5.2 Scientific Algorithms Library for Python scipy is an open source library of scientific tools for Python. scipy gathers a variety of high level science and engineering modules together as a single package. scipy provides modules for statistics, optimization, integration, linear algebra, Fourier transforms, signal and image processing, genetic algorithms, ODE solvers, special functions, and more. scipy requires and supplements NumPy, which provides a multidimensional array object and other basic functionality. Python is becoming a major language for scientific computing Copyright (c) SPSS Inc, 2007

17 SPSS Developer Central
SPSS Developer Central is the web home for developing SPSS applications Python, .NET, R Integration Plug-Ins Supplementary modules by SPSS and others Articles on programmability and graphics Forums for asking questions and exchanging information Programmability Extension SDK Get Python itself from or CD SPSS 14, 15 use 2.4. (2.4.3) SPSS 16 will use 2.5 Not limited to programmability GPL graphics User-contributed code Key Supplementary Modules spssaux spssdata New for SPSS 15 trans extendedTransforms rake pls enhanced Copyright (c) SPSS Inc, 2007

18 Example: Manipulating Output: Merging Tables module on Developer Central can merge two tables into one. E.g., Ctables significance tests into main tables Merge or replace cells with cells from a different table Flexibly define the join can also censor cells, e.g., blank statistics based on small counts. Merge example: data on importance of education qualifications for immigration by region of Europe CTABLES /TABLE qfimeduBin BY Region /TITLES TITLE='Qualifications for Immigration' /COMPARETEST TYPE=PROP Copyright (c) SPSS Inc, 2007

19 Ctables Output Graphic shows the requested custom table along with the associated table of comparisons of column proportions. Each table has the same set of row and column labels, so the tables can be easily merged. Copyright (c) SPSS Inc, 2007

20 Program to Merge Runs Ctables and merges test table into main table
BEGIN PROGRAM. import spss, tables cmd=r"""CTABLES /TABLE qfimeduBin BY Region /TITLES TITLE='Qualifications for Immigration' /COMPARETEST TYPE=PROP""" tables.mergeLatest(cmd, autofit=False) END PROGRAM. Runs Ctables and merges test table into main table Using default merge behavior "If it really is this simple this will generate a lot of excitement for us." "This is really fantastic." Copyright (c) SPSS Inc, 2007

21 Merged Output The graphic shows that the two tables from the original C tables output have been merged into one table. Each cell of the merged table contains the cell contents from the original custom table as well as the cell contents from the table of comparison of column proportions. Copyright (c) SPSS Inc, 2007

22 Approaches to Creating New Procedures
You can extend SPSS capabilities by building new procedures Or use ones that others have built Combine SPSS procedures and transformations with Python logic Poisson regression (SPSS 14) example using iterated CNLR New raking procedure built over GENLOG GENLIN in SPSS 15 Calculate data aggregates in SPSS and pass to algorithm coded in Python Raking procedure starts with AGGREGATE; uses GENLOG Acquire case data and compute in Python Use Python standard modules and third-party additions Partial Least Squares Regression (pls module) Copyright (c) SPSS Inc, 2007

23 Adapt Existing Code Libraries
Common to adapt existing libraries or code for use as Python extension modules C, C++, VB, Fortran,... Python tools and API's to assist Chap 25 in Python in a Nutshell Tutorial on extending and embedding the Python interpreter Call R programs with SPSS 16 Copyright (c) SPSS Inc, 2007

24 Partial Least Squares Regression
Regression with large number of predictors (even k > N) Similar to Principal Components but considers dependent variable simultaneously Calculates principal components of (y, X) then use regression on the scores instead of original data Equivalent to ordinary regression when number of factors equals number of predictors and one y variable For more information see An Optimization Perspective on Kernel Partial Least Squares Regression.pdf. Copyright (c) SPSS Inc, 2007

25 The pls Module for SPSS 15 Strategy Writes pivot tables to SPSS Viewer
Fetches data from SPSS Uses scipy matrix operations to compute results Third-party module from Cheeseshop Writes pivot tables to SPSS Viewer Subject to OMS SPSS 14 viewer module created pivot table using OLE automation SPSS 15 has direct pivot table API's Saves predicted values to active dataset Copyright (c) SPSS Inc, 2007

26 pls Example: REGRESSION vs PLS
GET FILE="c:/spss15/tutorial/sample_files/car_sales.sav". REGRESSION /STATISTICS COEFF R /DEPENDENT sales /METHOD=ENTER curb_wgt engine_s fuel_cap horsepow length mpg price resale type wheelbas width . begin program. import spss, pls pls.plsproc("sales", """curb_wgt engine_s fuel_cap horsepow length mpg price resale type wheelbas width""", yhat="predsales") end program. Copyright (c) SPSS Inc, 2007 plsproc defaults to five factors

27 Results PLS with 5 factors almost equals regression with 11 variables
Copyright (c) SPSS Inc, 2007

28 SPSS 16 User Procedures User procedures can be written in Python but specified using SPSS traditional syntax User never writes or sees Python code Used as if a built-in SPSS command EXTENSION command defines command to SPSS via simple XML file Python module called with syntax already checked and processed by SPSS More general PLS module PLS y1 y2 y3 BY fac1 fac2 WITH z1 z2 z3 /CRITERIA LATENTFACTORS=2. Dialog box interface tools in SPSS 17 In the meantime, use wxPython or something similar Copyright (c) SPSS Inc, 2007

29 Raking Sample Weights "Raking" adjusts sample weights to control totals in n dimensions Example: data classified by age and sex with known population totals or proportions Calculated by fitting a main effects loglinear model Various adjustments required Not a complete solution to reweighting Not directly available in SPSS Copyright (c) SPSS Inc, 2007

30 Raking Module Strategy: combine SPSS procedures with Python logic (from SPSS Developer Central) Aggregates data via AGGREGATE to new dataset Creates new variable with control totals Applies GENLOG, saving predicted counts Adjusts predicted counts Matches back into original dataset Does not use MATCH FILES or require a SORT command Written in one (long) day Copyright (c) SPSS Inc, 2007 rake.rake("age sex", [{0: 1140, 1:1140}, {0: 104.6, 1:2175.4}], finalweight="finalwt")

31 Extending SPSS Transformations
SPSS 14 programmability can wrap SPSS syntax in Python logic, e.g., generate COMPUTE commands on the fly Useful when definitions can be expressed in SPSS syntax SPSS 15 programmability can Generate new variables directly Add new cases directly Create new datasets from scratch SPSS 16 has additional dataset capabilities Copyright (c) SPSS Inc, 2007

32 trans and extendedTransforms Modules
trans module facilitates plugging in Python code to iterate over cases Runs as an SPSS procedure Passes the data Adds variables to the SPSS variable dictionary Can apply any calculation casewise Use with Standard Python functions (e.g., math module) Any user-written functions or appropriate classes Functions in extendedTransforms module Copyright (c) SPSS Inc, 2007

33 trans and extendedTransforms Modules
trans strategy Pass case data through Python code writing result back to SPSS in new variables extendedTransforms collection of 12 functions to apply to SPSS variables, including Regular expression search/replace soundex and nysiis functions for phonetic equivalence Date/time conversions based on patterns Copyright (c) SPSS Inc, 2007

34 Python Regular Expressions
Pattern matching in text strings If you use SPSS index or replace, you need these Standardize string data (Mr, Mr., Herr, Senor,...) Extract data from loosely structured text "simvastatin-- PO 80mg TAB" -> "simvastatin", "80" Patterns can be simple strings (as with SPSS index) or complex patterns Pick out variable names with common parts Can greatly simplify code Copyright (c) SPSS Inc, 2007

35 Write to Me! Jon Peck can now be reached at
Copyright (c) SPSS Inc, 2007

Download ppt "Programmability in SPSS 14, SPSS 15 and SPSS 16"

Similar presentations

Ads by Google