PROC UNIVARIATE vs. PROC SUMMARY A Comparison of Performance.

Slides:



Advertisements
Similar presentations
Rugby &Rosie Commands Manners Proud TrainersGraduation.
Advertisements

FT in diagnostic of HBV Prognostic Value of FibroTest in HCV Ngo et al, ClinChem 2006 A Prospective Analysis of the prognostic value of biomarkers (FirboTest)
Comparing Two Proportions
1 Lecture 2: Metrics to Evaluate Performance Topics: Benchmark suites, Performance equation, Summarizing performance with AM, GM, HM Video 1: Using AM.
An Exercise in Improving SAS Performance on Mainframe Processors
On Comparing Classifiers : Pitfalls to Avoid and Recommended Approach
One-sample T-Test Matched Pairs T-Test Two-sample T-Test
Confidence Intervals Objectives: Students should know how to calculate a standard error, given a sample mean, standard deviation, and sample size Students.
Bison Management Suppose you take over the management of a certain Bison population. The population dynamics are similar to those of the population we.
National Student Survey in the School of Earth and Ocean Sciences.
Statistical Techniques I EXST7005 Start here Measures of Dispersion.
CS1104: Computer Organisation School of Computing National University of Singapore.
Discriminant Analysis Database Marketing Instructor:Nanda Kumar.
VSphere vs. Hyper-V Metron Performance Showdown. Objectives Architecture Available metrics Challenges in virtual environments Test environment and methods.
One sentence, one sum. By Mark.
Descriptive Statistics In SAS Exploring Your Data.
Hash vs Join A case study evaluating the use of the data step hash object to replace a SQL join Geoff Ness Sep 2014.
Lecture The Client/Server Database Environment
The Client/Server Database Environment
STAT 3130 Statistical Methods II Missing Data and Imputation.
Chocolate Bar! luqili. Milestone 3 Speed 11% of final mark 7%: path quality and speed –Some cleverness required for full marks –Implement some A* techniques.
To Compress or not to Compress? Chuck Hopf. What is your precious? Gollum says every data center has something that is precious or hard to come by –CPU.
One-Factor Experiments Andy Wang CIS 5930 Computer Systems Performance Analysis.
Gary MarsdenSlide 1University of Cape Town Computer Architecture – Introduction Andrew Hutchinson & Gary Marsden (me) ( ) 2005.
Classroom Assessment A Practical Guide for Educators by Craig A
Do Now Wednesday, August 27, 2014 Do Now Wednesday, August 27, 2014 What are three things you recall about the scientific method? Write your answer using.
23-1 Analysis of Covariance (Chapter 16) A procedure for comparing treatment means that incorporates information on a quantitative explanatory variable,
Welcome to My Reading Recovery Lesson Fluent writing practice Familiar rereads Take a running record Make and break Write a story Cut up sentence Introduce.
Managing Monthly License Charges Connecticut CMG Andrew Jepeal April 2015.
SAS 介绍和举例 Presented by 经济实验教学中心 商务数据挖掘中心. Raw Data Read in Data Process Data (Create new variables) Output Data (Create SAS Dataset) Analyze Data Using.
A Brief Introduction to PROC TRANSPOSE prepared by Voytek Grus for
The Practice of Statistics Third Edition Chapter 13: Comparing Two Population Parameters Copyright © 2008 by W. H. Freeman & Company Daniel S. Yates.
April 6 Logistic Regression –Estimating probability based on logistic model –Testing differences among multiple groups –Assumptions for model.
Algorithms and Algorithm Analysis The “fun” stuff.
Parallel Processing in SAS CPUCOUNT A comparison of Proc Means for the Project.
CS 350, slide set 5 M. Overstreet Old Dominion University Spring 2005.
CS 3500 L Performance l Code Complete 2 – Chapters 25/26 and Chapter 7 of K&P l Compare today to 44 years ago – The Burroughs B1700 – circa 1974.
Mrs. Chandel Haslett Middle School 7 th Grade Science.
L2 Upgrade review 19th June 2007Alison Lister, UC Davis1 XFT Monitoring + Error Rates Alison Lister Robin Erbacher, Rob Forrest, Andrew Ivanov, Aron Soha.
Announcements First quiz next Monday (Week 3) at 6:15-6:45 Summary:  Recap first lecture: Descriptive statistics – Measures of center and spread  Normal.
CS 2601 Runtime Analysis Big O, Θ, some simple sums. (see section 1.2 for motivation) Notes, examples and code adapted from Data Structures and Other Objects.
The Beauty and Joy of Computing Lecture #6 Algorithms I UC Berkeley EECS Sr Lecturer SOE Dan Garcia.
Vamsi Sundus Shawnalee. “Data collected under different conditions (i.e. treatments)  whether the conditions are different from each other and […] how.
Chapter 3: Organizing Data. Raw data is useless to us unless we can meaningfully organize and summarize it (descriptive statistics). Organization techniques.
Other Types of t-tests Recapitulation Recapitulation 1. Still dealing with random samples. 2. However, they are partitioned into two subsamples. 3. Interest.
How to Organize Findings, Results, Conclusions, Summary Lynn W Zimmerman, PhD.
1 Lecture: Metrics to Evaluate Performance Topics: Benchmark suites, Performance equation, Summarizing performance with AM, GM, HM  Video 1: Using AM.
Customize SAS Output Using ODS Joan Dong. The Output Delivery System (ODS) gives you greater flexibility in generating, storing, and reproducing SAS procedure.
In the news: A recently security study suggests that a computer worm that ran rampant several years ago is still running on many machines, including 50%
Elementary Analysis Richard LeGates URBS 492. Univariate Analysis Distributions –SPSS Command Statistics | Summarize | Frequencies Presents label, total.
SAS Programming Training Instructor:Greg Grandits TA: Textbooks:The Little SAS Book, 5th Edition Applied Statistics and the SAS Programming Language, 5.
Chapter 5 Describing Distributions Numerically Describing a Quantitative Variable using Percentiles Percentile –A given percent of the observations are.
IMPROVEMENT MODEL. THE MODEL FOR IMPROVEMENT There are three fundamental questions that can be used to guide improvement efforts. Then using Plan – Do.
SQL and SAS Ph.d A.S.A SAS DAY – Oct
Multivariate vs Univariate ANOVA: Assumptions. Outline of Today’s Discussion 1.Within Subject ANOVAs in SPSS 2.Within Subject ANOVAs: Sphericity Post.
The Beauty and Joy of Computing Lecture #6 Algorithms I Jon Kotker UC Berkeley EECS 2010, 2013 Microsoft
Sumukh Deshpande n Lecturer College of Applied Medical Sciences
Jacob R. Lorch Microsoft Research
Classroom Assessment A Practical Guide for Educators by Craig A
The Client/Server Database Environment
This Week Review of estimation and hypothesis testing
The Client/Server Database Environment
Religion Journal 3rd Year Exam Religion ©MsKeleghan’sEducationBlog.
Chapter 9: The Client/Server Database Environment
The Scientific Method A Way to Solve a Problem
Defining and Calling a Macro
Motorcycles vs cars.
Chapter 6: Understanding and Assessing Hardware
A New Technique for Destination Choice
Does Education = Money ? ..
Presentation transcript:

PROC UNIVARIATE vs. PROC SUMMARY A Comparison of Performance

Background For many of the common things I do, PROCs UNIVARIATE and SUMMARY can accomplish similar results Many years ago, someone suggested I use PROC UNIVARIATE because it had more functions They claimed that both procedures performed about the same – I didn’t bother to check that out Unless I needed something that could be done only with PROC SUMMARY, I got in the habit of using PROC UNIVARIATE

More Background Several months ago, I was becoming frustrated with how long it was taking to run some large PROC UNIVARIATEs for simple functions (like SUM, MEAN, MIN, MAX, etc.) – It also was using a lot of CPU There had to be a better way

My First Experiment Wrote DATA steps to do simple functions Benchmarked the DATA steps again PROC UNIVARIATE steps Compared output results to ensure integrity Ran tests using SAS on both Mainframe and PC The results were surprising

Results of First Test Data step showed: – 95% reduction in elapsed time – 99% reduction in CPU time Decided to also run tests comparing PROC SUMMARY

Results of First Test Compared to PROC UNIVARIATE, PROC SUMMARY showed: – 94% reduction in elapsed time – 96% reduction in CPU time

Overall Test Results Ran many tests on several types of data Data Step vs. PROC UNIVARIATE – Elapsed time was 71% to 95% lower – CPU was 74% - 99% lower PROC SUMMARY vs. PROC UNIVARIATE – Elapsed time was 72% to 94% lower – CPU was 76% - 96% lower In tests where PROC MEANS was also run, results were similar to PROC SUMMARY – Sometimes a little less CPU and elapsed time, sometimes a little more

Other Observations Data steps performed slightly better then PROCs SUMMARY and MEANS for simple functions but not as good on more complex functions Most tests were run on both mainframe and PC – Elapsed time and CPU improvement percentages (vs. PROC UNIVARIATE) were usually similar on both platforms The tests were run on an older, slower mainframe and a new Windows 7 PC – For each test, the same data and parameters were run on both the mainframe and PC The PC generally ran percent faster than the same tests on the mainframe (for tested functions) and used per less CPU