PROC UNIVARIATE vs. PROC SUMMARY A Comparison of Performance
Background For many of the common things I do, PROCs UNIVARIATE and SUMMARY can accomplish similar results Many years ago, someone suggested I use PROC UNIVARIATE because it had more functions They claimed that both procedures performed about the same – I didn’t bother to check that out Unless I needed something that could be done only with PROC SUMMARY, I got in the habit of using PROC UNIVARIATE
More Background Several months ago, I was becoming frustrated with how long it was taking to run some large PROC UNIVARIATEs for simple functions (like SUM, MEAN, MIN, MAX, etc.) – It also was using a lot of CPU There had to be a better way
My First Experiment Wrote DATA steps to do simple functions Benchmarked the DATA steps again PROC UNIVARIATE steps Compared output results to ensure integrity Ran tests using SAS on both Mainframe and PC The results were surprising
Results of First Test Data step showed: – 95% reduction in elapsed time – 99% reduction in CPU time Decided to also run tests comparing PROC SUMMARY
Results of First Test Compared to PROC UNIVARIATE, PROC SUMMARY showed: – 94% reduction in elapsed time – 96% reduction in CPU time
Overall Test Results Ran many tests on several types of data Data Step vs. PROC UNIVARIATE – Elapsed time was 71% to 95% lower – CPU was 74% - 99% lower PROC SUMMARY vs. PROC UNIVARIATE – Elapsed time was 72% to 94% lower – CPU was 76% - 96% lower In tests where PROC MEANS was also run, results were similar to PROC SUMMARY – Sometimes a little less CPU and elapsed time, sometimes a little more
Other Observations Data steps performed slightly better then PROCs SUMMARY and MEANS for simple functions but not as good on more complex functions Most tests were run on both mainframe and PC – Elapsed time and CPU improvement percentages (vs. PROC UNIVARIATE) were usually similar on both platforms The tests were run on an older, slower mainframe and a new Windows 7 PC – For each test, the same data and parameters were run on both the mainframe and PC The PC generally ran percent faster than the same tests on the mainframe (for tested functions) and used per less CPU