Efficient SAS programming with Large Data Aidan McDermott Computing Group, March 2007.

Slides:



Advertisements
Similar presentations
Section 6.2. Record data by magnetizing the binary code on the surface of a disk. Data area is reusable Allows for both sequential and direct access file.
Advertisements

Interactive lesson about operating system
Chapter 7 Introduction to Procedures. So far, all programs written in such way that all subtasks are integrated in one single large program. There is.
Lecture 11: Operating System Services. What is an Operating System? An operating system is an event driven program which acts as an interface between.
Operating-System Structures
Chapter Chapter 4. Think back to any very difficult quantitative problem that you had to solve in some science class How long did it take? How many times.
Chapter 10 Operating Systems.
Starting Out with C++, 3 rd Edition 1 Chapter 1. Introduction to Computers and Programming.
Object Oriented Design An object combines data and operations on that data (object is an instance of class) data: class variables operations: methods Three.
16/13/2015 3:30 AM6/13/2015 3:30 AM6/13/2015 3:30 AMIntroduction to Software Development What is a computer? A computer system contains: Central Processing.
Memory Management 1 CS502 Spring 2006 Memory Management CS-502 Spring 2006.
CS-3013 & CS-502, Summer 2006 Memory Management1 CS-3013 & CS-502 Summer 2006.
State Machines Timing Computer Bus Computer Performance Instruction Set Architectures RISC / CISC Machines.
Lecture 1: Introduction CS170 Spring 2015 Chapter 1, the text book. T. Yang.
Silberschatz, Galvin and Gagne  Operating System Concepts Common System Components Process Management Main Memory Management File Management.
Chapter 1 Program Design
SAS: Managing Memory and Optimizing System Performance Jacek Czajkowski 09/29/2008.
What do operating systems do? manage processes manage memory and computer resources provide security features execute user programs make solving user.
Understanding SAS Data Step Processing Alan C. Elliott stattutorials.com.
Measuring zSeries System Performance Dr. Chu J. Jong School of Information Technology Illinois State University 06/11/2012 Sponsored in part by Deer &
Chapter 3 Memory Management: Virtual Memory
COMPUTER SOFTWARE Section 2 “System Software: Computer System Management ” CHAPTER 4 Lecture-6/ T. Nouf Almujally 1.
Comp 245 Data Structures Software Engineering. What is Software Engineering? Most students obtain the problem and immediately start coding the solution.
Operating Systems.
COP1220/CGS2423 Introduction to C++/ C for Engineers Professor: Dr. Miguel Alonso Jr. Fall 2008.
OS provide a user-friendly environment and manage resources of the computer system. Operating systems manage: –Processes –Memory –Storage –I/O subsystem.
SAS Efficiency Techniques and Methods By Kelley Weston Sr. Statistical Programmer Quintiles.
Unit R005: Understanding Computer Systems Introduction System Software Software (i.e., programs) used to control the hardware directly Used to run the.
INTRODUCTION SOFTWARE HARDWARE DIFFERENCE BETWEEN THE S/W AND H/W.
CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services Job Monitoring for the LHC experiments Irina Sidorova (CERN, JINR) on.
Chapter 4 Storage Management (Memory Management).
Chapter 10: File-System Interface 10.1 Silberschatz, Galvin and Gagne ©2011 Operating System Concepts – 8 th Edition 2014.
S2008Final_part1.ppt CS11 Introduction to Programming Final Exam Part 1 S A computer is a mechanical or electrical device which stores, retrieves,
Operating Systems (CS 340 D) Dr. Abeer Mahmoud Princess Nora University Faculty of Computer & Information Systems Computer science Department.
SE: CHAPTER 7 Writing The Program
Introduction to Using the Data Step Hash Object with Large Data Sets Richard Allen Peak Stat.
CS 111 – Nov. 22 Chapter 7 Software engineering Systems analysis Commitment –Please read Section 7.4 (only pp ), Sections –Homework #2.
Introduction Advantages/ disadvantages Code examples Speed Summary Running on the AOD Analysis Platforms 1/11/2007 Andrew Mehta.
Chapter 1 Program design Objectives To describe the steps in the program development process To introduce the current program design methodology To introduce.
CS Data Structures I Chapter 2 Principles of Programming & Software Engineering.
Chapter 22: Using Best Practices 1 STAT 541 ©Spring 2012 Imelda Go, John Grego, Jennifer Lasecki and the University of South Carolina.
C o n f i d e n t i a l 1 Course: BCA Semester: III Subject Code : BC 0042 Subject Name: Operating Systems Unit number : 1 Unit Title: Overview of Operating.
By Teacher Asma Aleisa Year 1433 H.   Goals of memory management  To provide a convenient abstraction for programming.  To allocate scarce memory.
Silberschatz, Galvin and Gagne  Operating System Concepts UNIT II Operating System Services.
O PERATING S YSTEM. What is an Operating System? An operating system is an event driven program which acts as an interface between a user of a computer,
Chapter 19: Introduction to Efficient SAS Programming 1 STAT 541 ©Spring 2012 Imelda Go, John Grego, Jennifer Lasecki and the University of South Carolina.
Design and Planning Or: What’s the next thing we should do for our project?
3/12/2013Computer Engg, IIT(BHU)1 CONCEPTS-1. Pipelining Pipelining is used to increase the speed of processing It uses temporal parallelism In pipelining,
1 Asstt. Prof Navjot Kaur Computer Dept PRESENTED BY.
Program Design. Simple Program Design, Fourth Edition Chapter 1 2 Objectives In this chapter you will be able to: Describe the steps in the program development.
Software. Introduction n A computer can’t do anything without a program of instructions. n A program is a set of instructions a computer carries out.
Introduction to Computer Programming Concepts M. Uyguroğlu R. Uyguroğlu.
Some of the utilities associated with the development of programs. These program development tools allow users to write and construct programs that the.
Working Efficiently with Large SAS® Datasets Vishal Jain Senior Programmer.
System Components Operating System Services System Calls.
Identify internal hardware devices (e. g
CSCI-235 Micro-Computer Applications
Introduction to Visual Basic 2008 Programming
ICS103 Programming in C Lecture 1: Overview of Computers & Programming
Chapter 1. Introduction to Computers and Programming
Chapter 2: Operating-System Structures
Introduction to Operating Systems
Programming Fundamentals (750113) Ch1. Problem Solving
Programming Fundamentals (750113) Ch1. Problem Solving
Chapter 1: Preliminaries
Chapter 2: Operating-System Structures
COMP755 Advanced Operating Systems
Function of Operating Systems
Presentation transcript:

Efficient SAS programming with Large Data Aidan McDermott Computing Group, March 2007

Axes if Efficiency processing speed: –CPU –real storage: –disk –memory –… user: –functionality –interface to other systems –ease of use –learning user development: –methodologies –reusable code –facilitate extension, rewriting –maintenance

Dataset / Table

Datasets consist of three parts

General (and obvious) principles Avoid doing the job if possible Keep only the data you need to perform a particular task (use drop, keep, where and if’s)

Combining datasets -- concatenation

General (and obvious) principles Often efficient methods were written to perform the required task – use them.

General (and obvious) principles Often efficient methods were written to perform other tasks – use them with caution. Write data driven code –it’s easier to maintain data than to update code Use length statements to limit the size of variables in a dataset to no more than is needed. –don’t always know what size this should be, don’t always produce your own data. Use formatted data rather than the data itself

Memory resident datasets

Compressing Datasets Compress datasets with a compression utility such as compress, gzip, winzip, or pkzip and decompress before running each SAS job –delays execution and there is need to keep track of data and program dependency. Use a general purpose compression utility and decompress it within SAS for sequential access. –system dependent (need a named pipe), sequential dataset storage.

Compressing Datasets

SAS internal Compression allows random access to data and is very effective under the right circumstances. In some cases doesn’t reduce the size of the data by much. “There is a trade-off between data size and CPU time”.

indata is a large dataset and you want to produce a version of indata without any observations

The data step is a two stage process compile phase execute phase

Data step logic

data step

data admits; set admits; discharge = admit + length; format discharge date8.; run; Nametypesizedropretainformatvalue patientIDC6ny genderC1ny admitN8nydate8. lengthN8ny dischargeN8nndate8. _N_ _ERROR_0 PDV: compile phase

data admits; set admits; discharge = admit + length; format discharge date8.; run; Nametypesizedropretainformatvalue patientIDC6ny321C-4 genderC1nyM admitN8nydate lengthN8ny21 dischargeN8nndate8. _N_1 _ERROR_0 PDV: execute phase

data admits; set admits; discharge = admit + length; format discharge date8.; run; Nametypesizedropretainformatvalue patientIDC6ny321C-4 genderC1nyM admitN8nydate lengthN8ny21 dischargeN8nndate _N_1 _ERROR_0 PDV: execute phase

data admits; set admits; discharge = admit + length; format discharge date8.; run; /* implicit output */ Nametypesizedropretainformatvalue patientIDC6ny321C-4 genderC1nyM admitN8nydate lengthN8ny21 dischargeN8nndate _N_1 _ERROR_0 PDV: execute phase

Nametypesizedropretainformatvalue patientIDC6ny321C-4 genderC1nyM admitN8nydate lengthN8ny21 dischargeN8nndate8. _N_2 _ERROR_0 data admits; set admits; discharge = admit + length; format discharge date8.; run; PDV: execute phase

Efficiency: suspend the PDV activities

General principles Use by processing whenever you can Given the data below, for each region, siteid, and date, calculate the mean and maximum ozone value.

General principles Easy:

General principles Suppose there are multiple monitors at each site and you still need to calculate the daily mean? –Combine multiple observations onto one line and then compute the statistics? Suppose you want the 10% trimmed mean? Suppose you want the second maximum? –Use Arrays to sort the data? –Write your own function?