Download presentation
Presentation is loading. Please wait.
Published byJames McBride Modified over 9 years ago
1
SAS Programming SAS Data Mart
2
Outline Access different format of data for SAS SAS data mart SAS data manipulation 2
3
Making Use of SAS Enterprise Guide Code Import a text file Example: Orders.txt Import an Excel file Example: SupplyInfo.xls 3
4
Learn from Examples SAS Help Contents -> Learning to use SAS -> Sample SAS Programs -> Base SAS “Base Usage Guide Examples” –Chapter 3, 4 4
5
Import an Excel Sheet proc import out=work.commrex datafile ="C:\Lin\Shared\ISQS6339\Commrex_3358.xls" dbms=excel replace; sheet="Company"; getnames=yes; mixed=no; scantext=yes; usedate=yes; scantime=yes; run; proc print data=work.commrex; run; 5
6
Excel SAS/ACCESS LIBNAME Engine libname xlsdata 'C:\Lin\Shared\ISQS6339\Commrex_6339.xls'; proc print data=xlsdata.New1; run; 6
7
Exercise 10: SAS Data Step Programming Import and print text file Userlog00.txt Import and print Excel sheet customers in Afs_customers.xls Send the Word file containing the Screenshots of two outputs to zhangxi.lin@hotmail.com 7
8
8 SAS Data Mart Definitions with minor differences A repository for data to be used as a source for business intelligence Object-oriented data repository, held for decision support and BI needs. A data mart is a specialized version of a data warehouse. Like data warehouses, data marts contain a snapshot of operational data that helps business people to strategize based on analyses of past trends and experiences. The key difference is that the creation of a data mart is predicated on a specific, predefined need for a certain grouping and configuration of select data. A data mart configuration emphasizes easy access to relevant information.
9
9 SAS Data Mart Structures for Different Purposes Longitudinal analysis Data mart has observations over time The generalized form of the data structure: time category Example: Beer sales Transactional data, and aggregated levels Example: web log The order of rows matters Cross-sectional dimension data – ranked by categories Interleaved time series – ranked by time
10
10 Auction Description: This is data from 19 livestock auction markets. The columns include: the number of head of different livestock sold (in thousands) including CATTLE, CALVES, HOGS, and SHEEP, the cost of operation of the auction market (in thousands of dollars) (COST), and the market identifier (MARKETID). The object is to use multiple linear regression to describe the relationship between the cost of operations to the number of livestock sold in the various classes. COST will be the dependent variable and CATTLE, CALVES, HOGS, and SHEEP the independent variables. An additional variable, VOLUME, is the total of all major livestock sold in each market. It is the sum of the variables CATTLE, CALVES, HOGS, and SHEEP, and can be used to demonstrate an exact linear dependency between independent variables. rows, columns. Regression, Graphic Analysis
11
11 Beer Sales Description: Beer sales records monthly sales of beer in hectoliters, along with the average high and low temperatures in the region, over a period of five years. The object is to see how beer sales change over time. You can also consider the relationships between beer sales and temperatures. 4 rows, 60 columns. Regression, Time Series
12
12 SAS Data Mart Topics Data accessing Data manipulation – Proc SQL Data transposing Data transformations Interval-scaled observations processing –Derived variables –Time intervals Define start and end of the interval Define how intervals are measured –Binning Observations into groups –Transformations of distributions –Replacing missing values Categorical observations processing Coding for predictive modeling Data preparation for different formats of data marts Data sampling Model scoring and automation
13
13 Data Transpose Basic transposing tasks (Ch14) From a multi-rows-per-subject to a one-row-per-subject From a one-row-per-subject to a multi-rows-per-subject Creating key-value tables Transposing longitudinal data (Ch15) From STANDARD to INTERLEAVED and back From CROSS SECTIONAL to STANDARD and back Complex scenarios Macro
14
14 Data Transformations Transformations of Interval-Scaled Variables (Ch16) Simple derived variables Derived relative variables Time intervals Binning observations into groups Transformations of distributions Replacing missing values Transformations of Categorical Variables (Ch17) Derived variables Combining categories Dummy coding of categorical variables Multidimensional categorical variables
15
15 Exercise 11 Data transpose. Write a SAS code to transpose SAS data set Auction in the shared directory. Transpose without other parameters Drop some variables Remove the generated variables Input data. Userlog0.txt, Userlog1.txt
16
16 SAS Programming Illustrative Examples Data sets merge SAS Procedures (Proc) Proc Freq Proc Means Demonstration: QUERYLOG.TXT
17
17 PROC FREQ data SummerSchool; input Gender $ Internship $ Enrollment $ Count @@; datalines; boys yes yes 35 boys yes no 29 boys no yes 14 boys no no 27 girls yes yes 32 girls yes no 10 girls no yes 53 girls no no 23 ; proc freq data=SummerSchool order=data; weight count; tables Internship*Enrollment / chisq; run;
18
18 PROC FREQ PROC SQL; CREATE VIEW WORK.SORT AS SELECT AnonID, QueryDate FROM SASUSER.IMPW_0012; QUIT; PROC freq DATA=WORK.SORT ORDER=INTERNAL; TABLES AnonID / NOPERCENT NOCUM SCORES=TABLE; TABLES QueryDate / NOPERCENT NOCUM SCORES=TABLE; RUN;
19
19 PROC MEANS PROC SQL; CREATE VIEW WORK.SORTTempTableSorted AS SELECT QueryTime, AnonID, QueryDate FROM SASUSER.IMPW_0012; QUIT; PROC MEANS DATA=WORK.SORTTempTableSorted (drop= _way_ _type_) FW=12 PRINTALLTYPES CHARTYPE NWAYVARDEF=DF MIN MAX N; VAR QueryTime; CLASS AnonID /ORDER=UNFORMATTED ASCENDING; CLASS QueryDate /ORDER=UNFORMATTED ASCENDING; OUTPUT OUT=QUERYLOG(LABEL="Summary Statistics for SASUSER.IMPW_0012") MIN()= MAX()= N()= / AUTONAME AUTOLABEL WAYS INHERIT; RUN;
20
20 Case Study – Building a Customer Data Mart Business Context To create an analytical model that predicts customers that have a high probability to leave the company. To create a data mart that reflects the properties and behaviors of customers. The data mart will also contain a variable that presents the fact that a customer has left the company at a certain point in time. Five SAS datasets Customer – demographic and customer baseline data Accounts – information customer accounts Leasing – data on leasing information Callcenter – transactional data on call center contacts Scores – transactional data of value segment scores The datasets have the unique CUSTID variable.
21
21 Data Mart Programs Create aggregation on a subject level of ACCOUNTS, LEASING, CALLCENTERS and SCORES tables ACCOUNT –Derive sums per customer –Calculate an account type split based on the balance LEASING –Summarize the value and the annual rate per customer CALLCENTER –Aggregate the call center data over all customers using the FREQ proc –Calculate the frequencies for the category COMPLAINT SCORES –Contains the VALUESEGMENT for each customer and month. If a customer cancelled his contract, he got an “8.lost”, or he got “1.Gold”, etc. –Create three output data sets: value segments in actual month, one month ago, and in a future month Join the resulting tables together and create a CUSTOMER DATA MART table.
22
22 Exercise 12 Use PROC FREQ to generate a summary statistics table from 6339EX12.xls dataset Hints Define path: –libname xlsdata ‘ \6339EX12.xls'; Define the worksheet as ‘weblog$’n Column: Page, Row: UserID Explore the data mart created by the code of Ch25.sas Use SAS Enterprise Guide to create joint tables from the dataset Try One-way Frequency and Summary Statistics functions with the ACCOUNTS data set
23
23 More Topics Multiple Interval-Scaled Observation per Subject Multiple Categorical per Subject Coding for Predictive Modeling Data Preparation for Multi-Rows-per-Subject and Longitudinal Data Marts Sampling Scoring and Automation Do’s and Don’ts
24
24 Base Procedures Guide Examples Elementary Statistics Procedures, Chapter 1 Reporting Procedures, Chapter 2 APPEND Procedure, Chapter 5 CALENDAR Procedure, Chapter 7 CHART Procedure, Chapter 9 COMPARE Procedure, Chapter 11 CORR Procedure, Chapter 15 FORMAT Procedure, Chapter 18 FORMS Procedure, Chapter 19 FREQ Procedure, Chapter 20 MEANS Procedure, Chapter 21 PLOT Procedure, Chapter 25 PRINT Procedure, Chapter 27 RANK Procedure, Chapter 29 SQL Procedure, Chapter 34 STANDARD Procedure, Chapter 35 TABULATE Procedure, Chapter 37 TIMEPLOT Procedure, Chapter 40 TRANSPOSE Procedure, Chapter 41 UNIVARIATE Procedure, Chapter 42
25
25 Base Usage Guide Examples What is the SAS System?, Chapter 1 Introduction to DATA Step Processing, Chapter 2 Starting with Raw Data, Chapter 3 Starting with SAS Data Sets, Chapter 4 Understanding DATA Step Processing, Chapter 5 Working with Numeric Variables, Chapter 6 Working with Character Variables, Chapter 7 Acting on Selected Observations, Chapter 8 Creating Subsets of Observations, Chapter 9 Grouped or Sorted Observations, Chapter 10 More than One Obs. in a Calculation, Chapter 11 Finding Shortcuts in Programming, Chapter 12 Working with Dates in the SAS System, Chapter 13 Concatenating SAS Data Sets, Chapter 15 Interleaving SAS Data Sets, Chapter 16 Merging SAS Data Sets, Chapter 17 Updating SAS Data Sets, Chapter 18 Manipulating SAS Data Sets, Chapter 19 Understanding and Enhancing Your Output, Chapter 20 Printing Detailed Reports, Chapter 24 Creating Summary Tables, Chapter 25 Mailing Labels and Other Forms, Chapter 26 Plotting Relationships Between Variables, Chap. 27 Charting the Values of a Variable, Chapter 28 Writing Output, Chapter 29 Customizing Output, Chapter 30
26
26 SQL Examples Demonstrates Basic SQL SELECT Statements Demonstrates Basic SQL CREATE Statements Demonstrates Basic SQL DELETE Statements Demonstrates Basic SQL Dictionary Statements Demonstrates Basic SQL Insert Statements Demonstrates SQL Join Performance with Macros Demonstrates Basic SQL JOIN Features Demonstrates SQL Set Operators Introduction to the SQL Procedure, Chapter 1 Retrieving Data with the SELECT Statement, Chapter 2 Datasets Used in SQL Usage Guide Demonstrates Basic SQL UPDATE Statements Demonstrates SQL View Capabilities Demonstrates SQL View Update Capabilities Demonstrates Recursive Joins with PROC SQL Demonstrates Nested Statistics with SQL Demonstrates CASE Statement with PROC SQL Demonstrates the Product of Columns with PROC SQL Demonstrates Merging Tables with Proc SQL Demonstrates Use of Macros with PROC SQL Demonstrates Use of Cartesian Product with PROC SQL More Uses of the Cartesian Product with PROC SQL Demonstrates Use of Subqueries with PROC SQL Demonstrates Relationship Between Data Step and SQL
Similar presentations
© 2025 SlidePlayer.com Inc.
All rights reserved.