Download presentation
Presentation is loading. Please wait.
Published byImogene Stafford Modified over 5 years ago
1
Simplifying Effective Data Transformation Via
PROC TRANSPOSE Arthur Li
2
INTRODUCTION Why transforming data? - different statistical procedures require different data shapes Dat1: Name E1 E2 E3 1 John 89 90 92 2 Mary . 81 Dat1 contains 2 observations 3 English test scores E1 – E3
3
INTRODUCTION Why transforming data? - different statistical procedures require different data shapes Dat1: Dat1_Transpose1: Name E1 E2 E3 1 John 89 90 92 2 Mary . 81 Test John Mary 1 E1 89 92 2 E2 90 . 3 E3 81 2 X 3 matrix 3 X 2 matrix Data structure fits for paired t-test
4
INTRODUCTION Name: By-variable
Why transforming data? - different statistical procedures require different data shapes Dat1: Dat1_Transpose2: Name E1 E2 E3 1 John 89 90 92 2 Mary . 81 Name Test Score 1 John E1 89 2 E2 90 3 E3 92 4 Mary 5 81 Name: By-variable
5
INTRODUCTION Why transforming data? - different statistical procedures require different data shapes Dat1: Dat1_Transpose2: Name E1 E2 E3 1 John 89 90 92 2 Mary . 81 Name Test Score 1 John E1 89 2 E2 90 3 E3 92 4 Mary 5 81 Dat1_Transpose2 fits for t-test for independent samples
6
INTRODUCTION Method for transforming data: ARRAY processing
PROC TRANSPOSE
7
INTRODUCTION PROC TRANSPOSE <DATA=input-data-set>
<DELIMITER=delimiter> <LABEL=label> <LET> <NAME=name> <OUT=output-data-set> <PREFIX=prefix> <SUFFIX=suffix>; BY <DESCENDING> variable-1 <...<DESCENDING> variable-n>; COPY variable(s); ID variable; IDLABEL variable; VAR variable(s);
8
Name & ID: character variables E1 – E3: numeric variables
TRANSPOSING AN ENTIRE DATA SET THE DEFAULT FORMAT OF TRANPOSED DATA SETS Dat1: Name Id E1 E2 E3 1 John A01 89 90 92 2 Mary A02 . 81 data dat1; input name $ id $ e1 - e3; label e1=English1 e2=English2 e3=English3; datalines; John A Mary A ; Read in data Name & ID: character variables E1 – E3: numeric variables E1 – E3: have variable labels
9
THE DEFAULT FORMAT OF TRANPOSED DATA SETS
proc transpose data=dat1 out=dat1_out1; run; Name Id E1 E2 E3 1 John A01 89 90 92 2 Mary A02 . 81 Only the PROC TRANSPOSE statement is used OUT=: specifies the name of the transposed data set
10
THE DEFAULT FORMAT OF TRANPOSED DATA SETS
proc transpose data=dat1 out=dat1_out1; run; proc print data=dat1_out1; Name Id E1 E2 E3 1 John A01 89 90 92 2 Mary A02 . 81 The SAS System Obs _NAME_ _LABEL_ COL1 COL2 e English e English e English
11
THE DEFAULT FORMAT OF TRANPOSED DATA SETS
proc transpose data=dat1 out=dat1_out1; run; proc print data=dat1_out1; Name Id E1 E2 E3 1 John A01 89 90 92 2 Mary A02 . 81 The SAS System Obs _NAME_ _LABEL_ COL1 COL2 e English e English e English By default, without specifying the names of the transposing variables, all the numeric variables from the input data set are transposed.
12
THE DEFAULT FORMAT OF TRANPOSED DATA SETS
proc transpose data=dat1 out=dat1_out1; run; proc print data=dat1_out1; Name Id E1 E2 E3 1 John A01 89 90 92 2 Mary A02 . 81 Default variable names The SAS System Obs _NAME_ _LABEL_ COL1 COL2 e English e English e English
13
THE DEFAULT FORMAT OF TRANPOSED DATA SETS
proc transpose data=dat1 out=dat1_out1; run; proc print data=dat1_out1; Name Id E1 E2 E3 1 John A01 89 90 92 2 Mary A02 . 81 The SAS System Obs _NAME_ _LABEL_ COL1 COL2 e English e English e English
14
THE DEFAULT FORMAT OF TRANPOSED DATA SETS
Contain labels Dat1: proc transpose data=dat1 out=dat1_out1; run; proc print data=dat1_out1; Name Id E1 E2 E3 1 John A01 89 90 92 2 Mary A02 . 81 The SAS System Obs _NAME_ _LABEL_ COL1 COL2 e English e English e English
15
THE DEFAULT FORMAT OF TRANPOSED DATA SETS
proc transpose data=dat1 out=dat1_out1; run; proc print data=dat1_out1; Name Id E1 E2 E3 1 John A01 89 90 92 2 Mary A02 . 81 The SAS System Obs _NAME_ _LABEL_ COL1 COL2 e English e English e English How to control the names of these variables?
16
CONTROLING THE NAMES OF THE VARIABLES IN THE TRANPOSED DATA SET
proc transpose data=dat1 out=dat1_out2 name=varname label=labelname prefix=score_; var e1-e3; run; Name Id E1 E2 E3 1 John A01 89 90 92 2 Mary A02 . 81 The SAS System Obs varname labelname score_1 score_2 e English e English e English
17
CONTROLING THE NAMES OF THE VARIABLES IN THE TRANPOSED DATA SET
proc transpose data=dat1 out=dat1_out2 name=varname label=labelname prefix=score_; var e1-e3; run; Name Id E1 E2 E3 1 John A01 89 90 92 2 Mary A02 . 81 The SAS System Obs varname labelname score_1 score_2 e English e English e English
18
CONTROLING THE NAMES OF THE VARIABLES IN THE TRANPOSED DATA SET
proc transpose data=dat1 out=dat1_out2 name=varname label=labelname prefix=score_; var e1-e3; run; Name Id E1 E2 E3 1 John A01 89 90 92 2 Mary A02 . 81 The SAS System Obs varname labelname score_1 score_2 e English e English e English
19
CONTROLING THE NAMES OF THE VARIABLES IN THE TRANPOSED DATA SET
proc transpose data=dat1 out=dat1_out2 name=varname label=labelname prefix=score_; var e1-e3; run; Name Id E1 E2 E3 1 John A01 89 90 92 2 Mary A02 . 81 SUFFIX= option: attach a suffix The SAS System Obs varname labelname score_1 score_2 e English e English e English
20
CONTROLING THE NAMES OF THE VARIABLES IN THE TRANPOSED DATA SET
proc transpose data=dat1 out=dat1_out2 name=varname label=labelname prefix=score_; var e1-e3; run; Name Id E1 E2 E3 1 John A01 89 90 92 2 Mary A02 . 81 The SAS System Obs varname labelname score_1 score_2 e English e English e English
21
CONTROLING THE NAMES OF THE VARIABLES IN THE TRANPOSED DATA SET
proc transpose data=dat1 out=dat1_out2 name=varname label=labelname prefix=score_; var e1-e3; run; Name Id E1 E2 E3 1 John A01 89 90 92 2 Mary A02 . 81 John’s score Mary’s score The SAS System Obs varname labelname score_1 score_2 e English e English e English How can you change the variable names to score_John and score_Mary?
22
USING THE ID STATEMENT TO LABEL THE NAMES OF THE TRANSPOSED VARIABLES
Dat1: proc transpose data=dat1 out=dat1_out3 label=labelname name=varname prefix=score_; var e1-e3; id name; run; Name Id E1 E2 E3 1 John A01 89 90 92 2 Mary A02 . 81 The SAS System score_ score_ Obs varname labelname John Mary e English e English e English
23
USING THE ID STATEMENT TO LABEL THE NAMES OF THE TRANSPOSED VARIABLES
Dat1: proc transpose data=dat1 out=dat1_out3 label=labelname name=varname prefix=score_; var e1-e3; id name; run; Name Id E1 E2 E3 1 John A01 89 90 92 2 Mary A02 . 81 The SAS System score_ score_ Obs varname labelname John Mary e English e English e English
24
USING THE ID STATEMENT TO LABEL THE NAMES OF THE TRANSPOSED VARIABLES
Dat1: proc transpose data=dat1 out=dat1_out4 label=labelname name=varname delim=_; var e1-e3; id name id; run; Name Id E1 E2 E3 1 John A01 89 90 92 2 Mary A02 . 81 The SAS System Obs varname labelname John_A01 Mary_A02 e English e English e English
25
USING THE ID STATEMENT TO LABEL THE NAMES OF THE TRANSPOSED VARIABLES
Dat1: proc transpose data=dat1 out=dat1_out5 label=labelname name=varname prefix=score_; var e1-e3; id name; idlabel id; run; Name Id E1 E2 E3 1 John A01 89 90 92 2 Mary A02 . 81 The SAS System score_ score_ Obs varname labelname John Mary e English e English e English
26
USING THE ID STATEMENT TO LABEL THE NAMES OF THE TRANSPOSED VARIABLES
Dat1: Name Id E1 E2 E3 1 John A01 89 90 92 2 Mary A02 . 81 proc contents data=dat1_out5; run; Alphabetic List of Variables and Attributes # Variable Type Len Label 2 labelname Char LABEL OF FORMER VARIABLE 3 score_John Num A01 4 score_Mary Num A02 1 varname Char NAME OF FORMER VARIABLE
27
TRANSPOSING BY-GROUPS THE DEFAULT FORMAT FOR TRANPOSING BY-GROUPS
proc sort data=dat1 out=dat1_sort; by name; run; proc transpose data=dat1_sort out=dat1_out6 ; Dat1: Name Id E1 E2 E3 1 John A01 89 90 92 2 Mary A02 . 81 By-variable: Name The SAS System Obs name _NAME_ _LABEL_ COL1 1 John e English 2 John e English 3 John e English 4 Mary e English 5 Mary e English 6 Mary e English
28
THE DEFAULT FORMAT FOR TRANPOSING BY-GROUPS
3 proc sort data=dat1 out=dat1_sort; by name; run; proc transpose data=dat1_sort out=dat1_out6 ; Dat1: Name Id E1 E2 E3 1 John A01 89 90 92 2 Mary A02 . 81 By-variable: Name 2 The SAS System Obs name _NAME_ _LABEL_ COL1 1 John e English 2 John e English 3 John e English 4 Mary e English 5 Mary e English 6 Mary e English 6
29
THE DEFAULT FORMAT FOR TRANPOSING BY-GROUPS
proc sort data=dat1 out=dat1_sort; by name; run; proc transpose data=dat1_sort out=dat1_out6 ; Dat1: Name Id E1 E2 E3 1 John A01 89 90 92 2 Mary A02 . 81 By-variable: Name 1 1 1 The SAS System Obs name _NAME_ _LABEL_ COL1 1 John e English 2 John e English 3 John e English 4 Mary e English 5 Mary e English 6 Mary e English
30
USE THE COPY STATEMENT TO COPY VARIABLES FROM THE INPUT DATA SET
proc transpose data=dat1_sort out=dat1_out7 (rename=(col1=SCORE _label_=TEST) drop=_name_ where=(score ne .)); by name; copy id; run; Dat1: Name Id E1 E2 E3 1 John A01 89 90 92 2 Mary A02 . 81 The SAS System Obs name id TEST SCORE 1 John A01 English 2 John English 3 John English 4 Mary A02 English 5 Mary English COPY statement: copy variable(s) from the input data set directly to the transposed data set
31
USE THE COPY STATEMENT TO COPY VARIABLES FROM THE INPUT DATA SET
proc transpose data=dat1_sort out=dat1_out7 (rename=(col1=SCORE _label_=TEST) drop=_name_ where=(score ne .)); by name; copy id; run; Dat1: Name Id E1 E2 E3 1 John A01 89 90 92 2 Mary A02 . 81 N = 2 from the input data The SAS System Obs name id TEST SCORE 1 John A01 English 2 John English 3 John English 4 Mary A02 English 5 Mary English number of observations are copied = 2
32
USE THE COPY STATEMENT TO COPY VARIABLES FROM THE INPUT DATA SET
proc transpose data=dat1_sort out=dat1_out7 (rename=(col1=SCORE _label_=TEST) drop=_name_ where=(score ne .)); by name; copy id; run; Dat1: Name Id E1 E2 E3 1 John A01 89 90 92 2 Mary A02 . 81 3 data set option The SAS System Obs name id TEST SCORE 1 John A01 English 2 John English 3 John English 4 Mary A02 English 5 Mary English
33
USE THE COPY STATEMENT TO COPY VARIABLES FROM THE INPUT DATA SET
proc transpose data=dat1_sort out=dat1_out7 (rename=(col1=SCORE _label_=TEST) drop=_name_ where=(score ne .)); by name; copy id; run; Dat1: Name Id E1 E2 E3 1 John A01 89 90 92 2 Mary A02 . 81 The SAS System Obs name id TEST SCORE 1 John A01 English 2 John English 3 John English 4 Mary A02 English 5 Mary English
34
USE THE COPY STATEMENT TO COPY VARIABLES FROM THE INPUT DATA SET
proc transpose data=dat1_sort out=dat1_out7 (rename=(col1=SCORE _label_=TEST) drop=_name_ where=(score ne .)); by name; copy id; run; Dat1: Name Id E1 E2 E3 1 John A01 89 90 92 2 Mary A02 . 81 Equivalent to... LABEL= TEST The SAS System Obs name id TEST SCORE 1 John A01 English 2 John English 3 John English 4 Mary A02 English 5 Mary English
35
USE THE COPY STATEMENT TO COPY VARIABLES FROM THE INPUT DATA SET
proc transpose data=dat1_sort out=dat1_out7 (rename=(col1=SCORE _label_=TEST) drop=_name_ where=(score ne .)); by name; copy id; run; Dat1: Name Id E1 E2 E3 1 John A01 89 90 92 2 Mary A02 . 81 The SAS System Obs name id TEST SCORE 1 John A01 English 2 John English 3 John English 4 Mary A02 English 5 Mary English
36
USE THE COPY STATEMENT TO COPY VARIABLES FROM THE INPUT DATA SET
proc transpose data=dat1_sort out=dat1_out7 (rename=(col1=SCORE _label_=TEST) drop=_name_ where=(score ne .)); by name; copy id; run; Dat1: Name Id E1 E2 E3 1 John A01 89 90 92 2 Mary A02 . 81 The SAS System Obs name id TEST SCORE 1 John A01 English 2 John English 3 John English 4 Mary A02 English 5 Mary English
37
SITUATIONS FOR USING THE ID STATEMENT FOR TRANSPOSING BY-GROUPS
proc transpose data=dat1_sort out=dat1_out7 (rename=(col1=SCORE _label_=TEST) drop=_name_ where=(score ne .)); by name; copy id; run; Dat1: Name Id E1 E2 E3 1 John A01 89 90 92 2 Mary A02 . 81 ID statement: used to specify the variable from the input data set that contains the values to rename the transposed variables. The SAS System Obs name id TEST SCORE 1 John A01 English 2 John English 3 John English 4 Mary A02 English 5 Mary English
38
SITUATIONS FOR USING THE ID STATEMENT FOR TRANSPOSING BY-GROUPS
proc transpose data=dat1_sort out=dat1_out8(drop=_name_) label = TEST; by name; id id; run; Dat1: Name Id E1 E2 E3 1 John A01 89 90 92 2 Mary A02 . 81 You are using the ID variable (contains two values) to name the transposed variable that was supposed to occupy only one column. The SAS System Obs name TEST A01 A02 1 John English 2 John English 3 John English 4 Mary English 5 Mary English 6 Mary English
39
SITUATIONS FOR USING THE ID STATEMENT FOR TRANSPOSING BY-GROUPS
Dat2: Name Id Exam Score 1 John A01 89 2 90 3 92 4 Mary A02 5 81 data dat2; input name $ id $ exam score; datalines; John A John A John A Mary A Mary A ;
40
SITUATIONS FOR USING THE ID STATEMENT FOR TRANSPOSING BY-GROUPS
1 Dat2: proc sort data=dat2 out=dat2_sort; by name; run; proc transpose data=dat2_sort out=dat2_out1; var score; Name Id Exam Score 1 John A01 89 2 90 3 92 4 Mary A02 5 81 By-variable: Name 2 The SAS System Obs name _NAME_ COL1 COL2 COL3 1 John score 2 Mary score 2
41
SITUATIONS FOR USING THE ID STATEMENT FOR TRANSPOSING BY-GROUPS
Dat2: proc sort data=dat2 out=dat2_sort; by name; run; proc transpose data=dat2_sort out=dat2_out1; var score; Name Id Exam Score 1 John A01 89 2 90 3 92 4 Mary A02 5 81 The SAS System Obs name _NAME_ COL1 COL2 COL3 1 John score 2 Mary score
42
SITUATIONS FOR USING THE ID STATEMENT FOR TRANSPOSING BY-GROUPS
Dat2: proc transpose data=dat2_sort out=dat2_out2 (drop=_name_) prefix=test_; var score; by name; id exam; run; Name Id Exam Score 1 John A01 89 2 90 3 92 4 Mary A02 5 81 The SAS System Obs name test_1 test_2 test_3 1 John 2 Mary
43
HANDLING DUPLICATES BY USING THE LET OPTION
Dat3: data dat3; input name $ id $ exam score; datalines; John A John A John A John A Mary A Mary A Mary A ; Name Id Exam Score 1 John A01 89 2 90 3 92 4 95 5 Mary A02 6 81 7 85
44
HANDLING DUPLICATES BY USING THE LET OPTION
Dat3: proc transpose data=dat3 out=dat3_out1(drop=_name_) prefix=test_; var score; by name; id exam; run; Name Id Exam Score 1 John A01 89 2 90 3 92 4 95 5 Mary A02 6 81 7 85
45
HANDLING DUPLICATES BY USING THE LET OPTION
198 proc transpose data=dat3 out=dat3_out1(drop=_name_) prefix=test_; 201 var score; by name; id exam; 205 run; ERROR: The ID value "test_3" occurs twice in the same BY group. NOTE: The above message was for the following BY group: name=John name=Mary ERROR: All BY groups were bad. NOTE: The SAS System stopped processing this step because of errors. NOTE: There were 7 observations read from the data set WORK.DAT3. WARNING: The data set WORK.DAT3_OUT1 may be incomplete. When this step was stopped there were 0 observations and 0 variables. WARNING: Data set WORK.DAT3_OUT1 was not replaced because this step was stopped. NOTE: PROCEDURE TRANSPOSE used (Total process time): real time seconds cpu time seconds
46
HANDLING DUPLICATES BY USING THE LET OPTION
LET option: keep the last occurrence of a particular ID value within either the entire data set or a BY group. Dat3: proc sort data=dat3 out=dat3_sort1; by name exam score; run; proc transpose data=dat3_sort1 out=dat3_out1(drop=_name_) prefix=test_ let; var score; by name; id exam; Name Id Exam Score 1 John A01 89 2 90 3 92 4 95 5 Mary A02 6 81 7 85
47
HANDLING DUPLICATES BY USING THE LET OPTION
Dat3: proc sort data=dat3 out=dat3_sort1; by name exam score; run; proc transpose data=dat3_sort1 out=dat3_out1(drop=_name_) prefix=test_ let; var score; by name; id exam; Name Id Exam Score 1 John A01 89 2 90 3 92 4 95 5 Mary A02 6 81 7 85 Keep the maximum score within each exam The SAS System Obs name test_1 test_2 test_3 1 John 2 Mary
48
HANDLING DUPLICATES BY USING THE LET OPTION
Dat3: proc sort data=dat3 out=dat3_sort2; by name exam descending score; run; proc transpose data=dat3_sort2 out=dat3_out2(drop=_name_) prefix=test_ let; var score; by name; id exam; Name Id Exam Score 1 John A01 89 2 90 3 92 4 95 5 Mary A02 6 81 7 85 Keep the minimum score within each exam The SAS System Obs name test_1 test_2 test_3 1 John 2 Mary
49
SITUATIONS FOR TRANSPOSING DATA MORE THAN ONCE
Dat4_Transpose: Dat4: Test_num John_E John_M Mary_E Mary_M 1 89 78 92 76 2 90 . 91 3 81 Name E1 E2 E3 M1 M2 M3 1 John 89 90 92 78 2 Mary . 81 76 91 To transpose from Dat4 Dat4_Transpose, we need a “transitional” data set. Name Scores Test_num Class 1 John 89 E 2 78 M 3 Mary 92 4 76 5 90 6 7 . 8 91 9 10 11 81 12
50
SITUATIONS FOR TRANSPOSING DATA MORE THAN ONCE
Dat4_Transpose: Dat4: Test_num John_E John_M Mary_E Mary_M 1 89 78 92 76 2 90 . 91 3 81 Name E1 E2 E3 M1 M2 M3 1 John 89 90 92 78 2 Mary . 81 76 91 Name _NAME_ COL1 1 John E1 89 2 E2 90 3 E3 92 4 M1 78 5 M2 6 M3 7 Mary 8 . 9 81 10 76 11 91 12 Name Scores Test_num Class 1 John 89 E 2 78 M 3 Mary 92 4 76 5 90 6 7 . 8 91 9 10 11 81 12
51
SITUATIONS FOR TRANSPOSING DATA MORE THAN ONCE
Dat4_Transpose: Dat4: Test_num John_E John_M Mary_E Mary_M 1 89 78 92 76 2 90 . 91 3 81 Name E1 E2 E3 M1 M2 M3 1 John 89 90 92 78 2 Mary . 81 76 91 Name _NAME_ COL1 1 John E1 89 2 E2 90 3 E3 92 4 M1 78 5 M2 6 M3 7 Mary 8 . 9 81 10 76 11 91 12 Step1: proc sort data=dat4 out=dat4_sort1; by name; run; proc transpose data=dat4_sort1 out=dat4_out1;
52
SITUATIONS FOR TRANSPOSING DATA MORE THAN ONCE
Dat4_Transpose: Dat4: Test_num John_E John_M Mary_E Mary_M 1 89 78 92 76 2 90 . 91 3 81 Name E1 E2 E3 M1 M2 M3 1 John 89 90 92 78 2 Mary . 81 76 91 Name _NAME_ COL1 Test_num Class 1 John E1 89 E 2 E2 90 3 E3 92 4 M1 78 M 5 M2 6 M3 7 Mary 8 . 9 81 10 76 11 91 12 Step2: data dat4_out1a; set dat4_out1; test_num=substr(_name_,2); class=substr(_name_,1,1); run;
53
SITUATIONS FOR TRANSPOSING DATA MORE THAN ONCE
Dat4_Transpose: Dat4: Test_num John_E John_M Mary_E Mary_M 1 89 78 92 76 2 90 . 91 3 81 Name E1 E2 E3 M1 M2 M3 1 John 89 90 92 78 2 Mary . 81 76 91 Name _NAME_ COL1 Test_num Class 1 John E1 89 E 2 M1 78 M 3 Mary 92 4 76 5 E2 90 6 M2 7 . 8 91 9 E3 10 M3 11 81 12 Step3: proc sort data=dat4_out1a out=dat4_sort2; by test_num name; run;
54
SITUATIONS FOR TRANSPOSING DATA MORE THAN ONCE
Dat4_Transpose: Dat4: Test_num John_E John_M Mary_E Mary_M 1 89 78 92 76 2 90 . 91 3 81 Name E1 E2 E3 M1 M2 M3 1 John 89 90 92 78 2 Mary . 81 76 91 Name _NAME_ COL1 Test_num Class 1 John E1 89 E 2 M1 78 M 3 Mary 92 4 76 5 E2 90 6 M2 7 . 8 91 9 E3 10 M3 11 81 12 Step4: proc transpose data=dat4_sort2 out=dat4_out2(drop=_name_) delimiter=_; by test_num; var col1; id name class; run;
55
CONCLUSION PROC TRANSPOSE is a powerful procedure to perform data transposition The key to a successful data transposition is knowing when to utilize different options and statements
56
ACKNOWLEDGEMENT I would like to thank Jerry Leonard, Technical Support Analyst from SAS Technical Support, for his valuable programming suggestions and insight I would like to thank SANDS for inviting me to present this paper
57
CONTACT INFORMATION Arthur Li City of Hope Division of Information Science 1500 East Duarte Road Duarte, CA
Similar presentations
© 2025 SlidePlayer.com Inc.
All rights reserved.