Presentation is loading. Please wait.

Presentation is loading. Please wait.

Advanced Programming Techniques Using the DS2 Procedure

Similar presentations


Presentation on theme: "Advanced Programming Techniques Using the DS2 Procedure"— Presentation transcript:

1 Advanced Programming Techniques Using the DS2 Procedure
Viraj Kumbhakarna

2 Disclaimer MUFG Union Bank N.A.
“The contents of the paper herein are solely the author’s thoughts and opinions, which do not represent those of MUFG Union Bank N.A. The bank does not endorse, recommend, or promote any of the computing architectures, platforms, software, programming techniques or styles referenced in this paper.”

3 Agenda Introduction to DS2 Comparison of SAS data step and PROC DS2
Table of Contents Introduction to DS2 Comparison of SAS data step and PROC DS2 DS2 run environment and Running a DS2 program Construct of DS2 program Program definition and syntax DS2 program semantics Methods Packages

4 Agenda Table of Contents HASH PACKAGE FCMP PACKAGE SQLSTMT PACKAGE
SAS Federated Query Language (FedSQL) MATRIX PACKAGE Threaded Processing Conclusion Questions

5 Important features of PROC DS2
Introduction What is DS2? DS2 is SAS proprietary programming language appropriate for advanced data manipulation and data modeling applications. Additional data types ANSI SQL types Programming structure elements User defined methods and packages Embedded FedSQL Note: DS2 procedure enables one to submit DS2 language statements from a Base SAS session Important features of PROC DS2

6 Running DS2 Programs Base SAS interface using DS2 procedure.
DS2 Environments Memory Data Store (MDS) SAS® Federation Server MySQL Netezza SAS® LASR Analytic Server ODBC-compliant databases SAS® High-Performance Analytics Oracle Teradata SAS® Embedded Process SAS® Enterprise Miner SAS® Decision Services Supported Data Sources Aster DB2 for UNIX and PC OS Greenplum, PostgreSQL Hadoop (Hive and HDMD) Base SAS interface using DS2 procedure. Directly to a data source using SAS In-Database Code Accelerator. Directly to the SAS Federation Server using the SAS LIBNAME engine for SAS Federation Server. HPDS2 procedure from SAS client to submit DS2 language statements. Note: Above methods require Base SAS. Some might require additional software license (e.g. SAS/ACCESS etc.)

7 When to use DS2? 7 Precision FedSQL Expressions HPA Threads
Require precision resulting from use of new supported data types Execute SAS FedSQL from within the DS2 program Benefit from using new expressions write methods or packages Execute outside a SAS session, e.g. on High-Performance Analytics Server or the SAS Federation Server Take advantage of threaded processing in products

8 Similarities between DS2 and DATA STEP
8 Similarities between DS2 and DATA STEP Language Elements SAS formats SAS functions SAS statements: DATA, SET, KEEP, DROP, RUN, BY, RETAIN, PUT, OUTPUT, DO, IF-THEN/ELSE, Sum etc. DATA step keywords are included in the list of DS2 keywords DATA step tasks using DS2 Process: variable arrays, multi-dimensional arrays and hash tables Convert between data types Work with expressions Calculate date and time values Process missing values

9 Differences between DS2 and DATA STEP
Topic Data Step DS2 Programming paradigm Executable code resides in the DATA step and PROC step. Executable code resides in methods Scope No concept of scope. All variables are global Variables declared in method have local scope. All other identifiers have global scope. Declaring variables Variables are created by assignment. Datatype determined by context of first use Variables are declared using DECLARE statement, it also determines the data type and scope attributes of the variable. Key and reserved words No reserved keywords Keywords are reserved words Quotation marks Single or double quotation marks can delimit a character constant. ANSI SQL quoting standards: Single quotation marks delimit character constant. Double quotation marks delimit an identifier.

10 Differences between DS2 and DATA STEP
Topic Data Step DS2 PUT statement Supports column and line parameters Column and line parameters are not supported Variable attributes Attributes of variables defined using LENGTH, FORMAT, INFORMAT, LABEL, and ATTRIB Variable attributes established using DECLARE statement and its HAVING clause Data types Two data types are supported: numeric and character Most ANSI SQL data types are supported. Missing and null values Supports only missing values. No concept of a null value. Supports both missing and null values. SQL language statements Available in PROC SQL, not in the DATA step SQL SELECT statements can be written directly in and used as input for a DS2 SET statement.

11 Construct of DS2 program
DS2 program implicitly or explicitly contains 3 methods: data _null_; method init(); end; method run(); method term(); enddata; 1 1 All initializations take place in the INIT method run 2 Execution control depends on the status of input statement in the RUN method. 2 TERM method results in execution of any final statements 3 3

12 Datatypes in DS2 12 Character Fractional numeric Integer numeric
Binary Date and time CHAR(n) and VARCHAR(n) use 1 byte per character. DECIMAL(p,s) has exact precision. TINYINT (-128 to 127) – 1 byte BINARY(n) is fixed length TO_DATE casts a SAS numeric date to a DS2 DATE NCHAR(n) and NVARCHAR (n) use 2 or 4 bytes per multi-byte character DOUBLE, FLOAT(p), and REAL are considered approximate SMALLINT (-32,768 to 32,767) – 2 bytes VARBINARY(n) is variable length. TO_TIME casts a SAS numeric time to a DS2 TIME INTEGER – 4 bytes TO_TIMESTAMP BIGINT – 8 bytes TO_DOUBLE

13 Automatic Conversions
A type conversion can lead to the loss of data or precision, or both. Data type conversions are especially critical if you save DS2 data types in SAS data sets SAS data sets support only two data types. DS2 variables might be automatically converted to either fixed-length character or numeric double. Char type is used in a num expression. Num type is used in a char expression. Call to a method supplies an argument that does not exactly match signature of the method. Types of operands differ in logical, arithmetic, relational, or concatenation expression. DS2 datatype is saved to a source that doesn’t support the type.

14 14 Methods data _null_; method init(); end; enddata; Methods are structural building blocks of DS2 programs where executable code resides. User Defined Methods System Methods User created methods for reuse. Similar to functions, procedures, and subroutines in other language Predefined Provide structural and functional framework for a program to execute

15 Methods 15 Modular programming enhances readability
Break up a complex problem into smaller modules Easier to design, implement, and test Code reuse can shorten development time Standardize repeated or business-specific tasks Improved understandability by testers and other programmers Method is a module that contains a sequence of instructions to perform a specific task 15 Methods

16 System Methods INIT( ) RUN( ) SETPARMS( ) TERM( )
16 System Methods INIT( ) RUN( ) SETPARMS( ) TERM( ) Automatically executes one time, as the first method of a program. Functional equivalent of DATA step, running as implicit loop if method contains SET statement. Executes one time, when called from data program, to initialize values of a parameterized thread Automatically executes one time, as the last method of a program.

17 User Defined Methods Topic Description Scope
Name has global scope within programming block in which it is defined. Each method creates a method scope in which local variables can be defined Method Parameters User-defined method can accept arguments: by value. Argument value is copied to method by reference (IN_OUT parameter). Method modifies value of the argument variable Return Values User-defined method can return a value, only if signature contains no IN_OUT parameters

18 User Defined Methods Topic Description Method Overload
Methods having same name can exist in same scope if their argument are unique. method squareIt(int value) returns int; return value**2; end; method squareIt(decimal(6,2) value)returns decimal(8,4); Calling System methods run automatically, user defined methods need to be called. May be called as many times as needed

19 Packages Packages are constructs that bundle variables and methods into named objects that are stored and reused by other programs package greeting; ...DS2 variables, statements, methods... endpackage; run; Predefined Packages User Defined Packages Predefined Encapsulate common functionality that is useful to many customer solutions Packages that user creates or someone else creates for reuse

20 Predefined DS2 packages
FCMP Hash HTTP JSON Matrix Supports calls to FCMP functions and subroutines from within the DS2 language Enables to quickly and efficiently: store, search, and retrieve data based on unique lookup keys Constructs an HTTP client to access HTTP web services Enables you to create and parse JSON text. Provides a powerful and flexible matrix programming capability.

21 Packages Topic Description Scope
Names of package methods & vars have global scope within the package Package methods To execute a package method, a program must call it. You instantiate a package, use dot notation to access a method of the package instance method run(); dcl package matrix m(2, 3); dcl double mr mc; mr=m.rows(); mc=m.cols(); put mr=; put mc=; end;

22 Packages Topic Description Preparing packages for use
A package must be compiled and stored before it can be used in a program Overwriting packages DS2 protects existing tables from being overwritten. To overwrite packages use OVERWRITE=YES table option package greeting /overwrite=yes; ...DS2 vars, statements, and methods... endpackage; run;

23 Topic Description Instantiating packages Create instance of package (package instance) and a variable that references the instance (package variable) DECLARE PACKAGE statement simultaneously declares a package variable and constructs package instance dcl package matrix m0() m1(3,3) m2(5, 4); First declaring package vars & then assigning package instance, using _NEW_ operator data _null_; dcl package matrix m1 m2; /* global */ method init(); dcl package matrix m3; /* local */ m1 = _NEW_ [THIS] matrix(3, 2); m3 = _NEW_ matrix(5, 4); m2 = _NEW_ [m1] matrix(); end; ...DS2 methods... enddata;

24 THE HASH PACKAGE Hash and Hash Iterator Hash package in DS2
The Hash package keys and data are variables. Key and data value can be directly assigned constant values, values from table, or values can be computed in expression. The hash package stores and retrieves data based on a unique lookup keys. Depending on number of unique lookup keys and size of the table, the hash package can be significantly faster than a standard format lookup or an array. Hash package in DS2 The hash and hash iterator in DS2 enables users to quickly and efficiently: store and retrieve data replace and remove data search and retrieve data generate a table that contains the data in the hash package based on unique lookup keys

25 Defining and creating a hash package instance
THE HASH PACKAGE Defining and creating a hash package instance Instantiate hash package Before using DS2 hash package, user must define and construct an instance of hash package. Creating Hash package instance To create hash package instance, user must provide: Keys Data and Optional initialization data Hash package instance can be defined either: fully at construction or at construction and through a subsequent series of method calls.

26 Defining and Creating a Hash Package Instance
THE HASH PACKAGE Defining and Creating a Hash Package Instance declare package hash h1([key], [data1 data2 data3],0, 'testdata', '', '', '', 'multidata'); declare package hash h2(); method init(); h2.keys([key]); h2.data([data1 data2 data3]); h2.dataset('testdata'); h2.multidata(); h2.defineDone(); end; In the following example, the hash instances, h1 and h2, have the same instance definition. The hash instance h1 is fully defined at construction while h2 is defined at construction and through a series of method calls

27 Defining a Hash Instance by Using Constructors
THE HASH PACKAGE Defining a Hash Instance by Using Constructors Defining a Hash Instance By Using Constructors A constructor method is used to: instantiate a hash package and initialize the hash package data. There are three methods: Create a partially defined hash instance Create a completely defined hash instance with specified key and data variables Create a completely defined hash instance with specified key variables #1: Create a partially defined hash instance DECLARE PACKAGE HASH instance (hashexp, {‘datasource’|’\{sql-text\}’}, ‘ordered’, ‘duplicate’, ‘suminc’, ‘multidata’); Key and data variables are defined by method calls. Optional parameters can be specified in the DECLARE PACKAGE statement, in the _NEW_ operator, by method calls, or a combination of any of these. A single DEFINEDONE method call completes the definition.

28 Defining a Hash Instance by Using Constructors
THE HASH PACKAGE Defining a Hash Instance by Using Constructors #2: Create a partially defined hash instance DECLARE PACKAGE HASH instance(\[keys\], \[data\][, hashexp, {'datasource' |’\{sql-text\}’}, 'ordered', 'duplicate', 'suminc', 'multidata']); Key and data variables are defined in the DECLARE PACKAGE statement, which indicates that the instance should be created as completely defined. No additional initialization data can be specified with subsequent method calls. #3: Create a completely defined hash instance DECLARE PACKAGE HASH instance(\[keys\] [,hashexp,{'datasource'|'\{sql-text\}‘ },'ordered','duplicate','suminc', 'multidata']); Keys are defined in the DECLARE PACKAGE statement, which indicates that instance should be created as completely defined. No additional initialization data can be specified with subsequent method calls There are no data variables

29 Defining a Hash Instance By Using Method Calls
THE HASH PACKAGE Defining a Hash Instance By Using Method Calls KEYS DEFINEKEY DATA DEFINEDATA DUPLICATE Defines Key variable for a hash package using a variable list Defines key variables for a hash package using implicit variables Specifies the data variables to be stored in the hash package using a variable list Defines data variables for the hash package using implicit variables Determines whether to ignore duplicates when loading into hash package

30 Defining a Hash Instance By Using Method Calls
THE HASH PACKAGE Defining a Hash Instance By Using Method Calls HASHEXP ORDERED MUTIDATA SUMINC DEFINEDONE Defines the hash package’s internal table size. The default size of hash table is 2n Specifies whether or how the data is returned ordered by key-value with a hash iterator package or OUTPUT method Specifies whether multiple data items are allowed for each key Specifies a variable that maintains a summary count of hash package keys. Indicates that all key and data definitions are complete

31 An example of hash instance, h, defined using the method calls
THE HASH PACKAGE An example of hash instance, h, defined using the method calls data _null_; declare package hash h(0, 'testdata'); method init(); h.keys([key]); h.data([data1 data2 data3]); h.ordered('descending'); h.duplicate('error'); h.defineDone(); end; enddata;

32 Defining Key and Data Variables in Hash
THE HASH PACKAGE Defining Key and Data Variables in Hash The hash package uses unique lookup keys to store and retrieve data. The Keys and data variables are used by user to initialize hash package using dot notation method calls. Use the variable methods, DEFINEDATA and DEFINEKEY Use the variable list methods, DATA and KEYS Use key and data variables lists specified in the DECLARE PACKAGE statement /* Keys and data defined using the variable list /* Keys and data defined using using implicit variable methods */ variable list constructors */ method */ declare package hash h([k],[d]); declare package hash h(); h.definekey('account_id'); h.keys([account_id]); h.definedata('total_sales'); h.data([total_sales]); h.definedone(); Key variables must be a DS2 built-in type (character, numeric, or date-time). Data variables can be either a DS2 built-in type or a built-in or user-defined package type.

33 Hash Initialization Data Provisioning
THE HASH PACKAGE Hash Initialization Data Provisioning Hash object needs three inputs – keys, data and initialization parameters. Following optional initialization parameters can be provided to hash package Internal table size (hashexp) where size of hash table is 2n Name of table to load (datasource) or FedSQL query to select data to load from. Whether data is returned in ordered by key-variables or not (ordered option) Whether duplicate key values are ignored or not when loading a table (duplicate option) Name of variable that maintains summary count of hash package keys (suminc) Whether multiple data items are allowed per key value or not (multidata)

34 Performing a table lookup using Hash in DS2
THE HASH PACKAGE Performing a table lookup using Hash in DS2 proc ds2; data emp /overwrite=YES; dcl char(8) ID ; dcl char(3) INITIALS; method init(); dcl integer i; dcl integer j; do i = 1 to 20; ID = put(i, BEST8.);   do j=1 to 3; substr(INITIALS,j)= byte(int(65 + 26*ranuni(0))); end; output; end; enddata; run; quit; Perform data lookup on a smaller table to read and merge data with a larger table on one key values. The required variables from smaller table will be loaded in resultant table only for matching key variables Create two tables – Employee and Salary. Employee table contains an ID variable corresponding to an Employee ID and Employee Initials.

35 Performing a table lookup using Hash in DS2
THE HASH PACKAGE Performing a table lookup using Hash in DS2 proc ds2; data salary(overwrite=yes); dcl char(8) ID; dcl double SALARY ; method init(); dcl integer i; do i = -20 to 20; ID = put(i, BEST8.); SALARY = int(ranuni(0) *10000); output; end; enddata; run; quit; The Salary table is created such that it contains two variables the Employee ID and employee Salary information. The do loop is used to create ID variable and the ranuni function is used to generate random numbers for generating sample salary information

36 Performing a table lookup using Hash in DS2
THE HASH PACKAGE Performing a table lookup using Hash in DS2 proc ds2; data emp_salary(overwrite=yes); declare char(8) ID; declare char(8) INITIALS; declare package hash h(8,'emp'); method init(); rc = h.defineKey('ID'); rc = h.defineData('INITIALS'); rc = h.defineDone(); end; method run(); set salary; if (h.find() = 0) then output; enddata; run; quit; Load EMP table into the hash package Define EMP table ID as key and INITIALS as data Use SET statement to iterate over EMPLOYEE table using keys in SALARY table to match key ID in hash package

37 THE FCMP PACKAGE FCMP Package in DS2 What is FCMP?
FCMP stands for SAS Function Compiler. The SAS function compiler enables users to: create, test and store SAS functions, CALL routines, and subroutines before one can use them in other SAS procedures or DATA steps Use of FCMP procedure allows programmers flexibility to develop complex code and allows reusability. FCMP procedure uses SAS language complier to execute SAS program. Programmers can use functions created in PROC FCMP procedure with: DATA step, WHERE statement, Output Delivery System (ODS) Several SAS procedures

38 THE FCMP PACKAGE FCMP Package in DS2

39 THE FCMP PACKAGE FCMP Package in DS2 FCMP package in DS2
DS2 language supports calls to functions and subroutines that created with FCMP procedure through FCMP package Programmers can create an FCMP package by using the LANGUAGE=FCMP and TABLE= options in a PACKAGE statement. The FCMP package is not supported on the CAS server FCMP package in DS2 FCMP package is a subset of: SAS functions, CALL routines and sub routines stored in a single SAS dataset containing FCMP functions which can be called in other DATA or PROC steps.

40 Construct instance of FCMP Package
THE FCMP PACKAGE Construct instance of FCMP Package #1: Declare package with _NEW_ operator DECLARE PACKAGE fcmp banking; banking = _new_ fcmp(); Creates a package variable and gives you the option to create an instance of the FCMP package When a package is declared, a variable is created that can reference an instance of the package. #2: Declare package with constructor syntax DECLARE PACKAGE fcmp pharma(); If constructor arguments are provided with package variable declaration, then a package instance is constructed and package variable is set to reference the constructed package instance. Package variables are subject to all variable scoping rules

41 Creating an FCMP Package and instantiating in DS2
THE FCMP PACKAGE Creating an FCMP Package and instantiating in DS2 We use PROC FCMP procedure to create an FCMP package ‘fcmparea’ which contains various FCMP functions The package is created in the current directory and can be referenced using the base libname libname base '.'; proc fcmp outlib = base.fcmparea.package1; function square(side); return (side*side); endsub; function rectangle(length,breadth); return (length*breadth); function triangle(base,height); return (0.5*(base*height)); run;

42 Creating an FCMP Package and instantiating in DS2
THE FCMP PACKAGE Creating an FCMP Package and instantiating in DS2 proc ds2; package pkg / overwrite=yes language='fcmp' table='base.fcmparea'; run; quit; We construct an instance of the FCMP package for use in DS2 by defining a DS2 package via which the FCMP functions will be called.

43 Creating an FCMP Package and instantiating in DS2
THE FCMP PACKAGE Creating an FCMP Package and instantiating in DS2 The following DS2 program instantiates and calls the methods defined in the previous PACKAGE statement. We create test data for enumerating how the area will be calculate for sample values of sides created randomly for enumeration purposes. We create two sides and assign values from 1 to 10 and multiple of 2 times 1 to 10 for side1 and side2 respectively. We call the FCMP functions to calculate area of square, rectangle and triangle for each of these sides respectively.

44 Creating an FCMP Package and instantiating in DS2
THE FCMP PACKAGE Creating an FCMP Package and instantiating in DS2 proc ds2; data _null_; dcl package pkg geometry(); dcl double side1 side2 area_square area_rectangle area_triangle; method init(); do side1 = 1 to 10; side2=2*side1; area_square=geometry.square(side1); area_rectangle= geometry.rectangle(side1,side2); area_triangle= geometry.triangle(side1,side2); put side1= area_square=; put side1= side2= area_rectangle=; put side1= side2= area_triangle=; end; enddata; run; quit; side1=1 area_square=1 side1=1 side2=2 area_rectangle=2 side1=1 side2=2 area_triangle=1 side1=2 area_square=4 side1=2 side2=4 area_rectangle=8 side1=2 side2=4 area_triangle=4 side1=3 area_square=9 side1=3 side2=6 area_rectangle=18 side1=3 side2=6 area_triangle=9 … ……. ………. side1=10 area_square=100 side1=10 side2=20 area_rectangle=200 side1=10 side2=20 area_triangle=100 NOTE: PROCEDURE DS2 used (Total time): real time seconds cpu time seconds

45 Creating an FCMP Package and instantiating in DS2
THE FCMP PACKAGE Creating an FCMP Package and instantiating in DS2 Advantages Function or CALL routine makes a program much easier to read, write and modify Centralized since any change to the functionality of the function does not necessarily impact any other processes except the FCMP package. Reusable and allow programmers greater flexibility to define repetitive functions in centralized location. Any program having access to data set where function routine is stored can call the routine. Considerations and Limitations FCMP package does not support VARARGS function calls and cannot use FCMP proc’s VARARGS interface. Errors caused when info is passed between FCMP proc and FCMP package is not always reported correctly. FCMP package assumes session encoding and currently does not allow different encoding. Programmers can access any FCMP library as long as the connection string defines the catalog in which the FCMP library is located. Advantages Function(s) are centralized and needs to be changed only once if an update or enhancement needs to be made to the function. All programmers referring to the appropriate FCMP library will be able to benefit from the change without having to update each individual code, ensuring version control Function or CALL routine makes a program much easier to read, write and modify Functions or CALL routines are independent and not affected by downstream code changes i.e. any change to the functionality of the function does not necessarily impact any other processes except the FCMP package. FCMP functions are reusable and allow programmers greater flexibility to define repetitive functions in centralized location. Any program having access to data set where function routine is stored can call the routine calls. Considerations and Limitations When Using the FCMP Package The FCMP package does not support VARARGS functions calls and therefore cannot use the FCMP procedure’s VARARGS interface. Errors caused when information is passed between the FCMP procedure and the FCMP package are not always reported correctly. For example, if you supply an incorrect table name in the PACKAGE statement, an error is written in the log file. However, there is no indication given that the operation fails. The FCMP package assumes the session encoding and currently has no mechanism that allows different encodings for different parameters within the same function call or for the same parameter across multiple function calls. You can access any FCMP library as long as the connection string defines the catalog in which the FCMP library is located

46 SAS® SQLSTMT PACKAGE SQLSTMT Package in DS2 SAS® SQLSTMT PACKAGE
When an SQLSTMT instance is created, the FedSQL statement is sent to the FedSQL language processor which, in turn, sends the statement to the DBMS to be prepared and stored in the instance. The instance can then be used to efficiently execute the FedSQL statement multiple times. With the delay of the statement prepare until run time, the FedSQL statement can be built and customized dynamically during execution of the DS2 program. SAS® SQLSTMT PACKAGE SAS® SQLSTMT package provides a way to pass FedSQL statements to a DBMS for execution and to access the resulting output set returned by the DBMS The FedSQL statements can : create, modify or delete tables

47 SAS® SQLSTMT PACKAGE SAS® FEDSQL SAS® FEDSQL
SAS when possible optimizes FedSQL queries in the background with large multithreaded algorithms to resolve large scale operations. FedSQL provides programmers with a vendor-neutral SQL dialect thereby allowing programmers to submit queries without having to worry about syntax specific to the various DBMS sources SAS® FEDSQL SAS® FedSQL is SAS propriety implementation of ANSI SQL:1999 core standard. FedSQL provides a scalable, threaded, high performance way to: access, manage and share relational data in multiple data sources.

48 FedSQL programs can be executed in multiple ways
48 SAS® SQLSTMT PACKAGE FedSQL programs can be executed in multiple ways From a JDBC, ODBC or OLE DB client by using SAS Federation Server From SPD Server using SQL pass- through facility or ODBC or JDBC client. FedSQL procedure using Base SAS From a SAS DS2 wrapper program From Base SAS interface by using PROC SQL pass- through facility. From Base SAS interface by using SAS Federation Server LIBNAME

49 Advantages of using FedSQL
SAS® SQLSTMT PACKAGE Advantages of using FedSQL FedSQL conforms to the ANSI SQL:1999 core standard FedSQL supports many additional datatypes than the previous SAS SQL implementations FedSQL handles federated queries. A federated query is one that accesses data from multiple data sources and returns a single result. The FedSQL language can create data in any of the supported data sources Advantages of using FedSQL FedSQL provides programmers with multiple advantages especially if one is interacting with the other DBMS sources outside of SAS® PROC SQL procedure. See below for the list of advantages of using FedSQL: FedSQL conforms to the ANSI SQL:1999 core standard, thereby allowing it to process queries using the standard syntax which is same across other standard DMBS sources which conform to the same standard. FedSQL supports many additional datatypes than the previous SAS SQL implementations which was earlier limited to only two datatypes the SAS character and the SAS numeric thereby allowing greater precision during data transfer between external databases and traditional data sources access through SAS®/ACCESS. FedSQL connects to data sources and translates the target data source definitions to the appropriate data types within FedSQL thereby allowing much greater precision during calculations. FedSQL handles federated queries. A federated query is one that accesses data from multiple data sources and returns a single result. As compared with a traditional DATA step or SQL procedure, a SAS®/ACCESS LIBNAME engine can access only the data for its intended data source. The FedSQL language can create data in any of the supported data sources, even if the target data source is not represented in any query. This enables users to store data in the data source that most closely meets the needs of one’s own application.

50 SAS® SQLSTMT PACKAGE Federated Queries
Federated query accesses data from: multiple data sources, possibly even from various DBMS sources and/or a combination of SAS datasets and DBMS sources and returns a single result set. Data remains stored in the data source. E.g. In query below data is requested from Oracle data source and SAS dataset libname mydata base 'U:\Personal\SAS\SASGF\Data\'; libname myoracle oracle path=ora11g user=xxx pwd=xxx schema=xxx; proc fedsql; create table myoracle.customer_sales as select * from mydata.customer where exists (select * from myoracle.sales where product.prodid = sales.prodid ); quit;

51 SAS® SQLSTMT PACKAGE SQLSTMT package in DS2 SQLSTMT package in DS2
SQLSTMT package passes FedSQL statements to DBMS for execution and accesses results returned by DBMS. If FedSQL statements selects rows from a data source, SQLSTMT package provides methods for interrogating the rows returned in a result set. When SQLSTMT instance is created, the FedSQL statement is sent to FedSQL language processor which in turn sends the statement to DBMS to be prepared and stored in the instance. The instance can then be used to efficiently execute the FedSQL statement multiple times. FedSQL statement can be built and customized during execution of the DS2 program.

52 SAS® SQLSTMT PACKAGE SQLSTMT package in DS2
For enumeration purposes, consider a sample SAS dataset ‘Customer’ containing variables Identifier (ID), and quarterly sales information – q1, q2, q3 and q3. libname mydata base "U:\SASGF2018\Data\"; data mydata.customer; format ID 8. q1 q2 q3 q4 8.; do i = 1 to 10; ID = put(i, BEST8.); q1 = int(ranuni(0)*10000); q2 = int(ranuni(0)*10000); q3 = int(ranuni(0)*10000); q4 = int(ranuni(0)*10000);   output; end; drop i ; run;

53 SAS® SQLSTMT PACKAGE SQLSTMT package in DS2
E.g. below uses SQLSTMT and inserts few additional customers in the original dataset and updates sales information for same. proc ds2; data _null_; dcl double x; dcl double y; dcl double z; dcl double w; dcl double u; dcl package sqlstmt s('insert into mydata.customer (ID,q1,q2,q3,q4) values (?,?,?,?,?) ',[x y z w u]); method init(); do i=11 to 15; x = put(i, BEST8.); y = int(ranuni(0)*10000); z = int(ranuni(0)*10000); w = int(ranuni(0)*10000); u = int(ranuni(0)*10000); s.execute(); end; enddata; run;quit;

54 Declaring and Instantiating an SQLSTMT Package
SAS® SQLSTMT PACKAGE Declaring and Instantiating an SQLSTMT Package Construct an instance of SQLSTMT package Programmers can use DECLARE PACKAGE statement to declare the SQLSTMT package Programmers can create a variable to reference the instance of the package during package declaration dcl package sqlstmt s ( 'insert into mydata.customer (ID, q1, q2, q3, q4) values (? , ? , ? , ? , ?)' ,[x y z w u] ); In e.g. below, ‘s’ is the variable used to reference the constructed package instance.

55 Declaring and Instantiating an SQLSTMT Package
SAS® SQLSTMT PACKAGE Declaring and Instantiating an SQLSTMT Package #1: Using constructor syntax DECLARE PACKAGE SQLSTMT variable [(‘sql-txt’ [,\[parameter-variable-list\] ) ]; DECLARE PACKAGE SQLSTMT variable [(‘sql-txt’ [, connection-string])]; Shown above are two syntax forms for instantiating a package using DECLARE PACKAGE statement along with its constructor syntax #2: Using _NEW_ operator DECLARE PACKAGE SQLSTMT variable; variable = _NEW_ SQLSTMT (‘sql-txt’ [,\[parameter-variable-list\]); variable = _NEW_ SQLSTMT (‘sql-txt’ [, connection-string]); DECLARE PACKAGE statement does not construct SQLSTMT package instance until the _NEW_ operator is executed. The SQL statement prepare does not occur until the _NEW_ operator is executed.

56 Declaring and Instantiating an SQLSTMT Package
SAS® SQLSTMT PACKAGE Declaring and Instantiating an SQLSTMT Package #3: Without using SQL text DECLARE PACKAGE SQLSTMT variable(); variable = _NEW_ SQLSTMT (); With _NEW_ operator, the sql-text can be: a string value that is generated from an expression or a string value that is stored in a variable. DECLARE statement includes arguments for construction within its parentheses and omitting arguments is valid for the SQLSTMT package

57 Invoke DS2 package method using FEDSQL
SAS® SQLSTMT PACKAGE Invoke DS2 package method using FEDSQL FedSQL language supports ability to invoke user-defined DS2 package methods as functions in the SELECT statement. Allows programmers to invoke SAS user-defined functions while reading from DBMS data sources other than SAS i.e. Oracle, DB2 etc. proc ds2; package adder / overwrite =yes; method add( double x, double y ) returns double; return x + y; end; endpackage; data numbers / overwrite = yes; dcl double x y; method init(); dcl int i; do i = 1 to 10; x = i; y = i * i; output; enddata; run; quit;

58 Threaded processing in DS2
Parallel Processing in DS2 DS2 code runs sequentially i.e. one process completes before next process begins. Threaded processing allows running multiple processes concurrently Runs as a program Runs as a thread Input data can include rows from DB tables and DS2 threads. Output data can be DB tables or rows returned to client application Input data can include only rows from DB tables, not other threads Output data includes rows returned to the DS2 program that started thread

59 Threaded processing in DS2
Parallel Processing in DS2 DS2 code runs sequentially i.e. one process completes before next process begins. Threaded processing allows running multiple processes concurrently Runs as a program Runs as a thread Input data can include rows from DB tables and DS2 threads. Output data can be DB tables or rows returned to client application Input data can include only rows from DB tables, not other threads Output data includes rows returned to the DS2 program that started thread

60 Threaded processing in DS2
Steps to run DS2 code in threads: Create threads by enclosing DS2 code between THREAD ... END THREAD statements. Create one or more instances of thread in a DS2 program by using a DECLARE THREAD statement. Execute the thread or threads by using a SET FROM statement proc ds2 ; thread t /overwrite=yes; dcl int x; method init(); dcl int i; do i = 1 to 3; x = i*i; output; end; endthread; run;quit; proc ds2; data _null_; dcl thread t t_instance; method run(); set from t_instance threads=2; put 'x= ' x ; enddata; 1 2 3

61 61 Threaded processing A thread ‘T’ is created by using THREAD statement An instance of ‘T’ is declared, and two threads are executed using SET FROM statement in RUN method. Each of the two threads generates three rows for x for a total of six rows in the output table.

62 Conclusion In conclusion, we have observed that DS2 is a very powerful language. The DS2 language shares core features with the DATA step. However, capabilities of DS2 extend far beyond those of the DATA step. PROC DS2 Building blocks Store and reuse Parallel Processing Common standard METHODS FedSQL PACKAGES THREADED PRCESSING Fig. DS2 Overview

63 Conclusion Support for different data types allowing for greater precision in data processing Threaded application processing resulting in faster processing speeds in on a machine with multiple cores as well as within MPP DB(s). Offers support for in-database processing at the disposal of an application developer and Accepts embedded FedSQL which allows users to connect to multiple tables within disparate databases within a single query and extract data for processing.

64 Your feedback counts! Don't forget to complete the session survey in your conference mobile app.  Go to the Agenda icon in the conference app. Find this session title and select it.  On the sessions page, scroll down to Surveys and select the name of the survey.  Complete the survey and click Finish. 

65 APPENDIX

66 Defining a Hash Instance By Using Method Calls
THE HASH PACKAGE Defining a Hash Instance By Using Method Calls KEYS – Defines Key variable for a hash package using a variable list DEFINEKEY- Defines key variables for a hash package using implicit variables DATA – Specifies the data variables to the stored in the hash package using a variable list DEFINEDATA – Defines data variables for the hash package using implicit variables DUPLICATE – Determines whether to ignore duplicates when loading into hash package. HASHEXP – Defines the hash package’s internal table size. The size of hash table is 2n ORDERED – Specifies whether or how the data is returned ordered by key-value with a hash iterator package or OUTPUT method MUTIDATA - Specifies whether multiple data items are allowed for each key SUMINC - Specifies a variable that maintains a summary count of hash package keys. DEFINEDONE – Indicates that all key and data definitions are complete

67 Contact Information Name: Viraj Kumbhakarna Company: MUFG Union Bank City/State: San Francisco, CA Phone: com


Download ppt "Advanced Programming Techniques Using the DS2 Procedure"

Similar presentations


Ads by Google