1 © 2009 Fiserv. All Rights Reserved. Optimizing APL Matrix Indexing for Application Programmers Eugene Ying Software Development Aug 8, 2011.

Slides:

Advertisements

Similar presentations

Introduction to C Programming

Advertisements

Tuning: overview Rewrite SQL (Leccotech)Leccotech Create Index Redefine Main memory structures (SGA in Oracle) Change the Block Size Materialized Views,

Introduction to arrays

CA 714CA Midterm Review. C5 Cache Optimization Reduce miss penalty –Hardware and software Reduce miss rate –Hardware and software Reduce hit time –Hardware.

Understanding the Need for Sorting Records

Lecture 34: Chapter 5 Today’s topic –Virtual Memories 1.

Instructor Notes This lecture discusses three important optimizations The performance impact of mapping threads to data on the GPU is subtle but extremely.

CMPT 225 Sorting Algorithms Algorithm Analysis: Big O Notation.

CSC 4250 Computer Architectures December 8, 2006 Chapter 5. Memory Hierarchy.

Chapter 1 Computing Tools Data Representation, Accuracy and Precision Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction.

Instructor: Craig Duckett CASE, ORDER BY, GROUP BY, HAVING, Subqueries

1 Advanced Database Technology February 12, 2004 DATA STORAGE (Lecture based on [GUW ], [Sanders03, ], and [MaheshwariZeh03, ])

IELM 230: File Storage and Indexes Agenda: - Physical storage of data in Relational DB’s - Indexes and other means to speed Data access - Defining indexes.

Advanced Topics Object-Oriented Programming Using C++ Second Edition 13.

Stanford University CS243 Winter 2006 Wei Li 1 Loop Transformations and Locality.

1 Overview of Storage and Indexing Chapter 8 (part 1)

Lecture 4 Sept 8 Complete Chapter 3 exercises Chapter 4.

Ceng Operating Systems

 2006 Pearson Education, Inc. All rights reserved Arrays.

Lecture 7 Sept 19, 11 Goals: two-dimensional arrays (continued) matrix operations circuit analysis using Matlab image processing – simple examples Chapter.

B + -Trees (Part 1) COMP171. Slide 2 Main and secondary memories  Secondary storage device is much, much slower than the main RAM  Pages and blocks.

Lecture 6 Sept 15, 09 Goals: two-dimensional arrays matrix operations circuit analysis using Matlab image processing – simple examples.

Virtual Memory Deung young, Moon ELEC 5200/6200 Computer Architecture and Design Lectured by Dr. V. Agrawal Lectured by Dr. V.

Concatenation MATLAB lets you construct a new vector by concatenating other vectors: – A = [B C D... X Y Z] where the individual items in the brackets.

Chapter 9 Introduction to Arrays

Programming Logic and Design Fourth Edition, Comprehensive

 2006 Pearson Education, Inc. All rights reserved Arrays and Vectors.

Cmpt-225 Simulation. Application: Simulation Simulation  A technique for modeling the behavior of both natural and human-made systems  Goal Generate.

Introduction to Database Systems 1 The Storage Hierarchy and Magnetic Disks Storage Technology: Topic 1.

Virtual Memory By: Dinouje Fahih. Definition of Virtual Memory Virtual memory is a concept that, allows a computer and its operating system, to use a.

Data Structures Introduction Phil Tayco Slide version 1.0 Jan 26, 2015.

CHP - 9 File Structures. INTRODUCTION In some of the previous chapters, we have discussed representations of and operations on data structures. These.

CSCI 4717/5717 Computer Architecture

Invitation to Computer Science 5th Edition

 2006 Pearson Education, Inc. All rights reserved Arrays.

Microprocessor-based systems Curse 7 Memory hierarchies.

Data Structures in Java for Matrix Computations Geir Gundersen Department of Informatics University of Bergen Norway Joint work with Trond Steihaug.

Comp 335 – File Structures Why File Structures?. Goal of the Class To develop an understanding of the file I/O process. Software must be able to interact.

1 © 2002, Cisco Systems, Inc. All rights reserved. Arrays Chapter 7.

OSes: 11. FS Impl. 1 Operating Systems v Objectives –discuss file storage and access on secondary storage (a hard disk) Certificate Program in Software.

External Storage Primary Storage : Main Memory (RAM). Secondary Storage: Peripheral Devices –Disk Drives –Tape Drives Secondary storage is CHEAP. Secondary.

Virtual Memory By Steven LaBarbera. What I plan to cover  The history of virtual memory  How virtual memory has evolved  How virtual memory is used.

Arrays Arrays in C++ An array is a data structure which allows a collective name to be given to a group of elements which all have.

Chapter 8 External Storage. Primary vs. Secondary Storage Primary storage: Main memory (RAM) Secondary Storage: Peripheral devices  Disk drives  Tape.

Programming Logic and Design Fourth Edition, Comprehensive Chapter 8 Arrays.

Addressing Modes Chapter 6 S. Dandamudi To be used with S. Dandamudi, “Introduction to Assembly Language Programming,” Second Edition, Springer,

CMSC 202 Arrays 2 nd Lecture. Aug 6, Array Parameters Both array indexed variables and entire arrays can be used as arguments to methods –An indexed.

Multilevel Caches Microprocessors are getting faster and including a small high speed cache on the same chip.

Basics Copyright © Software Carpentry 2011 This work is licensed under the Creative Commons Attribution License See

 2008 Pearson Education, Inc. All rights reserved. 1 Arrays and Vectors.

Arrays Version 1.1. Topics Tables of Data Arrays – Single Dimensional Parsing a String into Multiple Tokens Arrays - Multi-dimensional.

Image Processing A Study in Pixel Averaging Building a Resolution Pyramid With Parallel Computing Denise Runnels and Farnaz Zand.

Math 252: Math Modeling Eli Goldwyn Introduction to MATLAB.

Chapter 7 Memory Management Eighth Edition William Stallings Operating Systems: Internals and Design Principles.

Programming Logic and Design Fifth Edition, Comprehensive Chapter 6 Arrays.

 2008 Pearson Education, Inc. All rights reserved JavaScript: Arrays.

 1 More Mathematics: Finding Minimum. Numerical Optimization Find the minimum of If a given function is continuous and differentiable, find the root.

Chapter 1 Computing Tools Variables, Scalars, and Arrays Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

1 Components of the Virtual Memory System  Arrows indicate what happens on a lw virtual address data physical address TLB page table memory cache disk.

Matrix Multiplication in Hadoop

Computer System Structures Storage

CHP - 9 File Structures.

Instructor: Craig Duckett Lecture 09: Tuesday, April 25th, 2017

Other Kinds of Arrays Chapter 11

Data Types and Data Structures

File Storage and Indexing

File Storage and Indexing

Presentation transcript:

1 © 2009 Fiserv. All Rights Reserved. Optimizing APL Matrix Indexing for Application Programmers Eugene Ying Software Development Aug 8, 2011

2 © 2009 Fiserv. All Rights Reserved. Introduction In the business world where APL is being used to perform large scale data processing, data matrices tend to be very large, with millions of rows and thousands of columns of data. When a very large matrix is manipulated thousands of times inside a loop, optimizing matrix indexing is important in reducing the function’s run time. There are many cases where an application program’s run time can be significantly reduced when data indexing has been optimized. This paper talks about several techniques that can be used to optimize APL matrix indexing. The CPU time shown in the examples were based on Dyalog AIX V bit Classic APL running on the IBM pSeries server.

3 © 2009 Fiserv. All Rights Reserved. Indexing Optimization Techniques The major topics to be discussed are: Matrix Column Data Fragmentation Progressive Indexing Index Manipulation Grade Up and Grade Down Indices

4 © 2009 Fiserv. All Rights Reserved. Multidimenstional Array Storage According to Wikipedia, “For storing multidimensional arrays in linear memory, row-major order is used in C; column-major order is used in Fortran and MATLAB.” We notice that APL also stores array items in row-major order.

5 © 2009 Fiserv. All Rights Reserved. Matrix Column Data Fragmentation 1;11;21;31;4 2;12;22;32;4 3;13;23;33;4 4;14;24;34;4 1;1 1;2 1;3 1;4 2;1 2;2 2;3 2;4 3;1 3;2 3;3 3;4 4;1 4;2 4;3 4;4 Matrix[3;] Matrix[;3] Matrix

6 © 2009 Fiserv. All Rights Reserved. A Wider Matrix 1;11;21;31;41;51;61;71;8 2;12;22;32;42;52;62;72;8 3;13;23;33;43;53;63;73;8 4;14;24;34;44;54;64;74;8 1;1 1;2 1;3 1;4 1;5 1;6 1;7 1;8 2;1 2;2 2;3 2;4 2;5 2;6 2;7 2;8 3;1 3;2 3;3 3;4 3;5 3;6 3;7 3;8 4;1 4;2 4;3 4;4 4;5 4;6 4;7 4;8 Matrix[;3] Matrix[3;] The wider the matrix, the more fragmented the data columns. Matrix

7 © 2009 Fiserv. All Rights Reserved. Virtual Memory CACHE RAM Disk Swap L1 L2 L3 Seek Time (tracks) Rotational Latency (sectors) Transfer Time For a very large matrix An APL data row is concentrated in the cache, or a few pages of RAM, or perhaps a few sectors of disk. An APL data column is scattered among a few locations in the cache, and many pages of RAM, and perhaps many sectors or tracks of disk.

8 © 2009 Fiserv. All Rights Reserved. AIX APL Run Time N»10000 A»(N,N)³ÈAV ½ 1000 by 1000 matrix :For I :In Û10000 J»N?N K»?N L»ÈAV[?256] :EndFor A[J;K]»L ½ Random items J of column K A[K;J]»L ½ Random items J of row K 9492 CPU ms 725 CPU ms For a 1,000 by 1,000 character matrix, random row access is on the average more than 10 times faster than random column access.

9 © 2009 Fiserv. All Rights Reserved. Another Column Data Fragmentation Example Y» ³0.1+Û1000 :For I :In Û1000 :EndFor {}+/+/Y ½ Sum matrix columns first {}+/+¾Y ½ Sum matrix rows first {}+/,Y ½ Sum vector CPU ms 8178 CPU ms 6445 CPU ms

10 © 2009 Fiserv. All Rights Reserved. Platform & Version Dependency Although these observations on CPU times are generally true on all platforms, the exact performance varies depending on the hardware configuration, especially the quantity and quality of cache relative to the size of the arrays being manipulated. The performance might vary in the future versions of APL.

11 © 2009 Fiserv. All Rights Reserved. Matrix Organization Suggestion When you design a large data matrix for an APL application, always ask this question. “Will I be accessing the data columns more frequently than accessing the data rows?” If the answer is yes, then you should consider redesigning your data matrix such that it is in the transposed format so that you will access consecutive memory more frequently. If some of your virtual matrix data are on a disk, consecutive disk sectors can greatly speed up your program.

12 © 2009 Fiserv. All Rights Reserved. Marketing Data Example We construct the following large random data matrix for testing. ID»Û CTY»(( ³Û80),10000³85)[ ? ] EMP»( ³(5³1000),(10³500),(20³200),50³100)[ ? ] IND»( ³Û9999)[ ? ] REV»( ¡Û )[ ? ] DAT»100þ[2]ID,CTY,EMP,IND,[1.1]REV Assuming we have the data of a million customers or prospects, and each customer has 100 data attributes, the first 5 of which are customer ID, country code, # of employees, industry code, and revenue.

13 © 2009 Fiserv. All Rights Reserved. A Marketing Data Example Company ID Country Code # of Employees Industry Code Revenue……… 1 million rows of customers

14 © 2009 Fiserv. All Rights Reserved. Marketing Data Selection Example To select companies with more 1000 employees in the manufacturing industry in country 85, most APL programmers would use a one-liner Boolean logic to get the data indices. J»((DAT[;3]·1000)^(DAT[;4]µMFG)^DAT[;2]=85)/Û1þ³DAT ½ ms CPU MFG» :For I :In Û100 :EndFor

15 © 2009 Fiserv. All Rights Reserved. Marketing Transposed Data Example Company ID Country Code # of Employees Industry Code Revenue … … … Let us transpose the data matrix so that instead of 1 million rows of data for the 1 million customers, we have 1 million columns of data for the 1 million customers. 1 million columns of customers

16 © 2009 Fiserv. All Rights Reserved. Marketing Data Selection Example Data Organized by Columns MFG» :For I :In Û100 J»((DAT[;3]·1000)^(DAT[;4]µMFG)^DAT[;2]=85)/Û1þ³DAT ½ 28,733 ms CPU :EndFor Data Organized by Rows :For I :In Û100 J»((DAT[3;]·1000)^(DAT[4;]µMFG)^DAT[2;]=85)/Û1þ³DAT ½ 12,227 ms CPU :EndFor

17 © 2009 Fiserv. All Rights Reserved. Progressive Indexing Data Organized by Rows :For I :In Û100 :EndFor J»(DAT[3;]·1000)/ÛÚ1þ³DAT J»(DAT[4;J]µMFG)/J J»(DAT[2;J]=85)/J In multi-line progressive indexing, Expression 2: DAT[4;J]µMFG works with fewer items than expression 1: DAT[3;]·1000, Expression 3: DAT[2;J]=85 works with fewer items than expression 2: DAT[4;J]µMFG. ½ 2,393 ms CPU ½ 612 ms CPU ½ 5 ms CPU ½ 3,000 ms CPU Total

18 © 2009 Fiserv. All Rights Reserved. Progressive Indexing Optimization Suppose we know that there are not too many country 85 records in the data matrix, and there are hardly any big companies in this country. A better filtering sequence would be: :For I :In Û100 :ENDFOR ½ 1782 ms CPU ½ 9 ms CPU ½ 4 ms CPU ½ 1795 ms CPU total J»(DAT[2;]=85)/ÛÚ1þ³DAT J»(DAT[3;J]·1000)/J J»(DAT[4;J]µMFG)/J

19 © 2009 Fiserv. All Rights Reserved. Marketing Data Selection Speed Comparisons :For I :In Û100½ Column Data J»((DAT[;3]·1000)^(DAT[;4]µMFG)^DAT[;2]=85)/Û1þ³DAT :EndFor ½ 28,733 ms :For I :In Û100½ Row Data J»((DAT[3;]·1000)^(DAT[4;]µMFG)^DAT[2;]=85)/Û1þ³DAT ½ 12,227 ms :EndFor J»(DAT[3;]·1000)/ÛÚ1þ³DAT ½ 2,393 ms J»(DAT[4;J]µMFG)/J ½ 612 ms J»(DAT[2;J]=85)/J ½ 5 ms ½ 3,000 ms Total J»(DAT[2;]=85)/ÛÚ1þ³DAT ½ 1782 ms J»(DAT[3;J]·1000)/J ½ 9 ms J»(DAT[4;J]µMFG)/J ½ 4 ms ½ 1,795 ms total

20 © 2009 Fiserv. All Rights Reserved. Row Major Progressive Indexing With proper arrangement of the Boolean statements such that most of the unwanted data are filtered out by the first and second statements, for very large matrices, progressive indexing can be many times faster than the single Boolean statement in performing data selection. In the previous example, we see the function is 16 times faster when progressive indexing is performed on row- major data.

21 © 2009 Fiserv. All Rights Reserved. Index Manipulaton Y» ³ÈA T»1000³1 0 0 :For I :In Û {} {} :EndFor 4ÕY[;68+Û8] Y[;4Õ68+Û8] 0 1ÿY[;Û7] Y[;1ÿÛ7] T¾Y[;Û6] Y[T/Û1þ³Y;Û6] ½ 6006 ms ½ 2110 ms ½ 4448 ms ½ 2049 ms ½ 2330 ms ½ 982 ms It is usually more efficient to manipulate a matrix inside the square brackets [ ] than outside the [ ]. Y[;1 3],Y[;2 4] Y[; ] ½ 6359 ms ½ 1561 ms

22 © 2009 Fiserv. All Rights Reserved. Grade Up & Grade Down Index R»X ®SUM Y;J;P [1] X»X[ÂX[;Y];] ½ Sort X on columns Y [2] P»{1=³³²:1,(1ÿ²)°Ú1ÿ² ½ Unique vector items [3] 1,ß/(1ÿ[1]²)°Ú1ÿ[1]²}X[;Y] ½ Unique matrix rows [4] R»P¾X ½ Rows with unique columns Y [5]... can be rewritten as R»X ®SUM2 Y;I;J;P [1] I»ÂX[;Y] ½ Sort X on column Y [2] P»{1=³³²:1,(1ÿ²)°Ú1ÿ² ½ Unique vector items [3] 1,ß/(1ÿ[1]²)°Ú1ÿ[1]²}X[I;Y] ½ Unique matrix rows [4] R»X[P/I;] ½ Rows with unique columns Y [5]... It is usually more efficient to manipulate the grade up index or the grade down index than to manipulate the sorted matrix.

23 © 2009 Fiserv. All Rights Reserved. Speed and Space Comparisons M»(10000³Û100), ³Û :For I :In Û10000 {}M ®SUM 1 {}M ®SUM2 1 :EndFor ½ ms ½ 4542 ms ÈWA {}DATA ®SUM 1 WS FULL ®SUM[1] X»X[ÂX[;Y];] ^ ÈWA ½ Much less storage requirement {}DATA ®SUM2 1 ½ No WS FULL

24 © 2009 Fiserv. All Rights Reserved. Array Dimensions How many columns are there in matrix M? should be coded as Ú1þ³M³M[1;] How many rows are there in matrix M? ³M[;1] should be coded as 1þ³M How many planes and columns are there in the 3-dimensional array A? should be coded as (³A)[1 3]³A[;1;]

25 © 2009 Fiserv. All Rights Reserved. Conclusion When you perform your data selection on a large matrix using carefully constructed progressive indexing instead of a simple APL one-liner Boolean logic, the run time can be reduced significantly. If the data to be progressively indexed are arranged in rows instead of in columns, the run time can be reduced even more. In optimizing matrix indexing, we need to pay special attention to the ones that are in the innermost loop of a function. Optimizing the matrix indexing deep inside an intensive loop would give you the maximum benefits.