Presentation is loading. Please wait.

Presentation is loading. Please wait.

January 17, 2001Xiaohui Shen1 Data Management, Storage and Access Optimization in High Performance Distributed Environment Xiaohui Shen Department of Electrical.

Similar presentations


Presentation on theme: "January 17, 2001Xiaohui Shen1 Data Management, Storage and Access Optimization in High Performance Distributed Environment Xiaohui Shen Department of Electrical."— Presentation transcript:

1 January 17, 2001Xiaohui Shen1 Data Management, Storage and Access Optimization in High Performance Distributed Environment Xiaohui Shen Department of Electrical and Computer Engineering Northwestern University Jan 17, 2001

2 January 17, 2001Xiaohui Shen2 Outline Problem Definition Solutions Meta-data Management System Remote Storage Access Optimizations Multi-Storage I/O System Distributed Parallel File System I/O performance prediction and evaluation Integrated working environment

3 January 17, 2001Xiaohui Shen3 Motivation

4 January 17, 2001Xiaohui Shen4 Current Solutions Parallel File System and runtime libraries: smart I/O optimizations, caching, prefetching, parallel I/O User interfaces are low-level No portable Hard-coded I/O selection is difficult for runtime systems Database Systems: high-level, easy-to-use, portable lack of power I/O optimizations

5 January 17, 2001Xiaohui Shen5 System Architecture

6 January 17, 2001Xiaohui Shen6 Tasks Meta-data Management System Remote Storage Access Optimizations Efficient Storage Organization Multi-Storage I/O System Distributed Parallel File System I/O performance prediction and evaluation Integrated working environment

7 January 17, 2001Xiaohui Shen7 Part 1: Meta-data Management System (MDMS) Abstract Storage Devices (ASDs) Storage patterns & access patterns Access History and trail of navigation

8 January 17, 2001Xiaohui Shen8 MDMS Tables

9 January 17, 2001Xiaohui Shen9 MDMS Internal Representation

10 January 17, 2001Xiaohui Shen10 MDMS I/O Flow (API)

11 January 17, 2001Xiaohui Shen11 Optimizations inside MDMS

12 January 17, 2001Xiaohui Shen12 Part 2: Remote Storage Access Optimization for HSS Secondary Storage Access techniques: collective-I/O, data sieving, caching, prefetching etc Tertiary Storage Systems directly interacts with applications Remote environment

13 January 17, 2001Xiaohui Shen13 Optimizations Remote Collective I/O Remote Data sieving Asynchronous I/O Subfile Superfile Migration, Stage and Purge, SRB Container

14 January 17, 2001Xiaohui Shen14 Optimization: Subfile

15 January 17, 2001Xiaohui Shen15 Optimization: Superfile Create: One large file Access: first access brings the whole large file into memory, subsequent accesses can be directly serviced from memory

16 January 17, 2001Xiaohui Shen16 Other Optimizations Migration Stage Purge SRB Container

17 January 17, 2001Xiaohui Shen17 Part 3: MS-I/O: A Multi-storage I/O System Further performance improvement is limited by the nature of storage media. The problem is rooted in the traditional Single-storage resource architecutre.

18 January 17, 2001Xiaohui Shen18 Solution: Multi-storage Resource Architecture Increases logical storage capacity Provides a more flexible and reliable computing environment Provides new opportunities for further performance improvement

19 January 17, 2001Xiaohui Shen19 Multi-storage Resource Architecture

20 January 17, 2001Xiaohui Shen20 Experimental Environment Local Postgres Database Local Disks Remote Disks Remote Tapes Compute resource: Argonne SP2

21 January 17, 2001Xiaohui Shen21 Multi-storage I/O System

22 January 17, 2001Xiaohui Shen22 Database Tables and I/O Routines Run table Dataset table Access pattern table Storage pattern table Execution table

23 January 17, 2001Xiaohui Shen23 User Access Pattern (write)

24 January 17, 2001Xiaohui Shen24 User Access Pattern (read)

25 January 17, 2001Xiaohui Shen25 Optimization decision Flow

26 January 17, 2001Xiaohui Shen26 Applications and Tools

27 January 17, 2001Xiaohui Shen27 Experimental Environment Applications: IBM SP2 at Argonne Multiple Storage Resources: Local Disks: Argonne SP2 Remote Disks: SDSC Remote Tapes: SDSC HPSS Local Database: Postgres at NWU

28 January 17, 2001Xiaohui Shen28 MS-I/O Experiments:Data Analysis on Astrophysics data No access pattern then Remote Tape DataPartition=‘BBB’ then Remote Tape + Colletive I/O WhenUse=‘soon’ & Size =‘ medium’ then Remote Disk Plus DataPartion=‘BBB” then Remote Disk + Collective I/O Plus UseFrequency=‘frequent’ then Local Disk Plus DataPartion=‘BBB” then Local Disk + Collective I/O

29 January 17, 2001Xiaohui Shen29 MS-I/O Experiments: Volume Rendering No Access Pattern then Remote Tape ComputeTime=‘large’ then Remote Tape + Asyn- I/O WhenUse=‘soon’ & Size =‘ medium’ then Remote Disk Plus ComputeTime=‘large’ then Remote Disk + Asyn - I/O Plus UseFrequency=‘frequent’ then Local Disk Plus ComputeTime=‘large’ then Local Disk + Asyn - I/O

30 January 17, 2001Xiaohui Shen30 MS-I/O Experiments: Subfile and Superfile WriteSize=‘huge’ & FutureReadSize = ‘partial’ WriteSize=‘small’ & WriteSequence=‘y’ & FutureReadSequence=‘y’

31 January 17, 2001Xiaohui Shen31 MS-I/O Experiments: Replication and Access History Dataset was first placed at Remote site Read.UseFrequency =‘frequent’ Dataset being frequently used is detected.

32 January 17, 2001Xiaohui Shen32 Part 4: DPFS: A Distributed Parallel File System Collect idle distributed storage as supplement to native storage of parallel computing systems Characteristics Distributed Parallel File System Database

33 January 17, 2001Xiaohui Shen33 System Architecture of DPFS

34 January 17, 2001Xiaohui Shen34 Software Architecture of DPFS Parallelism Concurrency

35 January 17, 2001Xiaohui Shen35 DPFS BSU and File view A Basic Striping Unit (BSU) is called brick in DPFS. Size is 64K.

36 January 17, 2001Xiaohui Shen36 Striping Methods Lineal Striping Multi-dimensional Striping Array Striping

37 January 17, 2001Xiaohui Shen37 Lineal Striping

38 January 17, 2001Xiaohui Shen38 Problems of Linear Striping

39 January 17, 2001Xiaohui Shen39 Multi-dimensional Striping

40 January 17, 2001Xiaohui Shen40 Array Striping

41 January 17, 2001Xiaohui Shen41 Striping Algorithms Round - Robin Greedy Algorithm

42 January 17, 2001Xiaohui Shen42 Request Combination P0: 0-7 P1:8-15 P2:16-23 P3:24-31 P0(0,4) P1(9,13) P2(18,22) P3(27,31) P0(1,5) P1(10,14) P2(19,23) P3(24,28)...

43 January 17, 2001Xiaohui Shen43 Meta-data and Database

44 January 17, 2001Xiaohui Shen44 Tree Structure

45 January 17, 2001Xiaohui Shen45 Application Programming Interface DPFS-Open () DPFS-Write () DPFS-Read () DPFS-Close ()

46 January 17, 2001Xiaohui Shen46 User Interface File system commands: cp, mkdir, rm, ls etc File transfer between DPFS and general sequential file system. Example: cp local:my.data DPFS:/home/xhshen:4:greedy

47 January 17, 2001Xiaohui Shen47 Experimental Environment Compute Resource: Argonne IBM SP2 Storage Resources: Class 1: Argonne Linux machines (Fast Ethernet and ATM) Class 2: NWU Workstations (155M ATM) Class 3: NWU Workstations (10 M Eithernet)

48 January 17, 2001Xiaohui Shen48 DPFS Performance Numbers: File Level Comparison

49 January 17, 2001Xiaohui Shen49 DPFS Performance Numbers: Striping Algorithm Comparison

50 January 17, 2001Xiaohui Shen50 Part 5: I/O Performance Prediction and Evaluation Performance Model Performance Prediction Algorithm

51 January 17, 2001Xiaohui Shen51 Performance Model T(s) = T conn + T open + T seek + T read/write (s) + T fileclose + T connclose

52 January 17, 2001Xiaohui Shen52 Performance Prediction Algorithm M: number of datasets N: total number of iterations freq(j): I/O frequency n(j): number of I/O calls t j (s): data transfer time (stored in database)

53 January 17, 2001Xiaohui Shen53 Part 6: Integrated Java Graphical User Interface

54 January 17, 2001Xiaohui Shen54 Functions of IJ-GUI Registering new applications Running applications remotely

55 January 17, 2001Xiaohui Shen55 Functions of IJ-GUI Data analysis and visualization Table browsing and searching

56 January 17, 2001Xiaohui Shen56 Functions of IJ-GUI Automatic code generator

57 January 17, 2001Xiaohui Shen57 Functions of IJ-GUI I/O performance prediction

58 January 17, 2001Xiaohui Shen58 I/O Latency Reducing for Interactive Visualization

59 January 17, 2001Xiaohui Shen59 Summary of Contributions Meta-data Management System Remote Storage Access Optimizations Multi-Storage I/O System Distributed Parallel File System I/O performance prediction and evaluation Integrated working environment

60 January 17, 2001Xiaohui Shen60 Publications A Multi-Storage Resource Architecture and I/O Performance Prediction for Scientific Computing. by X. Shen and A. Choudhary. Cluster Computing Journal. A Novel Application Development Environment for Large-Scale Scientific Computations, by X. Shen, W. Liao, A. Choudhary, et al. ACM ICS2000 Remote I/O Optimization and Evaluation for Tertiary Storage Systems through Storage Resource Broker, by X. Shen, W. Liao and A. Choudhary. IASTED Applied Informatics, Innsbruck, Austria, 2001. A Java Graphical User Interface for Large-Scale Scientific Computations in Heterogeneous Systems, by X. Shen, G. Thiruvathukal, W. Liao, A. Choudhary, and A. Singh. HPC-ASIA, May 2000. Meta-Data Management System for High-Performance Large-Scale Scientific Data Access, by W. Liao, X. Shen, A. Choudhary. HiPC 2000. Data management for large-scale scientific computations in high performance distributed systems, by A. Choudhary, M. Kandemir, H. Nagesh, J. No, X. Shen, V. Taylor, S. More, and R. Thakur. In Proc. HPDC-99 A Multi-Storage Resource Architecture and I/O Performance Prediction for Scientific Computing. by Xiaohui Shen and Alok Choudhary. HPDC-00 A Distributed Multi-Storage I/O System for High Performance Data Intensive Computing, by Xiaohui Shen and Alok Choudhary. DPFS: A Distributed Parallel File System, by Xiaohui Shen and Alok Choudhary. An Integrated Graphical User Interface for High Performance Distributed Computing, by Xiaohui Shen, Wei-keng Liao and Alok Choudhary An Integrated Graphical User Interface for High Performance Distributed Computing, by Xiaohui Shen, Wei-keng Liao and Alok Choudhary A Multimedia Integrated Parallel File System, by J. Carretero, W. Zhu, X. Shen, A. Choudhary. JCIS98.

61 January 17, 2001Xiaohui Shen61 Future Directions-1

62 January 17, 2001Xiaohui Shen62 Future Directions-2


Download ppt "January 17, 2001Xiaohui Shen1 Data Management, Storage and Access Optimization in High Performance Distributed Environment Xiaohui Shen Department of Electrical."

Similar presentations


Ads by Google