Presentation is loading. Please wait.

Presentation is loading. Please wait.

Alok 1Northwestern University Access Patterns, Metadata, and Performance Alok Choudhary and Wei-Keng Liao Department of ECE,

Similar presentations


Presentation on theme: "Alok 1Northwestern University Access Patterns, Metadata, and Performance Alok Choudhary and Wei-Keng Liao Department of ECE,"— Presentation transcript:

1 Alok Choudharychoudhar@ece.nwu.edu 1Northwestern University Access Patterns, Metadata, and Performance Alok Choudhary and Wei-Keng Liao Department of ECE, Northwestern University Collaboration with ANL SDM kickoff meeting July 10-11, 2001

2 Alok Choudharychoudhar@ece.nwu.edu 2Northwestern University Virtuous Cycle Problem setup (Mesh, domain Decomposition) Simulation (Execute app, Generate data) Manage, Visualize, Analyze Measure Results, Learn, Archive

3 Alok Choudharychoudhar@ece.nwu.edu 3Northwestern University I/O Flow for Scientific Simulation Many scientific applications perform simulation/analysis 3 types of output : Checkpoint, visualization, data for analysis Checkpoint keeps re-writing to the same files Visualization output 2 types of input : new run or restart

4 Alok Choudharychoudhar@ece.nwu.edu 4Northwestern University Data Access Sequence Dependency Temporal dependency – Access the same data set at different time stamp Spatial dependency – Access different data sets at the same time stamp Resolution dependency – Access the same data set at different resolution Sequence is useful for I/O performance improvement, eg. Pre- fetch, pre-stage, storage continuity

5 Alok Choudharychoudhar@ece.nwu.edu 5Northwestern University Spatial Data Access Patterns Parallel partition patterns: –Regular, irregular –Static, dynamic during simulation Access sequence –Spatial, temporal, resolution Access frequency –Once only, multiple times (overwrite for restart) Access amount –Large, medium, small chunks

6 Alok Choudharychoudhar@ece.nwu.edu 6Northwestern University Access Patterns for Visualization/Analysis Generated from real data during simulation or in post- simulation process Smaller size than real data – Type conversion, eg. float  unsign char – Reduce/increase resolution – Projection 3D to 2D 3 types of data generate and display sequence

7 Alok Choudharychoudhar@ece.nwu.edu 7Northwestern University Architecture User Applications MDMS Storage Systems (I/O Interface) Simulation Data Analysis Visualization Metadata access pattern, history MPI-IO (Other interfaces..) Query Input Metadata Hints, Directives Associations OIDs parameters for I/O Schedule, Prefetch, cache Hints (coll I/O) Performance Input System metadata I/O func (best_I/O (for these param)) Hint Data

8 Alok Choudharychoudhar@ece.nwu.edu 8Northwestern University Approach Management meta data using OR-DBMS –Collect and organize meta data in relation tables –Design meta data query interface using SQL Access to HSS –Obtain current storage layout, configuration –Native I/O interfaces or MPI-IO I/O optimization –Determine optimal I/O calls –Overlap I/O with computation, communication, and I/O –Pre-fetch, pre-stage, migrate, purge in HSS –Sub-filing for large file, file container for small files

9 Alok Choudharychoudhar@ece.nwu.edu 9Northwestern University Objective and Goal Meta data management –Collect historical meta data, process user provided meta data, update meta data w.r.t environment changes –Efficient query for meta data High performance I/O –Automatically determine optimal I/O calls from data access patterns –Improve performance by prefetching, caching, layout, inter-object association, striping, etc. –Support for Hierarchical Storage Systems (HSS)

10 Alok Choudharychoudhar@ece.nwu.edu 10Northwestern University Metadata Application Level –Algorithms, compiling, execution environments –Time stamps, parameters, result summary Programming Level –Data types, structures, association of datasets, partition patterns Storage System Level –File locations, file structure, I/O modes, host names, device types, path names, storage hierarchy Performance Level –I/O bandwidth of HSS for local and remote access –Data access sequence, frequency, other access hints –Collective or non-collection I/O

11 Alok Choudharychoudhar@ece.nwu.edu 11Northwestern University Applications Asto3D -- study the highly turbulent convective layers of late-type star –Write only – regular partition on all data sets ENZO -- simulate the formation of a cluster of galaxies consisting of gas and stars –Both read and write –Both regular and irregular partition –Adaptive Mesh Refinement dynamic load balancing Common feature –Checkpoint / restart –Post-simulation data analysis –Visualizing the process of the computation in the form of a movie

12 Alok Choudharychoudhar@ece.nwu.edu 12Northwestern University Interface

13 Alok Choudharychoudhar@ece.nwu.edu 13Northwestern University Run Application

14 Alok Choudharychoudhar@ece.nwu.edu 14Northwestern University Dataset and Access Pattern Table

15 Alok Choudharychoudhar@ece.nwu.edu 15Northwestern University Data Analysis

16 Alok Choudharychoudhar@ece.nwu.edu 16Northwestern University Integrating Analysis Problem setup (Mesh, domain Decomposition) Simulation (Execute app, Generate data) Manage, Visualize, Analyze Measure Results, Learn, Archive On-line analysis And mining

17 Alok Choudharychoudhar@ece.nwu.edu 17Northwestern University Visualize

18 Alok Choudharychoudhar@ece.nwu.edu 18Northwestern University Meta Data Representation in Database

19 Alok Choudharychoudhar@ece.nwu.edu 19Northwestern University Future Directions and Challenges H/W and S/W mainly driven by commercial applications (e.g., web, DB, Content Delivery etc.) –H/W architectures (e.g., Infiniband) –S/W architectures (e.g., ODB, ORDB, DW, mining tools) –Challenge : Can we adapt and enhance these to satisfy scientific computing file systems, storage, and data management requirements? –E.g., Parallel file systems on Infiniband architectures so that uniform UI and access may be provided from different systems? –Can we incorporate DM and analysis capabilities as well as efficient I/O techniques within DM systems File Systems and DM Challenges –Can it be be customized with high performance? –Can it learn from user access patterns? –Can it optimize accesses automatically? –Can it provide high-level interface which is uniform?


Download ppt "Alok 1Northwestern University Access Patterns, Metadata, and Performance Alok Choudhary and Wei-Keng Liao Department of ECE,"

Similar presentations


Ads by Google