Presentation is loading. Please wait.

Presentation is loading. Please wait.

Scientific Computing Division Trends and Directions of Mass Storage in the Scientific Computing Arena CAS 2001 Gene Harano National Center for Atmospheric.

Similar presentations


Presentation on theme: "Scientific Computing Division Trends and Directions of Mass Storage in the Scientific Computing Arena CAS 2001 Gene Harano National Center for Atmospheric."— Presentation transcript:

1 Scientific Computing Division Trends and Directions of Mass Storage in the Scientific Computing Arena CAS 2001 Gene Harano National Center for Atmospheric Research

2

3 CAS 2001 – October 30, 2001 Copyright © 2001 University Corporation for Atmospheric Research Scientific Computing Division 2 Vision How do we accomplish that vision? Handling large datasets – Analysis and Visualization Shared File Systems and Cache Pools Middleware and layering Management tools Emerging Technologies (To name a few)

4 CAS 2001 – October 30, 2001 Copyright © 2001 University Corporation for Atmospheric Research Scientific Computing Division 3 Large Datasets The NCAR MSS was originally a tape based archive. NCAR MSS average file size is 35 MBs (11 M files); small due to historical restrictions (single volume datasets, model history files) and a large number (25%) of files < 1 MB (user backups) Single TB sized files are common for visualization and analysis Currently these large files are sliced up prior to landing in the archive. Access is generally sequential, but some random access.

5 CAS 2001 – October 30, 2001 Copyright © 2001 University Corporation for Atmospheric Research Scientific Computing Division 4 Large Datasets Are tape based archives obsolete? No, but there is a need to reevaluate the entire storage structure at NCAR. Cache pools Data warehouses, data sub-setting The NCAR MSS is being treated as a shared file system rather than an archive.

6 CAS 2001 – October 30, 2001 Copyright © 2001 University Corporation for Atmospheric Research Scientific Computing Division 5 Shared File System Heterogeneous High-Performance High-Capacity Doesn’t yet exist. Shared Data Web/ GRID/ servers Programmatic Command Line

7 CAS 2001 – October 30, 2001 Copyright © 2001 University Corporation for Atmospheric Research Scientific Computing Division 6 Cache Pools External to the archive Minimize archive activity Temporary data stays out of the archive Customized for a smaller set of associated data Internal to the archive Minimize tape activity Improve response time Federate and distribute Repackage small files for tape storage under system control

8 CAS 2001 – October 30, 2001 Copyright © 2001 University Corporation for Atmospheric Research Scientific Computing Division 7 MSS Proxy Data analysis GPFS Shared File System Advanced Research Computing System (IBM SP) Terascale Modeling & Analysis

9 CAS 2001 – October 30, 2001 Copyright © 2001 University Corporation for Atmospheric Research Scientific Computing Division 8 Vislab MSS Proxy Data analysis Storage Area Network Shared File System Terascale Analysis & Visualization

10 CAS 2001 – October 30, 2001 Copyright © 2001 University Corporation for Atmospheric Research Scientific Computing Division 9 CDP/ESG Data Processor DSS server Storage Area Network Shared File System Unidata, DODs MSS Proxy Data Provisioning & Access

11 CAS 2001 – October 30, 2001 Copyright © 2001 University Corporation for Atmospheric Research Scientific Computing Division 10 Internal Cache Pools NCAR MSS event log modeling (April 2000 – April 2001) – looking at tape activity 20 TB cache pool – can be federated and distributed 30 day average cache residency 70% reduction in tape read-backs Greatly enhanced response time Reduce the amount of tape resources or redefine their use.

12 CAS 2001 – October 30, 2001 Copyright © 2001 University Corporation for Atmospheric Research Scientific Computing Division 11 Middleware and Layering An Archive performs 2 basic functions Reliably storing data Returning data on demand Data analysis, data mining, data assimilation, distributed data servers, etc. are functions utilizing middleware that sits on top of an archive and should be implemented independent of the underlying archive. Role of an archive

13 CAS 2001 – October 30, 2001 Copyright © 2001 University Corporation for Atmospheric Research Scientific Computing Division 12 Middleware and Layering Separate archive functionality from Visualization Data servers Data warehousing, data mining, data subsetting Web and Grid access Etc. Maximally enables the use of COTS Allows (transparent) replacement of components as needed Fill the gaps with custom software

14 CAS 2001 – October 30, 2001 Copyright © 2001 University Corporation for Atmospheric Research Scientific Computing Division 13 Future Data Services File Cache Services Pools NCAR MSS Archive Data Analysis/Mining/Assimilation Data Cataloging/ Searching Data Storage Digital Libraries, Data Servers Visualization WEB

15 CAS 2001 – October 30, 2001 Copyright © 2001 University Corporation for Atmospheric Research Scientific Computing Division 14 Management Tools There is a need for better user and system management tools as MSS capacity scales. How does a single user manage 1 million files? How does a MSS administrator dynamically tune a system, predict workloads, find and correct bottlenecks?

16 CAS 2001 – October 30, 2001 Copyright © 2001 University Corporation for Atmospheric Research Scientific Computing Division 15 Management Tools Defining new roles Single ordinary user MSS superuser As users come and go, there is a need for: Project superuser (new) Division data administrator (new) Web based metadata user tools List, search, catalog holdings – metadata mining Remove unwanted files NCAR MSS tools

17 CAS 2001 – October 30, 2001 Copyright © 2001 University Corporation for Atmospheric Research Scientific Computing Division 16 Management Tools From the system perspective – utilize data warehousing and data mining techniques System modeling using event logs. Capacity planning Identify bottlenecks Operational monitoring Track errors, identify trends (media problems) Intrusion detection Dynamic system tuning NCAR MSS tools

18 CAS 2001 – October 30, 2001 Copyright © 2001 University Corporation for Atmospheric Research Scientific Computing Division 17 Emerging Technologies Data Path Tape Holographic Storage Probe-Based MEMS High-Density Rosetta (analog)

19 CAS 2001 – October 30, 2001 Copyright © 2001 University Corporation for Atmospheric Research Scientific Computing Division 18 Data Path HIPPI in use today in the NCAR archive Fibre Channel will replace our HIPPI in the near term FC SAN for RAID Cache Pools FC SAN for Tape sharing Others iSCSI FC over IP Infiniband

20 CAS 2001 – October 30, 2001 Copyright © 2001 University Corporation for Atmospheric Research Scientific Computing Division 19 Tape Linear 3590 3570 3590E MammothDLT-7000 DTF SD-3 Helical Native Cartridge Capacity (GB) 3480/90 AIT-29840 AIT 3570C Ultrium 2001 9490 EE Accelis Mammoth 2 SDLT 3490 E DLT-4000 9940 2H02 9840B Opt2003 1 TB 200GB 1Q02 500GB 500GB 2003 1 TB,60MB,2004

21 CAS 2001 – October 30, 2001 Copyright © 2001 University Corporation for Atmospheric Research Scientific Computing Division 20 Tape To be competitive with magnetic disk, magnetic tape must grow at 10x each 5 years. Achieved by a combination of increased areal density and longer (and possibly wider) tape. (from a storage vendor)

22 CAS 2001 – October 30, 2001 Copyright © 2001 University Corporation for Atmospheric Research Scientific Computing Division 21 Tape RAIT (Redundant Array of Independent Tapes) Increased Performance Higher Reliability with the use of parity Higher single “volume” Capacity Large datasets on a single “volume” RAIL (Redundant Array of Independent Libraries) Greater total system capacity Improved response time These are resource intensive solutions – dedicated libraries and drives

23 CAS 2001 – October 30, 2001 Copyright © 2001 University Corporation for Atmospheric Research Scientific Computing Division 22 Holographic Large capacity – 10 GBs in a single cubic centimeter (10 Gbits/in 2 for magnetic disk) High-speed – 2 Gigabits/sec Low power Billions of write cycles

24 CAS 2001 – October 30, 2001 Copyright © 2001 University Corporation for Atmospheric Research Scientific Computing Division 23 Probe-Based MEMS MEMS – Micro-Electrical Mechanical Systems Probe-based storage arrays Dense Highly parallel to achieve high bandwidth Rectilinear 2D positioning Commercial devices in the next several years

25 CAS 2001 – October 30, 2001 Copyright © 2001 University Corporation for Atmospheric Research Scientific Computing Division 24 HD Rosetta Product marketed by Norsam Technologies Developed at Los Alamos National Lab Analog Lifetime of 1000s of years Can be read back with only a microscope Stores text and images


Download ppt "Scientific Computing Division Trends and Directions of Mass Storage in the Scientific Computing Arena CAS 2001 Gene Harano National Center for Atmospheric."

Similar presentations


Ads by Google