Presentation is loading. Please wait.

Presentation is loading. Please wait.

CLASS Information Management Presented at NOAATECH Conference 2006 Presented by Pat Schafer (CLASS-WV Development Lead)

Similar presentations


Presentation on theme: "CLASS Information Management Presented at NOAATECH Conference 2006 Presented by Pat Schafer (CLASS-WV Development Lead)"— Presentation transcript:

1 CLASS Information Management Presented at NOAATECH Conference 2006 Presented by Pat Schafer (CLASS-WV Development Lead) Email: pat.schafer@tmctechnologies.com November 3, 2005

2 2 Agenda What is CLASS? Overview of the CLASS System Distributed Redundant Archive CLASS Cache Management Current and Future CLASS Data Volumes Dealing with Larger Data Volumes Information Management Research Case of CLASS Scalability Questions?

3 3 What is CLASS? CLASS stands for Comprehensive Large Array Stewardship System. CLASS is a web-based data archive and distribution system for NOAA’s environmental data. Mission Statement: NOAA's National Data Centers and their world-wide clientele of customers look to CLASS as the sole NOAA IT infrastructure project in which all NOAA’s current and future environmental data sets will reside. CLASS provides permanent, secure storage, and safe, efficient data discovery and access between the Data Centers and the customers.

4 4 Overview of the CLASS System

5 5 Overview of CLASS Activity Controller Controls all activities for the back end systems. Configure processing paths. Processing path are a group of activities that are executed in a specified sequence. There is a trigger for each activity of a processing path. Each activity is comprised of a process and its parameters. Each process has configurable environment specifications.

6 6 Overview of CLASS Ingest Checks for files to ingest periodically. These files are either push to CLASS or CLASS pulls them from the data provider. CLASS can also worth with manifest files that contain list of files to ingest. Extracts metadata. Creates inventory records in the database. Archives the data into a robotic tape archive. Puts the data in the local cache. Generates browse data files for AVHRR and GOES data types. Starts the subscription process.

7 7 Overview of CLASS Delivery The delivery system processes orders. Retrieves order information from the database. Locates files in the temporary or permanent caches or retrieves the files for the order from the robotic tape. Performs data extraction, sub-setting, conversion, etc. upon user request through the order. Encrypts data that is restricted. Generates digital signatures on all files for user that request digital signatures. Copies the order data in the CLASS FTP area. Pushes the data to subscriber users that have requested it. Notifies the user that there data is ready.

8 8 Overview of the CLASS WEB Interface Users can register and order data for free. Tomcat, Cocoon, XSL, Java, and Java script are used display information to the user it to the user. The web interface uses the VisServer to generate browse images to display to the user. The web interface uses the InvServer to search the inventory and retrieve search results. Users can place data into the shopping cart where it can be ordered. Users can update user preferences and profiles Approved users can manage user subscriptions URL: www.class.noaa.gov

9 9 CLASS Cache Management Manages files in three types of caches: permanent, temporary, and delivery. To save disk space, files are store in the temporary cache for a limited time and are removed once the demand for them is gone. Files not on the on-line caches can be retrieved from the robotic tape archive and store in the temporary cache for a limited time. Tracks activities on files and file location in the cache. Parameterization of file cleanup and file storage. Operator interface access to manage caches.

10 10 Distributed Redundant Archive Ingest process Operational inventory Archiver Archive interchange Robotic storage Provider Ingest process Operational inventory Archiver Archive interchange Robotic storage Suitland Asheville Operational datastore

11 11 Current and Future Data Volumes Current: –Ingest: 71 GB/Day (average) –Distribute: 120 GB/Day (average) Future (2010): –Ingest: 8+ TB/Day –Distribute: 48+ TB/Day (estimate: 6 times the ingest volume

12 12 Dealing with Larger Data Volumes Hardware and Communication studies recommended upgrades –CPUs: from 2 - 750 GHz to 4 - 1.65 GHz. –Increase RAM from 4 to 8 GB. –Increase processor to memory bandwidth – 25.5 GB/sec. –Increase Remote I/O bandwidth – 8.8 GB/sec. –SAN for all fibre channel transfers. –Shared File System (SFS) to eliminate unnecessary file copies. –System scalability for easy addition of new hardware. This upgrade will handle the immediate increase of data volume for EOS and NPP data: Ingest: 4 TB/Day Delivery: 24 TB/Day

13 13 Information Management Research Long Term Architecture (LTA) –CLASS Node Study –Reprocessing –APIs for access to CLASS –Data Models –External repositories and systems CLASS Near Term Upgrade –Upgrade the CLASS Ingest system for optimization and easier integration of new data streams. –Upgrade the CLASS Delivery system for optimization and easier integration of new data streams. –Implement an Order Generator to centralize order generation. Needed for API access CLASS ordering. Integrating with other systems for CLASS metadata like the NMMR (NOAA Metadata Manager's Repository). Delivery of data on physical media.

14 14 Case of CLASS Scalability Historical ingest of GOES data. Goal was 400 GB/Day. Started at 100 GB/Day. Now as high as 900+ GB/Day Changes to increase the ingest rate: –Parallelized the ingest process across multiple servers. –Switched from shared disks to local disks for temporary directories and some caches. –Increase the RAM from 2 GB to 4GB. –Re-configured, in the database, the number of Ingest processes that could run at a time to maximize ingest throughput. –Turned off operational ingest of GOES data at NCDC and turned it on at Suitland. –Added GigE network cards to the servers to increase data transfers. –No software changes were made to increase the ingest rate. Only software and hardware configurations.

15 15 Open Discussion


Download ppt "CLASS Information Management Presented at NOAATECH Conference 2006 Presented by Pat Schafer (CLASS-WV Development Lead)"

Similar presentations


Ads by Google