GA 1 CASC Discovery of Access Patterns to Scientific Simulation Data Ghaleb Abdulla LLNL Center for Applied Scientific Computing.

Slides:



Advertisements
Similar presentations
Artificial Neural Networks And XML
Advertisements

1 Slides presented by Hank Childs at the VACET/SDM workshop at the SDM Center All-Hands Meeting. November 26, 2007 Snoqualmie, Wa Work performed under.
Good afternoon. My name is Marek Pawłowski
Data mining in wireless sensor networks based on artificial neural-networks algorithms Authors: Andrea Kulakov and Danco Davcev Presentation by: Niyati.
GENI Experiment Control Using Gush Jeannie Albrecht and Amin Vahdat Williams College and UC San Diego.
C van Ingen, D Agarwal, M Goode, J Gupchup, J Hunt, R Leonardson, M Rodriguez, N Li Berkeley Water Center John Hopkins University Lawrence Berkeley Laboratory.
Grid & Libraries, 10/18/04.1 Second Invitational Berkeley – Academia Sinica Grid Digital Libraries Workshop, Taipei, October 18, 2004 Grid Middleware Application.
Bronis R. de Supinski Center for Applied Scientific Computing Lawrence Livermore National Laboratory June 2, 2005 The Most Needed Feature(s) for OpenMP.
Making the Most of What We Know: Towards Effective Use of Genomics Data Terence Critchlow Center for Applied Scientific Computing Lawrence Livermore National.
Workload Management Massimo Sgaravatto INFN Padova.
MCTS Guide to Microsoft Windows Server 2008 Network Infrastructure Configuration Chapter 11 Managing and Monitoring a Windows Server 2008 Network.
Loupe /loop/ noun a magnifying glass used by jewelers to reveal flaws in gems. a logging and error management tool used by.NET teams to reveal flaws in.
LLNL-PRES This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344.
Sensor Networks Storage Sanket Totala Sudarshan Jagannathan.
Scientific Computing Department Faculty of Computer and Information Sciences Ain Shams University Supervised By: Mohammad F. Tolba Mohammad S. Abdel-Wahab.
Presented by: Alvaro Llanos E.  Motivation and Overview  Frangipani Architecture overview  Similar DFS  PETAL: Distributed virtual disks ◦ Overview.
Grid Information Systems. Two grid information problems Two problems  Monitoring  Discovery We can use similar techniques for both.
Alok 1Northwestern University Access Patterns, Metadata, and Performance Alok Choudhary and Wei-Keng Liao Department of ECE,
LLNL-PRES-XXXXXX This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344.
Test Of Distributed Data Quality Monitoring Of CMS Tracker Dataset H->ZZ->2e2mu with PileUp - 10,000 events ( ~ 50,000 hits for events) The monitoring.
A Lightweight Platform for Integration of Resource Limited Devices into Pervasive Grids Stavros Isaiadis and Vladimir Getov University of Westminster
SDM meeting, July 10-11, 2001Area 3 Report Data mining and discovery of access patterns 3a.i) Adaptive file caching in a distributed system (LBNL) 3b.i)
11 SYSTEM PERFORMANCE IN WINDOWS XP Chapter 12. Chapter 12: System Performance in Windows XP2 SYSTEM PERFORMANCE IN WINDOWS XP  Optimize Microsoft Windows.
Module 7: Fundamentals of Administering Windows Server 2008.
Module 10: Monitoring ISA Server Overview Monitoring Overview Configuring Alerts Configuring Session Monitoring Configuring Logging Configuring.
Chandrika Kamath and Imola K. Fodor Center for Applied Scientific Computing Lawrence Livermore National Laboratory Gatlinburg, TN March 26-27, 2002 Dimension.
Distributed Anomaly Detection in Wireless Sensor Networks Ksutharshan Rajasegarar, Christopher Leckie, Marimutha Palaniswami, James C. Bezdek IEEE ICCS2006(Institutions.
Millions of points of measurement Dense spatial and temporal data Need visual analytic tools as conventional analyses are too inefficient Visualization.
1 Arie Shoshani, LBNL SDM center Scientific Data Management Center(SDM-ISIC) Arie Shoshani Computing Sciences Directorate Lawrence Berkeley National Laboratory.
1 Martin Schulz, Lawrence Livermore National Laboratory Brian White, Sally A. McKee, Cornell University Hsien-Hsin Lee, Georgia Institute of Technology.
Sam Uselton Center for Applied Scientific Computing Lawrence Livermore National Lab October 25, 2001 Challenges for Remote Visualization: Remote Viz Is.
Accelerating Scientific Exploration Using Workflow Automation Systems Terence Critchlow (LLNL) Ilkay Altintas (SDSC) Scott Klasky(ORNL) Mladen Vouk (NCSU)
Learningcomputer.com SQL Server 2008 – Profiling and Monitoring Tools.
Martin Schulz Center for Applied Scientific Computing Lawrence Livermore National Laboratory Lawrence Livermore National Laboratory, P. O. Box 808, Livermore,
KNOWLEDGE GRIDS Akshat Mishra GRID SEMINAR WINTER 2008 Feb 2008.
Parallel and Grid I/O Infrastructure W. Gropp, R. Ross, R. Thakur Argonne National Lab A. Choudhary, W. Liao Northwestern University G. Abdulla, T. Eliassi-Rad.
Grid Architecture William E. Johnston Lawrence Berkeley National Lab and NASA Ames Research Center (These slides are available at grid.lbl.gov/~wej/Grids)
ASC Tri-Lab Code Development Tools Workshop Thursday, July 29, 2010 Lawrence Livermore National Laboratory, P. O. Box 808, Livermore, CA This work.
SDM center Supporting Heterogeneous Data Access in Genomics Terence Critchlow Center for Applied Scientific Computing Lawrence Livermore National Laboratory.
Martin Schulz Center for Applied Scientific Computing Lawrence Livermore National Laboratory ASC STAT Team: Greg Lee, Dong Ahn (LLNL), Dane Gardner (LANL)
Youngil Kim Awalin Sopan Sonia Ng Zeng.  Introduction  Concept of the Project  System architecture  Implementation – HDFS  Implementation – System.
Threading Opportunities in High-Performance Flash-Memory Storage Craig Ulmer Sandia National Laboratories, California Maya GokhaleLawrence Livermore National.
SDM center Supporting Heterogeneous Data Access in Genomics Terence Critchlow Ling Liu, Calton Pu GT Reagan Moore, Bertam Ludaescher, SDSC Amarnath Gupta.
Bronis R. de Supinski and Jeffrey S. Vetter Center for Applied Scientific Computing August 15, 2000 Umpire: Making MPI Programs Safe.
CERN IT Department CH-1211 Genève 23 Switzerland t CERN IT Monitoring and Data Analytics Pedro Andrade (IT-GT) Openlab Workshop on Data Analytics.
LLNL-PRES This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344.
An Energy-Efficient Approach for Real-Time Tracking of Moving Objects in Multi-Level Sensor Networks Vincent S. Tseng, Eric H. C. Lu, & Kawuu W. Lin Institute.
Resource Management Model of Data Storage Systems Oriented on Cloud Computing Elena Kaina Yury Korolev.
Parallel I/O Performance Study and Optimizations with HDF5, A Scientific Data Package Christian Chilan, Kent Yang, Albert Cheng, Quincey Koziol, Leon Arber.
Em Spatiotemporal Database Laboratory Pusan National University File Processing : Database Management System Architecture 2004, Spring Pusan National University.
LIOProf: Exposing Lustre File System Behavior for I/O Middleware
Chapter 11 Analysis Methodology Spring Incident Response & Computer Forensics.
Performing Fault-tolerant, Scalable Data Collection and Analysis James Jolly University of Wisconsin-Madison Visualization and Scientific Computing Dept.
L04 Source Code Review & Modification Options to Connect to AMS Data Hub (Xuesong Zhou) 1.
Presented by SciDAC-2 Petascale Data Storage Institute Philip C. Roth Computer Science and Mathematics Future Technologies Group.
- Pritam Kumat - TE(2) 1.  Introduction  Architecture  Routing Techniques  Node Components  Hardware Specification  Application 2.
Energy Management Solution
Workload Management Workpackage
SNS COLLEGE OF TECHNOLOGY
Lawrence Livermore National Laboratory
Energy Management Solution
University of Technology
Cache-Efficient Layouts of BVHs and Meshes
SDM workshop Strawman report History and Progress and Goal.
Physics-based simulation for visual computing applications
Heat Simulations with COMSOL
A General Approach to Real-time Workflow Monitoring
Data Management Components for a Research Data Archive
Machine Learning for Space Systems: Are We Ready?
L. Glimcher, R. Jin, G. Agrawal Presented by: Leo Glimcher
Presentation transcript:

GA 1 CASC Discovery of Access Patterns to Scientific Simulation Data Ghaleb Abdulla LLNL Center for Applied Scientific Computing

GA 2 CASC Team Ghaleb Abdulla (0.4) Tina Eliassi-Rad (0.4) Terence Critchlow (0.15)

GA 3 CASC Task Objective Identify data storage formats that minimize access times using historical access patterns to the same or similar data sets Use spatial and temporal locality that result from data accesses to format the data on the disk

GA 4 CASC Challenges Data can be accessed using different: —tools, —by different users.

GA 5 CASC Enabling Access Pattern Discovery Application area (astrophysics) Visualization tool (VisIt) Analyze history of access patterns on two levels: —System Level –Disk references –Network overhead –Memory usage — Application level –Higher level commands –User level info

GA 6 CASC Enabling Access Pattern Discovery VisIt Astrophysics User 1 User n Application Logging Disk Logging Log files Unsupervised Learner (e.g., k-NN, k- means, etc) Supervised Learner (e.g., neural net, DT, etc) Hints [Pattern, Hints] training data Patterns Djehuty

GA 7 CASC Log file collection Collect logs at the application and disk level Managing log collection process —Start and stop collection sensors or agents based on demand —Keep log data in one central place —Detect any failure in the monitoring agents and restart them —Preferably work in a distributed environment JAMM from LBL meets our requirements

GA 8 CASC JAMM Architecture

GA 9 CASC What to Collect Application and user level: —Open —Zoom —Slice —etc. System level —Network overhead —Disk block size —Buffer size —Disk location, etc. We need to add our own sensors to collect data

GA 10 CASC Data format The DTD for our XML files is as follows: <!ATTLIST metadata name ID #REQUIRED time NMTOKENS #IMPLIED>

GA 11 CASC Log File, Example 100K K write random 210M 128K

GA 12 CASC Data Analysis Researched publicly available clustering tools Narrowed our choice to two —CLUTO (University of Minnesota) — R (GNU) Testing data processing algorithms on randomly generated log files Hoping to get real log files in the near future: — Logging applications —We are currently looking at the “Flash” Log files

GA 13 CASC Questions

GA 14 CASC This work was performed under the auspices of the U.S. Department of Energy by the University of California, Lawrence Livermore National Laboratory under contract No. W Eng-48. UCRL-MI-xxxxxx