Presentation is loading. Please wait.

Presentation is loading. Please wait.

Rasmus Munk Larsen / Pipeline Processing 1HMI Science Team Meeting – January, 2005 JSOC Pipeline Processing Environment Rasmus Munk Larsen, Stanford University.

Similar presentations


Presentation on theme: "Rasmus Munk Larsen / Pipeline Processing 1HMI Science Team Meeting – January, 2005 JSOC Pipeline Processing Environment Rasmus Munk Larsen, Stanford University."— Presentation transcript:

1 Rasmus Munk Larsen / Pipeline Processing 1HMI Science Team Meeting – January, 2005 JSOC Pipeline Processing Environment Rasmus Munk Larsen, Stanford University rmunk@quake.stanford.edu 650-725-5485

2 Rasmus Munk Larsen / Pipeline Processing 2HMI Science Team Meeting – January, 2005 Overview JSOC data series organization Pipeline execution environment Pipeline software architecture Co-I analysis module contribution Pipeline Data Products

3 Rasmus Munk Larsen / Pipeline Processing 3HMI Science Team Meeting – January, 2005 JSOC logical data organization Evolved from MDI dataset concept to –Fix known limitations/problems –Accommodate more complex data models required by higher-level processing Main design features –Separation of meta-data (keywords) and image data No need to re-write large image files when only keywords change (lev1.8 problem) No (fewer) out-of-date keyword values in FITS headers Can bind to most recent values on export –Easier data access All access in terms of (collections of) data records, which are the “atomic units” of a data series A dataset name is a query specifying a set of data records (possibly from multiple data series): – jsoc:hmi_lev0_com1_fg?recordnum=12345 (a specific filtergram with unique record number 12345) – jsoc:hmi_lev0_cam1_fg[12300-12330] (a minute’s worth of filtergrams from camera1) – jsoc:hmi_lev1_fd_V?”T_OBS>=‘2008-11-01’ AND T_OBS<‘2008-12-01’ AND N_MISSING<100” –Storage and tape management must be transparent to user Chunking of data records into storage units for efficient tape/disk usage done internally Completely separate storage and catalog (i.e. series & record) databases: more modular design Legacy MDI modules should run on top of new storage service –Storing keywords in relational database system (Oracle) Can use power of relational database to rapidly find data records Easy and fast to create time series of any keyword value (for trending etc.) Consequence: Data records for a given series must be well defined (e.g. have fixed set of keywords)

4 Rasmus Munk Larsen / Pipeline Processing 4HMI Science Team Meeting – January, 2005 hmi_lev0_cam1_fg Logical Data Organization JSOC Data SeriesData records for series hmi_lev1_fd_V Single hmi_lev1_fd_V data record aia_lev0_cont1700 hmi_lev1_fd_M hmi_lev1_fd_V aia_lev0_FE171 hmi_lev1_fd_V#12345 hmi_lev1_fd_V#12346 hmi_lev1_fd_V#12347 hmi_lev1_fd_V#12348 hmi_lev1_fd_V#12349 hmi_lev1_fd_V#12350 hmi_lev1_fd_V#12351 … … Keywords : RECORDNUM = 12345 # Unique serial number SERIESNUM = 5531704 # Slots since epoch. T_OBS = ‘2009.01.05_23:22:40_TAI’ DATAMIN = -2.537730543544E+03 DATAMAX = 1.935749511719E+03... P_ANGLE = LINK:ORBIT,KEYWORD:SOLAR_P … Storage Unit = Directory Links: ORBIT = hmi_lev0_orbit, SERIESNUM = 221268160 CALTABLE = hmi_lev0_dopcal, RECORDNUM = 7 L1 = hmi_lev0_cam1_fg, RECORDNUM = 42345232 R1 = hmi_lev0_cam1_fg, RECORDNUM = 42345233 … Data Segments: V_DOPPLER = hmi_lev1_fd_V#12352 hmi_lev1_fd_V#12353

5 Rasmus Munk Larsen / Pipeline Processing 5HMI Science Team Meeting – January, 2005 JSOC Series Definition (JSD) #======================= Global series information =========================== Seriesname: "testclass1" Description: “This is a small example of a JSOC series definition." Author: "Rasmus Munk Larsen" Owners: "rmunk" Unitsize: 10 Archive: 1 Retention: permanent Tapegroup: 127 Primary Index: #============================ Keywords ================================= # Format: # Keyword:, link,, # or # Keyword:,,,,, # Keyword: "keywd0", float, 0.0f, "%f", "unit3", "Comment3" Keyword: "keywd1", double, 0.0, "%lf", "unit4", "Comment4" Keyword: "keywd2", datetime, "1970-01-01 00:00:00", "%-s", "unit5", "Comment5" Keyword: "keywd3", timestamp, "19700101000000", "%-s", "unit6", "Comment6" Keyword: "keywd4", string, "", "%-s", "unit7", "Comment7" Keyword: "keywd5", link, "link1", "keywd0" Keyword: "keywd6", char, '\0', "%d", "unit1", "Comment1" Keyword: "keywd7", int, 0, "%d", "unit2", "Comment2" #============================ Links ===================================== # Format: # Link:,, { static | dynamic } # Link: "link0", "testclass0", static Link: "link1", "testclass0", dynamic #============================ Data segments =============================== # Data:,,,,, # Data: "x-axis", float, 1, 100, "m", fits Data: "y-axis", float, 1, 200, "m", fits Data: "z-axis", float, 1, 50, "m", fits Data: "pressure", float, 3, 100, 200, 50, "kg/(s^2*m)", fitz Data: "velocity", float, 4, 100, 200, 50, 3, "m/s", fitz testclass1.jsd JSD parser Oracle database SQL: INSERT INTO series_catalog VALUES(‘testclass1’,’rmunk’, … SQL: CREATE TABLE testclass1 ( recnum integer not null unique, keywd0 binary_float, … Creating a new Data Series:

6 Rasmus Munk Larsen / Pipeline Processing 6HMI Science Team Meeting – January, 2005 Pipeline batch processing (a.k.a. MDI mapfile) Pipeline processing is scheduled in batches by PUI+: a data driven pipeline scheduler inherited from MDI A pipeline batch is a single atomic transaction: –If no module fails all data records are commited and become visible to other clients of the archive –If failure occurs all data records are deleted and the database rolled back JSOC ARCHIVE Input data records Output data records JSOC API Register session JSOC API Module 1 JSOC API Module 2 … JSOC API Module N JSOC API Commit Data & Deregister Disk Pipeline batch = atomic transaction

7 Rasmus Munk Larsen / Pipeline Processing 7HMI Science Team Meeting – January, 2005 Pipeline Client-Server Architecture JSOC Disks Analysis code C/Fortran/IDL/Matlab JSOC Library Record Cache (Keywords+Links+Data paths) OpenRecords CloseRecords GetKeyword, SetKeyword GetLink, SetLink OpenDataSegment CloseDataSegment Pipeline client process JSOC Disks Data Record Management Service (DRMS) Storage Unit Management Service (SUMS) Tape Archive Service Series Catalog Record Catalogs Storage Database Record Catalogs Record Catalogs Oracle Database Server AllocUnit GetUnit PutUnit SQL query Storage unit transfer Data Segment I/O Storage unit transfer File I/O

8 Rasmus Munk Larsen / Pipeline Processing 8HMI Science Team Meeting – January, 2005 co-I contributions and collaboration Contributions from co-I teams: –Software for intermediate and high level analysis modules –Output data series definition Keywords, links, data segments, size of storage units etc. –Documentation (detailed enough to understand the contributed code) –Test data and intended results for verification –Time Explain algorithms and implementation Help with verification Collaborate on improvements if required (e.g. performance or maintainability) Contributions from HMI team: –Pipeline execution environment –Software & hardware resources (Development environment, libraries, tools) –Time Help with defining data series Help with porting code to JSOC API If needed, collaborate on algorithmic improvements, tuning for JSOC hardware, parallelization Verification

9 Rasmus Munk Larsen / Pipeline Processing 9HMI Science Team Meeting – January, 2005 HMI module status and MDI heritage Doppler Velocity Heliographic Doppler velocity maps Tracked Tiles Of Dopplergrams Stokes I,V Continuum Brightness Tracked full-disk 1-hour averaged Continuum maps Brightness feature maps Solar limb parameters Stokes I,Q,U,V Full-disk 10-min Averaged maps Tracked Tiles Line-of-sight Magnetograms Vector Magnetograms Fast algorithm Vector Magnetograms Inversion algorithm Egression and Ingression maps Time-distance Cross-covariance function Ring diagrams Wave phase shift maps Wave travel times Local wave frequency shifts Spherical Harmonic Time series Mode frequencies And splitting Brightness Images Line-of-Sight Magnetic Field Maps Coronal magnetic Field Extrapolations Coronal and Solar wind models Far-side activity index Deep-focus v and c s maps (0-200Mm) High-resolution v and c s maps (0-30Mm) Carrington synoptic v and c s maps (0-30Mm) Full-disk velocity, sound speed, Maps (0-30Mm) Internal sound speed Internal rotation Vector Magnetic Field Maps MDI pipeline modules exist Standalone “production” code routinely used Research code currently used New codes under development (HAO) Research code exists in the community Instrument specific code, Stanford is primary developer Primary observables Intermediate and high level data products

10 Rasmus Munk Larsen / Pipeline Processing 10HMI Science Team Meeting – January, 2005 Questions this meeting should address List of all science data products –Which data products, including intermediate ones, should be produced by JSOC? –What cadence, resolution, coverage etc. will/should each data product have? Eventually a JSOC series description must be written for each one. –Which data products should be computed on the fly and which should be archived? –Have we got the basic pipeline right? Are there maturing new techniques that have been overlooked? Detailing each branch of the processing pipeline –What are the detailed steps in each branch? –Can some of the computational steps be encapsulated in general tools that can be shared among different branches (example: tracking)? –What are the computer resource requirements of computational steps? Contributed analysis modules –Who will contribute code? –Which codes are mature enough for inclusion? Should be at least working research code now, since integration has to begin by c. mid 2006.

11 Rasmus Munk Larsen / Pipeline Processing 11HMI Science Team Meeting – January, 2005 Example: Global Seismology Pipeline


Download ppt "Rasmus Munk Larsen / Pipeline Processing 1HMI Science Team Meeting – January, 2005 JSOC Pipeline Processing Environment Rasmus Munk Larsen, Stanford University."

Similar presentations


Ads by Google