Download presentation
Presentation is loading. Please wait.
Published byDulcie Aubrey Clark Modified over 6 years ago
1
ATLAS metadata analysis in Hadoop Data Lake
DKB Meeting
2
Issue 1: Input/Output datasets differentiation
Example: based on DEFT (ProdSys2) database [Oracle] 1) Find all AOD (Analysis Object Data) datasets for project “mc15_13TeV” and physical group SUSY: select d.name as name from t_production_task t, t_production_dataset d where t.taskid = d.taskid and d.name LIKE '%AOD%' and t.project = 'mc15_13TeV' and t.phys_group = 'SUSY‘ 2) Find only input AOD (Analysis Object Data) datasets for project “mc15_13TeV”, physical group SUSY: select inputdataset as name from t_production_task where inputdataset LIKE '%AOD%' and project = 'mc15_13TeV' and phys_group = 'SUSY'; Maria Grigorieva
3
Issue 1: Input/Output datasets differentiation
Example: based on DEFT (ProdSys2) database [Oracle] 3) Find output AOD (Analysis Object Data) datasets for project “mc15_13TeV”, physical group SUSY: select d.name as name from t_production_task t, t_production_dataset d where t.taskid = d.taskid and d.name LIKE '%AOD%' and t.project = 'mc15_13TeV' and t.phys_group = 'SUSY' MINUS select inputdataset as name from t_production_task where inputdataset LIKE '%AOD%' and project = 'mc15_13TeV' and phys_group = 'SUSY'; All datasets Input datasets RESPONSE: …… mc15_13TeV MGPy8EG_A14N23LO_SS_RPVDV_700_50_lam12k_100.merge.AOD.e4634_s2726_r6869_r6282_tid _00 mc15_13TeV MGPy8EG_A14N23LO_SS_RPVDV_700_50_lam12k_1000.merge.AOD.e4634_s2726_r6869_r6282_tid _00 mc15_13TeV HppEG_UE5C6L1_GG_ttN1_UDD_1000_10.merge.AOD.e4807_a766_a777_r6282_tid _00 mc15_13TeV HppEG_UE5C6L1_GG_ttN1_UDD_1000_325.merge.AOD.e4807_a766_a777_r6282_tid _00 mc15_13TeV HppEG_UE5C6L1_GG_ttN1_UDD_1000_650.merge.AOD.e4807_a766_a777_r6282_tid _00 mc15_13TeV HppEG_UE5C6L1_GG_ttN1_UDD_1250_10.merge.AOD.e4807_a766_a777_r6282_tid _00 mc15_13TeV HppEG_UE5C6L1_GG_ttN1_UDD_1250_10.merge.AOD.e4807_s2726_r6869_r6282_tid _00 mc15_13TeV HppEG_UE5C6L1_GG_ttN1_UDD_1250_450.merge.AOD.e4807_a766_a777_r6282_tid _00 mc15_13TeV HppEG_UE5C6L1_GG_ttN1_UDD_1250_450.merge.AOD.e4807_a766_a777_r6282_tid _00 mc15_13TeV HppEG_UE5C6L1_GG_ttN1_UDD_1250_450.merge.AOD.e4807_s2726_r6869_r6282_tid _00 mc15_13TeV HppEG_UE5C6L1_GG_ttN1_UDD_1250_450.merge.AOD.e4807_s2726_r6869_r6282_tid _00 mc15_13TeV HppEG_UE5C6L1_GG_ttN1_UDD_1250_450.merge.AOD.e4807_s2726_r6869_r6282_tid _00 mc15_13TeV HppEG_UE5C6L1_GG_ttN1_UDD_1250_900.merge.AOD.e4807_a766_a777_r6282_tid _00 mc15_13TeV HppEG_UE5C6L1_GG_ttN1_UDD_1250_900.merge.AOD.e4807_a766_a777_r6282_tid _00 mc15_13TeV HppEG_UE5C6L1_GG_ttN1_UDD_1250_900.merge.AOD.e4807_s2726_r6869_r6282_tid _00 mc15_13TeV HppEG_UE5C6L1_GG_ttN1_UDD_1500_10.merge.AOD.e4807_a766_a777_r6282_tid _00 mc15_13TeV HppEG_UE5C6L1_GG_ttN1_UDD_1500_575.merge.AOD.e4807_a766_a777_r6282_tid _00 ….. minus Output datasets Maria Grigorieva
4
Issue 1: Input/Output datasets differentiation
Reproduce this task within Hadoop DKB Storage (DEFT_1 & DEFT_2) Maria Grigorieva
5
Issue 2: Group data by parameters
Phys Group Campaign / Subcampaign Project RequestID NOT in ('user','valid1','valid2','valid3','mc_evind') Production Step Task status Task Timestamp Dataset name Timestamp Dataset status Task ID Container name Timestamp Container status Maria Grigorieva
Similar presentations
© 2025 SlidePlayer.com Inc.
All rights reserved.