Presentation is loading. Please wait.

Presentation is loading. Please wait.

ATLAS metadata analysis in Hadoop Data Lake

Similar presentations


Presentation on theme: "ATLAS metadata analysis in Hadoop Data Lake"— Presentation transcript:

1 ATLAS metadata analysis in Hadoop Data Lake
DKB Meeting

2 Issue 1: Input/Output datasets differentiation
Example: based on DEFT (ProdSys2) database [Oracle] 1) Find all AOD (Analysis Object Data) datasets for project “mc15_13TeV” and physical group SUSY: select d.name as name from t_production_task t, t_production_dataset d where t.taskid = d.taskid and d.name LIKE '%AOD%' and t.project = 'mc15_13TeV' and t.phys_group = 'SUSY‘ 2) Find only input AOD (Analysis Object Data) datasets for project “mc15_13TeV”, physical group SUSY: select inputdataset as name from t_production_task where inputdataset LIKE '%AOD%' and project = 'mc15_13TeV' and phys_group = 'SUSY'; Maria Grigorieva

3 Issue 1: Input/Output datasets differentiation
Example: based on DEFT (ProdSys2) database [Oracle] 3) Find output AOD (Analysis Object Data) datasets for project “mc15_13TeV”, physical group SUSY: select d.name as name from t_production_task t, t_production_dataset d where t.taskid = d.taskid and d.name LIKE '%AOD%' and t.project = 'mc15_13TeV' and t.phys_group = 'SUSY' MINUS select inputdataset as name from t_production_task where inputdataset LIKE '%AOD%' and project = 'mc15_13TeV' and phys_group = 'SUSY'; All datasets Input datasets RESPONSE: …… mc15_13TeV MGPy8EG_A14N23LO_SS_RPVDV_700_50_lam12k_100.merge.AOD.e4634_s2726_r6869_r6282_tid _00 mc15_13TeV MGPy8EG_A14N23LO_SS_RPVDV_700_50_lam12k_1000.merge.AOD.e4634_s2726_r6869_r6282_tid _00 mc15_13TeV HppEG_UE5C6L1_GG_ttN1_UDD_1000_10.merge.AOD.e4807_a766_a777_r6282_tid _00 mc15_13TeV HppEG_UE5C6L1_GG_ttN1_UDD_1000_325.merge.AOD.e4807_a766_a777_r6282_tid _00 mc15_13TeV HppEG_UE5C6L1_GG_ttN1_UDD_1000_650.merge.AOD.e4807_a766_a777_r6282_tid _00 mc15_13TeV HppEG_UE5C6L1_GG_ttN1_UDD_1250_10.merge.AOD.e4807_a766_a777_r6282_tid _00 mc15_13TeV HppEG_UE5C6L1_GG_ttN1_UDD_1250_10.merge.AOD.e4807_s2726_r6869_r6282_tid _00 mc15_13TeV HppEG_UE5C6L1_GG_ttN1_UDD_1250_450.merge.AOD.e4807_a766_a777_r6282_tid _00 mc15_13TeV HppEG_UE5C6L1_GG_ttN1_UDD_1250_450.merge.AOD.e4807_a766_a777_r6282_tid _00 mc15_13TeV HppEG_UE5C6L1_GG_ttN1_UDD_1250_450.merge.AOD.e4807_s2726_r6869_r6282_tid _00 mc15_13TeV HppEG_UE5C6L1_GG_ttN1_UDD_1250_450.merge.AOD.e4807_s2726_r6869_r6282_tid _00 mc15_13TeV HppEG_UE5C6L1_GG_ttN1_UDD_1250_450.merge.AOD.e4807_s2726_r6869_r6282_tid _00 mc15_13TeV HppEG_UE5C6L1_GG_ttN1_UDD_1250_900.merge.AOD.e4807_a766_a777_r6282_tid _00 mc15_13TeV HppEG_UE5C6L1_GG_ttN1_UDD_1250_900.merge.AOD.e4807_a766_a777_r6282_tid _00 mc15_13TeV HppEG_UE5C6L1_GG_ttN1_UDD_1250_900.merge.AOD.e4807_s2726_r6869_r6282_tid _00 mc15_13TeV HppEG_UE5C6L1_GG_ttN1_UDD_1500_10.merge.AOD.e4807_a766_a777_r6282_tid _00 mc15_13TeV HppEG_UE5C6L1_GG_ttN1_UDD_1500_575.merge.AOD.e4807_a766_a777_r6282_tid _00 ….. minus Output datasets Maria Grigorieva

4 Issue 1: Input/Output datasets differentiation
Reproduce this task within Hadoop DKB Storage (DEFT_1 & DEFT_2) Maria Grigorieva

5 Issue 2: Group data by parameters
Phys Group Campaign / Subcampaign Project RequestID NOT in ('user','valid1','valid2','valid3','mc_evind') Production Step Task status Task Timestamp Dataset name Timestamp Dataset status Task ID Container name Timestamp Container status Maria Grigorieva


Download ppt "ATLAS metadata analysis in Hadoop Data Lake"

Similar presentations


Ads by Google