Presentation is loading. Please wait.

Presentation is loading. Please wait.

DDS, A Seismic Processing Architecture Reproducible research workshop UBC, Vancouver, 2006 Randall L. Selzler Data-Warp.com Jerry Ehlers Jerry.Ehlers.

Similar presentations


Presentation on theme: "DDS, A Seismic Processing Architecture Reproducible research workshop UBC, Vancouver, 2006 Randall L. Selzler Data-Warp.com Jerry Ehlers Jerry.Ehlers."— Presentation transcript:

1 DDS, A Seismic Processing Architecture Reproducible research workshop UBC, Vancouver, 2006 Randall L. Selzler RSelzler @ Data-Warp.com Jerry Ehlers Jerry.Ehlers @ BP.com Joseph A. Dellinger* Joseph.Dellinger @ BP.com

2 2 DDS ORIGINS: Amoco TRC, early 90’s DDS began at the Amoco Tulsa Research Center at a time of great organizational strain. The job of the TRC was to do research and crunch data, not to write software. Creating software is expensive! Amoco’s solution was an edict that “ everyone will use DISCO, or else ”.

3 3 Else! But DISCO just wasn’t good enough! And so chaos ensued... We were “mired in seismic processing diversity”. DDS grew up surrounded by: USP (Amoco internal trace-header based) SEPlib (ASCII header pointing to data cubes) SU (SEGY trace-header based) DISCO (proprietary monitor-based system).... and needed to be compatible with all of these!

4 Although formally cast as a research group, in fact the TRC also functioned as an “internal contractor” processing shop. 1) So to catch on, not only would any software have to be usable for quick-turnaround research, but 2) the ability to process large datasets efficiently and in parallel was also of vital importance. [Terabytes of data, Connection Machines, MPI, OpenMP] 3) The group had accumulated a considerable number and variety of computers. [All “Unix”, but CM5, Cray, Sun, SGI, Linux, Linux clusters, 32 and 64 bit...] 4) Finally, there was an urgent need for software that could accomodate all the various mutant SEGY formats coming into the shop, as well as DISCO, SEPlib, SU, and USP!

5 5 and out of the chaos came... John Etgen was using SEPlib for migration algorithm research on the CM200, a machine that required massively parallel data I/O. He showed SEPlib to Randy Selzler: “I want something that looks like THIS, but can handle the large industrial-strength jobs I need to do!” And thus DDS was born...

6 6 How SEPlib did it “header” file... processing history... esize=4 (bytes) data_format=xdr_float in=data_location n1=trace_length n2=number_traces_per_record n3=number_records d1=sample_interval o1=starting sample etc... regularly sampled cube of IEEE 4-byte floats of dimension n1 x n2 x n3 data file SEPlib was the system favored by the folks writing programs that worked on large data volumes instead of individual traces.

7 7 DDS can look a lot like SEPlib SEPlib header file... processing history... esize=4 (bytes) data_format=xdr_float in=data_location n1=trace_length n2=number_traces_per_record n3=number_records d1=sample_interval o1=starting sample label1=seconds etc... DDS “dictionary” file... processing history... type=float4 format=fcube data= data location axis= t offset cdp size.t = trace length size.offset=number traces per record size.cdp= number records delta.t= sample_interval origin.t= starting sample units.t= seconds etc...

8 8 DDS can look a lot like SEPlib “dictionary” file type=float4 format=fcube data= data location axis= t offset cdp size.t = trace length size.offset=number traces per record size.cdp= number records delta.t= sample_interval origin.t= starting sample units.t= seconds etc... regularly sampled cube of IEEE 4-byte floats of dimension size.t x size.offset x size.cdp data file (command-line arguments look a LOT like SEPlib too)

9 9 Binary Data Dictionary DDS’s Generalizations … axis= t y cmp … size.t= 1000 size.y= 96 size.cmp= 24 … delta.t= 0.008 units.t= s … origin.y= 5000 units.y= m … format= segy data= oak39_@ Card Header Line Header Traces… N-Dimensional Array of I/O Records Densely populated for random access Sequential access if sparse Meaningful Axis Names t, x, y, z, w, kx, ky, kz, cmp, shot, offset, … Extensible Axis Attributes Regular grid (size, origin, delta, units, …) Variable grid ( grid.z= 1 3 5 7 11, …) Non-numeric ( label.attr= Vp Vs rho ) Great for research! Exotic algorithms and unforeseen domains can be accurately represented and processed as easily as traditional ones.

10 10 How USP did it USP-format data file historical line header (processing history and 3 data dimensions) element count trace header trace samples element count trace header trace samples element count trace header trace samples... traces Unix Seismic Processing USP was Amoco’s internally home-grown trace-based processing system, beloved of Amoco’s signal processors. USP is similar to SU in concept. USP uses longer trace headers than SU, but they still turned out to not be long enough! USP is still used as much as ever today.

11 11 SU and USP use fixed-format trace headers defined by include files /* * hdr.h – SU include file for segy offset array */ static struct { char *key;char *type;int offs; } hdr[] = { { "tracl","i",0}, { "tracr","i",4}, { "fldr","i",8}, { "tracf","i",12}, { "ep","i",16}, { "cdp","i",20}, { "cdpt","i",24}, { "trid","h",28}, { "nvs","h",30}, { "nhs","h",32}, { "duse","h",34}, { "offset","i",36}, { "gelev","i",40}, { "selev","i",44}, { "sdepth","i",48}, { "gdel","i",52}, {...

12 12 DDS also plays well with USP USP-format data file line header (three dimensions) element count trace header trace samples element count trace header trace samples element count trace header trace samples... DDS dictionary file type=float4 format=usp data= data location axis= t offset cdp comp size.t = trace length size.offset=number traces per record size.cdp= number records size.comp= number components delta.t= sample_interval origin.t= starting sample units.t= seconds etc... traces DDS knows what USP headers look like!

13 13 and SEGY... SEGY-format data file EBCDIC cards binary header... DDS dictionary file type=float4ibm format=segy data= data location axis= t offset cdp comp size.t = trace length size.offset=number traces per record size.cdp= number records size.comp= number components delta.t= sample_interval origin.t= starting sample units.t= seconds etc... traces trace header IBM-format samples trace header IBM-format samples trace header IBM-format samples Note DDS only bothers to convert back to SEGY’s archaic IBM floats when writing to disk!

14 editd in=minute2.usp \ 3s=16 3e=16 2s=2 2e=32 2i=2 \ out_format= su \ out_data= stdout: | \ supswigp clip=.2 > wiggle.ps DDS can speak SU note input format auto-detected

15 15 DDS dictionaries can point at dictionaries! type=float4ibm format=segy slice.comp data= dict.comp1 dict.comp2 dict.comp3 axis= t offset cdp comp size.t = trace length size.offset=number traces per record size.cdp= number records size.comp= number components... type=float4ibm format=segy data= data.c1.segy axis= t offset cdp size.t = trace length size.offset=number traces per record size.cdp= number records... SEGY binary data file data.c1.segy type=float4ibm format=segy data= dict.c2.segy axis= t offset cdp size.t = trace length size.offset=number traces per record size.cdp= number records... SEGY binary data file data.c2.segy dict.comp2 dict.comp1

16 16 DDS plays well with mutant SEGY bridge in= Atlantis_EQ.segy \ in_format=segy \ out_format=usp \ comment="Component Type" \ map:segy:usp.RcComp= "TotalStatic" \ \ comment="Src and rec locations" \ map:segy:usp.SrPtXC= "SrcX / 10" \ map:segy:usp.SrPtYC= "SrcY / 10" \ map:segy:usp.SrPtEl= "15" \ map:segy:usp.ShtDep= "SrcDepth / 10" \ \ map:segy:usp.RcPtXC= "GrpX / 10" \ map:segy:usp.RcPtYC= "GrpY / 10" \ map:segy:usp.GrpElv= "Spare.I4[10] / 10" \ map:segy:usp.CabDep= "Spare.I4[10]" \ map:segy:usp.DstSgn= "DstSgn / 10" \ \ comment="Rec point and line numbers" \ map:segy:usp.DpPtLn= "Spare.I4[8]" \ map:segy:usp.DpPtLt= "Spare.I4[9]" \ \ comment="Dead or Live" \ map:segy:usp.StaCor= '( TrcIdCode - 1 ) * 30000' \ |\ editd in= stdin: 3e=106 out_data= raw.usp straight map fixed number arithmetic calculation

17 17 Data formats and mappings This is how DDS differs from SEPlib... The properties of the binary data, and all the elements within the binary data, are looked up in the “dictionary”. Even the array of trace samples is just another trace field as far as DDS is concerned. DDS knows a few default formats, but can use any format that you can define. It can also map to and from any format that you can define the necessary mappings for. This has the important side effect of documenting the data format, making future reproducibility possible

18 18 DDS supports generic formats In fact, besides having a few built-in default formats such as USP, SU, and SEGY that are convenient for geophysicists, there is nothing in the core of DDS that limits it to being a seismic processing system!

19 19 Internal data formats Programs can define their own internal data formats as well, simply by writing definitions into their own internal dictionary: fdds_printf (‘MOD_FIELD’, ‘ *+ float MyHeader1, MyHeader2;\n\0’) DDS will then convert from the format of the data, as documented by its dictionary, to the internal format specified by the program. On output, the internal format will be converted back into whatever output format has been requested on the command line, or by default, the output format will be the same as the input format.

20 20 Leverage Diversity? Interoperate! Data handling is fundamental… DDS Application Generic Write Generic Read Disk File Pipe/Socket Tape Non-DDS Application Non-DDS Application Disk File Pipe/Socket Tape Any DDS Supported Format Non-DDS Application API Emulation Generic I/O DDS Application Generic I/O API Emulation Foreign Format Foreign Library DISCO Support 1997-2003 USP Re-link 1998 Proof of Concept Format and API Emulation With Random Access I/O

21 21 Are you scared yet? You can probably imagine that all this translating between formats can get very complicated...... fmt:SAMPLE_TYPE= typedef float4 SAMPLE_TYPE; fmt:USP_ADJUST= typedef enum4 {USP_LINE_PAD \= 0, USP_TRACE_PAD \= 0, USP_HLH_SIZE \= 2236} USP_ADJUST; fmt:SEQUENCE= typedef USP_TRACE SEQUENCE; alias:fmt:USP_TRACE_PAD= fmt:USP_ADJUST alias:fmt:USP_HLH_SIZE= fmt:USP_ADJUST alias:fmt:USP_LINE_PAD= fmt:USP_ADJUST usp_NumRec= 2056... But still better than having to change your code or relink your code for every different mutant data format! It also makes it possible to interoperate with historical data formats without too much pain.

22 22 DDS scripting as a Rosetta stone /apps/global/bin/bridge \ in= /hpc/dat13/zdsr01/Node/EQ/all.segy \ in_format=segy out_format=usp \ comment="Component Type" \ map:segy:usp.RcComp= "TotalStatic" \ comment="Src and rec locations" \ map:segy:usp.SrPtXC= "SrcX / 10" \ map:segy:usp.SrPtYC= "SrcY / 10" \ map:segy:usp.SrPtEl= "15" \ map:segy:usp.ShtDep= "SrcDepth / 10" \ comment="Azimuth, Roll Tilt" \ map:segy:usp.TVPT01= "100 * Spare.F4[11]" \ map:segy:usp.TVPT02= "100 * Spare.F4[12]" \ map:segy:usp.TVPT03= "100 * Spare.F4[13]" \ comment="Dead or Live" \ map:segy:usp.StaCor= '( TrcIdCode - 1 ) * 30000' \ comment="Shot Time" \ map:segy:usp.TVPT15=Date.DateYear \ map:segy:usp.TVPT16=Date.DateDay \ map:segy:usp.TVPT17=Date.DateHour \ map:segy:usp.TVPT18=Date.DateMin \ map:segy:usp.TVPT19=Date.DateSec \....

23 23 In Conclusion: caveats Things aren’t so complicated if you use DDS as if it were SEPlib, but then what’s the point? Because so much functionality already exists in USP, there has been little motivation to flesh out DDS. The external distribution is a subset of the same stuff we use internally. There has been little effort put into improving the “packaging”. While there is some documentation, it is somewhat lacking!

24 24 In Conclusion: upsides The software infrastructure inside BP today is based almost entirely on DDS and USP. It is BP’s infrastructure both for research and for processing. BP’s advanced imaging team in Houston is “BP’s largest contractor”. The DDS I/O library was released publicly in 2003 on “freeusp.org”. The core of the USP system was released a year or so earlier on the same web site, along with some ARCO-heritage processing systems as well. By releasing USP and DDS, BP hoped to make it easier to share algorithms with academia and contractors. Randy Selzler now wants to create a successor to DDS, but that’s his talk, as the “prophet”, to give...


Download ppt "DDS, A Seismic Processing Architecture Reproducible research workshop UBC, Vancouver, 2006 Randall L. Selzler Data-Warp.com Jerry Ehlers Jerry.Ehlers."

Similar presentations


Ads by Google