Download presentation
Presentation is loading. Please wait.
Published byAbel Cox Modified over 7 years ago
1
Storage and Dissemination of SEGY Data in JPEG2000 Format Bob Courtney Geological Survey of Canada (Atlantic) The Geological Survey of Canada (GSC) collects over 1 TByte of SEGY (Society of Exploration Geophysicists)-formatted high-resolution seismic and sidescan data each year, and wishes to implement an efficient mechanism for storing, discovering and accessing these data holdings. Currently, scientists often use field-generated paper records in preference to digital data, as accessing and processing existing digital holdings involves excessive effort. High resolution chirp subbottom data is routinely collected during multibeam survey expeditions and, since hardcopy is not routinely generated during the survey, these data remain largely unutilized. Much of our digital holdings have never been examined and their content has not been verified. The GSC is exploring JPEG2000 technology to address these challenges. The JPEG2000 framework, although commonly associated with images, is a wavelet-based compression standard that can accommodate multiplane, signed data arrays with up to 38 bits of resolution. It can generate a multiresolution representation with quality layers and random file access, allowing a quick and quality-progressive peek into file contents. It provides both reversible and lossy compression options. JPEG2000 is a flexible file format optimized for transfer over low-bandwidth internet connections and allows the inclusion of user-designed content and more standardized XML boxes for metadata. A Windows-based application was developed to transform single channel SEGY data into JPEG2000 format. An XML schema (XSD) was developed to encode SEGY tape and trace header data into JPEG2000 xml boxes. These XML data will also be used as discovery metadata in GSC’s relational databases to describe seismic data holdings. Trace data was encoded in a data array at a bit resolution sufficient to preserve data fidelity since JPEG2000 can encode data at fractional word lengths. A series of trials on existing seismic data sets show that a loss-free conversion of SEGY to JPEG2000 will reduce file size by a factor more than 2:1 but that partial, lossy versions of the same data show no visible artifacts for compression ratios in excess of 20:1. In addition, very large files can be easily and quickly viewed over local networks and the internet using standard JPEG2000 viewers without decoding the entire data stream. Software to convert SEGY to and from JPEG2000 is freely available from the author
2
The Context & Problem The Context & Problems
Marine program collecting digital SEGY data since early 90’s Scientists use printed field records in preference to digital products New digital systems (e.g., 3.5 khz chirp on multibeam vessels) strictly digital. Digital processing issues – data size ( > 1 TB/yr) , comparison to gold-standard printed records, time vs return Discovery and Dissemination problems – cost of copying analog records, record degradation, size of digital SEGY archives Database population and update issues – no validation of digital data The GSC has been collecting digital SEGY seismic data since the early 90s and, until this time, much of the data has been left unexamined on digital tape. The scientists prefer to use the field records recorded at the time as it is much less work to examine the data in that fashion. This problem has been exacerbated over the last decade as chirp 3.5 kHz data has been routinely collected on all of our multibeam cruises: these data are purely digital and field records do not exist. This is true as well for a new generation of digital sidescan wherein no analog record, is recorded. Problems have arisen with these field records: these problems include storage issues, degradation of field records [for example, thermal paper, which became more prevalent in the 90s, has progressively faded over time.], and dissemination issues [cost of photocopying of field records, is prohibitively expensive, and requires intervention of archive staff]. External users can currently find out that we have records, but obtaining the records is another matter -- we're looking at delays of weeks if not months. Various researchers over the years have used industry-standard seismic mapping systems to interpret our digital data streams. But generally, these efforts have not been rewarding in terms of time and results. Scientists always use the printed field records as the gold standard, and rarely does a digital display of segy data approach the fidelity and quality present in the in the field records. So the previous task of quickly scanning through analog records, laying them out on the table, and interpreting has been replaced by clumsy processing through digital means. Dissemination of seismic data, whether digital or analog, until this point has been a huge issue. Even the discovery and examination of these data by in-house staff is both time-consuming and resource consuming. For analog sections, we generally only have one copy , that copy recorded on the expedition, so if one scientist has checked out the roll, no other scientist generally has access until the copy is return to the archives, signed in , and then signed out. Since we routinely collect 1 to 2 GB of data per seismic channel per 24-hour period, the dissemination of these data over standard Internet channels still remains a challenge. For an outside user to gauge the detail and quality in a seismic section, at present he would have to download the entire data set, load it into a mapping package, and visualize. This is a lot of work.
3
GSC Implementation of JPEG2000
GSC has implemented the JPEG2000 framework to consolidate, encode, archive, interpret and disseminate digital SEGY data. Experience suggests between 10:1 to 40:1 compression effective Approach applied to seismic, sidescan, and sounder data. Will (?) be extended to image trace data of multibeam sounders (water column imaging), other gridded data sets. All ancillary data encoded via XML schemas metadata harvesting for database during normal processing (carrot vs stick approach) The GSC is now using the JPEG 2000 framework for encoding, storing, interpreting and disseminating our seismic data. In what follows is important to recognize that this approach applies to seismic data, sidescan data and multibeam image data [with the advent of water column imaging multibeams, it is obvious that new strategies for data storage and display are necessary.]. In this approach, we have been careful to the store all data in XML format, these schemas for these formats are available on request. In constructing these XML data sets , as we naturally convert the data from segy to JPEG 2000, we generate data objects that are easily inserted into database systems. In this way, we automate the harvesting of metadata for these data sets, such that the arduous duty of documenting the details of our seismic coverage as separate task is avoided. For example, in generating composite segy files from field files, we use database keys (e.g., expedition name, data type and instrument type) , both in the nomenclature of the composite files , and in XML packages inserted into the JPEG 2000 files.
4
What is JPEG2000 ? JPEG2000 is definitely not JPEG
Open file standard ISO/IEC :2000 Wavelet based, multiresolution representation Up to 38 bit signed data – not just images Up to 16,000 planes/channels Entropy-based (MQ) bit-plane encoding (save 20 bits instead of 32, white space costs almost nothing) Lossless/lossy encoding - harmonic distortion for lossy compression Flexible file format –XML-aware, UUID defined boxes Random access to ROI, transcoding, quality layers,etc Internet ready : JPIP => low bandwidth optimized Industry support: e.g., Lizardtech, Adobe Photoshop JPEG 2000 (framework) : Open file standard ISO/IEC :2000 - Wavelet based, multiresolution Up to 38 bit signed data – not just images Up to 16,000 planes of data Entropy-based (MQ) binary bit-plane encoding ( i.e., save 20 bits instead of 32, white space costs almost nothing) Lossless/lossy encoding - harmonic distortion for lossy compression Flexible file format –XML-aware, can include anything in user defined boxes Random access to regions of interest. Internet ready : jpip industry support: Lizardtech, tech Adobe Photoshop
5
SEGY JPEG2000 Processing Framework
Viewers GIS Tape Encode Interpret Harvest Archive DVD Application software has been developed for harvesting, converting and interpreting segy data in the JPEG 2000 format. Each of these steps comprises a separate application which can be used to condense and interpret data from cruises. These applications may be run on modest Windows platforms although RAM in excess of 2 GB is recommended. Combine_segy : combine and demultiplex segy data from cruises. SgyJp2 : convert segy data to and from JPEG 2000 format. SgyJp2_Viewer : view and interpret JPEG 2000 encoded segy fdata Register Convert QC Internet Scan
6
Harvest Demultiplex and Combine
File 1 File 2 This application has been designed to load, demultiplex and combine digital SEGY files that are collected during marine field operations. During these expeditions, digitizers are often set up to record in one-hour chunks or they are set up to record files of a predetermined file size, often around 100 MB. Consequently, each field day will generate over 20 files per seismic recorder, and typically 1 to 2 GB of data. This program will combine the multiple files from one day (or days) into one large SEGY file with a sel describing name. Most marine geologists prefer to use an electrostatic hardcopy of the seismic data and find it quite convenient to roll out a single record that typically contains data from multiple field days. They find it a challenge to process and display digital seismic data, as often the number of files and the size of files present operational difficulties. Consequently, most of the digital seismic data collected over the last decade by the GSC have not been interpreted or even verified from digital tape. Current seismic recorders (e.g. GSCDIG) used in GSC field operations generally record a single seismic channel in a datafile and multiple data files are recorded simultaneously per instrument. In previous seismic recorders (e.g., AGCDIG multiple channels were encoded in a single SEGY file including a NMEA navigation datastream. File n-1 Demultiplex Combine Channels Concatenate Big SEGY >200,000 pings 2 GB File n
7
Harvest Demultiplex and Combine
Reduce number of files => 1 file/day rather than 50 Database-linked nomenclature Composite channel files; sidescan, 2 channel high res Self descriptive file names Expedition_datatype_instrument_xdcr_starttime_endtime _SEISMIC_KNUDSON_3.5khz_132_0007_to_132_1217.sgy Other systems create non-SEGY native data streams. The Knudsen Chirp recorder generates files in a proprietary KEB format which can be converted into SEGY format using a program that can be obtained from the manufacturer. Our new Klein sidescan record in a Klein SDF format ; at the moment, we are working on a converter to SEGY. In subsequent programs, these combined SEGY files will be converted into a JPEG 2000 files, which will offer substantial compression [more than 90%] and the means of easily viewing and interpreting these data with both off-the-shelf and custom image viewing software This program will reduce the number of seismic data files by producing very large, combined segy files and subsequent steps will make the handling of these very large data files not only possible but very easy.
8
Encode SEGY SGYJP2 XML to Database XML 1:1 10:1 SEGY SGYJP2 Waveform
SEGY.xsd XML to Database GZIP XML SEGY headers Summary data 1:1 10:1 Filter Signal Cond. Outliers Bipolar Envelope Half-wave SEGY SGYJP2 JPEG2000 Compression Engine 10:1 - 40:1 This application is used to convert segy data to JPEG 2000 format, and JPEG 2000 formatted seismic data into segy. The trace data is encoded in the image part of the JPEG 2000 file as bipolar data and the segy file and trace header information is encoded in a compressed XML format located in a user-defined box attached the file. The schema based XML structure allows other programs written in the language of choice to interpret the data payload, perhaps to prune out metadata of interest for insertion into institutional databases. With care, the schema representation also allows the data payload to be extended, to add in new data objects within the definition without the need to rewrite payloads written with a previous schema definitions. The schema definition will be made available to interested parties. We've taken some care to reduce the redundancy that often is seen in binary trace headers, where fields are often repeated and often zero, and even in an ASCII representation. The size of the XML package is generally comparable to the size of the binary trace headers. When compressed, the XML package shrinks to less than 10% of its original size. The application is set up for batch processing. So that the operator can queue up a series of files and walk away. The program allows the user to interrogate these segy files, for example, displaying traces from the file, to filter the data, add metadata and convert. The application will generate three versions of the input data set: Trace waveform data. Envelope of the trace data. Half-wave rectification of the trace data For data display and interpretation is recommended that the user generates either envelope or half wave versions. These versions have nonzero average values in all multiresolution representations so that when a user zooms in and zooms out a scaled version of the data will be displayed as if the user walked away or walked towards a section on the wall. In contrast, bipolar trace waveform data generally tends to have a zero mean when averaged in the progressive multi-resolution representation. Thus, the zoomed out version would look featureless. This format would be used primarily for storage and dissemination of waveform data. Given the high degree of compression that is possible with this technique, it is not unreasonable to produce all three versions of the file and keep them online. It is also in future plans to consider generating a multiplane representation of the JPEG 2000 file which includes all three versions in the same file. A user could then interpret any of the three versions with minimal computational overhead and the three versions would be available in the interpretation packages. Waveform Data zero padding trace delays lossless or lossy keep only significant bit-depth choose reduced bit-depth scaled to highest amplitude;
9
Encode SEGY SGYJP2
10
Encode SEGY SGYJP2
11
Encode SEGY SGYJP2
12
Encode SEGY SGYJP2 Sample from 3.5 khz Knudsen – Creed St.Lawrence Estuary 69333 traces; samples/tr ; 12 hr data; 10:1 compression Lizardtech IE plugin
13
Encode SEGY SGYJP2 Signal amplitudes (in this case; envelope) encoded in file; Anti-aliasing at all zoom levels
14
Encode SEGY SGYJP2 50:1 / 0.3 bpp 10:1 1.6 bpp
Comparison => 10:1 to 50:1 compression
15
Encode SEGY SGYJP2 Sidescan trace encoding; equally effective for MBES
16
Interpret SGYJP2 XML to Database XML Horizons SGYJP2 XML SGYJP2
horizons.xsd XML to Database XML Horizons View & Interpret SGYJP2 GZIP XML Markers SGYJP2 It is our intention to distribute both the data collected on a cruises and also the value-added interpretations or scientists have made of these data. The files coming from SgyJp2 can be viewed and served on the Internet as images, but to fully exploit the geospatial and temporal aspects encoded in the segy trace headers, we have developed software to fully utilize this information. Again, all interpretations are encoded in a schema based XML data payload. This interpretation package allows the user to view, zoom in, and zoom out the data contained in the JPEG 2000 file. It should be recognized that the data in the JPEG 2000 is not stored as image grey levels, but is stored as a signal amplitudes as defined in the conversion, whether it be trace waveform data, envelope data, or half wave rectified data. It is only mapped to a grayscale when it is extracted from the file for viewing. This grey level mapping is easily changed with the palette box and application. The interpretation package allows the user to digitize horizons, markers, and sections. XML Sections Shapefiles GIS Automation
17
Interpret SGYJP2 It is our intention to distribute both the data collected on a cruises and also the value-added interpretations or scientists have made of these data. The files coming from SgyJp2 can be viewed and served on the Internet as images, but to fully exploit the geospatial and temporal aspects encoded in the segy trace headers, we have developed software to fully utilize this information. Again, all interpretations are encoded in a schema based XML data payload. This interpretation package allows the user to view, zoom in, and zoom out the data contained in the JPEG 2000 file. It should be recognized that the data in the JPEG 2000 is not stored as image grey levels, but is stored as a signal amplitudes as defined in the conversion, whether it be trace waveform data, envelope data, or half wave rectified data. It is only mapped to a grayscale when it is extracted from the file for viewing. This grey level mapping is easily changed with the palette box and application. The interpretation package allows the user to digitize horizons, markers, and sections.
18
Interpret SGYJP2 It is our intention to distribute both the data collected on a cruises and also the value-added interpretations or scientists have made of these data. The files coming from SgyJp2 can be viewed and served on the Internet as images, but to fully exploit the geospatial and temporal aspects encoded in the segy trace headers, we have developed software to fully utilize this information. Again, all interpretations are encoded in a schema based XML data payload. This interpretation package allows the user to view, zoom in, and zoom out the data contained in the JPEG 2000 file. It should be recognized that the data in the JPEG 2000 is not stored as image grey levels, but is stored as a signal amplitudes as defined in the conversion, whether it be trace waveform data, envelope data, or half wave rectified data. It is only mapped to a grayscale when it is extracted from the file for viewing. This grey level mapping is easily changed with the palette box and application. The interpretation package allows the user to digitize horizons, markers, and sections.
19
Shapefiles and ESRI automation Google Earth KMZ
Ongoing Efforts SGYJP2 Shapefiles and ESRI automation Google Earth KMZ Drivers for Klein digital formats GSF, XTF encoding ( XML schemas) Shapefiles and ESRI automation Google Earth KMZ Drivers for Klein digital formats GSF, XTF encoding ( XML schemas) Presently, we have implemented these strategies for seismic and sidescan data that have been encoded in a segy format. We are implementing extensive linkages to ESRI products from our mapping tools through the generation of coded shape files , and direct process to process intercommunication strategies to link mapping tools with the desktop GIS. Google Earth will also be supported through the generation of KML. We have written drivers for digital Kline data, and that datastream will be addressed in the near future. Hooks have been inserted in the schema for the potential presence of both XTF and GSF files. The other file types are possible for example, we are considering a means of encoding SIMRAD image datagram data in this format. At present, the JPEG formulation includes two wavelet transformations: the 5 x 3 reversible transform and the 9 x 7 your reversible transform. In the in the last standard JPX standard, there is the provision for supplying custom wavelet transforms including KLT transforms. This is an area of research that is of significant interest to me and I would hope to spend some time in the future on this issue. Internet-based Web service and service strategies will likely be developed for the transmission of segy and user-defined content. Multiscale methods of characterization, outlier detection, etc Bathymetry gridding 1 m, 2m, 4m, 8m , 16 => multiresolution representations in one data set Distortion versus rate => how accurate do you want? 1%, .1%, =>quality layers, transcoding
20
Research Efforts SGYJP2
Multiplane data => MBES; multichannel seismic Wavelet transforms – custom based, KLT? Web services => extend/adapt JPIP Multiscale methods of data cleaning, characterization Bathymetry gridding => 1m, 2m,4m =>multiscale Rate versus distortion =>how much accuracy do you need? 1% , 0.1%, etc Presently, we have implemented these strategies for seismic and sidescan data that have been encoded in a segy format. We are implementing extensive linkages to ESRI products from our mapping tools through the generation of coded shape files , and direct process to process intercommunication strategies to link mapping tools with the desktop GIS. Google Earth will also be supported through the generation of KML. We have written drivers for digital Kline data, and that datastream will be addressed in the near future. Hooks have been inserted in the schema for the potential presence of both XTF and GSF files. The other file types are possible for example, we are considering a means of encoding SIMRAD image datagram data in this format. At present, the JPEG formulation includes two wavelet transformations: the 5 x 3 reversible transform and the 9 x 7 your reversible transform. In the in the last standard JPX standard, there is the provision for supplying custom wavelet transforms including KLT transforms. This is an area of research that is of significant interest to me and I would hope to spend some time in the future on this issue. Internet-based Web service and service strategies will likely be developed for the transmission of segy and user-defined content. Multiscale methods of characterization, outlier detection, etc Bathymetry gridding 1 m, 2m, 4m, 8m , 16 => multiresolution representations in one data set Distortion versus rate => how accurate do you want? 1%, .1%, =>quality layers, transcoding
21
Software SGYJP2 Tools and schemas developed to disseminate GSC data
Tools and schemas are free Single user, no distribute (need to measure impact) request to No support – we have limited capacity Welcome research partnerships to extend and continue efforts The three programs for harvesting, converting and interpreting segy data have been developed under the Geoscience fro Ocean Mapping Program to help aid the interpretation and dissemination of program results and data. These tools are freely available to anyone who asks. However, we have a limited capacity to provide support. In this light , updates to these programs will be made available on a periodic basis as the opportunity arises. Software is distributed as a single user license with no rights to redistribute the installation. The latest distribution package will be made freely available upon request via . This is simply a means of gauging the penetration of these techniques into the community. Absolutely no support or assistance is available for external use of these free programs. For those who desire some support, access to the newest innovations as they come along and have input to development, we hope to develop joint research agreements to further this effort.
Similar presentations
© 2025 SlidePlayer.com Inc.
All rights reserved.