Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Aerospace Data Storage and Processing Systems Implementation of High-Rate JPEG2000 Coding on a Virtex-2 Pro Reconfigurable Computing Board Presented.

Similar presentations


Presentation on theme: "1 Aerospace Data Storage and Processing Systems Implementation of High-Rate JPEG2000 Coding on a Virtex-2 Pro Reconfigurable Computing Board Presented."— Presentation transcript:

1 1 Aerospace Data Storage and Processing Systems Implementation of High-Rate JPEG2000 Coding on a Virtex-2 Pro Reconfigurable Computing Board Presented by Damon Van Buren SEAKR Engineering MAPLD 2004 Submission 133

2 2 Aerospace Data Storage and Processing Systems Van BurenSubmission 133 The Sensor Bandwidth Problem oCommercial satellite imaging systems are experiencing growth in imaging capability... Higher resolution: < 1 m Larger images: >10k image width and height More spectral components –Panchromatic –Red/Green/Blue –Multi-spectral oImproved capabilities are leading to high sensor data rates Data output rates > 2 Gbps for some systems oProviding storage and downlink bandwidth for the data is becoming a significant challenge for system designers The largest data recorders can store less than 20 minutes of data at 2 Gbps Downlinks must be several hundred Mbps to downlink 15 minutes of data in under an hour Data storage and high-bandwidth downlinks require lots of power oBy reducing the amount of image data, compression provides a solution to the bandwidth problem!

3 3 Aerospace Data Storage and Processing Systems Van BurenSubmission 133 Desired Compressor Features oReal Time Compression must be performed in real time, prior to storage. High throughput (> 2 Gbps) oExcellent Performance in Lossy and Lossless Modes Purchasers of satellite imagery are sensitive to reductions in image quality caused by lossy compression. Scientific users prefer undistorted data (bit true). oSpace-Qualified Must survive hazards of launch and space operation, including radiation. oLow Risk Satellite imaging companies seek high reliability solutions.. oLow Cost Commercial customers require cost effective solutions. oFlexible The ability to support varying compression ratios and contents would allow more effective use of available storage and bandwidth.

4 4 Aerospace Data Storage and Processing Systems Van BurenSubmission 133 JPEG2000 Algorithm oJPEG2000 is an excellent choice for satellite image compression. Latest still image compression standard from the JPEG committee oMeets two key requirements for satellite image compression: Excellent performance in both lossy and lossless modes. –~1.7 to 1 lossless compression for typical satellite imagery - 70% improvement! –Visually lossless compression > 2 to 1 - 100% improvement in storage and downlink performance. Very flexible: –Many options for compressed images. oOther advantages: International Standard Wavelet based –High quality lossy images with comp. ratios > 100:1 Packet oriented –Allows random access to the compressed code stream. –Makes compressed data more robust in the presence of bit errors. –Allows selection of image quality, spatial region, resolution, and color component after compression.

5 5 Aerospace Data Storage and Processing Systems Van BurenSubmission 133 JPEG2000 Implementation Challenges oJPEG2000 is a very complex algorithm. More Features = More Complexity. oOperation intensive Several hundred operations per pixel, because each bit must be processed many times, for the wavelet transform, entropy coding, MQ coding, packet generation, etc. oComplex Many different stages to produce compressed output. –Wavelet transform. –Quantization. –Context generation. –Arithmetic coding. –Packet generation. Many parameters must be tracked individually for each code block (64x64). oMemory intensive Each pixel must be accessed many times, so many small buffers are needed to get good throughput. oFew processors are capable of implementing JPEG2000 at high rates!

6 6 Aerospace Data Storage and Processing Systems Van BurenSubmission 133 High-Performance Processing Using Xilinx FPGAs oXilinx FPGAs have many advantages for fast parallel processing: Millions of gates. System clocks of several hundred MHz. High speed I/O –622 Mbps LVDS –Multi-Gigabit serial I/O Hundreds of internal block RAMS. Hundreds of internal 18 bit multipliers. oXilinx FPGAs are available in a space qualified versions: Radiation testing is complete on the Virtex and Virtex-II devices. –~200 kRad total dose, latchup immune. Radiation testing to begin on the Virtex-II Pro devices soon. oXilinx FPGAs are very flexible, reducing risk: May be re-programmed an infinite number of times. Configurations may be uploaded at any time during the mission to fix errors or add new capability. oXilinx FPGAs are the best solution for fast compression in space!

7 7 Aerospace Data Storage and Processing Systems Van BurenSubmission 133 Challenges for Xilinx Use in Space oThe effects of radiation in spacecraft electronics are well known. Caused primarily by charged particles. May cause permanent damage over time by ionizing SiO 2 (total dose). May also cause errors in digital logic by upsetting registers (single event effects). Mitigation techniques are used to reduce or eliminate the effect of radiation upsets. –Triple Modular Redundancy (TMR) uses voting to select the correct output from 3 separate instances of the design. oMitigation of radiation effects in SRAM-based FPGAs presents an additional challenge: As with other digital electronics, the functional logic of the device is susceptible to upset, however... Another layer of logic (configuration logic) controls the routing of the part, giving the device its capability to be reprogrammed to perform different functions. Configuration logic is also susceptible to radiation upsets. oXilinx FPGAs require system level mitigation strategies in addition to the device level mitigation techniques (such as TMR) that are commonly used for space electronics. Configuration data must be continuously re-written, or scrubbed using a read- and-correct approach.

8 8 Aerospace Data Storage and Processing Systems Van BurenSubmission 133 SEAKR’s RCC Board Processing Solutions oSEAKR has developed a line of Reconfigurable Computing (RCC) products based on the Xilinx FPGAs. RCC 1 – 4x Virtex 1000s RCC 2 – 4x Virtex II 6000s RCC 3 (NTRCC) – 4x Virtex II Pro 70/100s oBoards include system-level upset mitigation (scrub) for the Xilinx devices. Configuration data is continuously read and checked for errors. Errors are corrected by overwriting the corrupted frames, without interrupting the operation of the device. oOther devices on board employ radiation mitigation strategies as well: Radiation hardened EDAC oBoards also have dedicated resources to support high-performance processing: High speed I/O. External memories. oIndustry standard form-factor: 6U Compact PCI.

9 9 Aerospace Data Storage and Processing Systems Van BurenSubmission 133 Network RCC (NTRCC) oFour Xilinx XC2VP70-6FF1704 FPGA CO-Processors Design compatible with XC2VP100-6FF1706 and V2P-X o(4) banks of 1Mx36 Quad Data Rate (QDR) SRAMs for each COP o512MB of DDRII Shared SDRAM memory for prototype 1GB of 128M x 64 EDAC (R-S) Protected DDRII SDRAM shared memory (19.2Gbps @150MHz) using 1Gbit memory oNetwork IF (2) parallel 16bit RapidIO ports to front panel (8 Gbps) (1) 4x3.125 Gbps serial port to front panel (>10Gbps) 4x3.125 Gbps ports from NIC to each COP (>10Gbps) 4x3.125 Gbps ports from each COP to each neighbor COP (>10Gbps) oShared Data Buses Cop Interconnect Bus (~4.224 Gbps) cPCI 32bit 33Mhz oRead and write COP configurations via cPCI oExtended 6U form factor oConfiguration RAM SEU detection and correction DDRII SDRAM on configuration controller for shadow config program storage oNon-Volatile memory for 16 different configurations (1 Gbit Flash)

10 10 Aerospace Data Storage and Processing Systems Van BurenSubmission 133 Network RCC Block Diagram

11 11 Aerospace Data Storage and Processing Systems Van BurenSubmission 133 NTRCC Layout o24 Layer board oMicroVias, blind vias, via-in-pad oHigh speed 3.125 Gbps Serial links o82 pages of schematic capture o10 weeks of PCB layout time

12 12 Aerospace Data Storage and Processing Systems Van BurenSubmission 133 Implementation of the JPEG2000 Algorithm oThe JPEG2000 core has been in development for over a year. Eventual target data rate 600 Mbps/device. Written in VHDL. Simulations performed in Modelsim. Synthesis in Synplify_Pro. oTargeted to the NTRCC-R summer ‘04. Targeted to a reduced version of the NTRCC with a single coprocessor. Take advantage of improved external memory throughput. Ultimately use the high-speed serial I/O to move image information on the board. oDesigned for high throughput. Cycle efficient coding style. Highly parallel design. Pipelined architecture. Rolling wavelet transform. oDesigned for flexible output file format. Output is divided into quality layers for easy selection of compression ratio.

13 13 Aerospace Data Storage and Processing Systems Van BurenSubmission 133 JPEG2000 Block Diagram

14 14 Aerospace Data Storage and Processing Systems Van BurenSubmission 133 JPEG2000 Coding Steps oImage is broken into tiles oTiles are wavelet transformed 5/3 reversible or 9/7 irreversible, also user defined. Selectable number of transform levels. oEach subband from the transform is further broken up into code blocks (typically 32x32 or 64x64) for entropy coding. oEach code block is entropy coded, starting from the top bit plane and working down. The current bit of each pixel is passed to an arithmetic coder, along with context information. The MQ encoder takes advantage of any skewing of the probability for each context, and adapts contexts as the coding progresses. oPackets are formed by combining the entropy coder outputs from a single resolution. oTile parts are formed from all the packet in a given bit plane.

15 15 Aerospace Data Storage and Processing Systems Van BurenSubmission 133 JPEG2000 Architecture Drivers oTo achieve high data rates, the processing must be paralleled as much as possible. oThe “tall pole in the tent” is the arithmetic coding, because the coding of a single data bit with its context can take several clock cycles. oSignificance propagation coding is also a challenge, because each coefficient must be accessed many times, as each bit plane is processed. oOther operations, such as wavelet transform, code block loading, and packet generation are much more efficient, and require fewer parallel paths. oA pipelined architecture with many entropy coders in parallel was used to achieve the required throughput.

16 16 Aerospace Data Storage and Processing Systems Van BurenSubmission 133 Architecture Description oProcesses 256x256 tiles. oPipelined architecture, using separate external memories for image, tile, and compressed data storage. o19 Entropy coders working in parallel to improve throughput, one for each code block. 64x64 code blocks. oFIFO buffering between the stages improves data flow efficiency. oA rolling wavelet transform is used to reduce memory accesses and improve efficiency. oEntropy coder outputs are formed into layers, giving each tile a progressive output format. oTile parts are interleaved as the image tiles are processed. oPerforms lossy or lossless compression.

17 17 Aerospace Data Storage and Processing Systems Van BurenSubmission 133 NTRCC-R Implementation Results oThe JPEG2000 encoder was targeted to the V2Pro 70 FPGA on the NTRCC-R. Lossless or Lossy compression. Data precision up to 13 bits. oSimulation and Routing Results: Slices: 30043 out of 33088, 90% Block RAMS: 148 out of 328, 45% Max system clock ~43 MHz without optimization. oHardware Throughput: ~140 Mbps w/ 33 MHz clock (depending on image.) ~180 Mbps w/ 43 Mhz clock.

18 18 Aerospace Data Storage and Processing Systems Van BurenSubmission 133 JPEG2000 Floorplan oThe Pro 70 Device is quite full!

19 19 Aerospace Data Storage and Processing Systems Van BurenSubmission 133 Planned Improvements oOptimize design to hit 66 MHz. Un-optimized design will operate at up to 43 MHz. Use of asynchronous fifos will allow optimal clocking of various parts of the design. oImprove pipelining of code block loader and wavelet transform. Allow “autonomous” operation of each stage, so that operations take place as soon as input data and output buffers are ready. oMake use of additional QDR SRAMs available to each coprocessor by creating separate buffers for wavelet transform and packetizer output. NTRCC has 4 QDR memories for each coprocessor. oArithmetic coder bypass. Arithmetic coder requires > 2 cycles per bit coded, on average. o9/7 wavelet transform with quantization. Use of the 9/7 wavelet results in better SNR and max error performance for lossy compression. oAdd RapidIO serial interface to Network Interface Chip (NIC).

20 20 Aerospace Data Storage and Processing Systems Van BurenSubmission 133 Conclusions oThe JPEG2000 core is expected to provide a valuable option for satellite imagery systems. Compression will result in a dramatic improvement in system performance. Lossless compression will allow ~70% more image data to be stored and downlinked by a system. Lossy compression will allow even greater improvements. oNTRCC hardware is an excellent platform for the compressor. High bandwidth interconnect and I/O (several Gbps). High bandwidth external memories. Excellent processing capability with the Virtex-II Pro devices. oThe sky’s the limit! Target rate of 600 Mbps per device appears to be a realistic goal. Some improvements are left to be made to the clock rate and pipelining of the design.


Download ppt "1 Aerospace Data Storage and Processing Systems Implementation of High-Rate JPEG2000 Coding on a Virtex-2 Pro Reconfigurable Computing Board Presented."

Similar presentations


Ads by Google