DAC50, Designer Track, 156-VB543 Parallel Design Methodology for Video Codec LSI with High-level Synthesis and FPGA-based Platform Kazuya YOKOHARI, Koyo.

DAC50, Designer Track, 156-VB543 Parallel Design Methodology for Video Codec LSI with High-level Synthesis and FPGA-based Platform Kazuya YOKOHARI, Koyo NITTA, Mitsuo IKEDA, and Atsushi SHIMIZU NTT Media Intelligence Laboratories Thank you for you introduction. I will talking about “Parallel Design Methodology for Video Codec LSI with High-level Synthesis and FPGA-based Platform”. 6/5/2013

Outline Introduction Proposed Design Methodology
Case Study: 4K HEVC Intra Codec Evaluation Conclusion This is my presentation outline. 6/5/2013

Video Codec LSI MPEG-2 and H.264/AVC are major standards of video coding. We have developed MPEG-2 video codec LSI (VASA) and H.264/AVC codec LSI (SARA). The development of video codec LSI needs many simulations. Codec LSI Bit Stream (Coded Image) VASA (MPEG-2) SARA (H.264/AVC) Objective evaluation examples: BD-Bitrate, SSIM, PSNR Test data Firstly, I will introduce video codec LSI. MPEG-2 and H.264/AVC are major international standards of video coding. And, we have developed MPEG-2 video codec LSI, VASA, and H.264/AVC codec LSI, SARA. For the development of video codec LSI, there are various kinds of test data. For some examples, test data are landscape, night scene, sports, and so on. Therefore, the development of video codec LSI needs to perform many simulations with these various test data. After these simulations, designer should evaluate these results. At the video codec LSI design, there are two evaluation method, objective and subjective evaluation method. Major objective evaluation methods are BD-Bitrate, SSIM, and PSNR. Degradations of some coded images are not detected by these objective evaluations. Therefore, it is important that to evaluate coded image in real-time using subjective evaluation method. So, if the simulation time become faster, the video codec design term become shorter. Coded image should be evaluated by subjective and objective evaluation. Degradations of some coded images are not detected by objective evaluation. Subjective evaluation in real-time is important to find these degradations. 6/5/2013

Existing LSI Design Flow
Even behavioral design which is fastest simulation environment needs 100 times simulation time, at the existing design flow. Fast simulation environment is important, since many simulations are needed at the video codec LSI design. Simulation Speed Existing architecture exploration loop SystemC source codes Behavioral design Fail X100 (on CPU) Verification Stimulus Pass Behavioral Synthesis RTL design Verilog-RTL codes X1,000 (on CPU) X100 (on emulator) Fail Here is an example of the existing LSI design flow. At this design flow, hardware designers use SystemC. As you know, simulation speeds are differ for each design level. Even behavioral design, which is fastest simulation environment needs 100 times simulation time, at the this design flow. Fast simulation environment is very important, because many simulations are needed at the development of video codec LSI design. Verification Technology Library Pass Logic Synthesis Verilog-RTL codes (already verified) Gate-level design X10,000 (on CPU) X1,000 (on emulator) P & R ASIC FPGA IP core 6/5/2013

The Problems of The Video Codec LSI Development
Many simulations are needed at the development of the video codec LSI. The simulation needs 100 times simulation time at the existing LSI design. To resolve above problems, simulation and circuit design environments are important to check and improve codec LSI performance smoothly. Simulation environment: FPGA-based platform. Here is the summarized the problems of the video codec LSI development. The development of the video codec LSI needs many simulations. However, the simulation needs 100 times simulation time, at the existing LSI design. To resolve these problems, there are two important environments. One is simulation environment, the other is circuit design environment. At the proposed design methodology, FPGA-based platform is adopted as a simulation environment. Real-time simulation becomes possible using FPGA. The circuit scale of the video codec LSI is very large, therefore we need to use not only single FPGA but also some FPGAs. And, high-level synthesis is adopted, as a circuit design environment. Rapid prototyping becomes possible using high-level synthesis. A real-time simulation can be immediately performed after the design of a circuit and the coded image can be checked, by using FPGA-based platform and high-level synthesis. Real-time simulation becomes possible using FPGA. Circuit design environment: High-level synthesis. Rapid prototyping becomes possible using high-level synthesis. 6/5/2013

Video Codec Design Platform
The video codec design platform is able to run large scale circuit simulation in real-time using many FPGAs. The proposed platform enables input and output image data in real-time using some SDI interfaces. FPGA1 FPGA2 FPGA (Center) FPGA3 FPGA4 This is the proposed video codec design platform. As you can see, this platform consists of some FPGAs and some I/O ports. Using this platform, we can perform simulation in real-time. So, the algorithm examination of a video codec can be performed without waiting for a long simulation time. The circuit scale of a video codec LSI is very large, therefore only one FPGA cannot run a simulation of a large scale circuit like a product level. The proposed platform enables simulations of a large scale circuit as a product level using many FPGAs. SDI interface The proposed platform has many FPGAs, since the scale of a product level video codec LSI is very large. This platform enables simulations of a product level circuit using many FPGAs. 6/5/2013

Proposed Video Codec Design Flow (1/2)
Proposed design flow enables rapid prototyping using high-level synthesis. Proposed design flow enables real-time simulation using the proposed platform. GOOD Feedback time is needed by repetition of each design steps when single architecture exploration loop is used. NOT GOOD Simulation Speed Existing architecture exploration loop Proposed architecture exploration loop SystemC source codes Behavioral design Fail X100 (on CPU) Verification Stimulus Pass Behavioral Synthesis RTL design Verilog-RTL codes X1,000 (on CPU) X100 (on emulator) Fail Next, we introduce the propose video codec design flow. This proposed design flow enables consistent design from behavioral design to gate level design using high-level synthesis. Using this proposed platform, this design flow enables simulation in real-time using the result of gate level design. However, feedback time is needed by repetition of each design step, because each design step are performed in order from behavioral design to gate-level design. Therefore, the feature of the proposed platform, which can be run simulation in real-time, is not utilized. Verification Technology Library Pass Logic Synthesis Verilog-RTL codes (already verified) Gate-level design X10,000 (on CPU) X1,000 (on emulator) P & R ASIC FPGA IP core X1 (on video codec design platform) 6/5/2013

Proposed Video Codec Design Flow (2/2)
Circuits design is subdivided and parallel design is performed, in order to reduce feedback time by repetition of each design steps. Using parallel design, architecture exploration is realized at high speed. Simulation Speed Existing architecture exploration loop Proposed architecture exploration loop SystemC source codes Behavioral design Fail X100 (on CPU) Verification Stimulus Pass Behavioral Synthesis RTL design Verilog-RTL codes X1,000 (on CPU) X100 (on emulator) Fail To reduce, feedback time by repetition of each design step, circuits design is subdivided to more small functions. By adding a function gradually, different design loops are performed in parallel, don’t wait for the synthesis result of the other design loop. Using parallel design, architecture exploration is realized at high speed. Verification Technology Library Pass Logic Synthesis Verilog-RTL codes (already verified) Gate-level design X10,000 (on CPU) X1,000 (on emulator) P & R ASIC FPGA IP core X1 (on video codec design platform) 6/5/2013

Summary of The Proposed Design Methodology
The proposed parallel design methodology has three features. High-level synthesis. Using high-level synthesis, a target circuit architecture can be easily changed and tuned compared with a RTL design methodology. Video codec design platform. Using video codec design platform, a subjective image evaluation can be performed, since the proposed platform can perform simulation in real-time. Parallel design. Using parallel design and high-level synthesis, the function addition in smaller unit becomes possible that leads to the reduction of a feedback time. This slide summarizes the proposed parallel design methodology. The proposed design methodology has three features. First is High-level Synthesis. Using high-level synthesis, we can easily change and tuned a target circuit architecture compared with a RTL design methodology. Second is video codec design platform. Using video codec design platform, we can perform subjective evaluation, because the platform can perform simulation in real-time. Third is parallel design. Using parallel design, the function addition in smaller unit becomes possible that leads to the reduction of a feedback time. Finally, combining these three features, an effect of subjective image quality of each function can be evaluated, and used for architecture exploration. Combining these three features, an effect of subjective image quality for each function can be evaluated and used for architecture exploration. 6/5/2013

Case Study: 4K HEVC Intra Codec
HEVC (High Efficiency Video Coding) is a next generation video coding standard. HEVC intra codec consists of three blocks, intra prediction, transform and quantization, and entropy coding block. Video Coding Transform and Quantization Input Data Intra Prediction Entropy Coding Output Stream Intra Prediction generates prediction difference image from input data and predicted image data. Transform and Quantization generates quantized values from transformed difference image and reconstruction image from quantized values. Entropy Coding generates bit stream from quantized values. Next, we will introduce one case study using the proposed design methodology. We designed HEVC intra codec. HEVC is a next generation video coding standard. Intra codec has only intra prediction as a prediction function. Therefore, a compress performance of the intra codec is less than full spec codec. However, the coding structure is simple, so it is easy to design video codec. HEVC intra codec consists of three main blocks, intra prediction, transform and quantization, and entropy coding block. 6/5/2013

The Specifications of the HEVC Intra Codec
STEP1 STEP2 (LOOP#1) (LOOP#2) (LOOP#3) Intra Prediction PU: 32x32 Prediction Mode: 4 Prediction Mode: 7 PU: 64x64, 16x16 Transform and Quantization TU: 32x32 TU: 16x16 Entropy Coding CU: 32x32 CU: 64x64 Base Algorithm HM3.0 HM7.0 This slide’s scope. Prediction Mode *CU stands for Coding Unit. *PU stands for Prediction Unit. *TU stands for Transform Unit. *HM is a reference software of HEVC 18 26 34 This slide shows the specifications of the HEVC intra codec. In order to simply structure of the intra codec, we decide the target spec as STEP1 and STEP2 At this presentation, our main target is STEP2. CU, PU, and TU stands for the unit of each process. CU stands for coding unit, PU stands for prediction unit, and TU stands for transformation unit. HM is a reference software of HEVC. In this case study, each unit size are two or three sizes. At the prediction mode, full spec intra codec has 35 modes. In this case study, this value have seven mode at LOOP#2 of SETP2. 10 0: Planar 1: DC 2 6/5/2013

Evaluation (1/2) Circuits Performances and Design Period STEP1 STEP2
LOOP#1 Subjective Evaluation Period Subjective Evaluation Period Feedback data is available The main changed points of each block. LOOP#1: Version up base algorithm of each block LOOP#2: Functional expansion of IPD LOOP#3: Functional expansion of each block STEP2 LOOP#2 This slide shows the circuits performances and design period. At each design STEPs, hardware area or the number of cycle is large at the preliminary design. Three design loops are tried simultaneously at the STEP2. The circuit performances of each expanded function are evaluated at the STEP2. The feedback data from LOOP#1 is available at LOOP#2 and LOOP#3, because subjective evaluation is available at LOOP#1 from 14 months. The circuit performances of each expanded function are evaluated at STEP2. The feedback data is available from other design loops at STEP2. STEP2 LOOP#3 6/5/2013

Evaluation (2/2) Using the proposed parallel design methodology, three design loops were able to be tried in only seven months. Using the proposed parallel design methodology, the number of cycle*area was reduced to 1/5 in four months after preliminary design of the LOOP#1 and 1/4 in three months after preliminary design of the LOOP#2. STEP1 STEP2 This slide shows hardware design efficiency. Using the proposed parallel design methodology, three design loops can be tried in only seven months. Using the proposed parallel design methodology, the number of cycle*area was reduced to 1/5 in four months after preliminary design of the LOOP#1. And, the number of cycle*area was reduced to 1/4 in three months after preliminary design of the LOOP#2. 90% down LOOP#1 80% down (four months) LOOP#2 75% down (three months) 6/5/2013

Conclusion We proposed that the new design methodology for video codec LSI. Using the proposed design methodology, we are able to reduce feedback time and run simulation and evaluate coded image in real-time. Using the proposed design methodology, three design loops were able to be tried in only seven months. Using the proposed design methodology, the number of cycle * area was reduced to 1/5 in four months after preliminary design of the LOOP#1 and 1/4 in three months after preliminary design of the LOOP#2. In order to realize a HEVC codec, we need to add or expand some functional tools, checking subjective evaluation of these tools. I conclude my presentation. We proposed that the new design methodology for video codec LSI. Using the proposed design methodology, we are able to reduce feedback time and run simulation and evaluate coded image in real-time. Using the proposed design methodology, three design loops were able to be tried in only seven months. Using the proposed design methodology, the number of cycle*area was reduced to 1/5 in four months after preliminary design of the LOOP#1 and 1/4 in three months after preliminary design of the LOOP#2. In order to realize a HEVC codec, we need to add or expand some functional tools, checking subjective evaluation of these tools. Thank you for you kind attention. If you are interested in this presentation, please give me some comments. 6/5/2013

DAC50, Designer Track, 156-VB543 Parallel Design Methodology for Video Codec LSI with High-level Synthesis and FPGA-based Platform Kazuya YOKOHARI, Koyo.

Similar presentations

Presentation on theme: "DAC50, Designer Track, 156-VB543 Parallel Design Methodology for Video Codec LSI with High-level Synthesis and FPGA-based Platform Kazuya YOKOHARI, Koyo."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

DAC50, Designer Track, 156-VB543 Parallel Design Methodology for Video Codec LSI with High-level Synthesis and FPGA-based Platform Kazuya YOKOHARI, Koyo.

Similar presentations

Presentation on theme: "DAC50, Designer Track, 156-VB543 Parallel Design Methodology for Video Codec LSI with High-level Synthesis and FPGA-based Platform Kazuya YOKOHARI, Koyo."— Presentation transcript:

Similar presentations

About project

Feedback