Instructor: Dr. Phillip Jones

Instructor: Dr. Phillip Jones
CPRE 583 Reconfigurable Computing Lecture 10: Wed 9/24/2010 (High-level Acceleration Approaches) Instructor: Dr. Phillip Jones Reconfigurable Computing Laboratory Iowa State University Ames, Iowa, USA

Announcements/Reminders
HW2: Due Wed 10/6 Problem 2 will have a separate deadline (to be announced) MP2: Due Fri 10/1 (you can work in pairs) Make sure to read the README file in the MP2 distribution Contains info on how to fix a Gigabit core licensing issue ISE has Start thinking of class projects and forming teams Submit teams and project ideas: Mon 10/11 midnight Project proposal presentations: Wed 10/20

Projects Expectations Working system
Write up that can potentially be submitted to a conference Will use DAC format as write up guide line 15-20minute PowerPoint Presentation DAC (Design Automation Conference) Conference papers Due Date: 5pm (MT) Thur 11/18/2010 Student Design Contest Due Date: 5pm (MT) Wed 11/24/2010,Cash Prizes!

Projects Ideas: Relevant conferences
FPL FPT FCCM FPGA DAC ICCAD Reconfig RTSS RTAS ISCA Micro Super Computing HPCA IPDPS

Initial Project Proposal Slides (5-10 slides)
Project team list: Name, Responsibility (who is project leader) Project idea Motivation (why is this interesting, useful) What will be the end result High-level picture of final product High-level Plan Break project into mile stones Provide initial schedule: I would initially schedule aggressively to have project complete by Thanksgiving. Issues will pop up to cause the schedule to slip. System block diagrams High-level algorithms (if any) Concerns Implementation Conceptual Research papers related to you project idea

Weekly Project Updates
The current state of your project write up Even in the early stages of the project you should be able to write a rough draft of the Introduction and Motivation section The current state of your Final Presentation Your Initial Project proposal presentation (Due Wed 10/20). Should make for a starting point for you Final presentation What things are work & not working What roadblocks are you running into

Projects: Target Timeline
Teams Formed and Idea: Mon 10/11 Project idea in Power Point 3-5 slides Motivation (why is this interesting, useful) What will be the end result High-level picture of final product Project team list: Name, Responsibility High-level Plan/Proposal: Wed 10/20 Power Point 5-10 slides System block diagrams High-level algorithms (if any) Concerns Implementation Conceptual Related research papers (if any)

Projects: Target Timeline
Work on projects: 10/ /8 Weekly update reports More information on updates will be given Presentations: Last Wed/Fri of class Present / Demo what is done at this point 15-20 minutes (depends on number of projects) Final write up and Software/Hardware turned in: Day of final (TBD)

Common Questions

Overview First 15 minutes of Google FPGA lecture How to run Gprof
Discuss some high-level approaches for accelerating applications.

What you should learn Start to get a feel for approaches for accelerating applications.

Why use Customize Hardware?
Great talk about the benefits of Heterogeneous Computing

Profiling Applications
Finding bottlenecks Profiling tools gprof: Valgrind

Pipelining How many ns to process to process 100 input vectors? Assuming each LUT Has a 1 ns delay. Input vector <A,B,C,D> output A 4-LUT 4-LUT 4-LUT 4-LUT B C DFF DFF DFF DFF D How many ns to process 100 input vectors? Assume a 1 ns clock 4-LUT B C D A DFF 1 DFF delay per output

Pipelining (Systolic Arrays)
Dynamic Programming Start with base case Lower left corner Formula for computing numbering cells 3. Final result in upper right corner.

Dynamic Programming Start with base case Lower left corner Formula for computing numbering cells 3. Final result in upper right corner. 1

Dynamic Programming Start with base case Lower left corner Formula for computing numbering cells 3. Final result in upper right corner. 1 1 1

Dynamic Programming 1 Start with base case Lower left corner Formula for computing numbering cells 3. Final result in upper right corner. 1 2 1 1 1

Dynamic Programming 1 3 Start with base case Lower left corner Formula for computing numbering cells 3. Final result in upper right corner. 1 2 3 1 1 1

Dynamic Programming 1 3 6 Start with base case Lower left corner Formula for computing numbering cells 3. Final result in upper right corner. 1 2 3 1 1 1

Dynamic Programming 1 3 6 Start with base case Lower left corner Formula for computing numbering cells 3. Final result in upper right corner. 1 2 3 1 1 1 How many ns to process if CPU can process one cell per clock (1 ns clock)?

Dynamic Programming 1 3 6 Start with base case Lower left corner Formula for computing numbering cells 3. Final result in upper right corner. 1 2 3 1 1 1 How many ns to process if FPGA can obtain maximum parallelism each clock? (1 ns clock)

Dynamic Programming 1 3 6 Start with base case Lower left corner Formula for computing numbering cells 3. Final result in upper right corner. 1 2 3 1 1 1 What speed up would an FPGA obtain (assuming maximum parallelism) for an 100x100 matrix. (Hint find a formula for an NxN matrix)

Dr. James Moscola (Example)
MATL2 D10 ML9 MATP1 IL7 IR8 END3 E12 IL11 ROOT0 MP3 D6 MR5 ML4 S0 IL1 IR2 c g a 1 2 3 ROOT0 MATP1 MATL2 END3 1 2 3

Example RNA Model 1 2 3 ROOT0 MATP1 MATL2 END3 1 2 3 MATL2 MATP1 END3
ML9 MATP1 IL7 IR8 END3 E12 IL11 ROOT0 MP3 D6 MR5 ML4 S0 IL1 IR2 c g a 1 2 3 ROOT0 MATP1 MATL2 END3 1 2 3

Baseline Architecture Pipeline
END3 MATL2 MATP1 ROOT0 E12 IL11 D10 ML9 IR8 IL7 D6 MR5 ML4 MP3 IR2 IL1 S0 u g g c g a c a c c c residue pipeline

Processing Elements IL7,3,2 IR8,3,2 ML9,3,2 D10,3,2 ML4 + = + = + = +
1 2 3 .40 -INF .22 .72 .30 .44 1 j  IL7,3,2 2 + ML4_t(7) = 3 IR8,3,2 + ML4_t(8) = ML9,3,2 + ML4_t(9) = D10,3,2 + + ML4,3,3 = .22 ML4_t(10) ML4_e(A) ML4_e(C) ML4_e(G) ML4_e(U) input residue, xi

Baseline Results for Example Model
Comparison to Infernal software Infernal run on Intel Xeon 2.8GHz Baseline architecture run on Xilinx Virtex-II 4000 occupied 88% of logic resources run at 100 MHz Input database of 100 Million residues Bulk of time spent on I/O (41.434s)

Expected Speedup on Larger Models
Name Num PEs Pipeline Width Pipeline Depth Latency (ns) HW Processing Time (seconds) Total Time with measured I/O (seconds) Infernal Time (seconds) Infernal Time (QDB) (seconds) Expected Speedup over Infernal Expected Speedup over Infernal (w/QDB) RF00001 39492 195 19500 349492 128443 8236 3027 RF00016 43256 282 28200 336000 188521 7918 4443 RF00034 38772 187 18700 314836 87520 7419 2062 RF00041 44509 206 20600 388156 118692 9147 2797 Example 81 26 6 600 1039 868 25 20 Speedup estimated ... using 100 MHz clock for processing database of 100 Million residues Speedups range from 500x to over 13,000x larger models with more parallelism exhibit greater speedups

Distributed Memory ALU Cache BRAM BRAM PE BRAM BRAM

Next Class Models of Computation (Design Patterns)

Questions/Comments/Concerns
Write down Main point of lecture One thing that’s still not quite clear If everything is clear, then give an example of how to apply something from lecture OR

Instructor: Dr. Phillip Jones

Similar presentations

Presentation on theme: "Instructor: Dr. Phillip Jones"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Instructor: Dr. Phillip Jones

Similar presentations

Presentation on theme: "Instructor: Dr. Phillip Jones"— Presentation transcript:

Similar presentations

About project

Feedback