Presentation is loading. Please wait.

Presentation is loading. Please wait.

Using Network Processors in Genomics Herbert Bos * † Kaiming Huang * * Leiden Universiteit, Netherlands † Vrije Universiteit,

Similar presentations


Presentation on theme: "Using Network Processors in Genomics Herbert Bos * † Kaiming Huang * * Leiden Universiteit, Netherlands † Vrije Universiteit,"— Presentation transcript:

1 Using Network Processors in Genomics Herbert Bos * † Kaiming Huang * {herbertb,khuang}@liacs.nl * Leiden Universiteit, Netherlands † Vrije Universiteit, Netherlands http://www.liacs.nl/~herbertb/projects/biocomp/ H. Bos – Leiden University 13/02/20041

2 Case study: BLAST ● search nucleotide/protein database for query ● BLAST discovers similarity rather than exact match ● two main phases: 1. scoring (registering where query and DNA DB match) 2. alignment (dynamic programming) ● only the first phase on NPUs H. Bos – Leiden University 13/02/20042

3 Window matching H. Bos – Leiden University 13/02/20043

4 Window matching H. Bos – Leiden University 13/02/20044

5 Window matching H. Bos – Leiden University 13/02/20045

6 Window matching H. Bos – Leiden University 13/02/20046

7 Window matching ● naïve approach: roughly W*N*M comparisons ● does not scale ● string search algorithms: Aho-Corasick – all windows matched at the same time – shifting genome one nucleotide at a time – matching algorithm transformed in a DFA ● DFA may be quite large H. Bos – Leiden University 13/02/20047

8 Aho-Corasick H. Bos – Leiden University 13/02/20048 ● Alphabet: acgt ● Window size: 3 ● Query: acgccga ● Windows: {acg,cgc,gcc,ccg,cga}

9 Aho-Corasick H. Bos – Leiden University 13/02/20049 0123 456 12 1011 789 t acg c g gc a g cc c s123456789101112 f(s)0450780410451 ● Alphabet: acgt ● Window size: 3 ● Query: acgccga ● Windows: {acg,cgc,gcc,ccg,cga}

10 Aho-Corasick H. Bos – Leiden University 13/02/200410 0123 456 12 1011 789 t acg c g gc a g cc c ● Alphabet: acgt ● Window size: 3 ● Query: acgccga ● Windows: {acg,cgc,gcc,ccg,cga} s123456789101112 f(s)0450780410451 3691112 acgcgcgccccgcga

11 Aho-Corasick H. Bos – Leiden University 13/02/200411 0123 456 12 1011 789 t acg c g gc a g cc c ● Alphabet: acgt ● Window size: 3 ● Query: acgccga ● Windows: {acg,cgc,gcc,ccg,cga} s123456789101112 f(s)0450780410451 3691112 acgcgcgccccgcga tacgcga

12 H. Bos – Leiden University 13/02/200412 Control Processor NPU (IXP1200) ME PCI Bus StrongARMMicroengines DRAM SRAM Gbps ports Pentium PCI scratch IXPBlast Architecture

13 H. Bos – Leiden University 13/02/200413 Control Processor NPU (IXP1200) ME PCI Bus StrongARMMicroengines DRAM SRAM Gbps ports Pentium PCI scratch IXPBlast Architecture

14 H. Bos – Leiden University 13/02/200414 Control Processor NPU (IXP1200) ME PCI Bus StrongARMMicroengines DRAM SRAM Gbps ports Pentium PCI scratch IXPBlast Architecture

15 H. Bos – Leiden University 13/02/200415 Control Processor NPU (IXP1200) ME PCI Bus StrongARMMicroengines DRAM SRAM Gbps ports Pentium PCI scratch IXPBlast Architecture 0123 456 12 1011 789 t acg c g gc a g cc c

16 H. Bos – Leiden University 13/02/200416 Control Processor NPU (IXP1200) ME PCI Bus StrongARMMicroengines DRAM SRAM Gbps ports Pentium PCI scratch IXPBlast Architecture 0123 456 12 1011 789 t acg c g gc a g cc c

17 H. Bos – Leiden University 13/02/200417 Control Processor NPU (IXP1200) ME PCI Bus StrongARMMicroengines DRAM SRAM Gbps ports Pentium PCI scratch IXPBlast Architecture 0123 456 12 1011 789 t acg c g gc a g cc c

18 IXPBlast: packet handling ● packets read and processed in batches of 100.000 ● “spilling” must be taken into account ● currently no feedback H. Bos – Leiden University 13/02/200418 012345678910111213141516171819202122232425262728293031

19 Results ● 232 MHz IXP1200 ~ 1.8GHz Pentium-4 ● 1611 Nucleotide query (MyD88) ● 1.4 GB genome (Zebrafish) – IXP1200: 90 sec with DFA – IXP1200: 129 sec with “trie” – P4: 132: 132 sec with “trie” ● number of matches: 524856 H. Bos – Leiden University 13/02/200419

20 Results H. Bos – Leiden University 13/02/200420 Query size DNA DB size Impl.Performance 16111.4 GBP4132 sec 16111.4 GBIXP1200129 sec 16111.4 GB IXP1200 DFA 90 sec

21 Conclusions ● NPUs are useful in other application domains ● Newer hardware is expected to perform much better ● “Throughput processors” ● Adapting our current approach to use BLAST tricks/heuristics H. Bos – Leiden University 13/02/200421

22 Network processors ● geared for high throughput ● used exclusively in network systems ● example: intrusion detection ● similar to looking for gene on in genomes ● differences H. Bos – Leiden University 13/02/200422 Radisys ixp1200 board

23 Application domain: “Genomics” ● example: search genome for occurrence of “patterns” ● similar problems as IDS, poor performance on GPP  cannot exploit parallelism – throughput-driven – how about FPGAs? – how about clusters? ● NPU – easier to program than FPGAs – cheaper than cluster computing – “on the desktop”  IP never leaves the room H. Bos – Leiden University 13/02/200423


Download ppt "Using Network Processors in Genomics Herbert Bos * † Kaiming Huang * * Leiden Universiteit, Netherlands † Vrije Universiteit,"

Similar presentations


Ads by Google