Presentation is loading. Please wait.

Presentation is loading. Please wait.

An Introduction to CAMERA and Underlying Technologies Philip Papadopoulos University of California, San Diego San Diego Supercomputer Center California.

Similar presentations

Presentation on theme: "An Introduction to CAMERA and Underlying Technologies Philip Papadopoulos University of California, San Diego San Diego Supercomputer Center California."— Presentation transcript:

1 An Introduction to CAMERA and Underlying Technologies Philip Papadopoulos University of California, San Diego San Diego Supercomputer Center California Institute of Telecommunications and Information Technology (Calit2)

2 PI Larry Smarr Announced 17 Jan Public Release 13 March 2007 $24.5M Over Seven Years

3 DNA Basics for Non-Biologists Nucleotide bases of DNA –ACTG (Adenine, Cytosine, Guanine, Thymine) –A Sequence of Bases Forms One Side of a DNA Strand –Complementary Bases form the other side of DNA –A matches T (pair) –C matches G (pair) During cell replication, DNA is unzipped. The complementary side can then be replicated perfectly Human DNA is about 3 billion base pairs on 26 Chromosomes

4 Bases Amino Acids Triplets of nucleotide bases are called codons and define amino acids. –Amino acids are the basic building blocks of proteins –There are 20 amino acids, but 4^3 = 64 nucleotide combinations. –Many amino acids have multiple codons –Special codons (called start and stop codons) assist in DNA translation during cell replication. Reading Frames of: GGGAAACCC –This raw sequence could be read as –GGGAAACCC (GGG AAA CCC) (Glycine, Lysine, Proline) –GGAAACCC (GGA AAC) (Glycine, Asparagine) –GAAACCC (GAA ACC) (Glutamic Acid, Threonine)

5 Sequencing Tidbits The Institute for Genomic Research (TIGR) sequenced the genome of the bacterium Haemophilus influenzae in 1995 using shotgun sequencingThe Institute for Genomic Research –1.8 Million Base Pairs (Human: 3 Billion) Sequencing does NOT tell you what function a particular gene plays It is believed that only ~1.5% of human chromosome codes for expressed characteristics –The non-coding portions contain our genetic history –Unknown what function the rest our DNA plays

6 Most of Evolutionary Time Was in the Microbial World You Are Here Source: Carl Woese, et al Tree of Life Derived from 16S rRNA Sequences

7 Marine Genome Sequencing Project – Measuring the Genetic Diversity of Ocean Microbes Sorcerer II Data Will Double Number of Proteins in GenBank! Need Ocean Data

8 Some CAMERA Goals Provide an infrastructure where scientists from around the world can perform analysis on genetic communities –Global Ocean Sampling (GOS) is the initial large data set –~ 8.5 Billion base pairs of raw Reads –Metadata is available for samples –Saline, Temperature, Geographic Location, Water Depth, Time of Day … –Other metadata will be correlated with samples (e.g. MODIS Satellite) Allow others to search and compare input sequences against CAMERA data. Overall provide a resource dedicated to metagenomics –Support new datasets –Support new analysis tools and web services

9 Global Ocean Survey (GOS) Sequences are Largely Bacterial Source: Shibu Yooseph, et al. (PLOS Biology in press 2006) ~3 Million Previously Known Sequences ~5.6 Million GOS Sequences

10 Reason for CAMERA The Global Ocean Survey (GOS) is a huge influx of sequence data Factors that interrelate microbes and microbial communities are not well known Significant analysis requires large resources –All-to-all comparisons –Integration of other environmental (meta) data (weather, temperature, salinity,…) is essential Raw Sequence Data sets are mid-sized –Current set of GOS Raw Reads is about 100GB (FASTA Files)

11 Calit2 CAMERA Production Compute and Storage Complex is On-Line 512 Processors ~5 Teraflops ~ 200 Terabytes Storage

12 User Map – 03 May 2007 Site in production on 13 March 2007 More than 500 Registered users from around the globe (~10 new users/day)

13 Flat File Server Farm W E B PORTAL Traditional User Response Request Dedicated Compute Farm (100s of CPUs) TeraGrid: Cyberinfrastructure Backplane (scheduled activities, e.g. all by all comparison) (10000s of CPUs) Web (other service) Local Cluster Local Environment Direct Access Lambda Cnxns Data- Base Farm 10 GigE Fabric Calit2s Direct Access Core Architecture CAMERAs Metagenomics Server Complex Source: Phil Papadopoulos, SDSC, Calit2 + Web Services Sargasso Sea Data Sorcerer II Expedition (GOS) JGI Community Sequencing Project Moore Marine Microbial Project NASA and NOAA Satellite Data Community Microbial Metagenomics Data

14 Calit2 CAMERA Production Compute and Storage Complex is On-Line Compute Nodes 1 and 10 Gbit/s Switching 200 TB File Storage 10 Gbit/s Network Web, Application, DB Servers

15 Global Elements Data location – Storage Resource Broker Meta data catalog Data-type aggregation, cross-correlation, integration – BIRN Data Mediator Identity Management –Use Grid Security Infrastructure (GSI) Public Key System –Integrated Grid Accounts Management Architecture (GAMA) from SDSC for ease-of-use and Single Sign On Portal Services –Based on GridSphere –Small Dedicated Compute Cluster (32 nodes )

16 Cluster Nodes and File Servers Logical Layout of Servers Web Server Portal Server (Tomcat) Single Sign-on Server Postgres Database GAMA Server Blast Master (Jboss) Cluster Frontend Single Sign On Layer Public Net Private Net

17 An Incomplete List of Software Components Postgres Database Apache Tomcat Jboss Servlet Container Google Web Toolkit Sun Grid Engine GAMA (Grid Accounting and Management Architecture)/GSI from Globus OPAL (Grid/Web Services Wrapper) GridSphere Portlet Container CAMERA Registration Portal Venter Application Portal NCBI Blast, MPIBlast, ClustalW, MrBayes, CDHit, and host of other Bio Software Ergatis Workflow Engine Jforums Drupl All Integrated with Rocks … Single Person Deployment

18 OptIPortal– Another Rocks Cluster Termination Device for the OptIPuter Global Backplane 20 Dual CPU Nodes, Monitors, ~$50,000 1/4 Teraflop, 5 Terabyte Storage, 45 Mega Pixels--Nice PC! Scalable Adaptive Graphics Environment ( SAGE) Jason Leigh, EVL-UIC Source: Phil Papadopoulos SDSC, Calit2

19 Use of OptIPortal to Interactively View Microbial Genome Source: Raj Singh, UCSD Acidobacteria bacterium Ellin345 (NCBI) Soil Bacterium 5.6 Mb 15,000 x 15,000 Pixels

20 Use of OptIPortal to Interactively View Microbial Genome Source: Raj Singh, UCSD Acidobacteria bacterium Ellin345 (NCBI) Soil Bacterium 5.6 Mb 15,000 x 15,000 Pixels

21 A Look at Networking Introduction to Quartzite An Experimental Network

22 Sunlight (10 Gigabit) Campus/WAN

23 Using a Lambda Network for CAMERA Many community databases – Protein Databank (PDB) –GenBank –SwissProt Support only web or web services interfaces –New analysis/programs need access to raw databases/files –Usually, groups make a point-in-time copy of the database –We call this a data fork –Updates are not processed –Papers published with point-in-time data out of date by months or years CAMERA Direct Connect will allow us to provide a high-speed connection to the backend servers –Try to eliminate data forking –Copies of CAMERA data is inevitable –Need mechanisms that allow others to keep their copies in synch with CAMERA

24 UCSD Quartzite Core at Completion (Year 5 of OptIPuter) Funded 15 Sep 2004 Physical HW to Enable Optiputer and Other Campus Networking Research Hybrid Network Instrument Reconfigurable Network and Enpoints

25 25 | AT&T Labs, October x4 Wavelength Cross-Connect: All integrated optics (except optical amplifiers) –4 1x4 WSS modules –4 4x1 passive optical combiners 4 x 40 x 40Gbps = 6.4Tbps switching capacity –currently using central 8 1x4 WSS 4x4 WXC rack WSSs combiners Optical Amps

26 26 | AT&T Labs, October 2007 WXC performance demonstration: 1x4 WSS ASE source 4x1 swit ch OS A 8 lasers at centre of C-Band at 100GHz spacing use ASE source to illustrate wide bandwidth 1.use external 4x1 switch to scan WXC ports 2.alter switch states of WSS1 and WSS3 shown in movie on next page WSS1WSS2WSS3WSS /141/ /141/

27 27 | AT&T Labs, October 2007 WXC performance demonstration:

28 What Does it Cost to Drive the Network Dominant cost is DWDM optics Construction of Multiplexers is Simple, and not expensive ~ $250/Channel/End

29 Channel 31 Channel 32 Channel 33 Channel 34 10Gbps Switch X 4 Per Side (optional) XFP Switch Module X 4 Per Side (optional) XFP DWDM Optics X 4 Per Side Used in Host or Switch SC to LC Fiber 2M X 5 Per Side DWDM Mux Transmit X 1 Per Side DWDM DeMux Receive X 1 Per Side 1 Fiber Pair Corning 1U Rack Containing DWDM Mux / DeMux + SC to SC couplers, 1 Per side Layer 1 – Four Channel DWDM

30 1)Optics SFP/XFP Optics Costs DWDM Optics from AACTelecom 10Gbps Luminent XFP DWDM per unit (ZR 80Km) OC- 192 and 10GE compatible 3500 US 10Gbps Luminent (assembled in US) XFP DWDM per Unit (ER 40Km) OC-192 and 10GE compatible 2900 US 1 Gbps SFP DWDM per Unit (80KM model) OC-48 compliant and 1 GE compatible 1220 US 10Gbps non- DWDM 1310nm (LR 10Km model) 700 US

31 10Gbps capable switch SMC8748L2 (A )+ EXP MOD- 10G (A ) from Dell Switch 2 x 10Gbps XFP ports, 48 x 1Gbps Copper 1700 US 10 Gbps module (holds XFP) 300 US 2) Optional - Layer 2 Switch (10Gbps capable)

32 DWDM Mux DeMux (SC connector type) 4, 8, 16 channel = DWDM-100 From oemarket.c om 4 Channel (31,32,33,34 ) 560 US 8 Channel880 US 16 Channel1600 (approx) US 3) DWDM Mux DeMux

33 Corning Mux DeMux container -1U rack mount Corning PCH- 01U from Ed Carlin Graybar 1 U (sufficient for 4, 8 or 16 channel) 200 US 2 sets of SC to SC adaptors 100 US (approx) Fiber Patch Cables, Single Mode From Ed Carlin Graybar 2M, SC to LC connector type 30 US (approx) each 4) Corning Rack Mount, Couplers, Fiber

34 Complete Solution

35 DWDM to Copper Media Converter From Carl Stelling at m SFP pluggable DWDM to copper media converter 150 US each, not including DWDM optics (just converter) 5) Optional- DWDM Media Converter

36 Quartzite State Nov 2007 Core Packet Switch with GigE ports (More than ½ Terabit) Approximately 30 Channels Lit 64-port All-Optical Glimmerglass Switch - All Fiber into Quartzite is switchable 4 port x 8 Lambda DWDM switch at Lucent (On site at Calit2 in Dec) 4 Channel DWDM Between Calit2 and SDSC –One channel is used for 10Gigabit Production to BIRN Data Racks. Ordered, but waiting for fulfillment 20 Mux/Demux (8 C-band DWDM Channels (LR) Passband) 32 DWDM XFPS (Channel – will fill out rest of channels in 2008)

Download ppt "An Introduction to CAMERA and Underlying Technologies Philip Papadopoulos University of California, San Diego San Diego Supercomputer Center California."

Similar presentations

Ads by Google