Presentation is loading. Please wait.

Presentation is loading. Please wait.

M ODELING I NTER -M OTIF D EPENDENCE WITHOUT INCREASING THE COMPLEXITY Zhizhuo Zhang.

Similar presentations


Presentation on theme: "M ODELING I NTER -M OTIF D EPENDENCE WITHOUT INCREASING THE COMPLEXITY Zhizhuo Zhang."— Presentation transcript:

1 M ODELING I NTER -M OTIF D EPENDENCE WITHOUT INCREASING THE COMPLEXITY Zhizhuo Zhang

2 PWM M ODEL 123456 A 0.10010.4 C 00.1000.40 G 0.10100.2 T 0.80.90000.4 Positional Weight Matrix (PWM) TTGACT TCGACT TTGACT TTGAAA ATGAGG TTGAAA GTGAAA TTGACT TTGAGG TTGAAA

3 H IGH -O RDER D EPENDENCY 1 st -order 2merP 4-5 CT0.4 AA0.4 GG0.2 CC0 AC0 ….0 TT0 TTGACT TCGACT TTGACT TTGAAA ATGAGG TTGAAA GTGAAA TTGACT TTGAGG TTGAAA

4 H IGH -O RDER D EPENDENCY Assume only one dependency group

5 T WO M ODELING P RINCIPLES Inter-dependence bases only exists in the diverged positions. There is no inter-dependence relationship across the conserved base.

6 P RINCIPLE O NE People use KL-Divergence to measure the dissimilarity between two probability distribution To show the KL-divergence between K+1 order distribution and K order distribution + 0 order distribution is small when the K+1 position base is very conserved.

7 P RINCIPLE O NE The KL-divergence between K+1 order distribution and K order distribution + 0 order distribution is followed:

8 P RINCIPLE T WO Cys 2 His 2 Zinc Finger DNA-binding family, which is the largest known DNA-binding family in multi-cellular organisms. Independent

9 C ONTROL THE COMPLEXITY The larger the dependence group, the more parameters, the easier to overfit. We want to model the k-order dependence using the same number of parameters as (k+1) independent position PWM. (i.e.,4k+4 parameters)

10 C ONTROL THE COMPLEXITY 123456 A 0.1001CT=0.4AC=0 C 00.100AA=0.4CA=0 G 0.1010GG=0.2TT=0 T 0.80.900CC=0Other=0 TTGACT TCGACT TTGACT TTGAAA ATGAGG TTGAAA GTGAAA TTGACT TTGAGG TTGAAA Dependence Positional Weight Matrix (PWM)

11 C ONTROL THE COMPLEXITY Model the problem: Given a set of binding site sequences X (each is length k ), find a DPWM Ω maximize the likelihood P( X | Ω ) (or minimize the KL-divergence), with 4k parameters We can prove that taking top 4k-1 kmer probability as the first 4k-1 paramter value is the best solution:

12 E XHAUSTIVE S EARCH D EPENDENCE Naive method Enumerate all the combinations and find the max likelihood combination. Example: length 5 1,2,3,4,5 (1,2,3),4,5 (1,2)3,(4,5) (1,2,4,5)3 (1,2,3,4,5) ….

13 E XHAUSTIVE S EARCH D EPENDENCE improved method: Enumerate only single dependence group If D1 and D2 are two independent groups Then {D1}, {D2} can be used to compute {D1,D2} In fact, greedy search Example: sorted combination (log likelihood) (1,2),3,4,5: -32 (1,2,3),4,5:-44 1,2,3,(4,5):-50 … 1,2,3,4,5:-100 The best (1,2),3,(4,5)

14 R ESULT Run MEME, Cisfinder, Amadeus, ChIPMunk, HMS, Trawler, Weeder, JPomoda on 15 ES ChIPseq datasets Using one half of ChIPseq peaks to learn de novo PWM, and the other half to validate their performances.

15 R ESULT MEMEDP_MEMEWeederDP_WeederCisfinderDP_CisfinderAmadeusDP_AmadeusHMSDP_HMStrawlerDP_trawlerChIPMunkDP_ChIPMunkJpomodaDP_Jpomoda tcfcp2I10.92120.93750.89110.95440.93280.96440.86150.87520.97070.9673NA 0.97030.97020.9710NA klf40.86250.85960.84450.85690.84870.86010.82400.83890.86120.85610.63100.65380.86370.85920.83600.8369 suz120.64340.64380.58520.56950.57600.58380.59120.59190.59200.5959NA 0.59630.6005 zfx0.75860.75480.77170.74320.74060.74330.69740.70890.61660.60960.76060.76240.75620.76720.75220.7531 stat30.71370.72290.69890.72000.72160.73230.71590.70410.70350.71160.68980.70900.72430.73320.74550.7424 nmyc0.77850.78030.74250.74060.74940.75200.74250.74550.71450.7358NA 0.75470.77280.76400.7602 esrrbredo0.90990.90520.90760.91440.89940.90510.88740.88200.88070.89670.87690.8876NA 0.87290.8713 cmyc0.76810.76680.75500.76170.77460.77280.76310.75940.68550.69840.74720.7675NA 0.78010.7807 e2f10.61100.61850.57140.58030.57290.60390.58180.58750.58840.59000.56290.57670.64200.64640.62080.6204 nanog0.66900.68130.66490.68500.66350.68350.60740.61710.67220.6795NA 0.56350.55540.69640.6997 oct40.66730.68270.66460.68160.64600.67840.62930.64700.47900.4780NA 0.71940.71360.68800.6891 sox20.84490.88370.81510.85140.81450.86150.73690.74340.57580.58230.78810.85060.83230.85580.81850.8427 smad10.58480.5847 0.60420.5765 0.5767 0.57810.57180.54840.55040.60480.59570.6328 ctcf0.98090.98540.97080.98460.96480.98550.94740.96800.97900.9862NA 0.98190.98180.98040.9835 p3000.61980.60620.53550.5224NA 0.57490.5883NA 0.58310.58980.5709NA

16 A DJACENT D EPENDENCY MEME CTCF motif 1-2-3,10-11 AUC Result: MEME:0.9809 Dependence: 0.9854

17 L ARGE D EPENDENCY G ROUP MEME SOX2 motif 1-2-3-4-5-7,14-15 AUC Result: MEME:0.845 Dependence:0.884

18 L ONG D EPENDENCY MEME NMYC motif 10-21,11-12 AUC Result: MEME:0.7785 Dependence: 0.7803

19 N EW S ERVERS C ONFIGURATION

20 M ODEL & P RICE Hostname: genome 3U server 2X Intel Xeon X5680 Processor(6-core each) 144GB RAM 16X2TB SAS Disks 2X1G network interfaces Price:20kSGD Hostname: biogpu 1U server 2X Intel Xeon X5680 Processor(6-core each) 2XM2050 GPU 48GB RAM 3X2TB SATA2 Disks 2X1G network interfaces Price:18k SGD

21 F ILE S YSTEM genome: RAID-6, 28TB, Centos5.5 Home:23TB biogpu: RAID-5, 4TB, YellowDog linux (Centos5.4) Home: 3TB

22 S ERVER SOFTWARE NIS: using the same account for 2 servers NFS: Home directory : genome server Public_html: biogpu server Share software: /cluster/biogpu/programs/bin/ Apache: biogpu server Mysql: genome server

23 C URRENT P ROBLEMS Filesystem Size Used Avail Use% Mounted on /dev/mapper/VolGroup00-LogVol02 393G 4.7G 368G 2% / /dev/mapper/VolGroup00-LogVol00 2.0T 199M 1.9T 1% /tmp / dev/sdb2 23T 23T 439G 99% /home /dev/sda1 920M 47M 826M 6% /boot tmpfs 71G 0 71G 0% /dev/shm

24 T O DO Swap backup Connect to Tembusu Install SGE

25 GPU COMPUTING

26 F ERMI M2050 Fermi M2050 Peak double precision floating point performance 515 Gigaflops Peak single precision floating point performance 1030 Gigaflops CUDA cores448 Memory size (GDDR5)3 GigaBytes Memory bandwidth *(ECC off)144 GBytes/sec

27 C ODE E XAMPLE TO A DD T WO A RRAYS CUDA C Program __global__ void addMatrixG( float *a, float *b, float *c, int N ) { int i = blockIdx.x * blockDim.x + threadIdx.x; int j = blockIdx.y * blockDim.y + threadIdx.y; int index = i + j * N; if ( i < N && j < N ) c[index] = a[index] + b[index]; } void main() {...... dim3 dimBlk( 16, 16 ); dim3 dimGrd( N/dimBlk.x, N/dimBlk.y ); addMatrixG >>( a, b, c, N ); } 27 Device code Host code A CUDA kernel

28 CUDA M EMORY M ODEL Each thread can R/W per-thread registers R/W per-thread local memory R/W per-block shared memory R/W per-grid global memory RO per-grid constant memory RO per-grid texture memory Host can R/W global, constant and texture memory 28 Host

29 CUDA M EMORY H IERARCHY The CUDA platform has three primary memory types Local Memory – per thread memory for automatic variables and register spilling. Shared Memory – per block low-latency memory to allow for intra-block data sharing and synchronization. Threads can safely share data through this memory and can perform barrier synchronization through _ _syncthreads() Global Memory – device level memory that may be shared between blocks or grids

30 M OVING D ATA … CUDA allows us to copy data from one memory type to another. This includes dereferencing pointers, even in the host’s memory (main system RAM) To facilitate this data movement CUDA provides cudaMemcpy()

31 CUDA E XAMPLE 1 – V ECTOR A DDITION (1) // Device code __global__ void VecAdd( float *A, float *B, float *C ) { int i = blockIdx.x * blockDim.x + threadIdx.x; if ( i < N ) C[i] = A[i] + B[i]; } // Host code int main() { // Allocate vectors in device memory size_t size = N * sizeof(float); float *d_A; cudaMalloc( (void**)&d_A, size ); float *d_B; cudaMalloc( (void**)&d_B, size ); float *d_C; cudaMalloc( (void**)&d_C, size ); 31

32 CUDA E XAMPLE 1 – V ECTOR A DDITION (2) // Copy vectors from host memory to device memory // h_A and h_B are input vectors stored in host memory cudaMemcpy( d_A, h_A, size, cudaMemcpyHostToDevice ); cudaMemcpy( d_B, h_B, size, cudaMemcpyHostToDevice ); // Invoke kernel int threadsPerBlock = 256; int blocksPerGrid = (N + threadsPerBlock – 1) / threadsPerBlock; VecAdd >>( d_A, d_B, d_C ); // Copy result from device memory to host memory // h_C contains the result in host memory cudaMemcpy( h_C, d_C, size, cudaMemcpyDeviceToHost ); // Free device memory cudaFree(d_A); cudaFree(d_B); cudaFree(d_C); } 32

33 O PTIMIZATION Minimize the diverse path (if,else …) Collapsed access global memory Scattering to gathering Use Share memory as much as possible

34 34 Compiling code Linux Command line. CUDA provides nvcc (a NVIDIA “compiler-driver”. Use instead of gcc nvcc –O3 –o -I/usr/local/cuda/include – L/usr/local/cuda/lib –lcudart Separates compiled code for CPU and for GPU and compiles code. Need regular C compiler installed for CPU. Make files also provided. Windows NVIDIA suggests using Microsoft Visual Studio

35 CUDA TOO HARD ? Use others software with cuda acceleration Use wrapper library

36 C UDA A CCELERATED S OFTWARE cuBlas, cudaLAPACK CudaR, CudaPy Cuda Bioinformatics Softwares: Molecular Dynamics & Quantum Chemistry ACE MD AMBER BigDFT (ABINIT) (news) GROMACS HOOMD LAMMPS NAMD TeraChem (Quantum Chemistry) VMDACE MDAMBERBigDFT (ABINIT)(news)GROMACSHOOMDLAMMPSNAMDTeraChem (Quantum Chemistry)VMD Bio Informatics CUDA-BLASTP CUDA-EC CUDA-MEME CUDASW++ (Smith-Waterman) DNADist GPU Blast GPU-HMMER HEX Protein Docking Jacket (MATLAB Plugin) MUMmerGPU MUMmerGPU++CUDA-BLASTPCUDA-ECCUDA-MEMECUDASW++ (Smith-Waterman)DNADistGPU BlastGPU-HMMERHEX Protein DockingJacket (MATLAB Plugin)MUMmerGPUMUMmerGPU++

37 T HRUST Searching Binary Search Vectorized Searches Copying Gathering Scattering Reductions Counting Comparisons Extrema Transformed Reductions Logical Predicates Reordering Partitioning Stream Compaction Prefix Sums Segmented Prefix Sums Transformed Prefix Sums Set Operations Sorting Transformations Filling Modifying Replacing

38 E XAMPLE 1 #include #include... // put three 1s in a device_vector thrust::device_vector vec(5,0); vec[1] = 1; vec[3] = 1; vec[4] = 1; // count the 1s int result = thrust::count(vec.begin(), vec.end(), 1); // result is three

39 E XAMPLE 2 #include #include #include #include #include // square computes the square of a number f(x) -> x*x template struct square { __host__ __device__ T operator()(const T& x) const { return x * x; } }; int main(void) { // initialize host array float x[4] = {1.0, 2.0, 3.0, 4.0}; // transfer to device thrust::device_vector d_x(x, x + 4); // setup arguments square unary_op; thrust::plus binary_op; float init = 0; // compute norm float norm = std::sqrt( thrust::transform_reduce(d_x.begin(), d_x.end(), unary_op, init, binary_op) ); std::cout << norm << std::endl; return 0; }

40 R EFERENCES NVIDIA CUDA Programming Guide, Version 2.3 http://developer.download.nvidia.com/compute/cuda/2_3/ toolkit/docs/NVIDIA_CUDA_Programming_Guide_2.3.p df http://developer.download.nvidia.com/compute/cuda/2_3/ toolkit/docs/NVIDIA_CUDA_Programming_Guide_2.3.p df NVIDIA CUDA C Programming Best Practices Guide, Version 2.3 http://developer.download.nvidia.com/compute/cuda/2_3/ toolkit/docs/NVIDIA_CUDA_BestPracticesGuide_2.3.pdf http://developer.download.nvidia.com/compute/cuda/2_3/ toolkit/docs/NVIDIA_CUDA_BestPracticesGuide_2.3.pdf http://code.google.com/p/thrust/wiki/Documentati on 40


Download ppt "M ODELING I NTER -M OTIF D EPENDENCE WITHOUT INCREASING THE COMPLEXITY Zhizhuo Zhang."

Similar presentations


Ads by Google