Dissertation work in Functional Genomics Naim Rashid 4 th Year PhD, Biostatistics.

Slides:



Advertisements
Similar presentations
SHI Meng. Abstract The genetic basis of gene expression variation has long been studied with the aim to understand the landscape of regulatory variants,
Advertisements

Welcome Each of You to My Molecular Biology Class.
Correlation and Linear Regression.
Asking translational research questions using ontology enrichment analysis Nigam Shah
CSE Fall. Summary Goal: infer models of transcriptional regulation with annotated molecular interaction graphs The attributes in the model.
. Inferring Subnetworks from Perturbed Expression Profiles D. Pe’er A. Regev G. Elidan N. Friedman.
Bioinformatics Lectures at Rice
The DNA Bead String Exercise Anita L DeStefano, PhD Department of Biostatistics BU School of Public Health Co-Director Biostatistics Program Associate.
Computational biology and computational biologists Tandy Warnow, UT-Austin Department of Computer Sciences Institute for Cellular and Molecular Biology.
University of Louisville The Department of Bioinformatics and Biostatistics.
. Class 1: Introduction. The Tree of Life Source: Alberts et al.
Scientific Data Mining: Emerging Developments and Challenges F. Seillier-Moiseiwitsch Bioinformatics Research Center Department of Mathematics and Statistics.
Recap Sometimes it is necessary to conduct Bad Science – often the product of having too much information Human Genome Project changed natural scientists.
Data analytical issues with high-density oligonucleotide arrays A model for gene expression analysis and data quality assessment.
Analysis of Drug-Gene Interaction Data Florian Ganglberger Sebastian Nijman Lab.
Cristina Manfredotti D.I.S.Co. Università di Milano - Bicocca An Introduction to the Use of Bayesian Network to Analyze Gene Expression Data Cristina Manfredotti.
Introduction to Workshop 10 Choosing Learning and Teaching Approaches and Strategies.
Tse-Chuan Yang, Ph.D The Geographic Information Analysis Core Population Research Institute Social Science Research Institute Pennsylvania State University.
Multiple Linear Regression A method for analyzing the effects of several predictor variables concurrently. - Simultaneously - Stepwise Minimizing the squared.
By Moayed al Suleiman Suleiman al borican Ahmad al Ahmadi
Bioinformatics Jan Taylor. A bit about me Biochemistry and Molecular Biology Computer Science, Computational Biology Multivariate statistics Machine learning.
ANCOVA Lecture 9 Andrew Ainsworth. What is ANCOVA?
MAST: the organisational aspects Lise Kvistgaard Odense University Hospital Denmark Berlin, May 2010.
Using Bayesian Networks to Analyze Expression Data N. Friedman, M. Linial, I. Nachman, D. Hebrew University.
Introduction to Management Science Chapter 1: Hillier and Hillier.
ArrayCluster: an analytic tool for clustering, data visualization and module finder on gene expression profiles 組員:李祥豪 謝紹陽 江建霖.
Data Type 1: Microarrays
A New Oklahoma Bioinformatics Company. Microarray and Bioinformatics.
91587 Mathematics and Statistics Apply systems of simultaneous equations in solving problems Level 3 Credits 2 Assessment Internal.
Correlation and Linear Regression. Evaluating Relations Between Interval Level Variables Up to now you have learned to evaluate differences between the.
Welcome to AP Biology Mr. Levine Ext. # 2317.
The Complexities of Data Analysis in Human Genetics Marylyn DeRiggi Ritchie, Ph.D. Center for Human Genetics Research Vanderbilt University Nashville,
Generating and sharing large datasets: Moving out of our measurement comfort Rita Kukafka and Pamela M. Kato October 16-17, 2012 Bruxelles, Belgique.
We calculated a t-test for 30,000 genes at once How do we handle results, present data and results Normalization of the data as a mean of removing.
Eloise Forster, Ed.D. Foundation for Educational Administration (FEA)
Predicting protein degradation rates Karen Page. The central dogma DNA RNA protein Transcription Translation The expression of genetic information stored.
Ritesh Krishna Department Of Computer Science WPCCS July 1, 2008.
Biological Signal Detection for Protein Function Prediction Investigators: Yang Dai Prime Grant Support: NSF Problem Statement and Motivation Technical.
Genomics I: The Transcriptome RNA Expression Analysis Determining genomewide RNA expression levels.
BIOLOGICAL DATABASES. BIOLOGICAL DATA Bioinformatics is the science of Storing, Extracting, Organizing, Analyzing, and Interpreting information in biological.
Question paper 1997.
Bioinformatics Curriculum Issues, goals, curriculum.
While gene expression data is widely available describing mRNA levels in different cancer cells lines, the molecular regulatory mechanisms responsible.
COMPUTATIONAL BIOLOGIST DR. MARTIN TOMPA Place of Employment: University of Washington Type of Work: Develops computer programs and algorithms to identify.
Gene Expression and Epigenetic Regulation JESSICA LAINE AND ELIZABETH MARTIN.
CSE182 CSE182-L11 Protein sequencing and Mass Spectrometry.
Cluster validation Integration ICES Bioinformatics.
Paper Review on Cross- species Microarray Comparison Hong Lu
Disease Diagnosis by DNAC MEC seminar 25 May 04. DNA chip Blood Biopsy Sample rRNA/mRNA/ tRNA RNA RNA with cDNA Hybridization Mixture of cell-lines Reference.
Sequence Alignment.
Pedagogical Objectives Bioinformatics/Neuroinformatics Unit Review of genetics Review/introduction of statistical analyses and concepts Introduce QTL.
Integration of Bioinformatics into Inquiry Based Learning by Kathleen Gabric.
Lecture 12 RNA – seq analysis.
Teaching Bioinformatics Nevena Ackovska Ana Madevska - Bogdanova.
Review of statistical modeling and probability theory Alan Moses ML4bio.
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 6 –Multiple hypothesis testing Marshall University Genomics.
NCode TM miRNA Analysis Platform Identifies Differentially Expressed Novel miRNAs in Adenocarcinoma Using Clinical Human Samples Provided By BioServe.
Microarray: An Introduction
Investigate Plan Design Create Evaluate (Test it to objective evaluation at each stage of the design cycle) state – describe - explain the problem some.
Patricia Gonzalez, OSEP June 14, The purpose of annual performance reporting is to demonstrate that IDEA funds are being used to improve or benefit.
INFERENCE FOR BIG DATA Mike Daniels The University of Texas at Austin Department of Statistics & Data Sciences Department of Integrative Biology.
Bayesian Variable Selection in Semiparametric Regression Modeling with Applications to Genetic Mappping Fei Zou Department of Biostatistics University.
Applied Biostatistics: Lecture 4
Theme (i): New and emerging methods
RNA-Seq analysis in R (Bioconductor)
 The human genome contains approximately genes.  At any given moment, each of our cells has some combination of these genes turned on & others.
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
Genomewide profiling of chromatin accessibility in prostate cancer specimens Genomewide profiling of chromatin accessibility in prostate cancer specimens.
Presentation transcript:

Dissertation work in Functional Genomics Naim Rashid 4 th Year PhD, Biostatistics

Research in Functional Genomics How various processes dynamically interact, and in turn, affect gene expression and protein translation Misregulation/errors provide insight into how diseases develop and how to treat them The Central Dogma of Molecular Biology

Research in Functional Genomics The Central Dogma of Molecular Biology Protein-DNA InteractionModulation of Open Chromatin Analysis of Gene Expression DNA MethylationCopy Number Abberations

Research in Functional Genomics The Central Dogma of Molecular Biology Protein-DNA InteractionModulation of Open Chromatin Analysis of Gene Expression DNA MethylationCopy Number Abberations Measuring this activity genome-wide for a given data type generates an enormous amount of data. Methods for finding interactions between data types and impact on outcomes (gene expression, disease) are also relevant. This is which is where we come in as statisticians

Research in Functional Genomics The Central Dogma of Molecular Biology Protein-DNA InteractionModulation of Open Chromatin Analysis of Gene Expression DNA MethylationCopy Number Abberations The more basic question of how one measures this activity impacts how one develops methods to analyze it. Dependent on what technology is used to measure activity

Dissertation Work First project is dealing with the more basic question of how to best analyze data from a Next Generation Sequencing platform (new) Microarrays have been the popular platform to assess genomic activity over the last decade Poised to overtake microarrays in coming years Build basic analysis framework for this new platform  answer more complicated questions later

Next Generation Sequencing Microarray

Statistical Issues with NGS The local genomic percentage of G and C nucleotides has been shown to have an effect on signal, and this effect has been shown to differ between procedure types (above) and even individuals of the same cell type. Several other biases have been identified

Statistical Issues with NGS The way one chooses to represent the raw sequencing data impacts the ability one has to model it. Left is easier to model, but has bad resolution and is subject to boundary effects. Right has excellent resolution, but each point is strongly correlated with adjacent ones, harder to model.

Applied Dissertation Project How to adjust for multiple biases simultaneously, and incorporate this information into selecting enriched regions? Which biases to use, how to score them and check if relevant? What is the best type of data representation to select enriched regions? How to overcome the limitations of each If these questions can be addressed, how do you realistically implement this method for biologists to use?

Development of New Method FAIRE procedure (Lieb lab) is affected more by these biases than others on NGS platforms Met with lab on a biweekly basis to address these issues. Had to work closely together to develop the right approach Communication is key, led to greater relevance of method and extensive testing on real data Collaboration ≠

ZINBA Collaboration resulted in a method called ZINBA (Zero- Inflated Negative Binomial Algorithm) Mixture of regression models capturing BG (artificial signal) and enriched regions Regression aspect allows us to estimate effect and relevance of each bias Model uses window representation to select enriched windows, then calculates overlap representation to get refined boundaries within enriched windows

Application to Real Data Our method provides a more appropriate fit to the data Compared to other existing methods, our top calls isolate more relevant regions than others

How The Department Made This Work Possible Genomics training grant allowed me to devote a lot of time to this project Theory classes – Knowledge of Generalized Linear Model Theory – The Expectation-Maximization Algorithm – Poisson Processes Learning C led to implementation of C-based translations of R GLM functions, 12x speedup, memory savings Made Zinba feasible to be used by biologists

Future project topics More theoretical paper addressing issues of variable selection in this mixture model context Comparing multiple samples, look for differential enrichment across samples. Develop longitudinal framework to detect changes in enrichment over several time points Tie together multiple data types to look for patterns related to disease states or gene expression.

Questions? Feel free to me at or catch me later