Presentation is loading. Please wait.

Presentation is loading. Please wait.

Regulatory Signatures Inferred From Gene Expression Data Jayanth (Jay) Krishnan SBCNY Fellow Mahopac High School Mount Sinai School of Medicine Bio-Engineering/Bioinformatics.

Similar presentations


Presentation on theme: "Regulatory Signatures Inferred From Gene Expression Data Jayanth (Jay) Krishnan SBCNY Fellow Mahopac High School Mount Sinai School of Medicine Bio-Engineering/Bioinformatics."— Presentation transcript:

1 Regulatory Signatures Inferred From Gene Expression Data Jayanth (Jay) Krishnan SBCNY Fellow Mahopac High School Mount Sinai School of Medicine Bio-Engineering/Bioinformatics

2 Central Questions What causes cells to become malignant? How can we reverse the harmful effects of cancer?

3 The Wetlab Approach Onconase and Amphinase, the Antitumor Ribonucleases from Rana pipiens Oocytes Ardelt W, Shogen K, Darzynkiewicz Z. Ardelt WShogen KDarzynkiewicz Z – New York Medical College: Cancer Biology – X mol of drug + chemotherapeutic agent + cancer cells = Observation of cytostatic and cytotoxic properties – Accurate, but search space too large

4 New Methodology: My Approach Bioinformatics and mathematical modeling to prune search space – Efficient – Faster – Economically Sound – Easily Reproducible Wetlab biology = verification

5 Experimental Goals Use Bioinformatics to identify the regulatory signatures for 60 different Cancer Cell Lines – Transcription Factors, Protein subnetworks, Kinases Identify relationships between cancers/regulatory components Implement a quantitative method to predict drugs for each cancer cell line

6 Work Flow: Phase 1 NCI -60 database mRNA profile analysis Use statistical techniques to compute over/under expressed genes Phase 1

7 Phase 1: NCI-60 database The database gives gene expression values for each gene – cancer line pair using several experimental probes. Standard statistics are computed Perl program used to process the data from the NCI-60 database. NCI Citation: "DTP - Cell Lines in the In Vitro Screen." Developmental Therapeutics Program NCI/NIH. Web. 10 June. 2010..

8 Phase 1: Representation of the NCI-60 Identifying over and under expressed genes Gene “ G ” Cancer 1Cancer 2Cancer 3 ……….. Cancer 59 Cancer 60 Probe 1N(1,1)N(1,2)N(1,3)N(1,59)N(1,60) Probe 2N(2,1)N(2,2)N(2,3)N(2,59)N(2,60) Probe 3N(3,1)N(3,2)N(3,3)N(3,59)N(3,60) ….. Probe SN(S,1)N(S,2)N(S,3)N(S,59)N(S,60) Table 1: Depiction of the NCI-60 database for a single gene. The columns indicate the cancer cell lines while the rows show the probes. The intersections show the mRNA or expression value.

9 Statistics Two sided Z test with a.025 p value was used to determine whether the gene is disregulated S Sample mean Xbar(c) for cancer cell line “c” = ∑ N(i,c) / S i=1 60 Population mean µ = ∑ Xbar(i) / 60. This is the mean across all 60 cancer cell lines. i=1 60 Standard deviation σ = sqrt (∑ (Xbar(i) - µ)(Xbar(i) - µ) / 59) i=1 Test statistic(c) for cancer cell line “c” = (Xbar(c) - µ) / σ Assuming a significance level of α, Gene G over expressed for cancer cell line “c”:Test statistic(c) > Z(α/2) Gene G under expressed for cancer cell line “c”:Test statistic(c) < -Z(α/2)

10 Top 223 Over Expressed Genes for MDA_N

11 Work Flow Phase 2 NCI -60 database mRNA profile analysis Use statistical techniques to compute over/under expressed genes Phase 1 Chip Enrichment Analysis - ChEA Determine top ranked transcription factors responsible for the over/under expressed genes Genes2Networks Identify protein sub-networks that “connect” the transcription factors through additional proteins Created database Kinase Enrichment Analysis - KEA Top ranked protein kinases regulating the protein subnetworks Existing database Phase 2

12 Phase 2: Creation of a system to predict transcription factors ChIP-on-chip and ChIP-Seq data is gathered from prior experiments Extraction of data from the supplemental Excel spreadsheets and PDF tables Creation of a database of mammalian ChIP data

13 Phase 2: ChIP Enrichment Analysis ChIP Enrichment Analysis (ChEA) – 100,000 (TF-to-gene) interactions extracted from over 60 publications. – 80 transcription factors and the thousands of target genes which they potentially regulate The accumulated data is then manipulated using a user friendly system which implements the Fisher’s Exact Test

14 Software Inputs The over and under expressed genes from the NCI-60 are inputted into ChEA to get transcription factors The top transcription factors are inputted to Genes2Networks (Ma’ayan Lab) to get protein subnetworks The subnetworks are inputted into Kinase Enrichment Analysis (Ma’ayan Lab) to get kinases

15 Materials and Methods: Phase 1

16 Berger SI, Posner JM, Ma'ayan A. Genes2Networks: connecting lists of gene symbols using mammalian protein interactions databases. BMC Bioinformatics. 2007 Oct 4;8:372. Berger SI, Posner JM, Ma'ayan A. Genes2Networks: connecting lists of gene symbols using mammalian protein interactions databases. BMC Bioinformatics. 2007 Oct 4;8:372.

17 Alexander LachmannAlexander Lachmann and Avi Ma'ayan. KEA: Kinase Enrichment Analysis. Bioinformatics 25:684-6 (2009) PMID: 19176546.Avi Ma'ayanKEA: Kinase Enrichment Analysis. Bioinformatics 25:684-6 (2009) PMID: 19176546.

18 Work Flow: Phase 3 and 4 NCI -60 database mRNA profile analysis Use statistical techniques to compute over/under expressed genes Phase 1 Chip Enrichment Analysis - ChEA Determine top ranked transcription factors responsible for the over/under expressed genes Genes2Networks Identify protein sub-networks that “connect” the transcription factors through additional proteins Created database Kinase Enrichment Analysis - KEA Top ranked protein kinases regulating the protein subnetworks Existing database Phase 2 Compute integrated matrices for transcription factors, protein complexes and kinases vs. cancer cell lines Use MATLAB to form heat maps and dendrograms and Use principal component analysis to determine clusters. Phase 3 Phase 4

19 Phase 3: Creation of Integrated Matrices

20 MATLAB: Results and Analysis MATLAB code written to find relationships between the regulatory signatures and cancer cell lines Boxplots, dendrograms, principal component analysis, and similarity heat maps were created

21

22

23 Principal Component Analysis Convert a set of observations of possibly correlated variables into a set of values of uncorrelated variables called principal components. A (n x n) covariance matrix is created for each pair of signatures or cancer cell lines Eigen vectors are computed The vectors with the highest Eigen values are the principal components. Data replotted with principal components as axes

24

25

26 Work Flow: Phase 5 & 6 – Identifying drugs to reverse the effects of cancer Future Research

27 Predicting Drugs CMAP database contains 500 drugs and associated genes for each drug Intersection of down regulating genes of the drug and up regulating genes of the cancer Jaccard coeffficient was calculated for each cancer cell line – Drug with the highest Jaccard co-efficient is chosen – Can be calculated at the gene/transcription factor/kinase levels

28 Case Studies and Future Research MG-132 was identified as the top drug for the BR:T47D (Breast) cancer cell line. 6 case studies were performed confirming our prediction of the regulatory signatures and drugs by comparing it with wet lab data Drugs are being submitted to Mount Sinai wet lab department

29 Conclusion: What was accomplished? 1) A web interface was developed and published to identify transcription factors 2) Entire regulatory signatures identified for 60 cancer cell lines 3) Matlab analysis to group cancer lines and regulatory components 4) Drugs Predicted for all 60 Cancer Cell Lines 5) Case Studies performed ; Wet Lab verification being done

30 Acknowledgements Dr. Avi Ma’ayan - Science Research Mentor Mr. Mark Langella – Adult Sponsor, Mahopac High School Mr. Bilyeu – Principal, Mahopac High School Mr. Manko – Superintendent of Mahopac Schools Board of Education Art Department


Download ppt "Regulatory Signatures Inferred From Gene Expression Data Jayanth (Jay) Krishnan SBCNY Fellow Mahopac High School Mount Sinai School of Medicine Bio-Engineering/Bioinformatics."

Similar presentations


Ads by Google