Presentation is loading. Please wait.

Presentation is loading. Please wait.

Chip arrays and gene expression data. Motivation.

Similar presentations


Presentation on theme: "Chip arrays and gene expression data. Motivation."— Presentation transcript:

1 Chip arrays and gene expression data

2 Motivation

3 With the chip array technology, one can measure the expression of all genes at once (even all exons). Can answer questions such as: 1.Which genes are expressed in a muscle cell? 2.Which genes are expressed during the first weak of pregnancy in the mother? In the new baby? 3.Which genes are expressed in cancer?

4 4. If one mutates a TF: which genes are not expressed following this change? 5. Which genes are not expressed in the brain of a retarded baby? 6. Which genes are expressed when one is asleep versus when the same person is awake?

5 Techonology

6 DNA chip: in each spot there’s a specific marked DNA molecule. Upon hybridization with a marked mRNA molecule (or cDNA one) – the intensity of the hybridization can be quantified by light.

7 Affymetrix: The base is a “wafer” מצע גבישי מוליך למחצה דק A light-sensitive chemical compound that prevents coupling between the wafer and the first nucleotide of the DNA probe being created.

8 The blue “cap” is light sensitive. A mask is added to some of the cells. When the cells are illuminated, only where there is light – a reaction with a nucleotide can happen. Affymetrix

9 The nucleotide that is added is also chemically linked with a new “cap” (light sensitive). Affymetrix

10 The entire process is called photolithography Affymetrix

11

12 Affymetrix: each probe is 25 bp – a part of an exon. The reader The chip itself In one cm 2 > 10 6 different oligos. Affymetrix

13 Affymetrix: each probe is 25 nucleotides. Above this, a technological problem exists: the synthesis becomes inaccurate. With such short probes, each mRNA can hybridize to more than one probe. The solution, each gene is “covered” by several probes. Affymetrix

14 Affymetrix: one can buy ready-made chips (human genome, mouse genome), or he can design (“print”) his own chip (more expensive). Affymetrix

15 Detection: mRNA is isolated from the tissue Affymetrix (cells, viruses). cDNA is synthesized. The cDNA is fluorescently labeled. Sometimes, the cDNA is amplified using PCR. The intensity in each cell (probe) is measured by “the reader”.

16 Microarray movie From:

17 Agilent Developed DNA printers – in each spot pico-liters of nucleotides are added. They can make probes up to 60 mers (Agilent is derived from Hewlett-Packard). Agilent Standard phosphoramidite chemistry

18 Hybridization to Agilent probes is more accurate. If there is hybridization, to a probe, the gene it represents is probably expressed. Agilent

19 But, it is impossible to know how many probes are in each cell. So absolute fluorescent intensities are meaningless. Agilent

20 Solution, in the same experiment, hybridize samples with two conditions: healthy mRNA (in Red) versus tumor cells (green). The Agilent reader will give the ratio of the two colors. Agilent

21 In this approach, long cDNA sequences (>300bp) are produced in a cell (a clone) and are linked to each chip cell. Producing long cDNA rather than synthesizing them a nucleotide at a time is cheaper! As in the case of Agilent, it is impossible to control the number of probes in each cell. Stanford cDNA chips

22

23 Analyzing Output

24 Output Brain tumor females Brain tumor males w.t Gene 1 Gene 2 Gene 3 Gene 25,000 Each cell is either an absolute number or a relative one, depending on the technology used.

25 Repeats Brain tumor female1 Brain tumor male2 Brain tumor male1 w.t Gene 1 Gene 2 Gene 3 Gene 25,000 The repeat can either be the same sample – a different chip or a “real” biological repeat – a different sample.

26 Expression profile bt4bt3bt2bt1wt4wt3wt2wt1 231716154534g1 97366457g2 603026255232g3 Genes 1 and 3 show the same trend (go both high under the same conditions). That is: they have the same expression profile.

27 Clustering bt4bt3bt2bt1wt4wt3wt 2 wt 1 231716154534g1 97366457g2 603026255232g3 In general, we want to find all the genes that share the same expression profile → suggestive of a functional linkage. There are clustering algorithms, which do exactly that.

28 Clustering bt4bt3bt2bt1wt4wt3wt 2 wt 1 2302204534g1 90806457g2 1661605232g3 Clustering of the conditions can suggest two types of brain tumor (bt)

29 Clustering bt4bt3bt2bt1wt4wt3wt 2 wt 1 2302204534g1 90806457g2 37165232g3 Bi-clustering: both on the conditions and the genes.

30 Applications

31 Think of increasing the glucose concentration of E.coli and making a chip array in various concentration. One can potentially discover all genes in the glucose pathway. Knocking out a gene → discover all genes that interact with it.

32 Applications Analyzing expression of genes can help reveal the gene network of a given organism.

33 Gene network

34 Clinical /  11g1 4g2 0g3 Do someone has a brain tumor? bt4bt3bt2bt1wt4wt3wt 2 wt 1 2302204534g1 90806457g2 1661605232g3

35 MammaPrint Used to assess the risk that a breast tumor will spread to other parts of the body (metastasis). It is based on the well- known 70-gene breast cancer gene signature In February, 2007 the FDA cleared the MammaPrint test for use in the U.S

36 Sequence by hybridization It was thought that the following procedure could work for sequencing a genome: 1.Make a chip containing all x mers (e.g., x = 25). 2.Hybridize a genome to the chip. 3.By analyzing all the hybridizations with their overlaps – assemble the genome. Problem: it doesn’t work.

37 ChIP-on-chip : A method for measuring protein-DNA interaction. Proteins that bind DNA includes: Those responsible for transcription regulation Transcription factors (TFs) Replication proteins Histones…

38 ChIP-on-chip: One chip is for Chromatin ImmunoPrecipitation and the second chip is for DNA microarrays. The method is used mostly to detect TF binding sites.

39 ChIP-on-chip:

40 Tiling arrays Here the chip array should include not only protein coding genes but also control regions, or simply – the entire genome.

41 Deep sequencing movie From: http://www.illumina.com/

42 Deep sequencing reads Yoder-Himes D.R. et al. PNAS (2009)

43 Protein-Protein interaction (PPI)

44 Some facts: Human genome, 20,000-30,000 genes, ~500,000 proteins. At a given time in a cell 10,000 proteins are present. (Proteome). Estimate of >80% of proteins interact. The network includes hubs.

45 Large scale studies of protein-protein interactions (PPIs) give very noisy data: 40-80% of interactions are false negatives (true interactions that are unidentified). 30-60% of interactions are false positives (interactions that are inferred but are not real).

46 Method 1: affinity tag purification of complexes in vivo. Say we want to know what interact with protein X. We construct a plasmid with the gene coding for X which will be used as bait (blue) fused to a known tag (in white)

47 In the cell, protein X fused to the bait is expressed, and interacts with some proteins. The cells are lysed and the protein complex is isolated using a solid support linked to a ligand that can interact with the bait. Method 1: affinity tag purification of complexes in vivo.

48 Bound proteins are eluted, separated on a gel and identified using mass spectroscopy (MS). The method is biased towards proteins of high abundance. Method 1: affinity tag purification of complexes in vivo.

49 Method 2: yeast two hybrid system. Some transcription factors are composed of two domains: BD which Binds the DNA and AD (in red), which activate transcription. They need to interact in order to express the gene.

50 yeast two hybrid system. In order to check if protein A (bait) interacts with protein B (prey), protein A is expressed fused to AD, and protein B fused to BD. Only if A and B interact – the reporter gene will be expressed.

51 Protein-protein interactions are fundamental for functional annotation. If X interacts with Y & Y is known to be related to muscle development, maybe X is also related to muscle development. “Guilt by association”


Download ppt "Chip arrays and gene expression data. Motivation."

Similar presentations


Ads by Google