Proteomics ABC 23,000 genes in the Genome but Dynamic Range

Proteomics ABC 23,000 genes in the Genome but Dynamic Range
ca. 1,000,000 proteins caused by Exon splicing 300+ Post-translational modifications Dynamic Range Cell 106, Plasma 1012 The Dynamic Proteome Temporal (milliseconds, month) Spatial (cell, organelle), Developmental (100+ cell types in the body, years) All proteins exist in dynamic complexes This determines their function and is highly dynamic While genomics has greatly facilitated proteomics projects, characterizing a proteome is considerably more complex than sequencing a genome. At the most basic level, there are far more proteins than genes in a eukaryotic organism. For example, humans possess approximately 25,000 genes, but are estimated to have between 200,000 and 2 million unique proteins. Many of these proteins are produced by alternative splicing. These splice variants are likely to have nonoverlapping functions. In addition, the exact proteins that are expressed at any given moment depend on a person’s age, health, and environmental stimuli. To complicate matters further, the diverse chemical properties of proteins make it difficult to develop a “one size fits all” approach to characterizing the proteome. Instead, a wide variety of technologies is necessary. The point here is the genome deals with 42 molecules per cell. mRNA is found at between copies per cell. Both can be amplified using PCR. Proteins however cannot be amplified and are found a concentration of between 1-1,000,000 copies per cell or 1-1,000,000,000,000 copies per litre in the blood. The aim of the lecture is to introduce you to the basic methods used in modern proteomics research. Afterwards you should be able to understand current literature and papers that refer to the use of these techniques. It is a short overview, for a more deeper introduction, please look at: Principles of Proteomics, RM Tyman. ISBN BIOS scientific publications. (2004) and Principles and Pratice of Biological Mass Spectrometry. C. Dass. ISBN (2006) Wiley Interscience. If your are seriously considering using proteomics techniques in the lab then the following (expensive) texts are highly recommended: Proteins and Proteomics, a laboratory manual. Richard Simpson. ISBN Cold Spring Harbour Press (2002) and Purifying Proteins for Proteomics, a laboratory manual. Richard Simpson. ISBN Cold Spring Haroubr Press (2004)

Gene Expression Central dogma of molecular biology
The original formulation of the one gene, one protein hypothesis is what was known as the central dogma of biology. We now know that this, although in principle a reasonable idea, it is incorrect. The genome sequences have given us the list of parts that a single gene can use to create the working RNA copy and combined with mRNA sequencing (sometimes called SAGE) where attempts have been made to sequence and quantitate all the mRNA molecules in a cell, we can see that one gene usually gives rise to at least 10 different variants.

Gene Structure In bacteria, with some exceptions, the one gene one protein hypothesis more or less holds. In multicellular organisms, genes are much more complex allow much greater flexibility. A single gene may exist as over hundreds of different mRNA transcripts which can give rise to proteins with vastly different functions depending on the environment of the gene. Recently small microDNA strands have been found which can control transcription opening a vast new area of biological study.

Genetic Code The basis of genomics is the use of three bases to code for a protein. The DNA is the information store which is then transcribed (photocopied )as mRNA and sent to the ribosomes to be translated into protein. This code was cracked using synthetic polymers in the late 1950’s to mid 60’s. The triplets are called codons and correspond to the 20 amino acids (and in certain circumstances, the 21st amino acids, selenocysteine). However three code for a start signal and two for a stop signal.

Codon Frequencies Codon frequency in genes
L A S G E V K I T D R P N F Q Y M H C W Amino Acid frequency in proteins The codon and amino acid frequencies correlate fairly well. The deviations are caused by genes that are infrequently transcribed and also due to the amplification effect. Some genes are highly transcribed and translated whereas others are not. The one and three letter codes are commonly used and will be shown with the structures of the amino acids in later slides.

Proteomics: One gene, -many proteins
gene (DNA) ~23.000 genes transcription (gene expression) form B form A mRNA (alternative splicing) form C ~ proteins translation Protein A Protein B Protein C phosphorylation glycosylation heterogenity confirmation P B1 B4 ~ proteins post-translational modifications of proteins P S B2 B3

Post-translational modifications
Proteolytic cleavage Fragmenting protein Addition of chemical groups Phosphorylation: activation and inactivation of enzymes Acetylation: protein stability, used in histones Methylation: regulation of gene expression Glycosylation: cell–cell recognition, signaling GPI anchor: membrane tethering Hydroxyproline: protein stability, ligand interactions Sulfation: protein–protein and ligand interactions Disulfide-bond formation: protein stability Deamidation: protein–protein and ligand interactions Ubiquitination: destruction signal Nitration of tyrosine: inflammation Protein function may be altered by posttranslational modifications as well. Posttranslational modifications are defined as any changes to the covalent bonds of a protein after it has been fully translated. These changes can be broken into two broad categories: proteolytic cleavage (i.e., fragmenting the protein) and the addition of chemical groups to one or more amino acids on the protein.

Plasma Components 40,000 forms of Proteins secreted into plasma
500 gene variants, x2 splices, x20 glycoforms,x 2 clip forms -500,000 forms of Tissue proteins 23,000 genes, 5 splice variants, 5 PTMs 10,000,000 clonal forms of immunoglobulins

Plasma Protein Composition

Genetic Component of Variation

Dynamic Range of Plasma

Protein Structure Protein structure can be divided into: - Primary (amino acid sequence) - Secondary (local folding structure) - Tertiary (overall fold of amino acid chain) - Quaternary (subunits composing functional protein) mRNA: 5’-AUGGCUUGUUUACGAAUU ’ 3 letter code: NH2-Met-Ala-Cys-Leu-Arg-Ile-... COOH 1 letter code MACLRI... In theory, by knowing the gene sequence one can predict the proteins that can be encoded by that gene. If we ignore any post-translational chemical modifications occurring, it should be possible to predict the three-dimensional structure just using the primary sequence. This has not been possible up to now with any degree of confidence, however with the rapid increase in the number of physically determined structures appearing in the databanks, it should only be a matter of time until robust algorithms are developed.

Hydrophobic Amino Acids
Aliphatic Aromatic Sulphur-containing Neutral

Hydrophilic Amino Acids
Polar Charged Partially Charged

Acid-Base Properties of Amino Acids
All amino acids have acidic and basic functional groups – carboxyl group is acidic – amino group is basic • Amino acids that lack charged R groups are zwitterions at neutral pH Aspartic and glutamic acids are negatively charged at neutral pH Arginine and lysine are positively charged at neutral pH O C OH NH 2 C H CH 3 O C O - NH 3 + C H CH

What is pKa? pK1 pK2 pKR pI Glycine 2.34 9.78 6.06 Alanine 2.35 9.69
pK1 pK2 pKR pI Glycine 2.34 9.78 6.06 Alanine 2.35 9.69 6.02 Isoleucine 2.36 Serine 2.21 9.15 5.68 Aspartic Acid 2.09 9.82 3.86 2.97 Asparagine 2.02 8.80 5.41 Glutamic Acid 2.19 9.67 4.25 3.22 Glutamine 2.17 9.13 5.65 Arginine 9.04 12.48 10.76 Lysine 2.18 8.95 10.53 9.74 • The pKa for a functional group is the pH at which the acidic or basic group on 50% of the molecules in a solution are ionised Amino acids can ionise their N-terminal amino group, the C-terminal carboxy group and sometimes the side chains At neutral pH 7, the charges are: Asp, Glu -1; His +1/0;; Cys 0/-1; Arg, Lys +1; Tyr 0

Primary Structure

Secondary Structure -Alpha Helices

Secondary Structure -Beta sheets

Tertiary and Quaternary Structure
Tertiary structure - fold of a given chain Quaternary structure - protein functional unit

The Four Levels of Protein Structure

Proteomics ABC 23,000 genes in the Genome but Dynamic Range

Similar presentations

Presentation on theme: "Proteomics ABC 23,000 genes in the Genome but Dynamic Range"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Proteomics ABC 23,000 genes in the Genome but Dynamic Range

Similar presentations

Presentation on theme: "Proteomics ABC 23,000 genes in the Genome but Dynamic Range"— Presentation transcript:

Similar presentations

About project

Feedback