Structure Modeling and Bioimage informatics Unit 26 BIOL221T: Advanced Bioinformatics for Biotechnology Irene Gabashvili, PhD
Abstracts – approximate guidelines Motivation: Why do we care?(importance, difficulty, impact). Motivation: Why do we care?(importance, difficulty, impact). Problem statement: What problem are you trying to solve? What is the scope of your work? Problem statement: What problem are you trying to solve? What is the scope of your work? Approach: How did you go about solving or making progress on the problem? What was the extent of your work? Approach: How did you go about solving or making progress on the problem? What was the extent of your work? Results: What's the answer? Results: What's the answer?
Abstracts Limits: paragraph, ~ words, one double-spaced page… More to include: Limits: paragraph, ~ words, one double-spaced page… More to include: Numbers – if possible: How many genes, SNPs, sequence identity.. xx percent faster, cheaper, smaller, better Numbers – if possible: How many genes, SNPs, sequence identity.. xx percent faster, cheaper, smaller, better Conclusions: What are the implications? Have you found a path to change the world, was it a nice hack, or a road sign indicating that this path is a waste of time (all is useful!). Can you generalize? Conclusions: What are the implications? Have you found a path to change the world, was it a nice hack, or a road sign indicating that this path is a waste of time (all is useful!). Can you generalize?
How will projects be graded? Originality, structure, and scope Originality, structure, and scope No copy/paste from the web – but it’s Ok to reference the source - publications & websites No copy/paste from the web – but it’s Ok to reference the source - publications & websites
Proteins play key roles in a living system Three examples of protein functions Three examples of protein functions Catalysis: Almost all chemical reactions in a living cell are catalyzed by protein enzymes. Catalysis: Almost all chemical reactions in a living cell are catalyzed by protein enzymes. Transport: Some proteins transports various substances, such as oxygen, ions, and so on. Transport: Some proteins transports various substances, such as oxygen, ions, and so on. Information transfer: For example, hormones. Information transfer: For example, hormones. Alcohol dehydrogenase oxidizes alcohols to aldehydes or ketones Haemoglobin carries oxygen Insulin controls the amount of sugar in the blood
Amino Acid versus Residue C COOHH2NH2N H R C CO N H R H Amino AcidResidue
Amino acid: Basic unit of protein COO - NH 3 + C R H An amino acid Different side chains, R, determin the properties of 20 amino acids. Amino groupCarboxylic acid group
The DSSP code "Dictionary of Protein Secondary Structure" G = 3-turn helix (3 10 helix). Min length 3 residues. G = 3-turn helix (3 10 helix). Min length 3 residues.3 10 helix3 10 helix H = 4-turn helix (alpha helix). Min length 4 residues. H = 4-turn helix (alpha helix). Min length 4 residues.alpha helixalpha helix I = 5-turn helix (pi helix). Min length 5 residues. I = 5-turn helix (pi helix). Min length 5 residues.pi helixpi helix T = hydrogen bonded turn (3, 4 or 5 turn) T = hydrogen bonded turn (3, 4 or 5 turn) E = beta sheet in parallel and/or anti-parallel sheet conformation (extended strand). Min length 2 residues. E = beta sheet in parallel and/or anti-parallel sheet conformation (extended strand). Min length 2 residues.beta sheetbeta sheet B = residue in isolated beta-bridge (single pair beta-sheet hydrogen bond formation) B = residue in isolated beta-bridge (single pair beta-sheet hydrogen bond formation) S = bend (the only non-hydrogen-bond based assignment) S = bend (the only non-hydrogen-bond based assignment)
Protein structure Primary structure (Amino acid sequence) ↓ Secondary structure ( α -helix, β -sheet ) ↓ Tertiary structure ( Three-dimensional structure formed by assembly of secondary structures ) ↓ Quaternary structure ( Structure formed by more than one polypeptide chain )
20 Amino acids Glycine (G) Glutamic acid (E) Asparatic acid (D) Methionine (M) Threonine (T) Serine (S) Glutamine (Q) Asparagine (N) Tryptophan (W) Phenylalanine (F) Cysteine (C) Proline (P) Leucine (L) Isoleucine (I) Valine (V) Alanine (A) Histidine (H) Lysine (K) Tyrosine (Y) Arginine (R) Yellow: Hydrophobic, Green: Hydrophilic, Red: Acidic, Blue: Basic
Proteins are linear polymers of amino acids R1R1 NH 3 + C CO H R2R2 NH C CO H R3R3 NH CCO H R2R2 NH 3 + C COO ー H + R1R1 NH 3 + C COO ー H + H2OH2O H2OH2O Peptide bond The amino acid sequence is called as primary structure AA F N G G S T S D K A carboxylic acid condenses with an amino group with the release of a water
Amino acid sequence is encoded by DNA base sequence in a gene ・CGCGAATTCGCG・・CGCGAATTCGCG・ ・GCGCTTAAGCGC・・GCGCTTAAGCGC・ DNA molecule = DNA base sequence
Amino acid sequence is encoded by DNA base sequence in a gene Second letter TCAG First letter T TTT Phe TCT Ser TAT Tyr TGT Cys T Third letter TTCTCCTACTGCC TTA Leu TCATAA Stop TGA Stop A TTGTCGTAGTGG Trp G C CTT Leu CCT Pro CAT His CGT Arg T CTCCCCCACCGCC CTACCACAA Gln CGAA CTGCCGCAGCGGG A ATT Ile ACT Thr AAT Asn AGT Ser T ATCACCAACAGCC ATAACAAAA Lys AGA Arg A ATG Met ACGAAGAGGG G GTT Val GCT Ala GAT Asp GGT Gly T GTCGCCGACGGCC GTAGCAGAA Glu GGAA GTGGCGGAGGGGG
Gene is protein’s blueprint, genome is life’s blueprint Gene Genome DNA Protein Gene Protein
Gene is protein’s blueprint, genome is life’s blueprint Genome Gene Protein Glycolysis network
Each Protein has a unique structure Amino acid sequence NLKTEWPELVGKSVEE AKKVILQDKPEAQIIVL PVGTIVTMEYRIDRVR LFVDKLDNIAEVPRVG Folding!
Basic structural units of proteins: Secondary structure α-helix β-sheet Secondary structures, α-helix and β- sheet, have regular hydrogen-bonding patterns.
Three-dimensional structure of proteins Tertiary structure Quaternary structure
Close relationship between protein structure and its function enzyme A B A Binding to A Digestion of A! enzyme Matching the shape to A Hormone receptor Antibody Example of enzyme reaction enzyme substrates
More Links BLOCKS: BLOCKS: BLOCKS: BLOCKS: mepage.html mepage.html mepage.html mepage.html Eva: Cubic.bioc.columbia.edu/eva Eva: Cubic.bioc.columbia.edu/eva Jpred: Jpred: LOC3D: cubic.bioc.columbia.edu/db/LOC3D LOC3D: cubic.bioc.columbia.edu/db/LOC3D Pfam: Pfam:
More Links PredictProtein PredictProtein ProfTMB: ProfTMB: PROSITE: PROSITE: ProtFun: ProtFun: PSIPRED: PSIPRED: PSORT: PSORT: SAM-T99 - discontinued SAM-T99 - discontinued SOSUI: SOSUI: TargetP: TargetP:
Databases PDB: PDB: MSD: MSD: MMDB: MMDB: PDBSum: PDBSum: TargetDB: targetdb.pdb.org/ TargetDB: targetdb.pdb.org/
PDBsum provides an at-a-glance overview of every macromolecular structure deposited in the Protein Data Bank (PDB), giving schematic diagrams of the molecules in each structure and of the interactions between them. provides an at-a-glance overview of every macromolecular structure deposited in the Protein Data Bank (PDB), giving schematic diagrams of the molecules in each structure and of the interactions between them. srv/databases/pdbsum/ srv/databases/pdbsum/ srv/databases/pdbsum/ srv/databases/pdbsum/GetPage.pl
More links AbCheck - Antibody Sequence Test AbCheck - Antibody Sequence Test Atlas of protein Side chain interactions Atlas of protein Side chain interactions /index.html# /index.html# /index.html# /index.html# The beta-turn prediction server: The beta-turn prediction server: ex.html ex.html ex.html ex.html
More links CATH – protein structure classification: CATH – protein structure classification: Protein Ligand Interactions: Protein Ligand Interactions:
More links DB Browser, including protein sequence/structure DBs DB Browser, including protein sequence/structure DBs Dictionary of Homologous superfamilies: Dictionary of Homologous superfamilies: PROCAT – a DB of 3D enzyme active site templates: PROCAT – a DB of 3D enzyme active site templates: /PROCAT.html /PROCAT.html
More links DOMPLOT – annotation by ligands: DOMPLOT – annotation by ligands: Enzymes Structure database: Enzymes Structure database: ndex.html ndex.html ndex.html ndex.html Gene3D Gene3D
More links The Scorecons Server (scores residue conservation in a multiple sequence alignment) The Scorecons Server (scores residue conservation in a multiple sequence alignment) bin/valdar/scorecons_server.pl bin/valdar/scorecons_server.pl
3D enzyme active site templates PROCAT: /PROCAT.html PROCAT: /PROCAT.html /PROCAT.html /PROCAT.html PROCAT has now been superseded by the Catalytic Site Atlas: srv/databases/CSA/ PROCAT has now been superseded by the Catalytic Site Atlas: srv/databases/CSA/ Catalytic Site Atlas Catalytic Site Atlas
More Links Protein Nucleic Acid interaction Server Protein Nucleic Acid interaction Server er/ er/ er/ er/ Protein DNA interaction, tax Protein DNA interaction, tax prot_dna.html prot_dna.html prot_dna.html prot_dna.html SAS (Sequences Annotated by Structure) SAS (Sequences Annotated by Structure) srv/databases/sas/ srv/databases/sas/
More Links NACCESS – calculates residue accessibilities NACCESS – calculates residue accessibilities The SURFNET program generates surfaces and void regions between surfaces from coordinate data supplied in a PDB file The SURFNET program generates surfaces and void regions between surfaces from coordinate data supplied in a PDB file /surfnet.html /surfnet.html
Prediction Homology Modeling: >30% Homology Modeling: >30% Threading – picks up where homology leaves off Threading – picks up where homology leaves off Ab initio structure prediction Ab initio structure prediction
Validation DSSP DSSP PROCHEK: ck/procheck.html PROCHEK: ck/procheck.html VADAR VADAR Verify3D: Verify3D:
Visualization Cn3D Cn3D UCSF Chimera (MidasPlus) UCSF Chimera (MidasPlus) Rasmol ProteinExplorer Rasmol ProteinExplorer
Bioimaging NIH sites for image processing software: NIH sites for image processing software: NIH IMAGE NIH IMAGEhttp://rsb.info.nih.gov/nih-image/ Spider & Web: Spider & Web: EMAN : EMAN :
DICOM The Digital Imaging and Communications in Medicine standard The Digital Imaging and Communications in Medicine standard For all medical imaging modalities, such as CT scans, MRIs, and ultrasound. For all medical imaging modalities, such as CT scans, MRIs, and ultrasound. All image files which are compliant with Part 10 of the DICOM standard (available in DocSharing) are DICOM format files All image files which are compliant with Part 10 of the DICOM standard (available in DocSharing) are DICOM format files
HumansAnimal models Mutant Gene Mutant or missing Protein Mutant Phenotype (disease) Mutant Gene Mutant or missing Protein Mutant Phenotype (disease) Mutant Gene Mutant or missing Protein Mutant Phenotype (disease model) Disease models
SHH -/+ SHH -/- shh -/+ shh -/-