Comparing protein structure and sequence similarities Sumi Singh Sp 2015.

Slides:



Advertisements
Similar presentations
Organisation Of Data (1) Database Theory
Advertisements

Access 2007 ® Use Databases How can Microsoft Access 2007 help you structure your database?
ICDL Software Applications - Database Concepts. Unit 6 Data and Data Representation Database Concepts –File Structure –Relationships Database Design –Data.
Integration of Protein Family, Function, Structure Rich Links to >90 Databases Value-Added Reports for UniProtKB Proteins iProClass Protein Knowledgebase.
RETRIEVING DATA FROM FCC LICENSE DATABASE Steps for obtaining query results, and importing it into MS Excel Spreadsheet.
Analyzing Systems Using Data Dictionaries Systems Analysis and Design, 7e Kendall & Kendall 8 © 2008 Pearson Prentice Hall.
Data Dictionary What does “Backordered item” mean? What does “New Customer info.” contain? How does the “account receivable report” look like?
3/5/2009Computer systems1 Analyzing System Using Data Dictionaries Computer System: 1. Data Dictionary 2. Data Dictionary Categories 3. Creating Data Dictionary.
1 Welcome to the Protein Database Tutorial This tutorial will describe how to navigate the section of Gramene that provides collective information on proteins.
Information & Library Services Australian Education Index, British Education Index and ERIC Sally Giffen August 2006.
3.1 Data and Information –The rapid development of technology exposes us to a lot of facts and figures every day. –Some of these facts are not very meaningful.
1 CS 430 / INFO 430 Information Retrieval Lecture 4 Searching Full Text 4.
Journal Citation Reports on the Web. Copyright 2006 Thomson Corporation 2 Introduction JCR distills citation trend data for 7,600+ journals from more.
Computational Biology, Part 10 Protein Structure Prediction and Display Robert F. Murphy Copyright  1996, 1999, All rights reserved.
High Throughput Processing of the Structural Information of the Protein Data Bank Zoltán Szabadka, Vince Grolmusz Department of Computer Science Eötvös.
1 CS 430 / INFO 430 Information Retrieval Lecture 4 Searching Full Text 4.
User guide Harris Broadcast May How to use Broadcast Go to: Click on broadcast.
Database Software Application
Introduction to databases Developed by Anna Feldman for the Association for Progressive Communications (APC)
Access 2007 ® Use Databases How can Access help you to find and use information?
Using 3D-SURFER. Before you start 3D-Surfer can be accessed at For visualization.
Microsoft Office Word 2013 Expert Microsoft Office Word 2013 Expert Courseware # 3251 Lesson 4: Working with Forms.
MS Access: Database Concepts Instructor: Vicki Weidler.
With Microsoft Access 2007 Volume 1© 2008 Pearson Prentice Hall1 PowerPoint Presentation to Accompany GO! with Microsoft ® Access 2007 Volume 1 Chapter.
DAY 15: ACCESS CHAPTER 2 Larry Reaves October 7,
Excel Projects 5 & 6 Notes Mr. Ursone. Excel Project 5: Sorting a List  Sorting: Arranging records in a specific sequence  The Sort command is on the.
1 Lesson 22 Getting Started with Access Essentials Computer Literacy BASICS: A Comprehensive Guide to IC 3, 3 rd Edition Morrison / Wells.
DAY 14: ACCESS CHAPTER 1 Tazin Afrin October 03,
Copyright © 2007, Oracle. All rights reserved. Managing Concurrent Requests.
To enhance learning, service, and research through an advanced information technology environment. Our Mission:To enhance learning, service,and research.
 A database is a collection of data that is organized so that its contents can easily be accessed, managed, and updated. What is Database?
Lesson 17 Getting Started with Access Essentials
PROTEIN STRUCTURE CLASSIFICATION SUMI SINGH (sxs5729)
Access 2013 Microsoft Access 2013 is a database application that is ideal for gathering and understanding data that’s been collected on just about anything.
Chapter 17 Creating a Database.
XP Chapter 1 Succeeding in Business with Microsoft Office Access 2003: A Problem-Solving Approach 1 Preparing To Automate Data Management Chapter 1 “You.
Access 2007 ® Use Databases How can Microsoft Access 2007 help you structure your database?
Capabilities of Software. Object Linking & Embedding (OLE) OLE allows information to be shared between different programs For example, a spreadsheet created.
26 Mar 04 1 Application Software Practical 5/6 MS Access.
Lesson 01: Introduction to Database Software. At the end of this lesson, students should be able to: State the usage of database software. Start a database.
Using EBSCOhost databases Access via MyAthens Click on the EBSCOhost link.
SRS Introductory Course 5/12/ Temporary and permanent sessions - Simple querying - Browsing indices - Standard and extended query forms - User defined.
Analyzing Systems Using Data Dictionaries Systems Analysis and Design, 8e Kendall & Kendall 8.
Verification & Validation. Batch processing In a batch processing system, documents such as sales orders are collected into batches of typically 50 documents.
Lesson 2: Designing a Database and Creating Tables.
1 MS Access. 2 Database – collection of related data Relational Database Management System (RDBMS) – software that uses related data stored in different.
Lesson 13 Databases Unit 2—Using the Computer. Computer Concepts BASICS - 22 Objectives Define the purpose and function of database software. Identify.
CS307P-SYSTEM PRACTICUM CPYNOT. B13107 – Amit Kumar B13141 – Vinod Kumar B13218 – Paawan Mukker.
The World Wide Web. What is the worldwide web? The content of the worldwide web is held on individual pages which are gathered together to form websites.
Producing a Mail Merged Letter Step 1 Create an Access database for Names and Addresses you can use the ‘Customers’ template in Group Work. Enter the necessary.
©CMBI 2008 Databases Data must be in a certain format for software to recognize Every database can have its own format but some data elements are essential.
1 CS 430: Information Discovery Lecture 5 Ranking.
Microsoft Office 2013 Try It! Chapter 4 Storing Data in Access.
CS307P-SYSTEM PRACTICUM CPYNOT. B13107 – Amit Kumar B13141 – Vinod Kumar B13218 – Paawan Mukker.
GENBANK FILE FORMAT LOCUS –LOCUS NAME Is usually the first letter of the genus and species name, followed by the accession number –SEQUENCE LENGTH Number.
Basic Navigation in Oracle R12 BY: Muhammad Irfan.
ACCESS LESSON 1 DATABASE BASICS VOCABULARY. BACKSTAGE VIEW A menu of options and commands that allows you to access various screens to perform common.
Biocomputational Languages December 1, 2011 Greg Antell & Khoa Nguyen.
General Architecture of Retrieval Systems 1Adrienn Skrop.
PROTEIN IDENTIFIER IAN ROBERTS JOSEPH INFANTI NICOLE FERRARO.
Information Retrieval Inverted Files.. Document Vectors as Points on a Surface Normalize all document vectors to be of length 1 Define d' = Then the ends.
Sequence: PFAM Used example: Database of protein domain families. It is based on manually curated alignments.
Why indexing? For efficient searching of a document
Prepared By: Bobby Wan Microsoft Access Prepared By: Bobby Wan
GO! with Microsoft Office 2016
Chapter 9 Bank Master Records
Using ArrayExpress.
GO! with Microsoft Access 2016
Spreadsheets, Modelling & Databases
The ultimate in data organization
Presentation transcript:

Comparing protein structure and sequence similarities Sumi Singh Sp 2015

Learning goals To get a good understanding of vector space model. To be able to compute similarity between documents. To be able to rank the output documents based on their similarity to query document.

Dataset Proteins are made up of amino acid sequences of various lengths. The average length being 300 amino acid long. There are total 20 possible amino acids. The representation of proteins is in a specific format called PDB format (discussed later). PDB stands for protein database and is very large online repository of proteins.

Protein Data Bank (PDB) Protein Data Bank (PDB) is a large online database that keeps various information on proteins including sequence information. Web address: PDB ID: A 4-character PDB ID is assigned to each new structure at the time of deposition. The IDs are automatically assigned and do not have meaning. However, they serve as the unique, immutable identifier of each entry in the Protein Data Bank. As such, they are used throughout the scientific literature (e.g. in journal articles and in other databases) to refer to entries in the Protein Data Bank. Hence, if the PDB ID of an entry in the Protein Data Bank is known, it is the most direct way to retrieve it from the database. How to get protein file using PDB id? Go to the link below for access details Use the link below with wget to get the uncompressed PDB file for a given protein pdb/files/xxxx.pdb Where xxxx is the 4 character PDB id of a protein.

What to extract Protein is made up of amino acids. There are ONLY 20 possible amino acids. These amino acids are represented by their three letter abbreviation. To get the sequence information of a protein, you need to extract the amino acid from the PDB file for each protein.

Sequence information-How to extract For each PDB file corresponding to a given protein, get all the amino acid THREE letter codes from column that satisfy the following criteria: – The record name is ATOM (column 1-6) – The atom name if CA ( column 13-16) There will be several repeating amino acids

How to use the extracted information Save the extracted sequence in a sequence repository, to ensure availability for future matches. Use vector space model to represent each protein with features as amino acid. Use a distance/similarity measure to calculate the similarity of an unknown protein with the proteins stored locally.

Requirements of submission A GUI that gives user option to enter a PDB ID. Checks if the sequence of protein with that ID is in the local directory/repository. If not get the PDB file for that protein from the online database and extract the sequence information, save it. Perform the pair wise similarity calculation with the rest of the proteins in the local repository. Display ranked output with respect to similarity.

References Vector space model: Distance measure: PDB format: format/format33/v3.3.html Contact: