Jack Snoeyink & Matt O’Meara Dept. Computer Science UNC Chapel Hill.

Slides:



Advertisements
Similar presentations
Claude Beigel, PhD. Exposure Assessment Senior Scientist Research Triangle Park, USA Practical session metabolites Part II: goodness of fit and decision.
Advertisements

Introduction to Science as an Inquiry-Based Process The development and implementation of the research project leading to the creation of the science fair.
October 1999 Statistical Methods for Computer Science Marie desJardins CMSC 601 April 9, 2012 Material adapted.
Rosetta Energy Function Glenn Butterfoss. Rosetta Energy Function Major Classes: 1. Low resolution: Reduced atom representation Simple energy function.
Hydrogen bonds in Rosetta: a phenomonological study Jack Snoeyink Dept. of Computer Science UNC Chapel Hill.
Computational Experiments with a Lone-Pair Based Hydrogen-Bonding Energy Function in Mini-Rosetta YOUR NAME HERE (Arial 28 pt italic)YOUR PROJECT URL HERE.
Computer Engineering 203 R Smith Project Tracking 12/ Project Tracking Why do we want to track a project? What is the projects MOV? – Why is tracking.
CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU Homology Modeling Anne Mølgaard, CBS, BioCentrum, DTU.
Time Series Analyst An Internet Based Application for Viewing and Analyzing Environmental Time Series Jeffery S. Horsburgh Utah State University David.
BA 555 Practical Business Analysis
Chapter 6: Database Evolution Title: AutoAdmin “What-if” Index Analysis Utility Authors: Surajit Chaudhuri, Vivek Narasayya ACM SIGMOD 1998.
Multivariate Data Analysis Chapter 4 – Multiple Regression.
Selecting Distinctive 3D Shape Descriptors for Similarity Retrieval Philip Shilane and Thomas Funkhouser.
Quantitative Business Analysis for Decision Making Simple Linear Regression.
8 Systems Analysis and Design in a Changing World, Fifth Edition.
Sequence comparison: Significance of similarity scores Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas.
Overview of Search Engines
Hands-On Microsoft Windows Server 2008 Chapter 11 Server and Network Monitoring.
CH 13 Server and Network Monitoring. Hands-On Microsoft Windows Server Objectives Understand the importance of server monitoring Monitor server.
Machine Learning Usman Roshan Dept. of Computer Science NJIT.
Midterm Review. 1-Intro Data Mining vs. Statistics –Predictive v. experimental; hypotheses vs data-driven Different types of data Data Mining pitfalls.
Chapter 1 Database Systems. Good decisions require good information derived from raw facts Data is managed most efficiently when stored in a database.
eHiTS Score Darryl Reid, Zsolt Zsoldos, Bashir S. Sadjad, Aniko Simon, The next stage in scoring function evolution: a new statistically.
Chapter 8 Introduction to Hypothesis Testing
Ana Damjanovic (JHU, NIH) JHU: Petar Maksimovic Bertrand Garcia-Moreno NIH: Tim Miller Bernard Brooks OSG: Torre Wenaus and team.
The Tutorial of Principal Component Analysis, Hierarchical Clustering, and Multidimensional Scaling Wenshan Wang.
1 Validation & Verification Chapter VALIDATION & VERIFICATION Very Difficult Very Important Conceptually distinct, but performed simultaneously.
Computational Chemistry, WebMO, and Energy Calculations
Marcin Pacholczyk, Silesian University of Technology.
GTL Facilities Computing Infrastructure for 21 st Century Systems Biology Ed Uberbacher ORNL & Mike Colvin LLNL.
Evaluation of software engineering. Software engineering research : Research in SE aims to achieve two main goals: 1) To increase the knowledge about.
THEME 1: Improving the Experimentation and Discovery Process Unprecedented complexity of scientific enterprise Is science stymied by the human bottleneck?
EE325 Introductory Econometrics1 Welcome to EE325 Introductory Econometrics Introduction Why study Econometrics? What is Econometrics? Methodology of Econometrics.
1 PyMOL Evolutionary Trace Viewer 1.1 Lichtarge Lab Sept. 13, 2010.
Update on Selective Editing and Implications for Staff Skills International Trade Conference September 2008 Ken Smart.
7.4 – Sampling Distribution Statistic: a numerical descriptive measure of a sample Parameter: a numerical descriptive measure of a population.
Conceptual Modelling and Hypothesis Formation Research Methods CPE 401 / 6002 / 6003 Professor Will Zimmerman.
Combining Theory and Systems Building Experiences and Challenges Sotirios Terzis University of Strathclyde.
Regression Chapter 16. Regression >Builds on Correlation >The difference is a question of prediction versus relation Regression predicts, correlation.
The Functional Genomics Experiment Object Model (FuGE) Andrew Jones, School of Computer Science, University of Manchester MGED Society.
FuGE: A framework for developing standards for functional genomics Andrew Jones School of Computer Science, University of Manchester Metabomeeting 2.0.
Chapter 6 CASE Tools Software Engineering Chapter 6-- CASE TOOLS
Detectability Lab. Outline I.Brief Discussion of Modeling, Sampling, and Inference II.Review and Discussion of Detection Probability and Point Count Methods.
Protein Structure Prediction: Homology Modeling & Threading/Fold Recognition D. Mohanty NII, New Delhi.
Project Database Handler The Project Database Handler is a brokering application that mediates interactions between the project database and the external.
EBI is an Outstation of the European Molecular Biology Laboratory. Protein Database in Europe Deposition, Validation, Search and Analysis Services.
DEVELOPING AND USING MODELS IN SCIENCE
FlexWeb Nassim Sohaee. FlexWeb 2 Proteins The ability of proteins to change their conformation is important to their function as biological machines.
Forward and inverse kinematics in RNA backbone conformations By Xueyi Wang and Jack Snoeyink Department of Computer Science UNC-Chapel Hill.
Principal Component Analysis
UCI Large-Scale Collection of Application Usage Data to Inform Software Development David M. Hilbert David F. Redmiles Information and Computer Science.
1.3 Scientific Thinking and Processes KEY CONCEPT Science is a way of thinking, questioning, and gathering evidence.
High throughput biology data management and data intensive computing drivers George Michaels.
General Education Assessment Report Assessment Cycle.
Statistics.  Get into groups of three or four people.  The project will consist of a survey and then statistical analysis of the data followed by comprehension.
Molecular Modeling and Visualization Tools in Science Education ChemSenseChemViz Molecular Workbench Virtual Molecular Dynamics Lab NARST Convention Philadelphia,
Project Management Software development models & methodologies
1 C.A.L. Bailer-Jones. Machine Learning. Data exploration and dimensionality reduction Machine learning, pattern recognition and statistical data modelling.
Energy Consumption Forecast Using JMP® Pro 11 Time Series Analysis
Self Healing and Dynamic Construction Framework:
Hands-On Microsoft Windows Server 2008
OPERATIONS RESEARCH.
Chapter 1 Database Systems
Databases, Web Pages and Archives
Teaching Process of Science: Projects
JMP 11 added new features and improvements to CCB and MSA.
Atomic-Level Protein Structure Refinement Using Fragment-Guided Molecular Dynamics Conformation Sampling  Jian Zhang, Yu Liang, Yang Zhang  Structure 
1.1 Chemistry Organic-study of all chemicals containing carbon
The role of metadata in census data dissemination
Relating Models to Data
Presentation transcript:

Jack Snoeyink & Matt O’Meara Dept. Computer Science UNC Chapel Hill

Collaborators  Brian Kuhlman, UNC Biochem  Many other members of the RosettaCommons  Richardson lab, Duke Biochem Funding  NIH  NSF

 Scientific Models, esp. for Structural Molecular Biology  Models are the lens through which we view data  Models are predominantly geometric  Computational models are complex  Models evolve, so testing becomes crucial  Focus on statistical/computational models with  a sample source, observable local features, chosen functional form, fit parameters, & visualization/testing methods  Capture assumptions and date used to build models to:  Visualize for making design decisions while building  Fit parameters to ensure best performance  Record as scientific benchmarks Case Study: Rosetta protein structure prediction software [B]

 Scientific Models, esp. for Struct. Molec. Bio.  Why: Model as Lens  Who: Scientists as model builders  What: Geometric models  Which: Conceptual vs. Computational  Where: Evaluative vs. Predictive  When: Model evolution  How: Roles of models

 Physical and Conceptual models  Kept simple to aid understanding  Statistical and Computational models  Evolve by combining simple models  Even when complex can still be effective at Validation (Molprobity) or Prediction (Rosetta)

Spiral development, much like software  Discover problematic features in some data  Create an energy function to adjust them  Fit parameters to improve results  Check into the software as a new option  Make default option if everyone likes it  Occasionally refactor and rewrite, removing outdated or unused models But less support for testing…

Our goal: Capture data and assumptions from model building for use in model visualization and testing.

Abstraction: A simple component of a complex computational model consists of:  One or more sample sources giving  Pdb files from native or decoys  Observable local features having a  Hydrogen bond distances and angles  Chosen functional form that  Energy from distances and angles  Depends on fitting parameters  Weights for combining terms KMB’03

data set A data set B data set Z... SQL query ggplot2 spec plots statistics gather features filtertransform Tool schematic

Implemented tools  Compare distributions from sample sources  Tufte’s small multiples via ggplot  Kernel density estimation  Normalization Opportunities for  Statistical analysis  Dimension reduction …

[KMB’03] Histogram of Hbond A-H distances in natives

Scientific unit tests native, HEAD, ^HEAD run on continuously testing server Knowledge-base score term creation native, release, experimental turn exploration into living benchmarks Test design hypotheses native, protocol, designs how strange is the this geometry?

Rotamer recovery

 Scientific Models, esp. for Structural Molecular Biology  Models are the lens through which we view data  Models are predominantly geometric  Computational models are complex  Models evolve, so testing becomes crucial  Focus on statistical/computational models with  a sample source, observable local features, chosen functional form, fit parameters, & visualization/testing methods  Capture assumptions and date used to build models to:  Visualize for making design decisions while building  Fit parameters to ensure best performance  Record as scientific benchmarks Case Study: Rosetta protein structure prediction software [B]

Visualization plays important roles in all stages of the life cycle of computational models