CTCAAGGGGTNAGNNNTNTNAAAGNTGCCNTTCCAAAGNTNNGNNNANNACNNTTGGCCGAGAACTTNGNNG GGGNTNANTNNNATATTCCNATTTTGCCTAATACNANGCTTGATANTTTCCGTTTNNTCNCACCTGGGNNCNNNT AATCGGATGNNGGACANANCAANGCGGGCCTTCACCCCATCNTGGNGGNCCNTNNGNCCNTTTNGCCANTCNC NTNCGCCCNNGGGGTNNTNCNTNGCAGGGGGNNTANCGGTTCCNGGGGGNCAAANNTNCCNCAATGGNTTTN GGANNGTGNCCCCCNCCNTGAGNANTTNAAACANTNNNCNANTNNCATCNTNTTNGNANAACNNGGGGGGGA ATTTTTTNNCAAGGNGGNNCCAANGCGNNATTATCCANCNCNNCCNAGTTGTNNAAANNAGTNTNCCNCGAGG NTAAAAAAACTTTTNTCCGGCGGNGGCAGNTNGNGNAAATAACNGTTTCCCCNCCNTTGTGTTNGGGGGCNCC CCCCCCCCCCCTNCAAANAANANAAANGNNNGNCGGNNATTTTNACCGTCGCGGGGGGCCCNCACCNCACCC GNAGNAAATCNACCANATCAAGNGAGGANGGNGGGNGAGGCCTTTTTTTTTTTNNAAAATCCCANAAAAACTT NNNNCCGNNGGGGGGGCTAAAAAAAAAACCCCCCCCNCCCACCCNNCCNGGGGGGGNGNAGGTTTNTTGTTT TTTTTTCCANAANACTNGGTTNGNGGGAAGAGATNAANNAACACACCCCCCCNCNTGNGGTCCTTNTTTCCCC NAANGGGTGNGGGNGGNNNATTCCTCCTNCNTNCCACNAANAAAGGGGGNNTTATTAAAAACTTNNCCTCAGG TNCNCTNGNGGGGGGGGGGGGGGGGNGGNCCANAANTNTTNCNCCCGGGNCGGGGNNAATTNCCCNGGGTNA GGNATCCTTCNAANAGAGGTTTTTAAAANACCTTNNCNCCCGGGGGGAAATNCCTGNTCCCCCCTCTCNNNAA GANGAAAAATAAAACTCAAGGGGTNAGNNNTNTNAAAGNTGCCNTTCCAAAGNTNNGNNNANNACNNTTGGC CGAGAACTTNGNNGGGGNTNANTNNNATATTCCNATTTTGCCTAATACNANGCTTGATANTTTCCGTTTNNTCNC ACCTGGGNNCNNNTAATCGGATGNNGGACANANCAANGCGGGCCTTCACCCCATCNTGGNGGNCCNTNNGNCC NTTTNGCCANTCNCNTNCGCCCNNGGGGTNNTNCNTNGCAGGGGGNNTANCGGTTCCNGGGGGNCAAANNTN CCNCAATGGNTTTNGGANNGTGNCCCCCNCCNTGAGNANTTNAAACANTNNNCNANTNNCATCNTNTTNGNAN AACNNGGGGGGGAATTTTTTNNCAAGGNGGNNCCAANGCGNNATTATCCANCNCNNCCNAGTTGTNNAAANN AGTNTNCCNCGAGGNTAAAAAAACTTTTNTCCGGCGGNGGCAGNTNGNGNAAATAACNGTTTCCCCNCCNTTG TGTTNGGGGGCNCCCCCCCCCCCCCTNCAAANAANANAAANGNNNGNCGGNNATTTTNACCGTCGCGGGGGG CCCNCACCNCACCCGNAGNAAATCNACCANATCAAGNGAGGANGGNGGGNGAGGCCTTTTTTTTTTTNNAAAA TCCCANAAAAACTTNNNNCCGNNGGGGGGGCTAAAAAAAAAACCCCCCCCNCCCACCCNNCCNGGGGGGGNG NAGGTTTNTTGTTTTTTTTTCCANAANACTNGGTTNGNGGGAAGAGATNAANNAACACACCCCCCCNCNTGNG GTCCTTNTTTCCCCNAANGGGTGNGGGNGGNNNATTCCTCCTNCNTNCCACNAANAAAGGGGGNNTTATTAAA AACTTNNCCTCAGGTNCNCTNGNGGGGGGGGGGGGGGGGNGGNCCANAANTNTTNCNCCCGGGNCGGGGNN AATTNCCCNGGGTNAGGNATCCTTCNAANAGAGGTTTTTAAAANACCTTNNCNCCCGGGGGGAAATNCCTGNT CCCCCCTCTCNNNAAGANGAAAAATAAAACTCAAGGGGTNAGNNNTNTNAAAGNTGCCNTTCCNANGCTTGAT ANTTTCCGTTTNNTCNCACCTGGGNNCNNNTAATCGGATGNNGGACANANCAANGCGGGCCTTCACCCCATCNT GGNGGNCCNTNNGNCCNTTTNGCCANTCNCNTNCGCCCNNGGGGTNNTNCNTNGCAGGGGGNNTANCGGTTC CNGGGGGNCAAANNTNCCNCAATGGNTTTNGGANNGTGNCCCCCNCCNTGAGNANTTNAA Jeff LeJeune Bioinformatics and Data Management
Lecture Goals Define Bioinformatics Explore NCBI’s website Introduce some useful tools in available Microsoft Excel Review other computer software used in Molecular Epidemiology
Bioinformatics Defined Systematic approach to store and classify and analyze datadata and information and metadatainformation metadata that allows for the acquisition of knowledge and enhancement of the understanding of biological systems.
Data Numbers derived from observations, experiments or calculations Examples: Positive/Negative A, T, G, C ill/healthy
Information Data in context. Data with associated explanations and interpretations. Examples: Publications, available sequence data, that which is freely available.
Metadata Data about data Context in which information is used. One application's metadata is another application's data Examples:Descriptive studies, reference lists, sources, Genbank accession numbers.
Knowledge and Understanding This is what we’re all doing here! Advancement of Science Solving of problems
Bioinformatics Defined Data Information Metadata Knowledge & Understanding
Data Analysis Tools NCBI –Bibliographic information –Sequence analysis (nucleotide, protein) –Other Data scanning using Microsoft Excel Other tools –Gel comparisons –Spatial data-GIS –Temporal data –Statistical considerations
On-line Tools Available at NCBI
Blast Searches
Microsoft Excel Easily available Filters Pivot Table Reports and Charts
Other Software ClustalW multiple sequence alignments – BioNumerics
Fingerprint types Character types Sequence types 2-D gel types Matrix types
Geographic Information System (GIS) software
Cardinal Rules of Data Management I.Save your data II.Back-up your data III.Record databases, and program versions used IV.Write down sequence numbers V.Record program parameters VI.Use E-values VII.Double check your results visually VIII.In spreadsheets, one entry per column