Presentation is loading. Please wait.

Presentation is loading. Please wait.

3. Chemical Data and Data Bases. 2 Datasets and Databases Many small datasets are available Several commercial databases of compounds and reactions (e.g.

Similar presentations


Presentation on theme: "3. Chemical Data and Data Bases. 2 Datasets and Databases Many small datasets are available Several commercial databases of compounds and reactions (e.g."— Presentation transcript:

1 3. Chemical Data and Data Bases

2 2 Datasets and Databases Many small datasets are available Several commercial databases of compounds and reactions (e.g. CAS) Large but not comprehensive public databases of compounds are just starting to become available As of today, there is no large public database of reactions

3 3 Data: Small Datasets (examples) Mutag (Mutagenicity) –200 compounds (125/63), mutagenicity in Salmonella PTC (Predictive Toxicity Challenge) –A few hundred compounds, carcinogenicity (FM,MM,FR,MR) NCI (Anti-cancer activity) –70,000 compounds screened for ability to inhibit growth in 60 human tumor cell lines Alkanes (Boiling points) –All 150 non-cyclic alkanes (C n H 2n+2 ) with n<11 and their boiling points ([-164,174]) Benzodiazepines (QSAR) –79 1,4-benzodiazepines-2-one, affinity towards GABA A Solubility (Delaney and XLogP) –1440 compounds (Delaney); 1991 compounds (XLogP)

4 4 Large Databases Private/ Commercial Example: ACS Chemical Registry (CAS) [~10sM] Expensive and cannot be “mined” Cambridge Structural DB (CSD) [crytallographic structures, ~350K] More recent trends Example: eMolecules (formerly Chmoogle) Free search engine but cannot be “mined”

5 5 CAS CHEMICAL REGISTRY

6 6 GROWTH of CAS CHEMICAL REGISTRY SYSTEM

7 7 Large “Public” Databases Zinc (UCSF) ChemBank (Harvard) PubChem (NIH) ChemDB (UCI) http://cdb.ics.uci.edu J. Chen, S. J. Swamidass, Y. Dou, J. Bruand, and P. Baldi ChemDB: A Public Database of Small Molecules and Related Chemoinformatics Resources. Bioinformatics, 21, 4133-4139, (2005)

8 8 Example of Large Public DB: ChemDB ~5M unique compounds Commercially available compounds PostgreSQL/Oracle Annotation (Experimental, Computational) Searchable Web interface Similarity, in silico reactions,…

9 9 Example of Statistics

10 10 Molecular Weight/Solubility

11 11

12 12

13 13

14 14

15 15

16 16 ChemDB RChemDB NM Experiments Filters RM

17 17 Chemo/Bio Informatics Two Key Ingredients 1. Data 2. Similarity Measures Bioinformatics analogy and differences: –Data (GenBank, Swissprot, PDB) –Similarity (BLAST)


Download ppt "3. Chemical Data and Data Bases. 2 Datasets and Databases Many small datasets are available Several commercial databases of compounds and reactions (e.g."

Similar presentations


Ads by Google