Introduction Thor database concepts. Data (Chemical Structure) hierarchy. Thor data model Daylight/Oracle cartridge data model. Other Considerations. What are the steps. Demo.
Thor Database Concepts Datatrees, Datatypes, Dataitems and Datafields. –These four concepts are closely related. Datatrees - Is the method for representing chemical information. Datatypes - A set of definition that indicate the meaning of a dataitems fields. Dataitems - Tag names a datatype, which in turn defines the meaning of each datafield. Datafields - A string of characters; the unit of data.
Data (Chemical Structure) hierarchy Data stored for each chemical entity are organized on a hierarchical basis with each entity expressed in terms of Parent, Version and Preparation(Oracle only). Parent is the basic structure which is free of salts, solvates and radio labels. Version is the translation of a Parent into an actual compound, i.e. a free base, salt, solvates, etc... Preparation being a discrete batch of that Version.
Data (Chemical Structure) hierarchy Parent Version 1Version 2Version n Prep Thor Database, with all molecule information
Daylight Thor data tree model SMILES CC(C)C(O)C1CCCCC1 FP Timestamp 199701291010.03 Graph CC(C)C(O)C1CCCCC1 DB_NO 873 PISM CC(C)[C@@H](O)C1CCCCC1.. DB_NO 97 VISM Cl.CC(C)[C@@H](O)C1CCCCC1 ATOM_STER 2,1,1,4,1,7,1,10,1 SALT 1,10
Thor data tree in lexical form $SMI $DB_NO PISM ATOM_STER $DB_NO PISM ATOM_STER BOND_STER ISO $DB_NO VISM SALT.
Daylight/Oracle Cartridge model Prep_1 DB_NO Data Version_DB_NO Prep_2 Prep_3 Prep_2 Prep_1 Prep_2 Parent DB_NO Data SMILES Version_1 DB_NO Data Parent_DB_NO SMILES Data Version_2 DB_NO Parent_DB_NO SMILES DB_NO Data Parent_DB_NO SMILES Version_n Data
Other Considerations 2D coordinates or connection table. –2D coordinates as in Thor. –Connection table (Molfile, RXNfile, TGFfile … etc) Data conversion for input/search. –Nitro groups (charge separated, double bond or dont care). –Parent and salt molecules. Indexes for normal column data. Indexes for chemical structure data –i.e. ddexact, ddgraph, ddblob … etc.
Other Considerations SMILES column size (what to use?). –700 bytes or less, Oracle will allow you to put unique indexes and blob-based index. This may or may-not be important to you. –greater than 700 bytes you can use blob-based index.
What are the steps Design the data model. –Consider input, access and modification of your data. –Consider how chemical data is going to be searched. Design the database schema. Export data from Thor (or use SD files). Create PL/SQL program to load the data. If the data is formatted you may be able to use sqlldr command to load the tables. For data from SD files use mol2smi procedure.