Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Szilárd Dóránt May, 2005 JChem Base chemical database.

Similar presentations


Presentation on theme: "1 Szilárd Dóránt May, 2005 JChem Base chemical database."— Presentation transcript:

1 1 Szilárd Dóránt May, 2005 JChem Base chemical database

2 2 Slide 2 Jchem Base chemical database May 2005 Contents Introduction Structural overview Compatibility Administration JChem tables Fingerprints Structural search Structure cache Standardization Search options JSP example API examples Performance Future plans

3 3 Slide 3 Jchem Base chemical database May 2005 Introduction JChem Base provides high performance Java based tools for the storage, search and retrieval of chemical structures and associated data. These components can be integrated into web- based or standalone applications in association with other ChemAxon tools.

4 4 Slide 4 Jchem Base chemical database May 2005 Structural overview RDBMS (e.g. Oracle, MySQL, etc.) : Storage and security JDBC driver: Standard interface to the RDBMS JChem Base API: Chemical logic Structure cache ApplicationWeb application (JSP) Web browser

5 5 Slide 5 Jchem Base chemical database May 2005 Compatibility and integration File formats: SMILES MDL molfile (v2000 and v3000) MDL SDF RXN RDF MRV Integration: 100% Java extensive API JChem Cartridge for Oracle Database engines: Oracle MySQL MS SQL Server PostgreSQL MS Access DB2 etc. Operating systems: Windows Linux Mac OS X Solaris etc.

6 6 Slide 6 Jchem Base chemical database May 2005 Administration with JChemManager User interface for creating tables import export deleting rows dropping tables Most functions are also available from command-line.

7 7 Slide 7 Jchem Base chemical database May 2005 The property table The property table stores information about JChem structure tables, including: Fingerprint parameters Custom standardization rules Recent changes (to optimize cache updates) Other table options and information Database-related licence keys More than one property table can be used, each property table represents a particular JChem environment.

8 8 Slide 8 Jchem Base chemical database May 2005 The structure of JChem tables Column nameExplanation cd_id unique numeric identifier in the table cd_structure the imported structure in the original format, without modifications (except for the removal of data fields) cd_smiles the standardized structure in ChemAxon Extended Smiles (cxsmiles) format, used by the search process cd_formula the formula of the standardized structure cd_molweight the molecular weight of the standardized structure cd_hash hash code used for duplicate filtering (PERFECT search) cd_flags can store row specific option, e.g. overriding the chiral flag cd_timestamp the date and time of the insertion of the row cd_fp… fingerprint columns [user fields] custom data fields can be added by the user

9 9 Slide 9 Jchem Base chemical database May 2005 Chemical Hashed Fingerprints encode structural patterns in bit strings If structure A is a substructure of structure B, every bit in Bs fingerprint will be set that is set in structure As fingerprint: Tanimoto similarity of hashed fingerprints can be used for diversity analysis and similarity search : Chemical Hashed Fingerprints

10 10 Slide 10 Jchem Base chemical database May 2005 Structural search in database Two stage method provides optimal performance: 1. Rapid pre-screening reduces the number of possible hit candidates -Chemical Hashed Fingerprints are used for substructure and superstructure searches -Hash code is used for duplicate filtering (usually during compound registration) 2. Graph search algorithm is used to determine the final hit list

11 11 Slide 11 Jchem Base chemical database May 2005 Structure Cache Contains Fingerprints for screening and ChemAxon Extended SMILES for ABAS Instant access to the structures for the search process Reduced load on the database server Incremental update ensures minimum overhead after changes in the table Small memory footprint due to –SMILES compression –Optimized storage technique Approximately 100MB memory needed for 1 million typical drug-like structures (using 512 bit long fingerprints)

12 12 Slide 12 Jchem Base chemical database May 2005 Standardization Default standardization includes: – Hydrogen removal – Aromatization Custom standardization can be specified for each table by specifying an XML configuration file at table creation or in the Regenerate dialog of JChem Manager (jcman)

13 13 Slide 13 Jchem Base chemical database May 2005 Custom Standardization Example afterbefore

14 14 Slide 14 Jchem Base chemical database May 2005 Database search options Maximum search time / number of hits SQL SELECT statement for pre-filtering Ordering of results Result table Inverse hit list Chemical Terms filter constraint

15 15 Slide 15 Jchem Base chemical database May 2005 JSP example application Open source, customizable Features: –Substructure, Superstructure, Exact and Similarity search –Molecular Descriptor similarity search with descriptor coloring –Substructure hit alignment and coloring, inverse hit list –Chemical Terms filter –Import / Export –Export of hits –Insert / Modify / Delete structures

16 16 Slide 16 Jchem Base chemical database May 2005 API example : connecting to a database ConnectionHandler ch = new chemaxon.jchem.db.ConnectionHandler(); ch.setDriver(oracle.jdbc.driver.OracleDriver); ch.setPropertyTable(JChemProperties); ch.setLoginName(scott); ch.setPassword("tiger"); ch.connect(); // the java.sql.Connection object is available if needed: Connection con=ch.getConnection(); … // closing the connection: ch.close();

17 17 Slide 17 Jchem Base chemical database May 2005 API example : database import Importer importer = new chemaxon.jchem.db.Importer(); importer.setConnectionHandler(conh); importer.setInput(sample.sdf); // importer.setInput(is);// alternatively a stream can also be specified importer.setTableName(SCOTT.STRUCTURES); importer.setHaltOnError(false); importer.setDuplicateImportAllowed(false); //can filter duplicates // specifying SDFile field - table field pairs: String fieldPairs = DB_Field1=SDF_Field1; DB_Field2=SDF_Field2; importer.setFieldConnections(fieldPairs); int importedCount = importer.importMols(); System.out.println( Imported + importedCount + structures );

18 18 Slide 18 Jchem Base chemical database May 2005 API example : database export Exporter exporter = new chemaxon.jchem.db.Exporter(); exporter.setConnectionHandler(conh); exporter.setTableName(structures); //data fields to be exported with the structure: exporter.setFieldList(cd_id cd_formula name comments); String fileName=output.sdf; OutputStream os=new FileOutputStream(fileName); exporter.setOutputStream(os); exporter.setFormat(sdf); int exportedCount = exporter.writeAll(); System.out.println(Exported + exportedCount + structures);

19 19 Slide 19 Jchem Base chemical database May 2005 API example : database search JChemSearch searcher = new chemaxon.jchem.db.JChemSearch(); searcher.setConnectionHandler(ch); searcher.setSearchType(JChemSearch.SUBSTRUCTURE) searcher.setQueryStructure(c1ccccc1); searcher.setStructureTable(SCOTT.STRUCTURES); // a query that returns cd_id values can be used for prefiltering: Searcher.setFilterQuery( SELECT cd_id FROM structures, biodata WHERE + structures.cd_id = biodata.cd_id AND biodata.toxicity < 0.3 ); searcher.setWaitingForResult(true); // otherwise runs in a separate thread searcher.setStructureCaching(true); // caching speeds up the search searcher.run(); // getting the results as cd_id values: int[] results=searcher.getResults();

20 20 Slide 20 Jchem Base chemical database May 2005 API example : inserting a structure // ConnectionHandler, mode, table name and data field names: UpdateHandler uh = new chemaxon.jchem.db.UpdateHandler( ch, UpdateHandler.INSERT, structures, comment, stock); uh.setValueForFixColumns(c1ccccc1); // the structure // specifying data field values: uh.setStructureValueForAdditionalColumn(1, some text); uh.setStructureValueForAdditionalColumn(2, new Double(8.5)); uh.setDuplicateFiltering(true); // filtering duplicate structures int id=uh.execute(true); // getting back the cd_id of the inserted structure if ( id > 0 ) { System.out.println(Inserted, cd_id value : + id); } else { System.out.println(Already exists with cd_id value : + (-id)); } // storing update information, the database connection remains open : uh.close();

21 21 Slide 21 Jchem Base chemical database May 2005 Performance (1) Compound registration: Substructure search in a table of 3 million compounds: Server parameters: Windows XP; 1 CPU: Intel P4 3.0GHz; 2GB RAM; Oracle 9i 12min 26s8min 17s200,000 6min 20s4min 11s100,000 45s32s10,000 Duplicates checkedDuplicates not checked Elapsed timeNumber of compounds Search time (s)Number of hitsQuery

22 22 Slide 22 Jchem Base chemical database May 2005 Performance (2) Similarity search: Tanimoto >0.8 Server parameters: Windows XP; 1 CPU: Intel P4 3.0GHz; 2GB RAM; Oracle 9i Search time (s)Number of hitsQuery

23 23 Slide 23 Jchem Base chemical database May 2005 Future plans Additional layer: JChem Server (later also as grid) Structural keys as optional extension to current fingerprints Tables for storing query structures Tables for storing general (Markush) structures Partial clean option for hit alignment Installer etc.

24 24 Slide 24 Jchem Base chemical database May 2005 Summary ChemAxons JChem Base toolkit provides sophisticated methods to deal with chemical structures and associated data. The usage of fingerprints and structure cache provide high search performance.

25 25 Slide 25 Jchem Base chemical database May 2005 Links JChem home page: –www.jchem.comwww.jchem.com Live demos: –www.jchem.com/exampleswww.jchem.com/examples API documentation: –www.jchem.com/doc/apiwww.jchem.com/doc/api Brochure: –www.chemaxon.com/brochures/JChemBase.pdfwww.chemaxon.com/brochures/JChemBase.pdf

26 26 Slide 26 Jchem Base chemical database May 2005 Máramaros köz 3/a Budapest, 1037 Hungary Thank you for your attention


Download ppt "1 Szilárd Dóránt May, 2005 JChem Base chemical database."

Similar presentations


Ads by Google