Presentation on theme: "VI EBRCN GM, Paris, 10-11/12/20031 WP4 Current status Paolo Romano & WP4 group Questa presentazione può essere utilizzata come traccia per una discussione."— Presentation transcript:
VI EBRCN GM, Paris, 10-11/12/20031 WP4 Current status Paolo Romano & WP4 group Questa presentazione può essere utilizzata come traccia per una discussione con gli spettatori, durante la quale potranno essere assegnate delle attività. Per memorizzare le attività durante la presentazione: In visualizzazione Presentazione diapositive fare clic con il pulsante destro del mouse Scegliere Appunti presentazione Scegliere la scheda Attività Immettere le attività a mano a mano che vengono assegnate Per chiudere la finestra, scegliere OK Questa procedura consente di inserire automaticamente le attività assegnate in una diapositiva che verrà visualizzata alla fine della presentazione.
VI EBRCN GM, Paris, 10-11/12/2003 2 WP4: Linking to Medline (i) Retrieval of PUBMED IDs still ongoing MK amended DSMZ literature db FG supporting BCCM, manual insertion after restructuring of dbs GS retrieving PMIDs for CBS catalogues (not only Medline) PR retrieving PMIDs for NCCB and CIP catalogues Undefined CABI, NCIMB (updates?) ECACC?
VI EBRCN GM, Paris, 10-11/12/2003 3 WP4: Linking to Medline (ii) Catalogue structures ongoing (collections tasks) done for ICLC, BCCM/LMBP, DSMZ (maybe other as well) CABRI guidelines and SRS configuration files Catalogue Production Guidelines revision ongoing done for cell lines, plasmids, DSMZ literature SRS structure and syntax files are being updated as catalogues are submitted
VI EBRCN GM, Paris, 10-11/12/2003 4 WP4: Linking to Medline (iii) Catalogues (with link to Medline) updated ICLC: September 2003 (revision) BCCM/LMBP: December 2003 (revision) DSMZ: November (June) 2003 Other BCCM, NCCB plasmids: at next catalogue update All catalogues: according to collections plans
VI EBRCN GM, Paris, 10-11/12/2003 5 Linking to EMBL (i) Linking on-the-fly to EMBL Data Library through SRS, without IDs, gave negative results: Links are different for different materials and can use various EMBL fields: Organism (micro-organisms), Division (viruses and plasmids), Feature Table (definition of the source through Key, Qualifier, Description) Annotation problems (e.g., missing spaces) Indexing problems (e.g., use of dots)
VI EBRCN GM, Paris, 10-11/12/2003 6 Linking to EMBL (ii) (well known) Example of search on-the-fly: Searching for fil. fungi strain CBS 100.20 Involves: fungi & source & cbs 100.20 ( ( ([emblrelease-FtKey:source] & [emblrelease-FtQualifier:strain] & ( ( [emblrelease-FtDescription:cbs] & [emblrelease-FtDescription:100] ) | [emblrelease-FtDescription:cbs100] ) & [emblrelease-FtDescription:20]) ) < [emblrelease-Organism:fungi*] )
VI EBRCN GM, Paris, 10-11/12/2003 7 Linking to EMBL (iii) Identify crossreferences for linking from CABRI catalogues to EMBL (and viceversa) by unique IDs (single run for all CABRI records) Many EMBL records can be linked to a single CABRI item Add links in EMBL and use these links when linking from CABRI (fast and effective search by SRS) ID based links to CABRI included in EMBL data library and distributed with it
VI EBRCN GM, Paris, 10-11/12/2003 8 Linking to EMBL (iv) Agreement with EBI (list of crosserefs) Work do be done (vs EMBL 77) after uploading of CABRI extracted catalogues to EBI: early 2004 Crossreferences returned to collections (no obligations to add links to catalogues) Possible well known wrong EMBL sequence removed from table Links from plasmids catalogues to EMBL managed differently (using current remarks)
VI EBRCN GM, Paris, 10-11/12/2003 9 Linking to other sources Links to plasmids maps (BCCM/LMBP) by a purpose field: December 2003 Images of micro-organisms (CBS & BCCM) linked from a new field: starting from next updates Enzyme and biochemical pathways (cell lines, microorganisms): under development Further links (nomenclature, acronyms, genes) still under analysis Interconnected Biological Resource Database
VI EBRCN GM, Paris, 10-11/12/2003 10 Extracted databases Possible since availability of the new site (SRS 7) Selected meaningful subset of information: CABRI MDS + link to CABRI site (new field Full_details) Established agreement with EBI Preparation of extracted databases: Focus on bacteria, fungi & yeasts, human and animal cell lines Setting up of a purpose Web site: http://export.cabri.org/ Setting up of an FTP site for distributing data and SRS configuration files: ftp.cabri.org (not anonymous) Upload of catalogues to EBI: early 2004 Automatic updating by EBI by FTP through SRS Prisma
VI EBRCN GM, Paris, 10-11/12/2003 11 Inventory of data usage and sets GlobalSearch on partners sites ht://Dig can be used to index all partners site and search their contents in a unique step (only static files, not searchable archives/databases) http://srs711.cabri.org/htdig/index-ebrcn.html Virtual BRCs Library (W3C Virtual Library) List of data sets, by category, with links to information sources (Map of sites maps) Includes links to archives/databases
VI EBRCN GM, Paris, 10-11/12/2003 12 Summary MEDLINE Links to Medline already in place for many catalogues New links added with periodical updates New records include PUBMED ID EMBL No news, waiting for extracted catalogues available at EBI Other external links Plasmids maps and micro-organisms images ongoing Other under study Extracted databases Available purpose web and ftp sites Focus on bacteria, fungi & yeasts, human and animal cell lines Uploading to EBI planned early 2004 Inventory of data usage and data sets Search on partners site contents (ht://dig) List of partners site contents (sort of Map of sites maps, including dbs)
VI EBRCN GM, Paris, 10-11/12/2003 13 Thoughts about the future (i) CABRI as it is Many links to external databases are being set up and are already in place for some of the catalogues Extracted databases will soon be uploaded to EBI Integration made possible (mainly) because of the adoption of SRS CABRI sites are now well known, appreciated and use network services GBIF perspective GBIF has designed a nice and innovative architecure Distributed architecture can help management by avoiding conversions and updates It requires a sound expertise and good computer skills, not always available at collections/BRCs The ABCD Schema is not adequate for catalogues contents
VI EBRCN GM, Paris, 10-11/12/2003 14 Thoughts about the future (ii) We need to keep current and set up new links Current links with the molecular biology world should be kept SRS is an essential key for this connection Web Services based GBIF architecture must be taken into account for the future links with the (quickly) evolving biodiversity information environment SRS is evolving Since SRS 6, XML has been incorporated With SRS 7, XML is essential (alternative to flat files) With SRS 8, Web Services will be added and SRS itself will be able to provide Web Services and access them remotely
VI EBRCN GM, Paris, 10-11/12/2003 15 Thoughts about the future (iii) Proposal Start by extending the ABCD Schema to reach our needs Continue with SRS and follow its evolution Adopt as early as possible the new SRS Web Services features and start offering information to GBIF Individual collections/BRCs willing to go autonomous can stop submission of data, provided they offer an agreed interface for remote access by the central SRS based system Finally, reach a mix distributed/centralized architecture, based on SRS and offering both standard SRS services and Web Services