Presentation on theme: "Meta data and bioinformatics Bioinformatics is EBI-centred, loosely organised Bioinformatics was coined by Pauline Hogekamp ~1979 European bioinformatics."— Presentation transcript:
Meta data and bioinformatics Bioinformatics is EBI-centred, loosely organised Bioinformatics was coined by Pauline Hogekamp ~1979 European bioinformatics started, more or less, under Chris Sander at the EMBL ~1988 Bioinformatics is organised chaos with too many α-males The EBI is the boss, but they are powerless; this schizophrenic situation will not change with its Elixir ESFRI project EBI’s idea for a wheel with spokes ESFRI project is wrong * Bioinformatics cannot function without the EBI even if the bioinformaticians wanted it Having a boss makes it easy to make decisions about ontologies, meta data, etc
Meta data and bioinformatics Bioinformatics has been dealing with data from day-1 on, and bioinformaticians dealt with interoperability even before could be send from the Netherlands to Germany. Even the databases from the late 70’s had accession codes. We are all providers. Our users sit in life science labs. We are about to start second generation bioinformatics. Data deposition is obligatory if you want to publish. Data deposition generates citations (that generate money).
3 The ‘start’ of centralised databases
5 Submission – Chambon 1987 An early submission
6 Scale Stuff Years The business model
7 The boss…….
8 EMBRACE is the work of a cohesive community of information engineers. I have to thank you for that, and thank the Commission for their support. I hope the community persists beyond EMBRACE Embrace has been the FP6 NoE for database and tool interoperability. Outcome: SOAP, the EMBRACE Web service registry, an ontology for bioinformatics, Grid facilities, and a human network. Citation
Ontology (headed by previous EBI boss…)
Problems Ontologies aren’t ontologies. Metadata aren’t metadata. Too many α-males. We did the metadata EMBRACE project as a prelude to semantic applications, without knowing what is meant really with ‘semantic’. Data must be stored. The easiest way of doing this is using big monolithic databases. But with human genome data that simply isn’t possible. Remote data will have to do. But how do we get access? Do we trust each other internationally with the data (i.e. One USA database removed antrax related data). What happens if the EBI has to stop because of a lack of money?
Solutions? Better search methods: MRS searches in/with compressed data. Triplet comparison. Text analysis technologies based on word vectors. Search on distributed data. Searchability (of the many small databases): Continuous communication (Elixir) with niche-databasers. Follow the EBI (if they stay follow-able). Cross platform search. Better science to know what queries are needed (SB): Deeper hyperlinks. Wider hyperlinks. Inference engines (those reaeaeaeaeallllly need metadata)
Summary Bioinformatics is EBI-centred, loosely organised The EBI is the boss, but powerless Bioinformatics cannot function without the EBI Having a boss makes it easy to make decisions EBI decides, in consultation, on ontologies, meta data, etc.