Presentation is loading. Please wait.

Presentation is loading. Please wait.

B ig Data at B ITEM Research Group (Text|Web) Mining Research Group – Research projects:

Similar presentations


Presentation on theme: "B ig Data at B ITEM Research Group (Text|Web) Mining Research Group – Research projects:"— Presentation transcript:

1 B ig Data at B ITEM Research Group (Text|Web) Mining Research Group – patrick.ruch@hesge.ch, http://bitem.hesge.chhttp://bitem.hesge.ch Research projects: Digital Libraries, Web, Personalized medicine, Patent analytics, Consumer Analytics, Pharmacovigilance, Clinical trials… Specialised in (semi|un)structured data – We like text, text and more text – Especially on the noisy/dirty Web Technological expertise: CouchDB replication, SolrCloud (distributed indexing and search), indexing/searching in SSD/HDFS/Hadoop, SPARQL endpoints…

2 Drugbank Twitter API Couch DB Cleaning Normalisation Cleaning Normalisation RSS Forum Trends Analysis Trends Analysis Correlation Analysis Correlation Analysis Novelty Detection Pharmacovigilance on Big Social Media Data Dynamic and Real Time Data Analysis 26’000 per day 19’000 drug names checked each 10 mn 7 M of docs in 9 months

3 Managing the data deluge for proteins annotation 40’000 concepts [Big-scale Multiclass Multilabel Classifier]  Lazy learning ! 23 000 000 articles Proteins annotation based on litterature by curators annotated articles GOA Manual annotation planned for 2045 ! (Baumgartner et al) Machine Learning based on Information Retrieval methods Assisting curators Assisting curators Macro reading of litterature Profiling any textual content

4 Patent retrieval 4 The real situation (0.5-1 TB) Experiments Database 13 millions of patents Extraction 33 days XML patents 0.221 Tb Normalization 33 days XML patents + metadata 0.234 Tb Indexing 5 days Index 0.1 Tb Database A sample of 1 million of patents Extraction 2.5 days XML patents 17 Gb Normalization 2.5 days XML patents + metadata 18 Gb Indexing 10 hours Index 3 Gb


Download ppt "B ig Data at B ITEM Research Group (Text|Web) Mining Research Group – Research projects:"

Similar presentations


Ads by Google