Presentation is loading. Please wait.

Presentation is loading. Please wait.

Tomáš Jurníček, Jakub Jůza, Lenka Kmeťová

Similar presentations


Presentation on theme: "Tomáš Jurníček, Jakub Jůza, Lenka Kmeťová"— Presentation transcript:

1 Tomáš Jurníček, Jakub Jůza, Lenka Kmeťová
Big text data mining Tomáš Jurníček, Jakub Jůza, Lenka Kmeťová

2 Introduction Text data analysis Sophisticaded analytic methods
Information extraction from data

3 Big data and data mining
datasets of large size and complexity Companies have large amounts​ of data Data needs to be analyzed Problem: natural language Data mining Data cleaning Data integration Data selection Mining methods Evaluating results

4 Methods Information extraction Categorization Clustering Visualization
Key phrases and relations Unstructured text Categorization Assign categories to documents Clustering Using clusters Visualization Present data in a form understable for humans Summarization Long documents Expressing only core information

5 Tools Large companies like Facebook or LinkedIn work on open-source projects. For example: Apache Hadoop - for data-heavy distributed applications Apache S4- for continuous processing of data streams Storm (Twitter) - for streaming distributed data Open source tools for Big Data Mining: Apache Mahout, R, MOA,…

6 Nursing records A specific area of use for Big data mining
Electronic Medical Record (EMR) = information about patients This data is not used to its full potential. information is written in an unstructured style expressions are highly subjective -> Data mining is more complicated

7 Nursing records Result analyzed by KeyGraph
associations and frequent terms that represent basic concepts in the data

8 Future There are a lot of challanges:
Statistical significance – quality of statistical resultst for large sets of data Distributed mining – more parallelize methods Time evolving data - data is changing in conjuction with time Hidden big data – a lot of data is unlabeled and unstructured. Currently, only 3% of data is usable for data mining!

9 Conclusion We are at the beginning of a new era, when Big text data mining will allow to discover new, currently unknown, knowledge.


Download ppt "Tomáš Jurníček, Jakub Jůza, Lenka Kmeťová"

Similar presentations


Ads by Google