Presentation is loading. Please wait.

Presentation is loading. Please wait.

TDM=Text Mining “automated processing of large amounts of structured digital textual content for purposes of information retrieval, extraction, interpretation.

Similar presentations


Presentation on theme: "TDM=Text Mining “automated processing of large amounts of structured digital textual content for purposes of information retrieval, extraction, interpretation."— Presentation transcript:

1

2 TDM=Text Mining “automated processing of large amounts of structured digital textual content for purposes of information retrieval, extraction, interpretation and analysis” Bernie Reilly, Center for Research Libraries CRL

3 TDM=Data Mining Overview
Generally, data mining (sometimes called data or knowledge discovery) is the process of analyzing data from different perspectives and summarizing it into useful information - information that can be used to increase revenue, cuts costs, or both. Data mining software is one of a number of analytical tools for analyzing data. It allows users to analyze data from many different dimensions or angles, categorize it, and summarize the relationships identified. Technically, data mining is the process of finding correlations or patterns among dozens of fields in large relational databases.

4 Another Definition “automated tools, techniques or technology to process large volumes of digital content that is often not well structured…to identify and select relevant information; to extract information from the content, to identify relationships within/between/across documents and incidents or events for met-analysis” Eefke Smit

5 Another Definition SAS definition
Text Mining discovers themes, patterns, emerging issues and insights buried in document collections. By automatically reading text and delivering algorithms for rigorous, advanced analyses, the solution makes it possible to grasp future trends and act on new opportunities more precisely and with less risk. It can include advanced linguistic capabilities within the core data mining solution SAS definition

6 TDM Business uses vary from scholarly uses

7 Class Discussion How might business use data mining? Health sciences? Scholarly uses?

8 Reasons for TDM To enrich content Systematic review of literature Discovery Computation linguistics research

9 Steps in TDM Hurdles to overcome
Researchers must be able to process large amounts of content: automated Researchers must identify questions to be asked Must be able to find the right sources to be mined Must be able to access these sources Must be able to download the results -To analyze and interpret Hurdles to overcome Software required? Construct proper query Obtain permission to access – if not subscribed by an Institution-licensing problems Varying formates-no-standard formats for storage

10 Librarians Role in Text/Data Mining
Advise on License Language- to develop publishers licenses that address TDM See work of California Digital Library and JISC and CRL Assist Researchers in TDM-inform them of TDM process, what data mining can do for them and connect them with the tools to accomplish TDM – through interviews develop strategies, “pilot studies”

11 User Case Since ,000 journal articles on spinal cord injury There has been an average of 22 journal articles a day on spin cord injury How can all this information be analyzed?

12 TDM With the help of automated software a large amount of data and text will be processed to identify entities, instances, actions, relationships and patterns to do further analysis

13 Typical TDM Content Text mining output typically consists of a new metadata layer for information: Journal Article Clusters and categorizations, indexes Topical maps, to show the occurrence of topic and their interelationships Databases with fact, patterns, relationships, statements, assertions, properties found in the articles Visualisations: graphs, mappings, plot-graphs and topical maps

14 Class- Please View Smit,Eefke and Maurits van der Graaf. Content Mining a short introduction to practices and policies presented for Center for Research Libraries, July 17, 2013 (CRL Global Resources Forum)

15 Class Please Read

16

17 Tools for searching the Deep Web
Deep Dyve Deep Web Technologies WorldWideScience.org Deep Web Harvester from BrightPlanet

18

19 Credits Okerson, Ann. Text & Data Mining- A Librarian overview, IFLA WLIC, Singapore, August 7, 2013 Smit,Eefke and Maurits van der Graaf. Content Mining a short introduction to practices and policies presented for Center for Research Libraries, July 17, 2013 (CRL Global Resources Forum) Speirs, martha A. Data mining for scholarly journals: challenges and solutions for libraries. IFLA WLIC 2013, June 28, 2013 EMEA regional council meeting connects members to the latest in library data research: Mining insights from 50 million books. NEXTSpace no. 21, May 2013 Utube-Text/Data mining, libraries and online publishers, July 17,2013. CRL. Chiang,Katherine. Data mining, data fusion, and libraries. June 21, st Annual IATUL Conference. Paper 4


Download ppt "TDM=Text Mining “automated processing of large amounts of structured digital textual content for purposes of information retrieval, extraction, interpretation."

Similar presentations


Ads by Google