Presentation is loading. Please wait.

Presentation is loading. Please wait.

UB LIS 571 Soergel Lecture 6.2b Document analysis for retrieval and information extraction Dagobert Soergel Department of Library and Information Studies.

Similar presentations


Presentation on theme: "UB LIS 571 Soergel Lecture 6.2b Document analysis for retrieval and information extraction Dagobert Soergel Department of Library and Information Studies."— Presentation transcript:

1

2 UB LIS 571 Soergel Lecture 6.2b Document analysis for retrieval and information extraction Dagobert Soergel Department of Library and Information Studies Graduate School of Education University at Buffalo Includes audio.

3 1 Overview of document/text analysis You should have read Lecture Notes p. 225 – 226 (pink). Read Lecture Notes p. 227 – 228. 2

4 2 Approaches to document/text analysis Read Lecture Notes p. 229. The audio gives brief comments on some of these (6 min). 3

5 3 Linguistic techniques Read Lecture Notes p. 230, take the page out of your binder. 3.1 Query expansion, p. 231. Remember the Medline exercise. Will elaborate on this later, in Part 5. 3.2 Noun phrases, p. 231: Key points Importance of noun phrases as carriers of meaning in English. Many single nouns have multiple meanings, in a noun phrase the meaning is specified. For another example, try some noun phrases that include the word pool. 4

6 3.3 Word sense disambiguation Lecture Notes p. 232 – 233: Listen to the audio while following in the Lecture Notes (5 min.) 5

7 3.4 Named entity recognition Lecture Notes p. 234 – 235: Look at the examples. They should be self-explanatory. The audio talks about why this is important (2 min.). 6

8 3.5 Text structure. Cohesion (grammatical) Coherence (lexical – semantic) Read Lecture Notes p. 236 – 237. Lecture Notes p. 238 – 239: Listen to the audio while following in the Lecture Notes (5 min.) 7

9 3.5.3 Introduction to the analysis of how relationships are expressed in text Application to information extraction Information extraction is extremely important for systems to provide answers rather than pointers to where answers can be found. It helps people and organization cope with ever-increasing volumes of text. Google's Knowledge Graph is a huge database of statements, many extracted from text (using technology being further developed by Google). This database is used now to add substantive data to search results and will be used later to support question-answering. Read Lecture Notes p. 240, then study the second example from Crombie (p. 242) to find all the ways in which the relationship A B is expressed. In the top part, mentions of the same referent (object, action, or adjective) are connected through lines. In the bottom part, relationship types (e.g. cause) are labeled to make your task easier. After your own exploration listen to the audio; it explains the second example (7 min.). Then study p. 243. Explanation in the audio (4 min.). Optional: apply what you have learned to analyzing Example 1 from Crombie. 8

10 3.6 Extracting data through slot filling in frames Lecture Notes p. 244, with examples on p. 245: Read p. 244, first box The audio explains the rest (8 min.) 9


Download ppt "UB LIS 571 Soergel Lecture 6.2b Document analysis for retrieval and information extraction Dagobert Soergel Department of Library and Information Studies."

Similar presentations


Ads by Google