Presentation is loading. Please wait.

Presentation is loading. Please wait.

A Financial News Summarisation System based on Lexical Cohesion

Similar presentations


Presentation on theme: "A Financial News Summarisation System based on Lexical Cohesion"— Presentation transcript:

1 A Financial News Summarisation System based on Lexical Cohesion
GIDA IST TKE Conference, 28 – 30 August 2002 Nancy - France Page 1 A Financial News Summarisation System based on Lexical Cohesion Paulo Cesar Fernandes de Oliveira Khurshid Ahmad Lee Gillam

2 Introduction “Stock market news has gone from hard to find (in the 1970s and early 1980s), then easy to find (in the late 1980s), then hard to get away from”. (From Peter Lynch (2000)) growth in the volumes of financial news consequence of this growth  the need of text summarisation

3 Automatic Summarisation
Introduction Automatic Summarisation Get an information source; Extract some content from it; Present the most important part to the user xx xxx xxxx x xx xxxx xxx xx xxx xx xxxxx x xxx xx xxx xx x xxx xx xx xxx x xxx xx xxx x xx x xxxx xxxx xxxx xx xx xxxx xxx xxx xx xx xxxx x xxx xx x xx xx xxxxx x x xx xxx xxxxxx xxxxxx x x xxxxxxx xx x xxxxxx xxxx xx xx xxxxx xxx xx x xx xx xxxx xxx xxxx xx xxx xx xxx xxxx xx xxx x xxxx x xx xxxx xx xxx xxxx xx x xxx xxx xxxx x xxx x xxx xx xx xxxxx x x xx xxxxxxx xx x xxxxxx xxxx xx xx xxxxx xxx xx xxx xx xxxx x xxxxx xx xxxxx x

4 Introduction What is a summary?
A summary is a text that is produced from one or more texts, that contains a significant portion of the information in the original text(s). (From Hovy and Lin (1998))

5 Introduction What constitutes a good summary?
Mrs. Coolidge: what did the preacher discuss in his sermon? President Coolidge: sin. Mrs. Coolidge: what did he say? President Coolidge: he said he was against it. President Calvin Coolidge, Grace Coolidge, and dog, Rob Roy, c Plymouth Notch, Vermont. (Copyright © 2001 The MITRE Corporation) Source: Bartlett, J Collection of Familiar Quotations, 15th edition, Citadel Press, (noted by Graeme Hirst)

6 Lexical Cohesion Definition
The tendency of the sentences in a text to carry information about a certain topic through related words provides quality of unity to the text.

7 Lexical Cohesion Halliday and Hasan (1976) have looked at the question of cohesion in text. Their focus was on grammatical and on lexical cohesion. I will deal only with lexical cohesion: Halliday and Hasan have come up with a new terminology ‘selecting the same lexical item twice, or selecting two that are closely related’ (p.12) Tie  ‘single instance of cohesion’ (p.3) Texture  a property of ‘being a text’ (p.2)

8 Lexical Cohesion Hoey (1991) has looked at cohesion in text from a lexical perspective. He has suggested that cohesion ‘may be crudely defined as the way certain words of a sentence can connect that sentence to its predecessors (and successors) in a text’. link – occurrence of an item in two separate sentences bond – ‘connection between any two sentences by virtue of there being a sufficient number of links between them’ (p.91)

9 Lexical Cohesion Links Example Sentence 23:
J&J's stock added 83 cents to $65.49. Sentence 15: "For the stock market this move was so deeply discounted that I don't think it will have a major impact". Sentence 26: Flagging stock markets kept merger activity and new stock offerings on the wane, the firm said. Sentence 42: Lucent, the most active stock on the New York Stock Exchange, skidded 47 cents to $4.31, after falling to a low at $4.30. Text title: U.S. stocks hold some gains. Collected from Reuters’ Website on 20 March 2002.

10 Lexical Cohesion Bonds Example
17. In other news, Hewlett-Packard said preliminary estimates showed shareholders had approved its purchase of Compaq Computer -- a result unconfirmed by voting officials. 19. In a related vote, Compaq shareholders are expected on Wednesday to back the deal, catapulting HP into contention against International Business Machines for the title of No. 1 computer company. Text title: U.S. stocks hold some gains. Collected from Reuters’ Website on 20 March 2002.

11 Lexical Cohesion Simple Repetition
two identical items (e.g. bear – bear) or two similar items whose difference is ‘entirely explicable in terms of a closed grammatical paradigm’ (e.g. bears (N) – bears (N)) (p.53) Complex Repetition which results from two items sharing a lexical morpheme but differing with respect to other morphemes or grammatical function (e.g. human (N) – human (Adj.), dampness – damp) Simple Paraphrase two different items of the same grammatical class which are ‘interchangeable in the context’ (p.69) and ‘whenever a lexical item may substitute for another without loss or gain in specificity and with no discernible change in meaning’. (p.62). (e.g. sedated – tranquillised) Complex Paraphrase two different items of the same or different grammatical class; this is restricted to three situations: a) antonyms which do not share a lexical morpheme (e.g. hot – cold); b) two items one of which ‘is a complex repetition of the other, and also a simple paraphrase (or antonym) of a third’ (p.64). (e.g. a complex paraphrase is recorded for ‘finance’ (v) and ‘funds’ (n) if a simple paraphrase has been recorded for ‘finance’ (v) and ‘fund’ (v), and a complex repetition has been recorded for ‘fund’ (v) and ‘funds’ (n); c) when there is the possibility of substituting an item for another (for instance, a complex paraphrase is recorded between ‘record’ and ‘discotheque’ if ‘record’ can be replaced with ‘disc’.

12 SummariserPort Architecture

13 SummariserPort History
Summariser-Port is a revised and object-oriented version of the TelePattan developed at Surrey during by Benbrahim and Tostevin. The TelePattan system was used to investigate cohesion in technical texts by Trine Dahl, Bergen Business School. TelePattan was entered in the DARPA sponsored SUMAC (1997) competition where its summary were judged to amongst the best machine produced summaries by independent evaluators.

14 SummariserPort Reads the text file Segments it into sentences.
Parser Reads the text file Segments it into sentences. BreakIterator - Java class designed specifically to parse natural language into words and sentences. Features: built-in knowledge of punctuation rules; it does not require any special mark-up.

15 SummariserPort Patterns Extractor Performs simple repetition
Pattern-matching operation Includes an optional file of closed class words and other non-lexical items (e.g. pronouns, prepositions, determiners, articles, conjunctions, some adverbs, etc.)

16 SummariserPort Morphological Rules Performs complex repetition
Instances of complex repetition are looked up by means of a list of derivational suffixes encoded into the program. For the English language, it contains 75 morphology conditions that lead to approximately 2500 possible relations among words.

17 List of Sentences (TO, TC, MB)
SummariserPort Output Produces the results Files created: Summary File MoreInfo File Whole text Summary Link Matrix Bond Matrix Word Frequency List List of Sentences (TO, TC, MB)

18 SummariserPort Link Matrix Bond Matrix

19 SummariserPort List of Sentences Word Frequency List

20 Evaluation Question Game or Q&A Evaluation
To measure information content (retention) Some people see text and create a set of questions about content (questioners) Other people (answerers) see: 1. Nothing – but must try to answer the questions (default knowledge) 2. Summary – must answer the same questions 3. Full Text – must answer the same questions again Compute the quality of Summaries (% answers correct)

21 Evaluation

22 Conclusions We are very keen to devise strategies for independent and objective evaluation of our system. Human evaluation is continuing within the GIDA project – reviewed by project partners and EU-appointed evaluators. Machine-based evaluation, based on neural network classification of summarised and original texts, is also continuing.

23 Future Work Conduct further evaluation tests
Implement Simple Paraphrase Conduct experiments in Brazilian Portuguese Complex Repetition Simple Paraphrase


Download ppt "A Financial News Summarisation System based on Lexical Cohesion"

Similar presentations


Ads by Google