A Financial News Summarisation System based on Lexical Cohesion

Slides:



Advertisements
Similar presentations
Planning Your web content
Advertisements

Int 2 PE Preparation of the Body Lecture 2 – Data Collection.
Publishers Web Sites Standard Features. Objectives Access publishers websites Identify general features available on most publishers websites Know how.
© Paradigm Publishing, Inc Word 2010 Level 2 Unit 1Formatting and Customizing Documents Chapter 2Proofing Documents.
Morphology.
ValueLine Investment Service Beyond the Basics. What You Will Learn u How to navigate through the service. u What each of the six publications cover.
Tentative Unit 1 Schedule Week 2 1/19- MLK Day-No Class 1/21-Using library databases (bring computer to class) 1/23- Intro to Exploratory Narrative & Source.
1 colons : : : : : : : : : and semicolons ; ; ; ; ; within sentences For use with Technical Editing, 3rd ed.
I nvestment A nalysis II Investment Analysis II - © 2012 Houman Younessi MGMT-6330 Investment Analysis II 1 Interest Rates, Forwards and Futures.
Chapter 12 – Strategies for Effective Written Reports
Rubryx Document Classification Technology Authors: V.N. Polyakov, V.V. Sinitsin.
Copyright 2004 Monash University IMS5401 Web-based Systems Development Topic 2: Elements of the Web (g) Interactivity.
Copyright © 2003 by The McGraw-Hill Companies, Inc. All rights reserved. Business and Administrative Communication SIXTH EDITION.
NaLIX: A Generic Natural Language Search Environment for XML Data Presented by: Erik Mathisen 02/12/2008.
1 XML: Document Type Definitions 2 Road Map  Introduction to DTDs  What’s a DTD?  Why are they important?  What will we cover?  Our First DTD 
+ Common Cents Investment Group Winter Quarter Wrap-Up March 9, 2009.
Computational Language Andrew Hippisley. Computational Language Computational language and AI Language engineering: applied computational language Case.
Slide 3.1 Saunders, Lewis and Thornhill, Research Methods for Business Students, 5 th Edition, © Mark Saunders, Philip Lewis and Adrian Thornhill 2009.
7: Basics of RDA Relationships for Serials Relationships in RDA Relationship designators Creators and other corporate bodies related to works Corporate.
Guidelines for Examination Candidates Raymond Hickey English Linguistics University of Duisburg and Essen (August 2015)
Opening and Welcome Lee Gillam University of Surrey.
M1G Introduction to Programming 2 4. Enhancing a class:Room.
McEnery, T., Xiao, R. and Y.Tono Corpus-based language studies. Routledge. Unit A 2. Representativeness, balance and sampling (pp13-21)
Translation Studies 7. Cohesion in translation Krisztina Károly, Spring, 2006 Source: Klaudy & Károly, 2000.
HERMITAGE CAPITAL MANAGEMENT The Role of the Board of Directors in Promoting Corporate Governance by William F. Browder Managing Director, Hermitage Capital.
Search Engines and Information Retrieval Chapter 1.
Processing of large document collections Part 10 (Information extraction: multilingual IE, IE from web, IE from semi-structured data) Helena Ahonen-Myka.
Trade Effluent: Settlement and Configuration of Premises Workshop 18 th August 2015.
Issues in Paraphrasing Postgraduate In-sessional Writing: 4 John Morgan.
1)Read through and mark-up text. 2)After you've finished editing the paper, tell the writer what you as a reader are finding in the text. Writer listens-
How to write better text responses A Step by Step Guide.
Developing Reading Skills. Key Reading Skills 1.Selecting what is relevant for the current purpose; 2.Using all the features of the text e.g. headings,
CRESST ONR/NETC Meetings, July 2003, v1 ONR Advanced Distributed Learning Linguistic Modification of Test Items Jamal Abedi University of California,
FRE 2661 CSCL Conference, Bergen, june 2003C. Reffay, T. Chanier 1 How Social Network Analysis can help to measure cohesion in collaborative distance-learning.
1 University of Palestine Topics In CIS ITBS 3202 Ms. Eman Alajrami 2 nd Semester
A Language Independent Method for Question Classification COLING 2004.
Keys to Successful Marketing  Must understand and meet customer needs and wants  To meet customer needs, marketers must collect information.
Analysis of Complex Systems John Sherwood Period 2.
Terminology and documentation*  Object of the study of terminology:  analysis and description of the units representing specialized knowledge in specialized.
Morphology A Closer Look at Words By: Shaswar Kamal Mahmud.
Copyright 2007, Paradigm Publishing Inc. ACCESS 2007 Chapter 3 BACKNEXTEND 3-1 LINKS TO OBJECTIVES Modify a Table – Add, Delete, Move Fields Modify a Table.
Graph-based Text Classification: Learn from Your Neighbors Ralitsa Angelova , Gerhard Weikum : Max Planck Institute for Informatics Stuhlsatzenhausweg.
Information Extraction and Automatic Summarisation *
Summarisation Work at Sheffield Robert Gaizauskas Natural Language Processing Group Department of Computer Science University of Sheffield.
1/21 Automatic Discovery of Intentions in Text and its Application to Question Answering (ACL 2005 Student Research Workshop )
 Network  A _____ of computers that can _________ w/ each other  Examples of hardware  ______________ & communication lines  Internet  Hardware.
PET Writing Part 2 Writing Short Notes or Messages PET Writing Part 3 Writing Longer Texts.
UWMS Data Mining Workshop Content Analysis: Automated Summarizing Prof. Marti Hearst SIMS 202, Lecture 16.
Programming Errors. Errors of different types Syntax errors – easiest to fix, found by compiler or interpreter Semantic errors – logic errors, found by.
PMC xx/xxxx Project Title & Project Code Presenter:
INVESTMENT  acquisition of capital assets, (buildings, machinery, stocks, bonds and shares) SHARES  part ownership of a company BROKER  licensed.
Event-Based Extractive Summarization E. Filatova and V. Hatzivassiloglou Department of Computer Science Columbia University (ACL 2004)
Realtime Financial Monitoring and Analysis System May 2010 Lietu Search Engine.
Onlinedeeneislam.blogspot.com1 Design and Analysis of Algorithms Slide # 1 Download From
Tentative Unit 1 Schedule Week 2 1/20-Using library databases (bring computer to class) 1/22- Intro to Exploratory Narrative & Source evaluations Week.
An Approach to English Translation of Islamic Texts 1 Cohesion.
Annotated Bibliography A how to for Sociology & The Culture Project Taken from Purdue Owl!
124. Cont. 5 Re-read RW1.5 Understand and explain the figurative and metaphorical use of words in context.
A Survey on Automatic Text Summarization Dipanjan Das André F. T. Martins Tolga Çekiç
Plagiarism Miss H. 2008/2009. The entire content of this presentation comes from TurnItIn.com Turnitin allows free distribution and non-profit use of.
Access to Electronic Journals and Articles in ARL Libraries By Dana M. Caudle Cecilia M. Schmitz.
An Introduction to the USENIX Association The Advanced Computing Systems Association.
INFORMATION FOR PARENTS AUTUMN 2014 SPELLING, PUNCTUATION AND GRAMMAR.
ABI/INFORM Database Detail
Sourcing Event Tool Kit Multiline Sourcing, Market Baskets and Bundles
2017 Iowa Chapter ASLA Fall Conference Awards Category here
Multimedia Information Retrieval
Communicating and Adapting Language task
The Winograd Schema Challenge Hector J. Levesque AAAI, 2011
Presentation transcript:

A Financial News Summarisation System based on Lexical Cohesion GIDA IST-2000-31123 TKE Conference, 28 – 30 August 2002 Nancy - France Page 1 A Financial News Summarisation System based on Lexical Cohesion Paulo Cesar Fernandes de Oliveira Khurshid Ahmad Lee Gillam

Introduction “Stock market news has gone from hard to find (in the 1970s and early 1980s), then easy to find (in the late 1980s), then hard to get away from”. (From Peter Lynch (2000)) growth in the volumes of financial news consequence of this growth  the need of text summarisation

Automatic Summarisation Introduction Automatic Summarisation Get an information source; Extract some content from it; Present the most important part to the user xx xxx xxxx x xx xxxx xxx xx xxx xx xxxxx x xxx xx xxx xx x xxx xx xx xxx x xxx xx xxx x xx x xxxx xxxx xxxx xx xx xxxx xxx xxx xx xx xxxx x xxx xx x xx xx xxxxx x x xx xxx xxxxxx xxxxxx x x xxxxxxx xx x xxxxxx xxxx xx xx xxxxx xxx xx x xx xx xxxx xxx xxxx xx xxx xx xxx xxxx xx xxx x xxxx x xx xxxx xx xxx xxxx xx x xxx xxx xxxx x xxx x xxx xx xx xxxxx x x xx xxxxxxx xx x xxxxxx xxxx xx xx xxxxx xxx xx xxx xx xxxx x xxxxx xx xxxxx x

Introduction What is a summary? A summary is a text that is produced from one or more texts, that contains a significant portion of the information in the original text(s). (From Hovy and Lin (1998))

Introduction What constitutes a good summary? Mrs. Coolidge: what did the preacher discuss in his sermon? President Coolidge: sin. Mrs. Coolidge: what did he say? President Coolidge: he said he was against it. President Calvin Coolidge, Grace Coolidge, and dog, Rob Roy, c.1925. Plymouth Notch, Vermont. (Copyright © 2001 The MITRE Corporation) Source: Bartlett, J. 1983. Collection of Familiar Quotations, 15th edition, Citadel Press, 1983. (noted by Graeme Hirst)

Lexical Cohesion Definition The tendency of the sentences in a text to carry information about a certain topic through related words provides quality of unity to the text.

Lexical Cohesion Halliday and Hasan (1976) have looked at the question of cohesion in text. Their focus was on grammatical and on lexical cohesion. I will deal only with lexical cohesion: Halliday and Hasan have come up with a new terminology ‘selecting the same lexical item twice, or selecting two that are closely related’ (p.12) Tie  ‘single instance of cohesion’ (p.3) Texture  a property of ‘being a text’ (p.2)

Lexical Cohesion Hoey (1991) has looked at cohesion in text from a lexical perspective. He has suggested that cohesion ‘may be crudely defined as the way certain words of a sentence can connect that sentence to its predecessors (and successors) in a text’. link – occurrence of an item in two separate sentences bond – ‘connection between any two sentences by virtue of there being a sufficient number of links between them’ (p.91)

Lexical Cohesion Links Example Sentence 23: J&J's stock added 83 cents to $65.49. Sentence 15: "For the stock market this move was so deeply discounted that I don't think it will have a major impact". Sentence 26: Flagging stock markets kept merger activity and new stock offerings on the wane, the firm said. Sentence 42: Lucent, the most active stock on the New York Stock Exchange, skidded 47 cents to $4.31, after falling to a low at $4.30. Text title: U.S. stocks hold some gains. Collected from Reuters’ Website on 20 March 2002.

Lexical Cohesion Bonds Example 17. In other news, Hewlett-Packard said preliminary estimates showed shareholders had approved its purchase of Compaq Computer -- a result unconfirmed by voting officials.   19. In a related vote, Compaq shareholders are expected on Wednesday to back the deal, catapulting HP into contention against International Business Machines for the title of No. 1 computer company. Text title: U.S. stocks hold some gains. Collected from Reuters’ Website on 20 March 2002.

Lexical Cohesion Simple Repetition two identical items (e.g. bear – bear) or two similar items whose difference is ‘entirely explicable in terms of a closed grammatical paradigm’ (e.g. bears (N) – bears (N)) (p.53) Complex Repetition which results from two items sharing a lexical morpheme but differing with respect to other morphemes or grammatical function (e.g. human (N) – human (Adj.), dampness – damp) Simple Paraphrase two different items of the same grammatical class which are ‘interchangeable in the context’ (p.69) and ‘whenever a lexical item may substitute for another without loss or gain in specificity and with no discernible change in meaning’. (p.62). (e.g. sedated – tranquillised) Complex Paraphrase two different items of the same or different grammatical class; this is restricted to three situations: a) antonyms which do not share a lexical morpheme (e.g. hot – cold); b) two items one of which ‘is a complex repetition of the other, and also a simple paraphrase (or antonym) of a third’ (p.64). (e.g. a complex paraphrase is recorded for ‘finance’ (v) and ‘funds’ (n) if a simple paraphrase has been recorded for ‘finance’ (v) and ‘fund’ (v), and a complex repetition has been recorded for ‘fund’ (v) and ‘funds’ (n); c) when there is the possibility of substituting an item for another (for instance, a complex paraphrase is recorded between ‘record’ and ‘discotheque’ if ‘record’ can be replaced with ‘disc’.

SummariserPort Architecture

SummariserPort History Summariser-Port is a revised and object-oriented version of the TelePattan developed at Surrey during 1994-1999 by Benbrahim and Tostevin. The TelePattan system was used to investigate cohesion in technical texts by Trine Dahl, Bergen Business School. TelePattan was entered in the DARPA sponsored SUMAC (1997) competition where its summary were judged to amongst the best machine produced summaries by independent evaluators.

SummariserPort Reads the text file Segments it into sentences. Parser Reads the text file Segments it into sentences. BreakIterator - Java class designed specifically to parse natural language into words and sentences. Features: built-in knowledge of punctuation rules; it does not require any special mark-up.

SummariserPort Patterns Extractor Performs simple repetition Pattern-matching operation Includes an optional file of closed class words and other non-lexical items (e.g. pronouns, prepositions, determiners, articles, conjunctions, some adverbs, etc.)

SummariserPort Morphological Rules Performs complex repetition Instances of complex repetition are looked up by means of a list of derivational suffixes encoded into the program. For the English language, it contains 75 morphology conditions that lead to approximately 2500 possible relations among words.

List of Sentences (TO, TC, MB) SummariserPort Output Produces the results Files created: Summary File MoreInfo File Whole text Summary Link Matrix Bond Matrix Word Frequency List List of Sentences (TO, TC, MB)

SummariserPort Link Matrix Bond Matrix

SummariserPort List of Sentences Word Frequency List

Evaluation Question Game or Q&A Evaluation To measure information content (retention) Some people see text and create a set of questions about content (questioners) Other people (answerers) see: 1. Nothing – but must try to answer the questions (default knowledge) 2. Summary – must answer the same questions 3. Full Text – must answer the same questions again Compute the quality of Summaries (% answers correct)

Evaluation

Conclusions We are very keen to devise strategies for independent and objective evaluation of our system. Human evaluation is continuing within the GIDA project – reviewed by project partners and EU-appointed evaluators. Machine-based evaluation, based on neural network classification of summarised and original texts, is also continuing.

Future Work Conduct further evaluation tests Implement Simple Paraphrase Conduct experiments in Brazilian Portuguese Complex Repetition Simple Paraphrase