MILAN, Italy, April 18. A small airplane crashed into a government building in heart of Milan, setting the top floors on fire, Italian police reported. There were no immediate reports on casualties as rescue workers attempted to clear the area in the city's financial district. Few details of the crash were available, but news reports about it immediately set off fears that it might be a terrorist act akin to the Sept. 11 attacks in the United States. Those fears sent U.S. stocks tumbling to session lows in late morning trading. Witnesses reported hearing a loud explosion from the 30-story office building, which houses the administrative offices of the local Lombardy region and sits next to the city's central train station. Italian state television said the crash put a hole in the 25th floor of the Pirelli building. News reports said smoke poured from the opening. Police and ambulances rushed to the building in downtown Milan. No further details were immediately available. How many victims? Was it a terrorist act? What was the target? What happened? Says who? When, where?
1. How many people were injured? 2. How many people were killed? (age, number, gender, description) 3. Was the pilot killed? 4. Where was the plane coming from? 5. Was it an accident (technical problem, illness, terrorist act)? 6. Who was the pilot? (age, number, gender, description) 7. When did the plane crash? 8. How tall is the Pirelli building? 9. Who was on the plane with the pilot? 10. Did the plane catch fire before hitting the building? 11. What was the weather like at the time of the crash? 12. When was the building built? 13. What direction was the plane flying? 14. How many people work in the building? 15. How many people were in the building at the time of the crash? 16. How many people were taken to the hospital? 17. What kind of aircraft was used?
5 Questions What kinds of summaries do people want? What are summarizing, abstracting, gisting,...? How sophisticated must summ. systems be? Are statistical techniques sufficient? Or do we need symbolic techniques and deep understanding as well? What milestones would mark quantum leaps in summarization theory and practice? How do we measure summarization quality?
Summary definition(Sparck Jones,1999) “a reductive transformation of source text to summary text through content condensation by selection and/or generalization on what is important in the source.” Definitions
Schematic summary processing model Source text Interpretation Source representation Transformation Summary representation Generation Summary text
8 Summarizing factors Input (Sparck Jones 2007) subject type: domain genre: newspaper articles, editorials, letters, reports... form: regular text structure; free-form source size: single doc; multiple docs (few; many) Purpose situation: embedded in larger system (MT, IR) or not? audience: focused or general usage: IR, sorting, skimming... Output completeness: include all aspects, or focus on some? format: paragraph, table, etc. style: informative, indicative, aggregative, critical...
9 Examples Exercise: summarize the following texts for the following readers: text1: Coup Attempt text2: childrens’ story reader1: your friend, who knows nothing about South Africa. reader2: someone who lives in South Africa and knows the political position. reader3: your 4-year-old niece. reader4: the Library of Congress.
10 ‘Genres’ of Summary? Indicative vs. informative...used for quick categorization vs. content processing. Extract vs. abstract...lists fragments of text vs. re-phrases content coherently. Generic vs. query-oriented...provides author’s view vs. reflects user’s interest. Background vs. just-the-news...assumes reader’s prior knowledge is poor vs. up-to-date. Single-document vs. multi-document source...based on one text vs. fuses together many texts.
11 A Summarization Machine EXTRACTS ABSTRACTS ? MULTIDOCS ExtractAbstract Indicative Generic Background Query-oriented Just the news 10% 50% 100 % Very Brief Brief Long Headline Informative DOC QUERY CASE FRAMES TEMPLATES CORE CONCEPTS CORE EVENTS RELATIONSHIPS CLAUSE FRAGMENTS INDEX TERMS
12 Computational Approach Top-Down: I know what I want! User needs: only certain types of info System needs: particular criteria of interest, used to focus search Bottom-Up: I’m dead curious: what’s in the text? User needs: anything that’s important System needs: generic importance metrics, used to rate content
13 Review of Methods Text location: title, position Cue phrases Word frequencies Internal text cohesion: word co-occurrences local salience co-reference of names, objects lexical similarity semantic rep/graph centrality Discourse structure centrality Information extraction templates Query-driven extraction: query expansion lists co-reference with query names lexical similarity to query Bottom-up methods Top-down methods
14 Query-Driven vs. Text-Driven Focus Top-down: Query-driven focus Criteria of interest encoded as search specs. System uses specs to filter or analyze text portions. Examples: templates with slots with semantic characteristics; termlists of important terms. Bottom-up: Text-driven focus Generic importance metrics encoded as strategies. System applies strategies over rep of whole text. Examples: degree of connectedness in semantic graphs; frequency of occurrence of tokens.
15 Bottom-Up, using Info. Retrieval IR task: Given a query, find the relevant document(s) from a large set of documents. Summ-IR task: Given a query, find the relevant passage(s) from a set of passages (i.e., from one or more documents). Questions: 1. IR techniques work on large volumes of data; can they scale down accurately enough? 2. IR works on words; do abstracts require abstract representations? xx xxx xxxx x xx xxxx xx xxx x xxx xx xxx x xx x xxxx xxxx xx xx xxxx xxx xxx xx xx xxxx x xxx xx x xx xx xxxxx x x xx xxx xxxxxx xxxxxx x x xxxxxxx xx x xxxxxx xxxx xx xx xxxxx xxx xx x xxxxx xxxxx xx xxx x xxxxx xxx
16 Top-Down, using Info. Extraction IE task: Given a template and a text, find all the information relevant to each slot of the template and fill it in. Summ-IE task: Given a query, select the best template, fill it in, and generate the contents. Questions: 1. IE works only for very particular templates; can it scale up? 2. What about information that doesn’t fit into any template—is this a generic limitation of IE? xx xxx xxxx x xx xxxx xx xxx x xxx xx xxx x xx x xxxx xxxx xxxx xx xx xxxx xxx xxx xx xx xxxx x xxx xx x xx xx xxxxx x x xx xxx xxxxxx xxxxxx x x xxxxxxx xx x xxxxxx xxxx xx xx xxxxx xxx xx x xx xx xxxx xxx xxxx xx xxxxx xxxxx xx xxx x xxxxx xxx Xxxxx: xxxx Xxx: xxxx Xxx: xx xxx Xx: xxxxx x Xxx: xx xxx Xx: x xxx xx Xx: xxx x Xxx: xx Xxx: x
17 NLP/IE: Approach: try to ‘understand’ text—re-represent content using ‘deeper’ notation; then manipulate that. Need: rules for text analysis and manipulation, at all levels. Strengths: higher quality; supports abstracting. Weaknesses: speed; still needs to scale up to robust open- domain summarization. IR/Statistics: Approach: operate at lexical level—use word frequency, collocation counts, etc. Need: large amounts of text. Strengths: robust; good for query-oriented summaries. Weaknesses: lower quality; inability to manipulate information at abstract levels. Paradigms: NLP/IE vs. ir/statistics
18 Toward the Final Answer... Problem: What if neither IR-like nor IE-like methods work? Solution: semantic analysis of the text (NLP), using adequate knowledge bases that support inference (AI). Mrs. Coolidge: “What did the preacher preach about?” Coolidge: “Sin.” Mrs. Coolidge: “What did he say?” Coolidge: “He’s against it.” –sometimes counting and templates are insufficient, –and then you need to do inference to understand. Word counting Inference
2 Elaboration 8 Example 2 Background Justification 3 Elaboration 8 Concession 10 Antithesis Mars experiences frigid weather conditions (2) Surface temperature s typically average about -60 degrees Celsius (-76 degrees Fahrenheit) at the equator and can dip to - 123 degrees C near the poles (3) 4 5 Contrast Although the atmosphere holds a small amount of water, and water-ice clouds sometimes develop, (7) Most Martian weather involves blowing dust and carbon monoxide. (8) Each winter, for example, a blizzard of frozen carbon dioxide rages over one pole, and a few meters of this dry-ice snow accumulate as previously frozen carbon dioxide evaporates from the opposite polar cap. (9) Yet even on the summer pole, where the sun remains in the sky all day long, temperature s never warm enough to melt frozen water. (10) With its distant orbit (50 percent farther from the sun than Earth) and slim atmospheric blanket, (1) Only the midday sun at tropical latitudes is warm enough to thaw ice on occasion, (4) 5 Evidence Cause but any liquid water formed in this way would evaporate almost instantly (5) because of the low atmospheric pressure (6)
20 Marcu 97 5 Evidence Cause 56 4 4 5 Contrast 3 3 Elaboration 12 2 Background Justification 2 Elaboration 78 8 Concession 9 10 Antithesis 8 Example 2 Elaboration Summarization = selection of the most important units 2 > 8 > 3, 10 > 1, 4, 5, 7, 9 > 6
21 Information extraction Method Idea: content selection using templates Predefine a template, whose slots specify what is of interest. Use a canonical IE system to extract from a (set of) document(s) the relevant information; fill the template. Generate the content of the template as the summary. Previous IE work: FRUMP (DeJong, 78): ‘sketchy scripts’ of terrorism, natural disasters, political visits... (Mauldin, 91): templates for conceptual IR. (Rau and Jacobs, 91): templates for business. (McKeown and Radev, 95): templates for news.
22 Information Extraction method Example template: MESSAGE:IDTSL-COL-0001 SECSOURCE:SOURCEReuters SECSOURCE:DATE26 Feb 93 Early afternoon INCIDENT:DATE26 Feb 93 INCIDENT:LOCATIONWorld Trade Center INCIDENT:TYPEBombing HUM TGT:NUMBERAT LEAST 5
23 Full Generation Example Challenge: Pack content densely! Example (McKeown and Radev, 95): Traverse templates and assign values to ‘realization switches’ that control local choices such as tense and voice. Map modified templates into a representation of Functional Descriptions (input representation to Columbia’s NL generation system FUF). FUF maps Functional Descriptions into English.
24 Generation Example (McKeown and Radev, 95) NICOSIA, Cyprus (AP) – Two bombs exploded near government ministries in Baghdad, but there was no immediate word of any casualties, Iraqi dissidents reported Friday. There was no independent confirmation of the claims by the Iraqi National Congress. Iraq’s state-controlled media have not mentioned any bombings. Multiple sources and disagreement Explicit mentioning of “no information”.
Build Word-Word Graph Word relations Word similariy computation: Based on dictionary (WordNet) Based on corpus (mutual information)
Build Sentence-Word Graph Relation between sentences and words Similarity computation:
Document Model Assumption 1 If a sentence is important, its closely connected sentences are also important; If a word is important, its closely related words are also important. Assumption 2 More important words are included in a sentence, more important the sentence is. More frequent a word occurs in important sentences, more important the word is.
matrix form: Then we can simultaneously rank sentences (u) and words (v).
Postprocessing Simple processing Extract the highest scored sentences until the length reaches the requirement. Problems 1: redundancy 2: meaningless words in sentences (rules based) 3: coherence
Sentence simplification Delete meaningless words in sentences News specific noisy words Content irrelevant words Rule based method The beginning of news: e.g.,“ALBUQUERQUE, N.M. (AP) ； The initial words in the sentence: such as “and” ， ”also” ， ”besides ， ” ， ”though ， ” ， ”in addition” ， ”somebody said” ， “somebody says”; ； “somebody ( 代词 )/It is said/reported/noticed/thought that” ； The parenthesized content in captalized letters …
Sentence ordering Sentence ordering by score: no logic in the content Temporal based sentence ordering Acquire the time stamp from the original texts Order sentence according to the publish time of documents; For the sentences in the same document, order them by their occurrence in the document
43 Outline Summarization Recap Cross-genre Information Linking (Huang et al., 2012) Cross-genre Summarization
What are Other People Planning? 10K tweets posted each hour Top tweets ranked by TextRank
Desperately Looking for a Better Ranker Top Ranked Tweets from our system
After temporal and spatial constraints, informative to a general audience or helpful for tracking events Break news Real-time coverage of ongoing events … Informative Tweet Examples New Yorkers, find your exact evacuation zone by your address here: http://t.co/9NhiGKG /via @user #Irene #hurricane #NY Hurricane Irene: Latest developments http://t.co/2nQOJLO Non-Informative Tweet Examples Me, Myself, and Hurricane Irene. I'm ready For hurricane Irene. What is Informativeness?
Limitation of Previous Work Supervised ranking models require large amount of labeled data and multiple levels of features (e.g. content and user account features) Ignored cross-genre linkages and background knowledge Tweets about events of general interest are sent by many disconnected users Need to handle link sparsity with implicit user network prediction Ignored subjectivity detection and redundancy removal Our relationships have been ignored
Motivations and Hypotheses Informative tweets often contain rich links to diverse networks Hypothesis 1: Informative tweets are more likely to be posted by credible users; and vice versa (credible users are more likely to post informative tweets). Hypothesis 2: Tweets involving many users are more likely to be informative. Similar tweets appear with high frequency Synchronous behavior of users indicates informative information Had fun in the excursion bus for 16 hours Saw some empty ancient caves in darkness Our bus hit a house before the second caves!
Motivations and Hypotheses (Cont’) Hypothesis 3: Tweets aligned with contents of web documents are more likely to be informative. New Yorkers, find your exact evacuation zone by your address here: http://t.co/9NhiGKG /via @user #Irene \#hurricane \#NY Details of Aer Lingus flights affected by Hurricane Irene can be found at http://t.co/PCqE74V\u201dhttp://t.co/PCqE74V\u201d Hurricane Irene: City by City Forecasts http://t.co/x1t122Ahttp://t.co/x1t122A
Approach Overview Make use of correlations to formal genre web documents Infer implicit tweet-user relations to enrich network linkages Extend to heterogeneous networks instead of homogeneous networks Effective propagation model to consider global evidence from different genres
Non-Informative Tweet Filtering Capture the characteristics of a noisy tweet by a few patterns very short tweets without a complementary URL tweets with subjective opinions (e.g. include I, me, my…) I'm ready for hurricane Irene I hope New York and New Jersey are ok when the hurricane hits informal tweets containing slang words Precision: 96.59%
Initializing Ranking Scores Initializing Tweet and Web Document Scores TextRank based on content similarity (cosine & tf.idf) Initializing User Credibility Scores TextRank based on retweet/reply/user mention networks Bayesian Ranking approach to consider user and tweet networks simultaneously (Wang et al 2011) Rank(x): the increase of posterior probability that a user is credible, normalized by prior probability : the percentage of true claims : the percentage of credible users : the explicit tweet-user networks
Constructing Heterogeneous Networks Tweet-User Networks Explicit tweet-user relations are sparse Infer implicit tweet-user relations. -U 1 posts T 1, if sim(T 1,T 2 ) exceeds an threshold, an direct edge is created for U 1 and T 2. Web-Tweet Networks T i is aligned with relevant web document D j if they are on the similar topic (cosine & tf.idf)
S0(d) S0(t) S0(u) Implicit links between tweets and web documents: Wtd Wdt Explicit and implicit links between tweets and users: Wtu Wut Tri-HITS: preliminaries D1D1 D2D2 U1U1 U2U2 T1T1 T2T2 T3T3 Heterogeneous Networks Initial ranking scores 0.45 0.8 0.1 1.0 0.5 Similarity matrix W dt Transition matrix P dt
Propagated Score Initial ScoreUpdated Score Propagation from tweets to web documents Tri-HITS: based on the similarity matrix Co-HITS: based on transition matrix (Deng et al 2009) Differences between Tri-HITS and Co-HITS: Tri-HITS: normalize the propagated ranking scores based on original similarity matrix Co-HITS propagates normalized ranking scores using the transition matrix
Tri-HITS (con’t) Propagation from tweets to users Propagation from web documents and users to tweets Set to 0 will only consider tweet-user networks Set to 0 will only consider web-tweet networks
An Example over bipartite graph D1D1 D2D2 T1T1 T2T2 T3T3 0.45 0.8 0.1 W td P dt Propagated scores in first iteration of Tri-HITS: 0.2 0.3 0.5 0.6 0.4 Propagated scores in first iteration of Co-HITS: P td Choose =0.5, the final ranking of tweets Tri-HITS: (0.276, 0.463, 0.261) Co-HITS: (0.202, 0.331, 0.467) Co-HITS: Weaken or damage original meaning of semantic similarity
Overall Performance nDCG@ top n ranked tweets Non-informative tweets filtering is important for informal information from social media. Evidence from multi- genre networks improves TextRank significantly Knowledge transferred from the Web and the Inferred Implicit Social Networks dramatically boosted quality
Remaining Error Analysis Topically-relevant tweet identification Hurricane Kitty: http://t.co/cdIexE3http://t.co/cdIexE3 Non-informative tweet identification by performing deeper linguistic analysis and rumor/sarcasm detection Hurricane names hurricane names http://t.co/iisc7UY ;) My favorite parts of Hurricane coverage is when the weathercasters stand in those 100 MPH winds right on the beach. Good stuff. Deep semantic analysis to improve inferring implicit linkages “MTA closed” = “Subway shut down” Subjectivity Detection with opinion mining Damn earthquake & hurricane in the same week… = Worst week to live on the East Coast
60 Outline Summarization Recap Cross-genre Information Linking Cross-genre Summarization
61 Tweet Ranking (Liu et al., 2012) Modifying Weights in TextRank Retw: a tweet is more important if it has been re-tweeted more times Foll: a tweet is more important if it is published by an account with more followers Readability: sentence length, word length, OOV They also considered user diversity
62 Another Similar Work by Yan et al., 2012 Tweet Graph: Popularity Personalization (user’s topic preference) User Graph #tweets Co-Ranking
63 Discussion No Solid Work on News Summarization using Newsworthy tweets yet How to Proceed? Look at Jesse’s data and Results