2 Outline Introduction Single Document Summarization Multiple Document SummarizationApplicationEvaluationConclusion
3 Introduction What is Summary? Text produced from one or more texts Conveys important information in the original texts, and that is no longer than half of the original texts.3 important aspects of summary are:Summaries should be shortSummaries should preserve important informationSummaries may be produced from single/multiple documents
4 Common terms in summarization dialect ExtractionProcedure of identifying important sections of text and producing verbatimAbstractionAim to produce material in a new wayFusionCombining extracted parts coherentlyCompressionAims at throwing out unimportant sections of text
5 Single Document Summarization Early WorksMachine Learning MethodsNaïve-Bayes MethodsRich Features and Decision TreesDeep Natural Language Analysis MethodsLexical ChainingRhetorical Structure Theory (RST)
6 Early Works Luhn, 1958 Edmundson, 1969 Summarization based on measuring significance of words depending on its frequencyDeriving significance factor of sentence, based on number of significance words in that sentenceEdmundson, 1969Word frequency and positional importance were incorporatedPresence of cue words, and skeleton of the document were also incorporated
7 Naïve Bayes MethodClassifier based on applying Bayes theorem with strong independence assumptions-particular sentenceS-set of sentences that make up the summaryF1…, Fk -the featuresAssuming independence of features:P(s ε S | F1,F2….Fk)= 𝑖=1 𝑘 P(Fi |s ∈ S). P(s ∈ S) 𝑖=1 𝑘 P(Fi )Evaluation is done by analyzing its match with the human extracted document summary
8 Naïve Bayes Method Term frequency-inverse document frequency Increases proportionally to the number of times a word appears in the documentoffset by the frequency of the word in the corpusTakes into account that certain words are more common than others. For e.g.. “the”, “is” etc.Idf(t,D)= log |𝐷| | 𝑑∈𝐷:𝑡∈𝑑 ||D|: total number of documents in the corpus| 𝑑∈𝐷:𝑡∈𝑑 |: number of documents where the term t appears i.e. tf(t,d) ≠ 0
9 Rich Features and Decision Trees Weighing sentences based on their positionArises from the idea that texts generally follow a predictable discourse structureSentence position yield was calculated against the topic keywords laterSentence position were then ranked by average yield to produce Optimal Position Policy for topic positions for the genreLater, sentence extraction problem was modeled using decision treesassumption that features are independent broke away
10 Deep Natural Language Analysis Methods Techniques aimed at modeling the text’s discourse structureUse of heuristics to create document extractsLexical Chainingindependent of the grammatical structure of the textlist of words that captures a portion of the cohesive structure of the textsequence of related words in the text, spanning short or long distancestechnique used to identify the central theme of a document
11 Forms of Cohesion Ellipsis Substitution Words are omitted when the phrase needs to be repeatedExample:A: Where are you going?B: To town.SubstitutionWord is not omitted but replaced by anotherA: Which ice-cream would you like?B: I would like the pink one.
12 Forms of Cohesion Conjunction Repetition Reference Relationship between two clausesFew of them are: “and”, “then”, “however” etc.RepetitionMentioning of the same word againReferenceAnaphoric referenceRefers to someone/something that has been previously identifiedCataphoric referenceForward referencing . Example: Here he comes….It’s Brad Pitt
13 Lexical chainingExample:- John had mud pie for dessert. Mud pie is made of chocolate. John really enjoyed it.Steps involved in lexical chaining:a) Selecting a set of candidate words.b) For each candidate word, finding an appropriate chain relying on a relatedness criterion among members of the chainc) If it is found, inserting the word in the chain and updating it accordingly
14 Lexical Chaining relatedness measure-Wordnet Distance. Weights assigned to chains based on their length and homogeneityDetermining the strength of a lexical chain by taking in consideration the distribution of elements in the chain throughout the textCorresponds to the significance of the textual context it embodies.Provides a basis identifying the topical units in a document which are of great importance in document summarization.
15 Rhetorical Structure Theory(RST) two non-overlapping pieces of text spans: the nucleus and the satelliteNuclei expresses what is more essential to the writer's purpose than the satelliteExample: claim followed by evidence for the claim. RST posits an "Evidence" relation between the two spans.claim is more essential to the text than the particular evidenceclaim span a nucleus and the evidence span a satelliteNucleus is independent of the satellite but not vice versa
17 Multiple Document Summarization Need and EncouragementExtraction of single summary from multiple documents started in mid 1990sMost of the application in news articleGoogle news (news.google.com)Columbia news blaster (newsblaster.cs.columbia.edu)News in Essence (NewsInEssence.com)Multiple source of information which are :-supplementary to each otheroverlapping in contenteven contradictory at time
18 Early Work Extended template driven message understanding system Abstractive System, rely heavily on internal NLP toolsEarlier considered as knowledge ofLanguage InterpretationGenerationExtractive Techniques have been applied - Similarity measures between sentencesidentify common theme through clustering - select one sentence to represent each clustergenerate composite sentence from each clusterSummarization differs on what the final goal isMEAD : works based on extraction techniques on general domainsSUMMONS : build a briefing highlighting difference and updates on news report
19 Abstractions and Information Fusion SUMMONS is the first example of multi-document summarizationConsiders event about a narrow domainnews articles about terrorismIt produces a briefing merging relevant information about event and their evolution over timeIt reads a database built by template based message understanding systemConcatenation of two systems : Content Planner and Linguistic Generator
20 SUMMONS - processing the text (Content Planner) Content Planner : selects information to include in summary through combination of input templatesIt uses summary operators - set of heuristics that perform operations like :change of perspective, contradiction, refinementLinguistic Generator :selects the right words to express the information in grammatical and coherent text.Uses connective phrases to synthesize summary, adapting language generation tools like FUF/SURGE
21 Theme based approach - McKeown et al., Barzilay et al. Themes - set of similar text units (Paragraphs) - Clustering ProblemText is mapped to vector of features including single words weighted by their TF-IDF scores, noun, pronoun, semantic classes of verbsFor each pair of paragraphs a vector is computed which represents matches on different features.Decision rules learnt from data classify each pair as similar or dissimilar. An algorithm then places the most related paragraphs in same themeInformation Fusion - which sentences of the theme should be included in the final summary.
22 Information FusionAlgorithm - compares and intersects predicate argument structures of the phrases within each theme to find which are repeated often enough to be included in summarySentenced are parsed using Collins' statistical parser converted into dependency tree – captures predicate- argument structure, identify functional roles.Comparison algorithm traverses the tree recursively, adding identical nodes to output tree.Once full phrase are found, they are marked to be included in summary.Once summary content is decided, a grammatical text is generated using FUF/SURGE language generating system.
23 “McVeigh, 27,was charged with the bombing” Decision Tree“McVeigh, 27,was charged with the bombing”
24 Topic-Driven Summarization MMR - Maximal Marginal Relevance introduced by Carbonell and GoldsteinRewards relevant sentences and penalizes redundant ones by considering a linear combination of two similarity measures.Q - query or user profile, R - Ranked list of documents, S - already selected documents .Select a document one at a time and add them to S.For each document in Di in R\S, MR(Di) = a * Sim1(Di,Q) - (1-a) * max Di in S Sim2(Di,Dj), where a lies in [0,1]Document getting maximum MR(Di) is selected until maximum number is reached or threshold is reached,a controls the relative importance between relevance and redundancy.Sim1 and Sim2 are similarity measures ( cosine similarity measure )
25 Graph Spreading Activation Content is denoted as entities and relations as nodes and edges of a graph. Rather than extracting sentences, they detect salient regions of the graph. Topic Driven : topic is denoted by entry nodes in graph.Graph :Each node is single occurrence of word. Different kind of links – Adjacency links, Same links, Alpha Links and Phrase links, Name and Coref Links
26 Graph Spreading Activation Topic nodes are identified through stem comparison and marked as entry node. Spreading activation: search for semantically related text is propagated from these to other nodes of the graph. Weight of neighboring node depends on node links traveled and is exponentially decaying function of the distance. Pair of document graph: identify common nodes and difference nodes. Highlight sentences having higher common and different scores. User is able to specify the maximal number to control the output.
27 Centroid-based Summarization It does not use any language generation module. Easily scalable and domain-independentTopic Detection - Group together news articles that describe the same event.An agglomerative clustering algorithm is used, operates on TF- IDF vector representations, successively adding documents to clusters and re computing the centroids according to 𝑐𝑗= 𝑑 ∈ 𝐶𝑗 𝑑~ 𝐶𝑗cj is the centroid of the j-th cluster, Cj the set of documents that belong to that clusterCentroids can thus be considered as pseudo-documents that include those words whose TF-IDF scores are above a threshold in their cluster.
28 Centroid-based Summarization Second Stage - Identify sentences that are central to topic of the entire cluster.Two metrics similar to MMR(but not query dependent) are defined by Radev et al., 2000Cluster-based relative utility (CBRU) - how relevant a particular sentence to general topic of clusterCross-sentence Informational subsumption (CSIS) - measure of redundancy among sentencesGiven a cluster segmented into n sentences, and compression rate R, we select nR sentences in order of appearance in chronologically arranged documentsAddition of the three scores minus redundancy penalty(Rs) for sentence that overlaps highly ranked sentence is the final score for each sentenceCentroid Value (Ci) sum of centroid values of all the words in sentencePositional Value(Pi) makes leading sentences more importantFirst sentence Overlap (Fi) - inner product of word occurrence vector of sentence I and that of 1st sentence of document
29 Application Google News: Ultimate research Assistant: news aggregator, selecting most up-to-date(within the past 30 days) information from thousands of publications by an automatic aggregation algorithmDifferent versions available for more than 60 regions in 28 languagesUltimate research Assistant:performs text mining on Internet search resultsmake it easier for the user to perform online research by organizing the output.Type name of a topic and it will search the web for highly relevant resources, and organize the search results
30 Application Shablast iResearch Reporter – Universal search engine Produces multi-document summaries from the top 50 results returned by Microsoft's Bing search engine for a set of keywords.iResearch Reporter –Commercial Text Extraction and Text Summarization systemProduces categorized, easily-readable natural language summary reports covering multiple documents retrieved by entering user query in google search engine
32 Evaluation A difficult task Absence of a standard human or automatic evaluation metricmakes difficult to compare different systems and establish a baselineManual evaluation not feasibleNeed for an evaluation metric having high correlation with human scoreshuman and automatic evaluation:Comparison of automatic generated summaries with manually written "ideal" summaries decomposition of text into sentencesRating between 1-4 to system unit(SU) which shares content with Model unit(MU) corresponding to ideal summaries
33 Evaluation ROUGE Information-theoretic Evaluation of Summaries based only on content overlapcan determine if the same general concepts are discussed between an automatic summary and a reference summarycannot determine if the result is coherent or the sentences flow together in a sensible mannerBetter in case of single document summarizationInformation-theoretic Evaluation of SummariesCentral idea is to use a divergence measure between a pair of probability distributionsFirst distribution is derived from automatic summarySecond from a set of reference summariesSuits both the single document and multi document summarization scenarios
34 ConclusionNeed to develop efficient and accurate summarization systems due to enormous rate of information growthStill a lot of research going on this field especially in evaluation techniquesMulti document summarization is more in use as compared to single-document summarizationExtractive techniques are employed usually rather than abstractive techniques as they are easy to employ and have produced satisfactory results
35 ReferencesA survey on Automatic Summarization – Dipanjan Das and Andre F.T. Martins (http://www.cs.cmu.edu/~afm/Home_files/Das_Marti ns_survey_summarization.pdf)WikipediaRelevance of cluster size in MMR Based summarizer (http://www.cs.cmu.edu/~madhavi/publications/Gan apathiraju_11-742Report.pdf)