Blog Summarization We have built a blog summarization system to assist people in getting opinions from the blogs. After identifying topic-relevant sentences, our system takes a further step to pick opinionated sentences to form a summary. Unsupervised iterative training is implemented to identify opinions. Evaluation shows the sentence level accuracy of our Opinion Identification Module is 79.7%. The document level accuracy is 71.8%, which outperforms an existing sentiment analysis system by 2.8%. Student: Wang Xuan Supervisor: Kan Min-Yen Seven students Twelve blogs on “abortion” One hour time 100 words summary on “What are people’s opinions towards abortion?” 1.Read the blogs to gain an understanding of their contents. 2.Identify the relevant information to the given question for each blog. 3.Use subjective information and discard information that expresses facts. 4.Group the information into categories. 5.Extract and organize the information into well-formed sentences. 6.Combine these sentences to a paragraph. True PositiveTrue Negative Predicted PositiveAB Predicted NegativeCD UnpredictedEF Enlightened by the human behavior experiment, we built a three-stage blog summarization system. We employ Opinion Identification Module to extract opinionated sentences in blog articles, to fit blogs’ characteristic. We use unsupervised iterative training process in Opinion Identification Module. More opinionated evidence are added through the iterative process. The evaluation of the Opinion Identification Module achieved an accuracy of 79.7% at the sentence level. Future work can be done on investigate in the relationship between zones. Summarization Approach: Features: location, thematic, fixed phrases, add term Similarity of two text units, Distance between text units, Semantic relationships among words Document format, Topics structure, Rhetorical structure of the text. Sentiment Analysis Identify prior subjectivity and sentiments Identify subjective language and its contextual polarity Subjective and sentiment analysis in NLP application Abstract Literature Review Human Summary Survey Conclusion and Future Work Since our main contribution is Opinion Identification Module, we evaluate the performance of that module only. Common Behaviors Yes Seed List Polarity Identifier Identify New Seed Word POS Tagger Sentences Tagged Sentence More seed words? Terminate No Seed Words Seed Words Tagged Sentence New Seed Words Topic Relevance Module Opinion Identification Module Blog Articles On Topic Sentences Opinion Sentences Summary Opinion Query Summarization Module Split sentence into zones Sentence: I grew up with all women, and happen to think I hate to generalize, but must they are smarter than men. Zone1: I grew up with all women Zone2: and happen to think I hate to generalize Zone3: but must they are smarter than men. Polarity Identifier Match of seed word and part of speech Negation word Identify New Seed word Influenced by existing seed words based on its part of speech Zone: We do not have the advantage of seeing that. Polarity: negative (advantage: positive, not: negation) Zone: one of the very best movies ever made about the life of movie making Potential POS_Seed: movies (best: positive, movie: noun) Significance of co-existence If (difference>1), score=F p /(F p +F n ) If (difference <-1), score=-F n /(F p +F n ) System DesignEvaluation Influence of negation word in zone Whole zone Three-word-window Word to be added to potential seed list All part of speech Only noun, adverb, adjective Influence of seed word in zone Whole zone Six words window for all words Three words after adjective, adverb. Three words before noun. PrecRecallF1AccTie Sentence Level Sentiment Analysis Short seed list Positive73.0%83.7%78.0% 79.7%96.1% Negative86.2%76.7%81.2% Comprehensive seed list Positive65.5%74.3%69.6% 65.9%43.2% Negative66.6%56.7%61.3% Document Level Sentiment Analysis Short seed list Positive71.8%64.0%67.7% 71.8%36.8% Negative71.7%78.4%74.9% Comprehensive seed list Positive63.2%83.3%71.9% 68.6%0.3% Negative78.0%54.8%64.4% PrecRecallF1AccTie Negation word influence whole zone Positive63.1%79.8%70.5% 71.1%96.1% Negative80.7%64.5%71.7% Add words with all part of speech Positive68.3%10.1%17.5% 53.0%70.9% Negative51.8%95.4%67.1% Seed word influence whole zone Positive59.9%24.2%34.5% 58.9%81.7% Negative58.7%87.0%70.1% Seed word influence six word window Positive55.4%66.1%60.3% 59.5%78.3% Negative64.5%53.7%58.6% Important parameters