Text Classification of USENET messages for a Conversation Visualisation System Final Year Project Final Presentation Jolyon Hunter

Text Classification of USENET messages for a Conversation Visualisation System Final Year Project Final Presentation Jolyon Hunter cs91jh@surrey.ac.uk www.jrth.co.uk Tuesday 6 th May 2003

Jolyon Hunter cs91jh 6 th May 2003 TEXT CLASSIFICATION OF USENET MESSAGES FOR A CONVERSATION VISUALISATION SYSTEM Introduction Aim “To investigate how messages and conversations on USENET newsgroups can be classified automatically as part of a system to visually represent online discussions.” Objectives To review systems which visualise online discussions - enabling the identification of phenomena to be visualised To analyse 250,000+ word corpus of text – try to identify potential cues for classification To specify and design a system for automatic classification of messages/conversations To implement, test and evaluate this system

Jolyon Hunter cs91jh 6 th May 2003 TEXT CLASSIFICATION OF USENET MESSAGES FOR A CONVERSATION VISUALISATION SYSTEM Conversation Visualisation Systems? For example… Others include: “ Loom” (Donath et al), “Netscan” (Smith) and “Conversation Map” (Sack), and “CodeZebra” (Diamond et al) Xiong, Rebecca & Donath, Judith 1999 “PeopleGarden: Creating Data Portraits for Users” MIT Media Laboratory http://smg.media.mit.edu/~becca/ “PeopleGarden”

Jolyon Hunter cs91jh 6 th May 2003 TEXT CLASSIFICATION OF USENET MESSAGES FOR A CONVERSATION VISUALISATION SYSTEM Phenomena to Visualise… …and how to do it! Emotion (“Happy”, “Sad”) Agreement/Disagreement (“Argument”) Involvement – Sense of Community Character traits of users and many more… How to Classify? Automated Text Analysis “Smokey” (Spertus) “WebSOM” (Kohonen) “CLUTO” (Karypis)

Jolyon Hunter cs91jh 6 th May 2003 TEXT CLASSIFICATION OF USENET MESSAGES FOR A CONVERSATION VISUALISATION SYSTEM Analysis Overview HOW? Initial Observations – phenomena + features In-depth corpus analysis WHAT? 6000+ messages from various newsgroups (4 million+ words) Uni S /CodeZebra Workshop – features (words) Using System Quirk to extract words; frequency counting (Kontext) >> Relative Frequencies Using gCLUTO to visualise data for interpretation WHY? Formulate programmable rules to code into a system

Jolyon Hunter cs91jh 6 th May 2003 TEXT CLASSIFICATION OF USENET MESSAGES FOR A CONVERSATION VISUALISATION SYSTEM gCLUTO Visualisations Visualise clusters and the relationships between clusters Possible to see patterns or heuristics to help derive rules CLUTO has potential for future use within a system to automatically classify text - e.g. real-time clustering

Jolyon Hunter cs91jh 6 th May 2003 TEXT CLASSIFICATION OF USENET MESSAGES FOR A CONVERSATION VISUALISATION SYSTEM Analysis: Creating Rules Possible to derive example rules from analysis More analysis – random sample using 6 classes: Similar patterns emerge Example rules also >>> SYSTEM!

Jolyon Hunter cs91jh 6 th May 2003 TEXT CLASSIFICATION OF USENET MESSAGES FOR A CONVERSATION VISUALISATION SYSTEM System Development Process Model of Software Engineering: Requirements, Design, Implementation, Testing and Evaluation “System”: System Quirk > Rules > Program > CLASSIFICATION Rule-Based Processor: IF..THEN.. Rules coded into Perl program to produce classifications

Jolyon Hunter cs91jh 6 th May 2003 TEXT CLASSIFICATION OF USENET MESSAGES FOR A CONVERSATION VISUALISATION SYSTEM Generic Conversation Visualisation System

Jolyon Hunter cs91jh 6 th May 2003 TEXT CLASSIFICATION OF USENET MESSAGES FOR A CONVERSATION VISUALISATION SYSTEM “Message Text Analysis” Module

Jolyon Hunter cs91jh 6 th May 2003 TEXT CLASSIFICATION OF USENET MESSAGES FOR A CONVERSATION VISUALISATION SYSTEM Perl Code: Key points IF…THEN… RULES (as seen earlier) CLASS COUNTER: if(($word eq "agree") && ($relative{$word} > 0.003)) { $AGREEMENT++; } CLASSIFICATIONS… if ($AGREEMENT >= 2){ $classification = "AGREEMENT"; } if ($ARGUMENT >= 2) { $classification = "ARGUMENT"; }

Jolyon Hunter cs91jh 6 th May 2003 TEXT CLASSIFICATION OF USENET MESSAGES FOR A CONVERSATION VISUALISATION SYSTEM Testing & Evaluation Ten sample messages either “Agreement” or “Disagreement” Small sample Key excerpts given to human testers (ten people) – asked to rate System vs. Humans! System correct 3 times, most inconclusive Human responses correlate with system, but ambiguities also exist Conclusions? Results not conclusive but show promise > Larger sample; more research;

Jolyon Hunter cs91jh 6 th May 2003 TEXT CLASSIFICATION OF USENET MESSAGES FOR A CONVERSATION VISUALISATION SYSTEM Recap: Mission Accomplished? Aim “To investigate how messages and conversations on USENET newsgroups can be classified automatically as part of a system to visually represent online discussions.” Objectives To review systems which visualise online discussions - enabling the identification of phenomena to be visualised To analyse 250,000+ word corpus of text – try to identify potential cues for classification To specify and design a system for automatic classification of messages/conversations To implement, test and evaluate this system

Jolyon Hunter cs91jh 6 th May 2003 TEXT CLASSIFICATION OF USENET MESSAGES FOR A CONVERSATION VISUALISATION SYSTEM Text Classification of USENET messages for a Conversation Visualisation System Thanks for listening… Any Questions?

Jolyon Hunter cs91jh 6 th May 2003 TEXT CLASSIFICATION OF USENET MESSAGES FOR A CONVERSATION VISUALISATION SYSTEM Final Report The Final Report for this project is also available online at:www.jrth.co.uk

Jolyon Hunter cs91jh 6 th May 2003 TEXT CLASSIFICATION OF USENET MESSAGES FOR A CONVERSATION VISUALISATION SYSTEM REFERENCES “Loom" Judith Donath Donath, Judith 2002 “A Semantic Approach to Visualising Online Conversation” Communications of the ACM 45(4): 45-49 http://web.media.mit.edu/~kkarahal/loom/index.html “Conversation Map” Warren Sack Sack, Warren 2000 “Design for Very Large-Scale Conversations” Ph.D. Thesis, February 2000, MIT Media Laboratory http://www.sims.berkeley.edu/~sack/cm/ “Netscan” Marc Smith Smith, Marc. 2001. “Netscan: A tool for measuring and mapping social cyberspaces.” http://netscan.research.microsoft.com “PeopleGarden” Rebecca Xiong & Judith Donath Xiong, Rebecca & Donath, Judith 1999 “PeopleGarden: Creating Data Portraits for Users” MIT Media Laboratory http://smg.media.mit.edu/~becca/ “CodeZebra” Sara Diamond Diamond, Sara (Project Leader) - Banff New Media Institute, Canada plus many others (inc. Dr. A. Salway, University of Surrey) http://www.codezebra.net “Smokey” Ellen Spertus Spertus, Ellen 1997 "Smokey: Automatic Recognition of Hostile Messages,“ Innovative Applications of Artificial Intelligence ‘97 http://www.spertus.com/ellen/ “WebSOM” Teuvo Kohonen Kohonen, T. 1996 onwards: more details at http://websom.hut.fi/websom/ “CLUTO” George Karypis Karypis, George - 2002 - “CLUTO”, “gCLUTO” and “wCLUTO” University of Minnesota, MN USA Software available from http://www-users.cs.umn.edu/~karypis/cluto/

Text Classification of USENET messages for a Conversation Visualisation System Final Year Project Final Presentation Jolyon Hunter

Similar presentations

Presentation on theme: "Text Classification of USENET messages for a Conversation Visualisation System Final Year Project Final Presentation Jolyon Hunter"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Text Classification of USENET messages for a Conversation Visualisation System Final Year Project Final Presentation Jolyon Hunter

Similar presentations

Presentation on theme: "Text Classification of USENET messages for a Conversation Visualisation System Final Year Project Final Presentation Jolyon Hunter"— Presentation transcript:

Similar presentations

About project

Feedback