Presentation is loading. Please wait.

Presentation is loading. Please wait.

Text Annotation By: Harika kode Bala S Divakaruni.

Similar presentations


Presentation on theme: "Text Annotation By: Harika kode Bala S Divakaruni."— Presentation transcript:

1 Text Annotation By: Harika kode Bala S Divakaruni

2 Goal We developed this tool for Text Annotation for various text categories. Author Identification. Language Identification. To analyze the differences between two documents. To classify oral conversations and chat conversations.

3 Classification tasks Assign the correct class label for a given input/object In basic classification tasks, each input is considered in isolation from all other inputs, and the set of labels is defined in advance. Problem Object Label’s categories Conversation Type Document oral/chat Author identification DocumentAuthors Text categorization Document Topics/classes Sentiment classification Document Positive/negative

4 Structure Each text is identified by its features, the feature set looks like: FREQ OF POS, POS of the first word, POS of the last word, number of Words, average number of words, number of sentences, average number of sentences. # A total of 45 features. Can be expanded: A Feature Selector can be added: Classifier Model : Support Vector Machine

5 Feature set What Features to identify the text? What type of context? What type of problem? Analyze the text. Adding function words, Checking if more common words are present. % of words of different POS, # of sentences per turn, # of turns per dialogue, what POS is first word of sentence, etc. Data Format: (Type of conversation)(Seq number) - (Person Communicating) - (Turn of the person) - (Sentence of the turn): (Text)

6 Results --------------------------------------------------------------------- K/F (taking each sentence into consideration) & testing with all '10' in the turns: Accuracy = 94.44444444444444% (51/54) (classification) --------------------------------------------------------------------- K/F classification each turn.. and like '10' in turns: *Accuracy = 87.5609756097561% (359/410) (classification) --------------------------------------------------------------------- K/F classification each turn.. and 10,20,30,40,50 in turns: Accuracy = 94.40298507462687% (253/268) (classification) --------------------------------------------------------------------- K/F classification each turn.. and 10,20,30,40,50 in turns removed pos from features set: Accuracy = 92.53731343283582% (496/536) (classification) --------------------------------------------------------------------- tu1/tu2 classification each turn.. and 10,20,30,40,50, 11,21,31,41,51 in turns removed pos from features set: Accuracy = 62.68656716417911% (168/268) (classification) Here K/F refers to Chat and Oral conversations. And tu1 and tu2 refers to two tutors: ‘michael’ and ‘rovick’ Future enhancements: 1. A wider scope of the features could be considered, both using semantic and syntactic grammar. 2. Fully automate the process of annotation of labels. 3. Integrating SVM with the tool. 4. Include a more efficient Feature Selection Process. 5. Include function words 6. GUI QUESTIONS: ?


Download ppt "Text Annotation By: Harika kode Bala S Divakaruni."

Similar presentations


Ads by Google