Presentation is loading. Please wait.

Presentation is loading. Please wait.

Project Deliverable-1 -Prof. Vincent Ng -Girish Ramachandran -Chen Chen -Jitendra Mohanty.

Similar presentations


Presentation on theme: "Project Deliverable-1 -Prof. Vincent Ng -Girish Ramachandran -Chen Chen -Jitendra Mohanty."— Presentation transcript:

1 Project Deliverable-1 -Prof. Vincent Ng -Girish Ramachandran -Chen Chen -Jitendra Mohanty

2 Agenda Pre-processing of tweets Research literatures studied and motivation Next 2-weeks Plans

3 Pre-processing Tasks Completed: Parsed all the files provided by Raytheon and extracted tweets of ~18GB. Tweets doesn’t have meta-data associated with it for time being. Tweets containing non-ascii characters and new-line characters are discarded. –POS tagger stopped processing the tweets containing above characters. Tasks to be addressed: Approximately 2 weeks to POS tag, Chunking and NER all the tweets that we have currently at our disposal.

4 Research Literatures Studied Several research literatures have been studied to get an idea of the prior work in this field. –Sentiment Analysis –Opinion-Target pairs –Latent user attributes –Event Detection –POS and NER for twitter data-set –Domain Adaptation Reference to all the research literatures can be found on wiki maintained by our team.

5 Motivation behind studying research literatures Sentiment Analysis provides background to examine sentiment of a person on a topic, an abstract or a discussion etc. –Classifying the polarity of a given text at the document, sentence, or feature/aspect level. –Generally, sentiments means positive, negative, or neutral. –This could be extended to emotional states of a person such as angry, sad or happy. Latent user attributes –For our project, we need to construct profile. –Profile associated with meta-data. Name, Profile Id, Tweet Id, location (geo-stationary or profile creation) etc. –Some meta-data are not available as part of tweets meta-data. Gender, age, political orientation, region

6 Motivation behind studying research literatures contd… Event Detection –Event is basically an observable phenomena or occurrence. Ex. Earthquake, war, flood –People have different opinion. –Zero-in on an event and start analyzing the sentiment of a person over a definite period during that effect of the event. POS and NER for twitter data-set (continuing…) –Existing tool (such as Alan Ritter’s POS tagging for twitter) is currently being used for part-of-speech tagging and named-entity recognition. –This will be used as feature in our learning algorithm. Domain Adaptation –How the model behaves in a different data-set.

7 Next 2-weeks plans Complete POS tagging and NER in next 2-3 weeks using existing tool. Annotating tweets. Identifying the domains/issues that we will be concentrating on and finding the active users in the domains/issues. –Key words to be used to search domains/issues. –Group the tweets with respect to domains –Find the active users in each domain.

8 Difficulties Faced Feature selection POS tagging and NER Removing non-ascii characters


Download ppt "Project Deliverable-1 -Prof. Vincent Ng -Girish Ramachandran -Chen Chen -Jitendra Mohanty."

Similar presentations


Ads by Google