Vincent Fiore, Ange Assoumou, Debarshi Dutta, Kenneth Almodovar

Vincent Fiore, Ange Assoumou, Debarshi Dutta, Kenneth Almodovar
The Correlation between the Topic and Emotion of Tweets through Machine Learning Vincent Fiore, Ange Assoumou, Debarshi Dutta, Kenneth Almodovar

Our Project Classifying Tweets based on topic and emotion. Happy Sad
Angry Religion Politics Family

Purpose Searching for a correlation between a topic and an emotion.
Using machine learning to classify Tweets, then plotting these data points against each other.

Background Many papers have been written on similar topics
We used these to get an idea of where to start our research. Performed a literature review in each of these topics to develop a better background. Ideas on preprocessing Tweets. Using word lists. Edges cases to keep in mind and avoid.

Methodology Manually categorizing Tweets. Creation of large word lists
Used to train the ML algorithm Creation of large word lists Used to create the feature set to train the algorithm with This is the raw data that is fed into the algorithm Turns Tweets into number Confirming the training has worked Classifying Tweets beyond the original data set Plotting these new classifications to search for correlations

Manual Classification and Word Lists
Incredibly difficult process due to the nature of the topics. Emotion is easy, but topics can be vague. Politics, for example, changed wildly since this data set was created in 2009. Word lists are also challenging Need to balance amount of words on list with possible occurrences. In one case, one emotion’s list was much larger than the others, so almost every Tweet was preliminarily marked as that emotion. This had to be fixed.

Implementation Using the manual classification, we created 6 different lists in the following categories: Politics Religion Family Anger Happy Sadness Each of these contains hundreds of words.

Implementation (continued)
Figure 1.2 The figure shows an example of The Word Lists Our biggest list contained more than 600 words.

Implementation (continued)
Figure 1.3 Display the rate of words per tweet

Results Each classifier, on their own, scored roughly 90% accuracy on the test data set This meant that, when compared to our pre-classified tweets, the algorithm was roughly three times more accurate than random guessing. (33%, IE: randomly choosing one of the three categories)

Results Classification of Donald Trump’s tweets
Classification of Pope Francis’s tweets

Results Clear trends for individual users For the most part:
President was political and either sad or angry Religious leaders were religious and happy The Dalai Lama The Dalai Lama, for example, was overwhelmingly positive

Results Need for further study
Classifier worked best with users whose tweets fell into a clear category. Left: Nancy Pelosi’s tweets as classified. They are usually political, but show a large number of religious tweets.

Results Religious tweets are over represented
For tweets that are not classified as political, family, or religious, the classifier tended to categorize these as religious. This means that certain users’ tweets are overly classified as religious. Due to the nature of machine learning, tracking down this issue proved too problematic.

Results Tweets from Dr. Phil, left, illustrate the issue of overclassification of religion. However, the emotion behind the tweet is still correct.

Conclusion It proved difficult to extrapolate the results for a universal data set Most users tend to tweet in their own specific style. For the most part, this doesn’t transcend all users and topics. There are certain topics that do show a correlation, like religion and sadness, but for the most part, emotion and topic do not show an obvious link. The link only exists for users and their own style of tweeting. IE: Politicians speak about politics and religious leaders about religion. Emotion is even more specific in its relation to each individual.

Further study Needs to focus on gathering much larger samples of tweets. This will help weed out any possible trends between the two subjects in Twitter at large. Also needs further development time of the classifier On its own, each classifier was scoring roughly 90% accuracy, but this dropped once combined. The error level was multiplied, causing errors like the religion issue that was encountered.

Vincent Fiore, Ange Assoumou, Debarshi Dutta, Kenneth Almodovar

Similar presentations

Presentation on theme: "Vincent Fiore, Ange Assoumou, Debarshi Dutta, Kenneth Almodovar"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Vincent Fiore, Ange Assoumou, Debarshi Dutta, Kenneth Almodovar

Similar presentations

Presentation on theme: "Vincent Fiore, Ange Assoumou, Debarshi Dutta, Kenneth Almodovar"— Presentation transcript:

Similar presentations

About project

Feedback