Presentation is loading. Please wait.

Presentation is loading. Please wait.

Tweets Discrimination Analysis

Similar presentations


Presentation on theme: "Tweets Discrimination Analysis"— Presentation transcript:

1 Tweets Discrimination Analysis
Shuhan Yuan

2 Discrimination Analysis
Discrimination is treatment or consideration of, or making a distinction in favor of or against, a person or thing based on the group, class, or category to which that person or thing is perceived to belong to rather than on individual merit. (Wikipedia) Tweets discrimination analysis aims to detect whether a tweet has discrimination against gender, race, age, etc.

3 Twitter API REST-based API (Representation State Transfer)
HTTP-Requests Authentication with OAuth Responses are available in JSON Streaming API Supports long-lived HTTP connection Real-time delivery of tweets API Rate Limits Search will be rate limited at 180 queries per 15 minutes

4 Twitter API https://apps.twitter.com
The authentication requires that you get an API key from the Twitter developers site. This just requires that you have a Twitter account. The four keys the site gives you are used as parameters in the programs. The OAuth authentication gives your program permission to make API calls.

5 Crawler (python+tweepy)
Search API Stream API

6 Twitter JSON

7 Why choose MongoDB Document Storage Super easy to install and config
Documents are stored in BSON (binary JSON) Any valid JSON can be easily imported and queried Schema-less; very flexible Super easy to install and config BSON is a binary serialization of JSON-like objects; This is extremely powerful, b/c it means mongo understands JSON natively

8 Document-oriented Think of “documents” as database records
Documents are basically just JSON objects that Mongo stores in binary Think of “collections” as database tables

9 Storing Tweets in MongoDB (python + pymongo)

10 Concept Mapping RDBMS MongoDB Table Collection Row Document JOIN
Embedded Document or Reference Queries return record(s) Queries return a cursor

11

12 Thinking in Documents Blogging Platform RDBMS (JOIN 5 tables)
MongoDB (Data Locality)

13 Thinking in Documents Blogging Platform RDBMS (JOIN 5 tables)
MongoDB (Data Locality)

14 Queries return “cursors” instead of collections
A cursor allows you to iterate through the result set Much more efficient than loading all objects into memory Find() function returns a cursor

15 Discrimination detecting
Assumption: Tweets containing some specific hashtags, like #discrimination, #sexism, #racism, have discrimination. Tweets containing #news hashtag do not have discrimination. Classifier tweets Discrimination or not Naïve Bayes Decision Tree Random Forest SVM Neural Network

16 One slide intro to machine learning
Training (BOW, TFIDF, LSA, EMBEDDINGS, etc…) Testing

17 Recurrent Neural Networks (RNN)
Recurrent Neural Networks (RNN) are capable of conditioning the model on all previous words in the corpus. The sequential information is preserved in the recurrent network’s hidden state, which manages to span many time steps as it cascades forward to affect the processing of each new words. RNNs take as their input not just the current word they see,  but also what they perceived one step back in time.  So recurrent networks have two sources of input, the present and the recent past. Can operate on sequential data of variable length An unrolled recurrent neural network.

18 The architecture 𝑦 Logistic Repression ℎ Mean pooling ℎ 0 ℎ 1 ℎ 2 ℎ 𝑛
RNN RNN RNN RNN 𝑤 1 𝑤 2 𝑤 3 𝑤 𝑛

19 The last step: coding Loading tweets from MongoDB
Preprocessing tweets (removing stop words, tokenize, …) Split the dataset into training dataset and test dataset Training the neural network on training dataset and test the accuracy on test dataset.

20 Resources https://dev.twitter.com/overview/documentation

21 Thank you


Download ppt "Tweets Discrimination Analysis"

Similar presentations


Ads by Google