Tweets Discrimination Analysis

Slides:



Advertisements
Similar presentations
Introduction to MongoDB
Advertisements

The Software Infrastructure for Electronic Commerce Databases and Data Mining Lecture 4: An Introduction To Data Mining (II) Johannes Gehrke
Overview of Twitter API Nathan Liu. Twitter API Essentials Twitter API is a Representational State Transfer(REST) style web services exposed over HTTP(S).
Ch. Eick: More on Machine Learning & Neural Networks Different Forms of Learning: –Learning agent receives feedback with respect to its actions (e.g. using.
CS 4100 Artificial Intelligence Prof. C. Hafner Class Notes March 27, 2012.
TI: An Efficient Indexing Mechanism for Real-Time Search on Tweets Chun Chen 1, Feng Li 2, Beng Chin Ooi 2, and Sai Wu 2 1 Zhejiang University, 2 National.
Content Management & Hashtag Recommendation IN P2P OSN By Keerthi Nelaturu.
1 Large-Scale Machine Learning at Twitter Jimmy Lin and Alek Kolcz Twitter, Inc. Presented by: Yishuang Geng and Kexin Liu.
Introduction to Backend James Kahng. Install Node.js.
Deep Belief Networks for Spam Filtering
1 Statistical Learning Introduction to Weka Michel Galley Artificial Intelligence class November 2, 2006.
Forecasting with Twitter data Presented by : Thusitha Chandrapala MARTA ARIAS, ARGIMIRO ARRATIA, and RAMON XURIGUERA.
What is MongoDB? Developed by 10gen It is a NoSQL database A document-oriented database It uses BSON format.
Large-Scale Cost-sensitive Online Social Network Profile Linkage.
A Social blog using MongoDB ITEC-810 Final Presentation Lucero Soria Supervisor: Dr. Jian Yang.
Online real-time tweets extraction, mapping and dissemination Xiannian Chen and Gregory Elmes West Virginia University Chen & West Virginia University2014.
CSE 185 Introduction to Computer Vision Pattern Recognition.
MongoDB An introduction. What is MongoDB? The name Mongo is derived from Humongous To say that MongoDB can handle a humongous amount of data Document.
Processing of large document collections Part 2 (Text categorization) Helena Ahonen-Myka Spring 2006.
INTRODUCTION TO MACHINE LEARNING. $1,000,000 Machine Learning  Learn models from data  Three main types of learning :  Supervised learning  Unsupervised.
COMP 410 & Sky.NET May 2 nd, What is COMP 410? Forming an independent company The customer The planning Learning teamwork.
WTT Workshop de Tendências Tecnológicas 2014
Goodbye rows and tables, hello documents and collections.
Python and REST Kevin Hibma. What is REST? Why REST? REST stands for Representational State Transfer. (It is sometimes spelled "ReST".) It relies on a.
Configuration Management (CM)
TEXT CLASSIFICATION USING MACHINE LEARNING Student: Hung Vo Course: CP-SC 881 Instructor: Professor Luo Feng Clemson University 04/27/2011.
WEEK 1, DAY 2 STEVE CHENOWETH CSSE DEPT CSSE 533 –INTRO TO MONGODB.
Introduction to Digital Libraries hussein suleman uct cs honours 2003.
What have we learned?. What is a database? An organized collection of related data.
Exploiting Wikipedia Categorization for Predicting Age and Gender of Blog Authors K Santosh Aditya Joshi Manish Gupta Vasudeva Varma
Introduction to MongoDB
USE RECIPE INGREDIENTS TO PREDICT THE CATEGORY OF CUISINE Group 7 – MEI, Yan & HUANG, Chenyu.
MongoDB - Overview - Doctrine ODM - Symfony2 with Doctrine ODM.
Metadata By N.Gopinath AP/CSE Metadata and it’s role in the lifecycle. The collection, maintenance, and deployment of metadata Metadata and tool integration.
Modeling MongoDB with Relational Model Proposed by Christopher Polanco.
MongoDB First Light. Mongo DB Basics Mongo is a document based NoSQL. –A document is just a JSON object. –A collection is just a (large) set of documents.
Text Categorization With Support Vector Machines: Learning With Many Relevant Features By Thornsten Joachims Presented By Meghneel Gore.
IBM Research ® © 2007 IBM Corporation A Brief Overview of Hadoop Eco-System.
An Introduction Student Name: Riaz Ahmad Program: MSIT( ) Subject: Data warehouse & Data Mining.
Distributed Pattern Recognition System, Web-based by Nadeem Ahmed.
Types of Discrimination Ms. Dombrow Eng. 9B. Discrimination  “to make a distinction in favor of or against a person or thing on the basis of the group,
Introduction to MongoDB. Database compared.
A Nonparametric Method for Early Detection of Trending Topics Zhang Advisor: Prof. Aravind Srinivasan.
CMPE 226 Database Systems May 3 Class Meeting Department of Computer Engineering San Jose State University Spring 2016 Instructor: Ron Mak
Neural Networks Lecture 4 out of 4. Practical Considerations Input Architecture Output.
Introduction to Data Mining Clustering & Classification Reference: Tan et al: Introduction to data mining. Some slides are adopted from Tan et al.
Power BI for Developers Rui Romano SQLSaturday.com
Machine Learning Usman Roshan Dept. of Computer Science NJIT.
9/24/2017 7:27 AM © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN.
Introduction to Mongo DB(NO SQL data Base)
Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.
A Simple Approach for Author Profiling in MapReduce
4/19/ :02 AM © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN.
Twitter Data Mining and Sentiment Analysis
RESTful Sevices Distributed Objects Presented by: Shivank Malik
Node.js Express Web Services
Dineesha Suraweera.
Data Mining 101 with Scikit-Learn
Twitter & NoSQL Integration with MVC4 Web API
New Mexico State University
Classifying enterprises by economic activity
A Comparative Study of Convolutional Neural Network Models with Rosenblatt’s Brain Model Abu Kamruzzaman, Atik Khatri , Milind Ikke, Damiano Mastrandrea,
Presented by: Prof. Ali Jaoua
Machine Learning with Weka
Overview of big data tools
CSE 482 Lecture 5: NoSQL.
实习生汇报 ——北邮 张安迪.
Presentation By: Eryk Helenowski PURE Mentor: Vincent Bindschaedler
Elena Mikhalkova, Nadezhda Ganzherli, Yuri Karyakin, Dmitriy Grigoryev
Informer 5 API How to get connected and start integrating
Presentation transcript:

Tweets Discrimination Analysis Shuhan Yuan sy005@email.uark.edu

Discrimination Analysis Discrimination is treatment or consideration of, or making a distinction in favor of or against, a person or thing based on the group, class, or category to which that person or thing is perceived to belong to rather than on individual merit. (Wikipedia) Tweets discrimination analysis aims to detect whether a tweet has discrimination against gender, race, age, etc.

Twitter API REST-based API (Representation State Transfer) HTTP-Requests Authentication with OAuth Responses are available in JSON Streaming API Supports long-lived HTTP connection Real-time delivery of tweets API Rate Limits Search will be rate limited at 180 queries per 15 minutes https://dev.twitter.com/overview/documentation

Twitter API https://apps.twitter.com The authentication requires that you get an API key from the Twitter developers site. This just requires that you have a Twitter account. The four keys the site gives you are used as parameters in the programs. The OAuth authentication gives your program permission to make API calls.

Crawler (python+tweepy) Search API Stream API

Twitter JSON

Why choose MongoDB Document Storage Super easy to install and config Documents are stored in BSON (binary JSON) Any valid JSON can be easily imported and queried Schema-less; very flexible Super easy to install and config BSON is a binary serialization of JSON-like objects; This is extremely powerful, b/c it means mongo understands JSON natively http://www.slideshare.net/drumwurzel/intro-to-mongodb

Document-oriented Think of “documents” as database records Documents are basically just JSON objects that Mongo stores in binary Think of “collections” as database tables http://www.slideshare.net/drumwurzel/intro-to-mongodb

Storing Tweets in MongoDB (python + pymongo)

Concept Mapping RDBMS MongoDB Table Collection Row Document JOIN Embedded Document or Reference Queries return record(s) Queries return a cursor https://www.mongodb.com/blog/post/thinking-documents-part-1

https://www.mongodb.com/blog/post/thinking-documents-part-1

Thinking in Documents Blogging Platform RDBMS (JOIN 5 tables) MongoDB (Data Locality) https://www.mongodb.com/blog/post/thinking-documents-part-1

Thinking in Documents Blogging Platform RDBMS (JOIN 5 tables) MongoDB (Data Locality) https://www.mongodb.com/blog/post/thinking-documents-part-1

Queries return “cursors” instead of collections A cursor allows you to iterate through the result set Much more efficient than loading all objects into memory Find() function returns a cursor http://www.slideshare.net/drumwurzel/intro-to-mongodb

Discrimination detecting Assumption: Tweets containing some specific hashtags, like #discrimination, #sexism, #racism, have discrimination. Tweets containing #news hashtag do not have discrimination. Classifier tweets Discrimination or not Naïve Bayes Decision Tree Random Forest SVM Neural Network

One slide intro to machine learning Training (BOW, TFIDF, LSA, EMBEDDINGS, etc…) Testing http://ogrisel.github.io/scikit-learn.org/sklearn-tutorial/tutorial/astronomy/general_concepts.html

Recurrent Neural Networks (RNN) Recurrent Neural Networks (RNN) are capable of conditioning the model on all previous words in the corpus. The sequential information is preserved in the recurrent network’s hidden state, which manages to span many time steps as it cascades forward to affect the processing of each new words. RNNs take as their input not just the current word they see,  but also what they perceived one step back in time.  So recurrent networks have two sources of input, the present and the recent past. Can operate on sequential data of variable length An unrolled recurrent neural network. http://colah.github.io/posts/2015-08-Understanding-LSTMs/

The architecture 𝑦 Logistic Repression ℎ Mean pooling ℎ 0 ℎ 1 ℎ 2 ℎ 𝑛 RNN RNN RNN RNN 𝑤 1 𝑤 2 𝑤 3 𝑤 𝑛 http://deeplearning.net/tutorial/lstm.html

The last step: coding Loading tweets from MongoDB Preprocessing tweets (removing stop words, tokenize, …) Split the dataset into training dataset and test dataset Training the neural network on training dataset and test the accuracy on test dataset.

Resources https://dev.twitter.com/overview/documentation https://github.com/tweepy/tweepy http://www.ark.cs.cmu.edu/TweetNLP/ https://docs.mongodb.org/manual/ http://www.deeplearning.net/tutorial/ https://education.github.com/ http://stats.seandolinar.com/collecting-twitter-data-introduction/ http://adilmoujahid.com/posts/2014/07/twitter-analytics/

Thank you