Presentation is loading. Please wait.

Presentation is loading. Please wait.

REPLACE THIS BOX WITH YOUR ORGANIZATION’S

Similar presentations


Presentation on theme: "REPLACE THIS BOX WITH YOUR ORGANIZATION’S"— Presentation transcript:

1 REPLACE THIS BOX WITH YOUR ORGANIZATION’S
PREDICTING INFLUENTIAL PERSON IN SOCIAL NETWORK USING EXTRA TREE CLASSIFIER WITH RECURSIVE FEATURE ELIMINATION REPLACE THIS BOX WITH YOUR ORGANIZATION’S HIGH RESOLUTION LOGO Aman Jain, Jitender Gangwar, Lucky Suman ABV-Indian Institute of Information Technology and Management, Gwalior ABSTRACT INTRODUCTION RESULTS DISCUSSION Social networks affect individuals and their decisions, people follow other people and tend to follow their foot-marks. The model aims to find the people who are at the center of these social network and affect a large number of people around them by their decisions. Actions and social mentions of people are persuaded more efficiently by "power middle." Intermediary influencers like bloggers have reach to a smaller audience but hold a strong grasp on them and are 16 times more persuasive than paid media and "mega influencers." Given two persons and their social network features, our job is to predict which one is more influential. The project uses training samples from Kaggle based on human judgment. We use several different models to make predictions, such as Neural Network, Decision Tree classifier and Extra Tree classifier. The model also uses some auxiliary techniques like cross validation, feature selection and data preprocessing. The F1 score in binary classification gives the measure of a test’s accuracy considering recall ’r’ and the precision ’p’ of the test to compute the final score: F1 Score = 2(precision × recall)/(precision + recall) p is the number of favorable results divided by the number of all results and, r is the number of favorable results divided by the number of correct results that are expected to be returned. 1. The maximum accuracy obtained using ANN is with 60 neurons in hidden layer using hyperbolic tan as activation function with 0.01 learning rate is 80.90%. 2. The maximum accuracy obtained using Decision Tree Classifier with AdaBoost along 50 estimators and 0.1 learning rate is 83.41%, 3. For most accurate results the model raises the randomness by using the Extra tree classifier and the accuracy obtained is 81.77%. 4. Further increment in accuracy is obtained 83.65% using the RFE with Extra tree Classifier using 100 estimators and 7 minimum split samples. The probabilistic values of the output obtained are found to be accurate. Models seems to fit for the various user profiles using various features of twitter dataset. Moreover, the system show increase in accuracy as we shift from linear models to non-linear models because the dataset does not fits the linear model. The chart 1 shows the actual outputs predictions of the models and the target results as per the data-set. The project uses different models for the task in hand and it is found that Extra Tree classifier with RFE works best and the worst results are given by Decision Tree classifier. The central premise using the feature selection is removing the redundancy of data. The model uses RFE for removing the less important features from the data-set in order to achieve higher accuracy. Proposed model correctly predicts the probabilistic outcome and maximum accuracy of 83.65% is obtained with Extra Tree classifier and RFE and the worst accuracy of 70.24% is given by Decision Tree classifier. A person’s reputation on a social network can be directly related to his influentiality in the network, which means the more influential a person is on a social media, the better his reputation is. Search engines like Google, Bing and Yahoo also consider social media influence into their page ranking algorithms. With the emergence of digital marketing, a form of marketing that targets the key individuals over a network versus a target market, there has been increased pressure on companies to find these social media influencers. A celebrity with a huge audience that retweets everything they post, is not an influencer for a specific brand, until they start talking about that specific brand/product. To be an influencer, a person does not always need to reach a large number of audience, sometimes reaching a small but targeted audience can be much more valuable. Consider the query, “Whom to target to spread information to a larger crowd?”. The answer to this is a person who is much better connected than others and ought to playa correspondingly greater role in spreading the information and viruses throughout the society. METHODS AND MATERIALS CONCLUSIONS The model is able to predict the influencer person between the two on the basis of dataset of particular social network - twitter. The outcomes obtained are testified against actual target values and thus the project obtains maximum accuracy of 83.65% with Extra Tree classifier and RFE. Accuracy obtained by the non-linear models also restrict to certain value because of the fact that dataset given may be widely scattered. The project uses several models, some of them are explained for predicting which of the user is more influential than the other. The Data set is collected from Kaggle where it is being provided as an accessory. The data set contains a total of 5500 tuples and each tuple has 22 features, 11 for each person. This project considers that there are 2 users in the system. It selects features of the users and based on those features extracted from Twitter it tries to find the influentiality extent of the user and predict whether he is more influential than other or not. The different models used are: 1. Artificial Neural Network  2. Decision Tree Classifier 3. Random Forest Classifier 4. Extra Tree Classifier REFERENCES Figure 1. Influence of users Graph 1. Effect of retweets Liu, R., Zhao, Y. and Zhou, L.: 2014, Predict influencers in the social network. Kitsak, Maksim, Gallos, Lazaros, Havlin, Shlomo, Liljeros, Fredrik, Muchnik, Lev, Stanley, Eugene, Makse and Hern: 2010, Identification of influential spreaders in complex networks, Nature physics 6(11), 888–893. Meeyoung, Haddadi, Hamed, Benevenuto, Fabricio, Gummadi, Krishna, P. and Cha: 2010, Measuring user influence in twitter: The million follower fallacy., ICWSM 10(10-17), 30. Smith, R. A. and Fink, E. L.: 1998, Understanding the influential people and social structures shaping compliance, Journal of Social Structure 16. URL: CONTACT Jitender Gangwar ABV IIITM, Gwalior Phone:    Graph 2. Effect of mentions Chart 1. Performance of Models


Download ppt "REPLACE THIS BOX WITH YOUR ORGANIZATION’S"

Similar presentations


Ads by Google