Presentation is loading. Please wait.

Presentation is loading. Please wait.

FYP Presentation DATA FUSION OF CONSUMER BEHAVIOR DATASETS USING SOCIAL MEDIA Madhav Kannan A0088653R 1.

Similar presentations


Presentation on theme: "FYP Presentation DATA FUSION OF CONSUMER BEHAVIOR DATASETS USING SOCIAL MEDIA Madhav Kannan A0088653R 1."— Presentation transcript:

1 FYP Presentation DATA FUSION OF CONSUMER BEHAVIOR DATASETS USING SOCIAL MEDIA Madhav Kannan A0088653R 1

2 Traditional recommendation engines are capable of recommending products by employing techniques such as content-based and collaborative filtering methods. While these recommendations keep customers engaged, they are restricted to recommending products within the same site. This project explores the idea of utilizing review data from multiple sites as well as social media, to enhance recommendations. Motivation Background Procedure Observations Conclusion 2

3 Among the multiple reviews related to books and hotels on Amazon and TripAdvisor, and tweets on Twitter, the number of users who have read both book P and stayed at hotel Q can be determined. Using this information, it is possible to infer whether someone else will stay at hotel Q in the future, on having read book P. This project aims to achieve superior user segmentation and recommendation capabilities by developing an algorithm to determine the probability of a consumer purchasing product A on having purchased product B, by fusing data from social media (Twitter) with consumer review data. Motivation Background Procedure Observations Conclusion 3

4 LDA Motivation Background Procedure Observations Conclusion LDA posits that each document is a mixture of a small number of topics and that each word's creation is attributable to one of the document's topics. z w N M θ β α ϕ κ 4

5 Scraping TripAdvisor and Amazon data 28981 reviews pertaining to 3091 hotels from 186 cities were scraped from the TripAdvisor database. Hotels can be categorized as ‘Business’, ‘Solo’, ‘Couple’ or ‘Family’. 25004 reviews pertaining to 2561 products from 15 categories were scraped from the Amazon database. There were 2987 instances of reviews with matching usernames greater than 3 characters on both sides (>10% of all reviews), with 2010 unique usernames. Motivation Background Procedure Observations Conclusion 5

6 Scraping Twitter data 26978 tweets pertaining to TripAdvisor hotels were scraped. 27653 tweets pertaining to Amazon products were scraped. There were 2287 instances (over 8%) of tweets with matching Twitter handles with 1920 unique handles. Motivation Background Procedure Observations Conclusion 6

7 Topic Modeling After running LDA on the review text, topics summarizing the entire text corpus are received. Motivation Background Procedure Observations Conclusion 7

8 Category Recommendation Recommendations across sub-categories of Amazon products and categories of TripAdvisor hotels. The corpus is prepared by creating documents from reviews and tweets related to each product. In the following example, the ‘non-fiction’ subcategory from the ‘books’ category has been chosen. There are over 140 instances of reviews and tweets that form the basis of our evaluation. Motivation Background Procedure Observations Conclusion 8

9 Topic Modeling in Amazon Top 5 topics under category ‘Non-fiction Books’ – Topic 1: 0.137 *travel + 0.063*location + 0.061*exotic 0.041*destinations + 0.04*hotels + 0.039*shopping + 0.029*food Topic 2: 0.13*business + 0.117*world + 0.087*trends + 0.044*hard + 0.03*knowledgeable + 0.029*work + 0.028*night + 0.02*book Topic 3: 0.185*food + 0.154 *delicious + 0.067*recipes + 0.049*chef + 0.046 *experienced + 0.028 *restaurant + 0.023*cuisines Topic 4: 0.129*first + 0.089*love + 0.053*family + 0.053*learn + 0.035*culture + 0.022*values + 0.021*adults + 0.02*parents Topic 5: 0.255*biography + 0.123*really + 0.091*very + 0.08*point + 0.065*interesting + 0.043 *inspirational + 0.025*science Motivation Background Procedure Observations Conclusion 9

10 Topic Modeling in TripAdvisor Motivation Background Procedure Observations Conclusion Top 5 topics under hotels – Topic 1: 0.141*restaurant + 0.091*pool + 0.06*experience + 0.045*lovely + 0.03*price + 0.029*hostel + 0.025*cheap Topic 2: 0.153*room + 0.12*excellent + 0.087*expensive + 0.076*star + 0.065*nights + 0.034*loved + 0.028*family + 0.022*comfortable Topic 3: 0.132*hotel + 0.096*food + 0.063*place + 0.045*restaurant + 0.04*everything + 0.031*week + 0.029*cuisines Topic 4: 0.143*great + 0.088*stayed + 0.075*clean + 0.057*work + 0.046*business + 0.039*nights +0.028*company + 0.022*laundry Topic 5: 0.156*rooms + 0.132*location + 0.109*airport + 0.65*hotels + 0.32*booked + 0.27*find + 0.23*metro 10

11 Similarity Measure Motivation Background Procedure Observations Conclusion Highest similarity within topics– Topic 1(Amazon): 0.137*travel + 0.063*location + 0.061*exotic + 0.041*destinations + 0.04*hotels + 0.039*shopping + 0.029*food Topic 5(TA): 0.156*rooms + 0.132*location + 0.109*airport + 0.65*hotels + 0.32*booked + 0.27*find + 0.23*metro The above topics provided a cosine similarity measure of 0.377. 34 instances of reviews and tweets form the basis of our evaluation. 11

12 Similarity Measure Motivation Background Procedure Observations Conclusion Highest similarity within topics– Topic 2(Amazon): 0.13*business + 0.117*world + 0.087*trends + 0.044*hard + 0.03*knowledgeable + 0.029*work + 0.028*night + 0.02*book Topic 4(TA): 0.143*great + 0.088*stayed + 0.075*clean + 0.057*work + 0.046*business + 0.039*nights +0.028*company + 0.022*laundry The above topics provided a cosine similarity measure of 0.523. 47 instances of reviews and tweets form the basis of our evaluation. 12

13 Similarity Measure Motivation Background Procedure Observations Conclusion Highest similarity within topics– Topic 3(Amazon): 0.185*food + 0.154 *delicious + 0.067*recipes + 0.049*chef + 0.046 *experienced + 0.028 *restaurant + 0.023*cuisines Topic 3(TA): 0.132*hotel + 0.096*food + 0.063*place + 0.045*restaurant + 0.04*everything + 0.031*week + 0.029*cuisines The above topics provided a cosine similarity measure of 0.478. 43 instances of reviews and tweets contain form the basis of our evaluation. 13

14 The corpus text is too small to determine meaningful topics through topic modeling. The value of recommendations by product is weighted as an average of the sub-category that the product is a part of (previous section), and the cosine similarity between reviews of the product and a particular hotel. In the following experiment, a book (‘Power Broker’) has been taken from ‘Business’ sub-subcategory from the non-fiction subcategory. Product Recommendation Motivation Background Procedure Observations Conclusion 14

15 Similarity Measure Motivation Background Procedure Observations Conclusion The reviews and tweets of ‘Power Broker’ were found to have the highest similarity score of 0.472 with reviews and tweets of ‘Hotel Langham Palace’ in New York City, after weighing the scores with sub-category recommendations. Moreover, 20% of the reviewers who had reviewed the book and the hotel. Additionally, 10% of Twitter users had tweeted about the book and the hotel. 15

16 Limitations Motivation Background Procedure Observations Conclusion Working on the assumption that matching usernames are accounts held by the same person – not necessarily true. Tweets scraped are extremely noisy, with poor grammar and punctuation. Limited ground truth. Possibility of random agreement may be a bit high. 16

17 The findings seem to suggest correlation between the results obtained and the ground truth established. Through multiple experiments, it was found that over 70% of the recommendations carried weight as at least 10% of the reviewers of the product had mentioned the hotel either through reviews or through tweets. Work to be done – use KL divergence as an alternative method of similarity measure, apply the model to restaurant recommendations and compare results. Conclusion Motivation Background Procedure Observations Conclusion 17


Download ppt "FYP Presentation DATA FUSION OF CONSUMER BEHAVIOR DATASETS USING SOCIAL MEDIA Madhav Kannan A0088653R 1."

Similar presentations


Ads by Google