Presentation is loading. Please wait.

Presentation is loading. Please wait.

Big Data Analysis Chin-Chih Chang 張欽智 Computer Science and Information Engineering Chung Hua University 2014/03/24 1.

Similar presentations


Presentation on theme: "Big Data Analysis Chin-Chih Chang 張欽智 Computer Science and Information Engineering Chung Hua University 2014/03/24 1."— Presentation transcript:

1 Big Data Analysis Chin-Chih Chang 張欽智 changc@chu.edu.tw Computer Science and Information Engineering Chung Hua University 2014/03/24 1

2 Big Data Analysis What is Big Data? Why is Big Data important? How to do with these data? Example: A Recommender System Combining Social Networks for Tourist Attractions 2

3 What is Big Data? Big Data refers to datasets whose size are beyond the ability of typical database software tools to capture, store, manage and analyze. This definition is intentionally subjective and incorporates a moving definition of how big a dataset needs to be in order to be considered big data. Big data in many sections today will range from a few dozen terabytes (10 12 ) to multiple petabytes (10 15 ). 3

4 What is big data? Big Data is not just about the size of data but also includes data variety and data velocity. Together, these three attributes form the three V’s of Big Data. 4

5 Data types Structured data: This type describes data which is grouped into a relational scheme (e.g., rows and columns within a standard database). The data configuration and consistency allows it to respond to simple queries to arrive at usable information, based on an organization's parameters and operational needs. 5

6 Data Types Semi-structured data: This is a form of structured data that does not conform to an explicit and fixed schema. The data is inherently self-describing and contains tags or other markers to enforce hierarchies of records and fields within the data. Examples include weblogs and social media feeds. Unstructured data: This type of data consists of formats which cannot easily be indexed into relational tables for analysis or querying. Examples include images, audio and video files. 6

7 What is big data? 7

8 How much big data? Multiples of bytes Decimal ValueMetric 1000kBkilobyte 1000 2 MBmegabyte 1000 3 GBgigabyte 1000 4 TBterabyte 1000 5 PBpetabyte 1000 6 EBexabyte 1000 7 ZBzettabyte 1000 8 YByottabyte Binary ValueJEDECIEC 1024KBkilobyteKiBkibibyte 1024 2 MBmegabyteMiBmebibyte 1024 3 GBgigabyteGiBgibibyte 1024 4 --TiBtebibyte 1024 5 --PiBpebibyte 1024 6 --EiBexbibyte 1024 7 --ZiBzebibyte 1024 8 --YiByobibyte Orders of magnitude of data 8

9 How much data? 9

10 “We except to create 12.6 exabytes of data every day in 2014 — so much that 90% of the data in the world today has been created in the last two years alone. This data comes from everywhere: sensors used to gather climate information, posts to social media sites, digital pictures and videos, purchase transaction records, and cell phone GPS signals to name a few. This data is “big data.” 10

11 Big Data is everywhere! Lots of data is being collected and warehoused  Web data, e-commerce  purchases at department/grocery stores  Bank/credit card transactions  Social network  Instant messaging  Internet of things 11

12 Type of Data Relational Data (Tables/Transaction/Legacy Data) Text Data (Web) Semi-structured Data (XML) Graph Data  Social Network, Semantic Web (RDF), … Streaming Data  You can only scan the data once 12

13 Why is Big Data important? Successful Stories:  Netflix  Movies  Super markets  … 13

14 What to do with these data? Aggregation and Statistics  Data warehouse and OLAP Indexing, Searching, and Querying  Keyword based search  Pattern matching (XML/RDF) Knowledge discovery  Data Mining  Statistical Modeling 14

15 What is Data Mining? Discovery of useful, possibly unexpected, patterns in data Non-trivial extraction of implicit, previously unknown and potentially useful information from data Exploration & analysis, by automatic or semi-automatic means, of large quantities of data in order to discover meaningful patterns 15

16 Data Mining Tasks Classification [Predictive] Clustering [Descriptive] Association Rule Discovery [Descriptive] Sequential Pattern Discovery [Descriptive] Regression [Predictive] Deviation Detection [Predictive] Collaborative Filter [Predictive] 16

17 Example: A Recommender System Combining Social Networks for Tourist Attractions 17

18 Outline Abstract Introduction Related Work System Design and Mechanism System Implementation and Experiments Experimental Results Conclusion and Future Work 18

19 Abstract In this paper we present a recommender system combining social networks for tourist attractions. Three mechanisms are analyzed:  Using similarity among users and their trustability.  Using information collected from social networks.  Combination of similarity and social networks. 19

20 Introduction A recommender system is a system that suggests things which users might be interested in after learning their preferences. A recommender system can help users cope with the problem of information overload. Social networks have become a common platform for people to share their thoughts and extend their friendships into a virtual world. 20

21 Introduction There is high potential to enhance recommender systems by incorporating social network information. But how to effectively use social network information is still a research topic. A tourist information system will be convenient to those who are preparing to travel or just on the road. 21

22 Introduction Similar information overload could happen in these tourist information systems. In this paper, we will present a tourist information system that combines recommender systems and social network. 22

23 Related Work Recommender System A recommender system is used to help users find items they prefer faster and more accurate by suggesting them the right things. There are mainly four approaches for recommendation: content-based filtering, collaborative filtering, knowledge-based approaches, and hybrid approaches. 23

24 Related Work Recommender System Content-based filtering: The method recommends items that are similar to the ones that the user liked in the past. Collaborative filtering: The method recommends the items that are likely used by those who have the similar interest to the user. Knowledge-based approaches: One example of this type of approaches is to ask the user directly about her or his requirements. Based on the criteria provided by the user the items are recommended. Hybrid approaches: The method is a hybrid of above methods. 24

25 Related Work Recommender System Comparison of Recommender Techniques 25 Recommendation techniques AdvantagesDrawbacks Content-based filtering Effective in locating items that are relevant to the topic Capturing only certain aspects of the content; over- specialization Collaborative filtering The items are recommended based on user’s rating. The coverage of rating could be very sparse; the new items would not be recommended; algorithm is not so efficient. Knowledge-based approaches It does not rely on the existence of a purchase history. Detailed knowledge about items might be required. Hybrid approach Efficient and more accurate Not so simple.

26 Related Work Social Network Sites Social network sites are Web-based services which enable online social networks or relationships. Social network sites are one type of social media which is any platform where people can create, share, and exchange their activities, views, interests, experiences, or information. 26

27 Related Work Social Network Sites Social media have become a part of our daily life.  It is not easy for us not to notice people are focusing on their mobile device to use Facebook or LINE no matter where they are. User profiles, friends, and comments are three key components of social network sites. 27

28 Related Work Social Network Sites Social network users have been growing drastically.  There approximate 800 million users on Facebook. Some even called it Facebook country. A social recommendation utilizes user's social network and related information for recommendation. 28

29 Related Work Social Network Sites Social network users have been growing drastically.  There approximate 800 million users on Facebook. Some even called it Facebook country. A common technique for social recommendations is collaborative filtering.  Based on two assumptions: people who are socially associated are more likely to share the common interests and users can be easily influenced by the friends they trust. 29

30 Related Work Tourist Information Systems A tourist information system is a system that provides travel guides, maps, information of accommodation and transportation. A system that can recommends tourist attractions will be very helpful to any tourist. 30

31 System Design and Mechanism In our design we aim at building a tourist information system which lets users access the attraction information either from an information kiosk. The system is associated with Facebook. Whenever an interface device is equipped with a RFID reader, users can to log into the system without typing the account and password by using a RFID card. 31

32 System Design and Mechanism The interactions between users and attraction information website are shown as follows. 32

33 System Design and Mechanism The system operation is shown as follows: 33

34 System Design and Mechanism System Operation 1. Facebook App interface is available to users. 2. Users can access Facebook App to share, like, comment on, and rate the attractions using their Facebook account. The first user needs to choose their interest and can register their RFID cards. 3. A Web server and a database management system (DBMS) are running on a server machine. 4. Users can directly log into the system through a RFID card if they have registered their RFID card. 34

35 Personalized Social Recommendation (PSR) 1. Acquire users’ appraisal on each attraction and activities on the social network site. 2. Use collaborative filtering or keep track of activities on the social network site for recommendation. 3. Calculate the score for each attraction. 4. Rank attractions based on the score. 1. If the scores are same, then check the appraisal time. The evaluation done in the more recent time obtains the higher rank. 5. Recommend the attractions with top 3 scores and show the attraction of the top 1 on the main page. 35

36 List of Recommendation Methods Recommendation Method 1Collaborative filtering based on users’ appraisal and trustability evaluations: Equation (1)(2)(3). 2Social recommendation based on users’ activities in social network sites: Equation (4)(5). 3Combination of 1 and 2: Equation (6). 36

37 Attraction appraisals of different users User Attraction User 0 User 1 User 2 User 3 User 4 User 5 A1A1 1084 7 A2A2 424226 A3A3 84688 A4A4 43 855 A5A5 528 4 37

38 Recommendation Methods Collaborative filtering 38

39 Recommendation Methods Collaborative filtering 39

40 Recommendation Methods Collaborative filtering Trustability of users 40 Attraction User 1 User 2 User 3 User 4 User 5 A1A1 0.9220.1320.4200.5590.410 A2A2 0.5230.7230.5320.5230.273 A3A3 0.1980.5230.7300.7230.273 A4A4 0.2520.1320.3590.667 A5A5 0.1120.4820.1910.2960.182 0.4010.3980.4460.5550.361

41 Recommendation Methods Collaborative filtering Similarity matrix among users 41 j i 12345678910 110.90.80.70.60.50.40.30.20.1 20.91 0.80.70.60.50.40.30.2 30.80.91 0.80.70.60.50.40.3 40.70.80.91 0.80.70.60.50.4 50.60.70.80.91 0.80.70.60.5 6 0.60.70.80.91 0.80.70.6 70.40.50.60.70.80.91 0.80.7 80.30.40.50.60.70.80.91 0.8 90.20.30.40.50.60.70.80.91 100.10.20.30.40.50.60.70.80.91

42 Recommendation Methods Collaborative filtering 42

43 Recommendation Methods Collaborative filtering Average similarity between User 0 and other users 43 User 1 User 2 User 3 User 4 User 5 0.7600.6600.7800.8600.800

44 Recommendation Methods Collaborative filtering 44 User 1 User 2 User 3 User 4 User 5 User 0 R 0,j 0.3050.2630.3480.4770.289

45 Recommendation Methods Social Network Activities 45

46 Recommendation Methods CF plus Social Recommendation We then combine Equation (3) for collaborative filtering and Equation (4) for social recommendation into Equation (6) where each method are given the weight 0.5. T = 0.5R + 0.5P (6) 46

47 System Implementation and Experiments Development environment 47 Hardware CPUIntel Core i5-560M, 2.67GHz Memory4GB DDRIII Network Interface CardAtheros AR8131 PCI-E Gigabit Ethernet Controller Software OSWindows 7 Enterprise Development Zend Studio 8.0.1 、 Apache Programming LanguagesPHP SDKFacebook SDK for PHP DatabasePostgreSQL 9.0

48 System architecture 48

49 Map of Web pages 49

50 Main page 50

51 Page of an attraction 51

52 Experiment Results In our experiments there are total 1360 records based on 20 attractions and 68 participants. We test 3 methods. The better solution needs to include more attractions without being affected by low activities on social network sites. 52

53 Experiment Results Collaborative Filtering 53

54 Experiment Results Social Recommendation 54

55 Experiment Results collaborative filtering and social recommendation 55

56 Conclusions and Future Work In this paper we present a recommendation mechanism that takes user’s social network into consideration. The system has three key features: 1. Social recommendation is integrated into the system. 2. Personalization is taken into account. 3. The system is practical, cost-effective, and expandable. 56

57 Conclusions and Future Work In order to enhance the recommendation mechanism, more factors that could affect the recommendation should be investigated. The other issue is to figure out more methods that can mine social network sites. In the future, we can apply our design to other systems such as a museum information system. 57

58 58 Thanks for your listening! Q & A


Download ppt "Big Data Analysis Chin-Chih Chang 張欽智 Computer Science and Information Engineering Chung Hua University 2014/03/24 1."

Similar presentations


Ads by Google