Presentation is loading. Please wait.

Presentation is loading. Please wait.

From RDBMS to Hadoop A case study Mihaly Berekmeri School of Computer Science University of Manchester Data Science Club, 14th July 2016 Hayden Clark,

Similar presentations


Presentation on theme: "From RDBMS to Hadoop A case study Mihaly Berekmeri School of Computer Science University of Manchester Data Science Club, 14th July 2016 Hayden Clark,"— Presentation transcript:

1 From RDBMS to Hadoop A case study Mihaly Berekmeri School of Computer Science University of Manchester Data Science Club, 14th July 2016 Hayden Clark, Robert Wakeling Xiao-Jun Zeng, John Keane Project supported by EPSRC IAA 1

2 Introduction The context: Wadaro Limited – leading provider of quality of experience (QoE) monitoring and performance analysis solutions for mobile networks. The challenge: – provide real-time QoE monitoring of many millions of mobile network customers. The aim: – develop big data analytics based on existing bottom-up dynamic hierarchical structure – real-time performance analysis and prediction for any size of mobile network. IntroductionSetup & ChallengesThe solutionConclusions Data Science Club, 14th July 20162

3 Use case: Wadaro Wadaro Limited – SIM based QoE (SIM Applet) – gather Key Performance Indicators (KPIs) from subscribers – service performance from the user perspective Usage scenarios – network coverage benchmark – importance of cells based on coverage – benchmark customer experience of devices Data Science Club, 14th July 2016 IntroductionSetup & ChallengesThe solutionConclusions 3

4 Current Setup & Challenges – Current setup Linux LAMP (MySQL) In-House and Hosted – Challenges: Growing database Sluggish query performance Overnight aggregations Added complexity Predicted exponential increase in in-coming data Little capacity to run data analytics, whilst maintaining operational support New approach needed for data ingestion and analytics! Data Science Club, 14th July 2016 IntroductionSetup & ChallengesThe solutionConclusions 4

5 New architecture expectations – Scalable data ingestion: Capable of handling millions of devices – Scalable, cheap storage Terabytes of data – Scalable, fast analytics engine SQL compatibility Avoid overnight aggregations Data Science Club, 14th July 2016 IntroductionSetup & ChallengesThe solutionConclusions 5

6 Data Science Club, 14th July 2016 IntroductionSetup & ChallengesThe solutionConclusions 6

7 Scalable data ingest: Flume Data Science Club, 14th July 2016 Ingesting data into Hadoop Advantages – distributed service for getting the data in the cluster – captures and processes data asynchronously – act as a mediator between data producers and the data store. Cons – the complexity of writing custom agents IntroductionSetup & ChallengesThe solutionConclusions 7

8 Scalable data ingest: Flume Data Science Club, 14th July 2016 IntroductionSetup & ChallengesThe solutionConclusions 8

9 Scalable data analytics: IBM BigSQL Data Science Club, 14th July 2016 Massively parallel processing engine Advantages – fully ANSI SQL compliant (DB2) – audited TPC benchmarks shows Big SQL being faster than competitors Hive and Impala – enterprise ready – compatible with BI tools: such as Tableau Cons – specific to the IBM 9 IntroductionSetup & ChallengesThe solutionConclusions

10 Scalable data ingest: BigSQL Data Science Club, 14th July 2016 IntroductionSetup & ChallengesThe solutionConclusions 10

11 Scalable data analytics: IBM BigSQL Data Science Club, 14th July 2016 Technical issues encounter – issues with MySQL compatibility DB2 specific commands – lacking documentation few examples, low community support – no update/delete support – complex architecture new skills, expertise required IntroductionSetup & ChallengesThe solutionConclusions 11

12 Scalable data analytics: IBM BigQL Data Science Club, 14th July 2016 Key development process insights: Query type, database analysis: – data partitioning granularity improve query response times and efficiency (for example by day, month, year) – storage format (Parquet, Avro etc) – SQL or No-SQL approach? Table scans, aggregations -> columnar, Hive Frequent, small data lookup -> columnar, Hbase What if you need to do both? IntroductionSetup & ChallengesThe solutionConclusions 12

13 Results Data Science Club, 14th July 2016 IntroductionSetup & ChallengesThe solutionConclusions – Experimental setup: 70 Gb, 152 million rows 1 hosted MySQL on top end server 3 In-house cheap desktops – Key results Best case: 45 minutes queries 10 seconds Worst case: 3 hour queries 5 minutes 2-3 orders of magnitude improvement 13

14 Conclusions Data Science Club, 14th July 2016 IntroductionSetup & ChallengesThe solutionConclusions Is the switch from RDBMS to Hadoop worth it? Significant cost reduction Scalable data ingest Scalable, cheap storage Scalable data analytics Opens up data for future analysis Commercially, it offers the possibility of expansion into massive markets. prepared for business evolution 14

15 Future work Data Science Club, 14th July 2016 IntroductionSetup & ChallengesThe solutionConclusions Stream analysis Model the time-varying and interacting relationships between raw streaming data and various network performance indicators Online learning algorithms Feedback the knowledge into the network ML: classification, regression, clustering 15

16 Thank you for your attention! Questions? IntroductionExperimental setupControlConclusions Data Science Club, 14th July 201616


Download ppt "From RDBMS to Hadoop A case study Mihaly Berekmeri School of Computer Science University of Manchester Data Science Club, 14th July 2016 Hayden Clark,"

Similar presentations


Ads by Google