Presentation is loading. Please wait.

Presentation is loading. Please wait.

Clearstorydata.com Using Spark and Shark for Fast Cycle Analysis on Diverse Data 12.2.13 Vaibhav Nivargi.

Similar presentations


Presentation on theme: "Clearstorydata.com Using Spark and Shark for Fast Cycle Analysis on Diverse Data 12.2.13 Vaibhav Nivargi."— Presentation transcript:

1 clearstorydata.com Using Spark and Shark for Fast Cycle Analysis on Diverse Data 12.2.13 Vaibhav Nivargi

2 clearstorydata.com About ClearStory Data

3 clearstorydata.com Analysis in the New Data Landscape New use cases seen in all industries. Live situational analysis requiring fast-cycle analysis across internal data and sources of external data Multi-source analysis with data refreshing on new insights, as data from sources evolves Large-scale analysis of structured and unstructured data combined in integrated insights

4 clearstorydata.com Example: Interactive Multi-source Analysis More data and more people change the analysis. Facebook Shares, Likes, Comments News Coverage Online, Print, Television Twitter Followers, Tweets, Retweets Donations New Members, Donations Website Traffic Traffic, Referrals, Content Data Intelligence Interactive analysis on diverse internal & external data Corporate Sponsors Corporate Engagement, New Inquiries

5 clearstorydata.com Today’s Need is Speed, Scale & Ad Hoc Flexibility With more sources, more data and more people. ?? ??

6 clearstorydata.com Why Spark and Shark ? RDDs – Low latency & scale – Iterative and Interactive computation Lineage and fault tolerance – Able to re-derive data Expressive power of Scala and SQL – Operations beyond aggregations, joins, and statistical operators – Advanced: ML, data mining, segmentation, approximate queries, graphs … Support for structured and semi-structured data BDAS Stack & AMPLab – Tachyon, MLBase, BlinkDB, GraphX … Community and adoption

7 clearstorydata.com Data SourcesClearStory PlatformClearStory Application The ClearStory Solution Data Inference & Profiling Harmonization Visualization Collaboration In-Memory Data Units

8 clearstorydata.com Public Premium Web RDBMS Hadoop ClearStory API User Application Data Access, Inference and Lineage Data Source API Files Spark Cluster + ClearStory IP Harmonization Engine and Blended Data Processing Where do Spark & Shark fit ?

9 clearstorydata.com How we leverage Spark & Shark User intent captured and translated to custom API Harmonization-as-a-Service Manages Spark and Shark query execution Read cached data from HDFS RESTful Merges datasets (RDDs) on the fly – on user request Support conversion of user actions to backend queries Query optimizations Performance optimizations Mixed-mode execution (sql2rdd & spark native) Caching Pre-computation

10 clearstorydata.com How we leverage Spark & Shark Query results returned to the application for scalable visualization and ClearStory-specific viz techniques RDDs cached/un-cached and materialized at strategic points based on usage patterns and signals Data updates automatically processed as source data changes ClearStory’s own deployment, packaging, and integrated monitoring for operations at scale

11 clearstorydata.com Spark Developments – What We Like Query cancellation, progress indication (0.8.1 and beyond) More performance breakthroughs Workload Management BlinkDB MLBase Tachyon GraphX

12 clearstorydata.com We’re Hiring! Working with the community, giving back Lots of exciting new developments This is like the early days of Hadoop – massive momentum gathering The First Spark Summit! More Meet-ups!

13 clearstorydata.com


Download ppt "Clearstorydata.com Using Spark and Shark for Fast Cycle Analysis on Diverse Data 12.2.13 Vaibhav Nivargi."

Similar presentations


Ads by Google