Tech Evangelist, Microsoft Responsible for Azure Evangelism in Denmark Economics and Statistics background (Aarhus University) Twitter: @sebastianbk Blog: https://sebastianbrandes.com
What is Big Data (according to Microsoft)? Hadoop and The Hadoop Ecosystem – some stats Introduction to Microsoft Azure and HDInsight Provisioning a Hadoop cluster in Azure Installing R on the cluster Running MapReduce jobs using R Azure Machine Learning + R Wrapping Up
$100 gets you 3 million times more storage in 30 years 1980 10 MIPS/$ 2005 10M MIPS/$ >5.5 billion (70+% of global population) >2 Billion users Web traffic 2010 130 Exabyte (10 E18) 2015 1.6 Zettabyte (10 E21) >10 Billion
“Big data is a collection of data sets so large and complex that it becomes awkward to work with using on-hand database management tools. Difficulties include capture, storage, search, sharing, analysis, and visualization.” – Wikipedia
Internet of Things Audio / Video Log Files Text/Image Social Sentiment Data Market Feeds eGov Feeds Weather Wikis / Blogs Click Stream Sensors / RFID / Devices Spatial & GPS Coordinates Modern Web Mobile Advertisin g CollaborationeCommerce Digital Marketing Search Marketing Web Logs Recommendation s ERP / CRM Sales Pipeline Payables Payroll Inventory Contacts Deal Tracking Terabytes (10E12) Gigabytes (10E9) Exabytes (10E18) Petabytes (10E15) Velocity - Variety Volume ERP / CRM Modern Web Internet of Things
How do I optimize my services based on patterns of weather, traffic, etc.? What’s the social sentiment of my product? How do I better predict future outcomes?
The Large Hadron Collider (LHC) is the world's largest and most powerful particle collider, and the largest single machine in the world, built by the European Organization for Nuclear Research (CERN) from 1998 to 2008. Wikipedia
Integration between the R statistical package and Hadoop’s Distributed File System and MapReduce Computation Engine Moves algorithm execution closer to the data Provides access to lots of high ‐ quality statistical libraries Speeds work by processing in parallel