DBSI Teaser presentation The Beckman Report On Database Research

Slides:

Advertisements

Similar presentations

DIGIDOC A web based tool to Manage Documents. System Overview DigiDoc is a web-based customizable, integrated solution for Business Process Management.

Advertisements

Big Data Management and Analytics Introduction Spring 2015 Dr. Latifur Khan 1.

Data - Information - Knowledge

Big Data Workflows N AME : A SHOK P ADMARAJU C OURSE : T OPICS ON S OFTWARE E NGINEERING I NSTRUCTOR : D R. S ERGIU D ASCALU.

Chapter 9 DATA WAREHOUSING Transparencies © Pearson Education Limited 1995, 2005.

Components and Architecture CS 543 – Data Warehousing.

DATA WAREHOUSING.

By N.Gopinath AP/CSE. Why a Data Warehouse Application – Business Perspectives  There are several reasons why organizations consider Data Warehousing.

Data Mining on the Web via Cloud Computing COMS E6125 Web Enhanced Information Management Presented By Hemanth Murthy.

Logistics and Systems Rabby Q. Lavilles. Supply chain is a system of organizations, people, technology, activities, information and resources involved.

CS525: Special Topics in DBs Large-Scale Data Management Hadoop/MapReduce Computing Paradigm Spring 2013 WPI, Mohamed Eltabakh 1.

2007. Software Engineering Laboratory, School of Computer Science S E Web-Harvest Web-Harvest: Open Source Web Data Extraction tool 이재정 Software Engineering.

Big Data Analytics Large-Scale Data Management Big Data Analytics Data Science and Analytics How to manage very large amounts of data and extract value.

1 Computing Challenges for the Square Kilometre Array Mathai Joseph & Harrick Vin Tata Research Development & Design Centre Pune, India CHEP Mumbai 16.

Big traffic data processing framework for intelligent monitoring and recording systems 學生 : 賴弘偉教授 : 許毅然作者 : Yingjie Xia a, JinlongChen a,b,n, XindaiLu.

IoT Meets Big Data Standardization Considerations

Big Data Analytics Platforms. Our Team NameApplication Viborov MichaelApache Spark Bordeynik YanivApache Storm Abu Jabal FerasHPCC Oun JosephGoogle BigQuery.

Big Data Quality Panel Norman Paton University of Manchester.

MarkLogic The Only Enterprise NoSQL Database Presented by: Aashi Rastogi ( ) Sanket Patel ( )

Abstract MarkLogic Database – Only Enterprise NoSQL DB Aashi Rastogi, Sanket V. Patel Department of Computer Science University of Bridgeport, Bridgeport,

Leverage Big Data With Hadoop Analytics Presentation by Ravi Namboori Visit

Big Data-An Analysis. Big Data: A definition Big data is a collection of data sets so large and complex that it becomes difficult.

Dr. Ir. Yeffry Handoko Putra

DATA Storage and analytics with AZURE DATA LAKE

Big Data & Test Automation

Building a Data Warehouse

Building Enterprise Applications Using Visual Studio®

Scaling Big Data Mining Infrastructure: The Twitter Experience

SNS COLLEGE OF TECHNOLOGY

Big Data Enterprise Patterns

Connected Living Connected Living What to look for Architecture

MapReduce Compiler RHadoop

Viewing Data-Driven Success Through a Capability Lens

Data Warehouse Components

An Open Source Project Commonly Used for Processing Big Data Sets

The Development Process of Web Applications

Big Data Technology.

Meteorological Big Data-as-a-Service: SOA based Environment and Methods for Meteorological Big Data Exploration Yaqiang Wang Chengdu University of Information.

Joseph JaJa, Mike Smorul, and Sangchul Song

Connected Living Connected Living What to look for Architecture

Software Design and Architecture

Chapter 18 MobileApp Design

Cloud Computing By P.Mahesh

Data Warehouse.

C2CAMP (A Working Title)

DILV -Data Integrity and Lifecycle Validator

Big Data - in Performance Engineering

IT INFRASTRUCTURES Business-Driven Technologies

Azure's Performance, Scalability, SQL Servers Automate Real Time Data Transfer at Low Cost MINI-CASE STUDY “Azure offers high performance, scalable, and.

Data Warehousing and Data Mining

Tools for Processing Big Data Jinan Al Aridhee and Christian Bach

Ch 4. The Evolution of Analytic Scalability

Clouds & Containers: Case Studies for Big Data

Overview of big data tools

Big Data Young Lee BUS 550.

Data Warehousing in the age of Big Data (1)

UNIT 5 EMBEDDED SYSTEM DEVELOPMENT

UNIT 5 EMBEDDED SYSTEM DEVELOPMENT

Charles Tappert Seidenberg School of CSIS, Pace University

Dtk-tools Benoit Raybaud, Research Software Manager.

Chapter 1 Database Systems

Dep. of Information Technology By: Raz Dara Mohammad Amin

Big Data Analysis in Digital Marketing

Reportnet 3.0 Database Feasibility Study – Approach

UNIT 6 RECENT TRENDS.

Map Reduce, Types, Formats and Features

Architecture of modern data warehouse

Presentation transcript:

DBSI Teaser presentation The Beckman Report On Database Research Presented by: Akshita Anand (2012015) Priyanka Balotra (MT14018) Sakshi Agarwal (MT14043) 28 database researchers, two invited speakers, October 2013, Beckman Center and University of California. 30 leaders from the database community had a meeting on ‘Big data as a defining challenge of our time” Hello everyone, we are going to deliver a presentation on “The Beckman Report on Database Research”. Very famous people like Rakesh Agarwal, Daniel Abadi, Raghu Ramakrishnan and many more were involved in this conference.

Content Characteristics of Big Data Research Challenges Community Challenges Conclusion This will be our roadmap for this presentation.

Characteristics Of Big Data Big Data is identified as a defining challenge for the field of Database. So we all know what big data is. It is a broad term for data sets which is so large that the traditional data processing applications are inadequate. Let’s discuss about the characteristics of Big Data. It’s the 3-V. That is Volume, Velocity and Variety. Volume is the huge amount of data. In velocity we focus on how fast the operations can be performed on the big data and variety is that big data is collected from various sources so there is like a variety of big data.

Research Challenges Scalable big/fast data infrastructures 2. Coping with diversity in data management End-to-end processing of data So at the end of that meeting, those researchers were able to pen down 3 main challenges for Big Data: scalable big/fast data infrastructures coping with diversity in data management end-to-end processing of data cloud services the roles of people in the data life cycle and we will go through these challenges in this presentation.

Challenge #1:Scalable Big Data Infrastructure Taking about the first challenge: scalable big data infrastructures. Let’s talk about this in more detail. Parallel and Distributed Processing – Database world has seen success through Parallel Processing, Data Warehousing and Higher-level languages due to which data processing has become easier like Hadoop. but we cannot ignore that more powerful cost-aware query processors and optimizers are needed to fully exploit the large clusters and for that we require New Hardware. Eg: fields like graphics processing units and integrated circuits produce very large data so to process them, more heterogeneous environments are required. We need specialized processors to be more specific but we cannot overlook the cost-efficient storage. And for this both server-attached and network-attached storage architectures need to be considered like HDFS. Let’s imagine that we are able to satisfy all these requirements, so then there will come a time when we’ll be receiving data at higher speed. So we also need algorithms to process those streams of data. One important challenge we face is Late-bound schemas. We need query engines that can efficiently run over raw files which are processed only once. (As they are processed only once storing and indexing them is costly, so they should be stored as binary files). For Consistency, many systems are developed to ensure consistency like NoSQL systems but most of them provide only basic data access and weak atomicity and isolation guarantees so there is a need to revisit programming models for data consistency. Lastly, scalability should be measured not only in petabytes of data and queries per second, but also total cost of ownership, end-to-end processing speed or usability. To measure progress against such broader metrics, new types of benchmarks will be required.

Challenge #2:Diversity in data management No one-size-fits-all. Cross-platform integration Integration of platforms Hiding heterogeneity Optimization of performance Programming models. Diversity in programming abstractions and reusuablilty Need of more than one language! Focus on domain- specific language Data processing workflows platforms that can span both "raw" and "cooked" data. example, querying data with SQL and then analyzing it with R 1. platforms need to be integrated or federated to enable data analysts to combine and analyze data across systems. involves hiding the heterogeneity of data formats and access languages also, optimizing the performance of accesses and flows in big data systems. Disconnected devices also raises challenges in reliable data ingestion, query processing, and data inconsistency in such sometimes-connected, wide-area environments. 2. diverse programming abstractions to operate on very large datasets development of reusable middle-layer components single data analysis language doesn’t meet everyone’s need User must feel free to use language they feel comfortable to anaylse their data. Like R python sql etc. tools that simplify the implementation of new scalable, data-parallel languages.

Challenge #3:End-to-end processing of data Data-to-knowledge pipeline steps of the raw-data-to-knowledge pipeline data acquisition; selection, assessment, cleaning, and transformation, extraction and integration etc. greater diversity of data and users Tool diversity need of multiple tools to solve each step of raw-data-to-knowledge pipeline Tool customizability domain knowledge, such as dictionaries, knowledge bases, and rules. easy to customize to a new domain Hand crafted rules are needed along with machine learning Ex- precision sensitive applications like e-commerce In the pipeline mentioned above, we need to take into consideration human feedback also. nd must be usable by subject-matter experts, not just by IT professionals. For example, a journalist may want to clean, map, and publish data from a spreadsheet file of crime statistics. Tools must also be tailored to data scientists, the new class of data analysis professionals that has emerged. Data comes in wide variety of formats like structured and unstructured. But we need to use them together and in a structured fashion. seamlessly integrated and easy to use for both lay and expert users, We will need machine learning as well as hand crafted rules here. Hand crafted rules will give us precision need in sensitive systems. We need to cover corner cases in such applications and that can be done by hand crafted rules. So our tool should support this feature also.

Mostly expensive proprietory products Understanding data Open source Few open source tools Mostly expensive proprietory products Understanding data Capturing and managing appropriate meta-information Eg. Facebook automatically identifies faces in the image so users can optionally tag them Knowledge base The more knowledge about a target domain, the better that tools can analyze the domain Open source data capture, data processing, analysis, and the generation of outputs Understanding data Filtering, summarization and visualization etc required

Community Challenges Conclusion Some of these are new, some old, brought by big data and are becoming increasingly important: Database education Data science Conclusion Database research has been restricted by the rigors of the enterprise and relational database systems Handling data diversity; exploiting new hardware, software, and cloud-based platforms; It is also time to rethink approaches to education, involvement with data consumers, and our value system and its impact on how we evaluate Conclusion This is an exciting time for database research The rise of big data and the vision of a data-driven world present many exciting new research challenges

THANK YOU