Discussion MySQL&Cassandra ZhangGang 2012/11/22. Optimize MySQL.

Slides:

Advertisements

Similar presentations

2 Proprietary & Confidential What is Sharding Benefits of Sharding Alternatives of Sharding When to start Sharding Agenda.

Advertisements

Based on the text by Jimmy Lin and Chris Dryer; and on the yahoo tutorial on mapreduce at index.html

Data Management in the Cloud Paul Szerlip. The rise of data Think about this o For the past two decades, the largest generator of data was humans -- now.

COLUMN-BASED DBS BigTable, HBase, SimpleDB, and Cassandra.

NoSQL Databases: MongoDB vs Cassandra

Databases. Database Information is not useful if not organized In database, data are organized in a way that people find meaningful and useful. Database.

Google Bigtable A Distributed Storage System for Structured Data Hadi Salimi, Distributed Systems Laboratory, School of Computer Engineering, Iran University.

Attribute databases. GIS Definition Diagram Output Query Results.

+ Hbase: Hadoop Database B. Ramamurthy. + Motivation-1 HDFS itself is “big” Why do we need “hbase” that is bigger and more complex? Word count, web logs.

Hadoop Team: Role of Hadoop in the IDEAL Project ●Jose Cadena ●Chengyuan Wen ●Mengsu Chen CS5604 Spring 2015 Instructor: Dr. Edward Fox.

USING HADOOP & HBASE TO BUILD CONTENT RELEVANCE & PERSONALIZATION Tools to build your big data application Ameya Kanitkar.

AN INTRODUCTION TO NOSQL DATABASES Karol Rástočný, Eduard Kuric.

Databases From A to Boyce Codd. What is a database? It depends on your point of view. For Manovich, a database is a means of structuring information in.

5.1 © 2007 by Prentice Hall 5 Chapter Foundations of Business Intelligence: Databases and Information Management.

Introduction –All information systems create, read, update and delete data. This data is stored in files and databases. Files are collections of similar.

Systems analysis and design, 6th edition Dennis, wixom, and roth

CPSC 203 Introduction to Computers T59 & T64 By Jie (Jeff) Gao.

ZhangGang, Fabio, Deng Ziyan /31 NoSQL Introduction to Cassandra Data Model Design Implementation.

SOFTWARE SYSTEMS DEVELOPMENT MAP-REDUCE, Hadoop, HBase.

Simple Database.

HBase A column-centered database 1. Overview An Apache project Influenced by Google’s BigTable Built on Hadoop ▫A distributed file system ▫Supports Map-Reduce.

PHP meets MySQL.

TM 7-1 Copyright © 1999 Addison Wesley Longman, Inc. Physical Database Design.

Distributed Indexing of Web Scale Datasets for the Cloud {ikons, eangelou, Computing Systems Laboratory School of Electrical.

ZhangGang Since the Hadoop farm has not successfully configured at CC, so I can not do some test with HBase. I just use the machine named.

INFO 344 Web Tools And Development CK Wang University of Washington Spring 2014.

MySQL. Dept. of Computing Science, University of Aberdeen2 In this lecture you will learn The main subsystems in MySQL architecture The different storage.

SQL Queries Relational database and SQL MySQL LAMP SQL queries A MySQL Tutorial and applications Database Building Assignment.

LOGO Discussion Zhang Gang 2012/11/8. Discussion Progress on HBase 1 Cassandra or HBase 2.

Databases From A to Boyce Codd. What is a database? It depends on your point of view. For Manovich, a database is a means of structuring information in.

Changwon Nati Univ. ISIE 2001 CSCI5708 NoSQL looks to become the database of the Internet By Lawrence Latif Wed Dec Nhu Nguyen and Phai Hoang CSCI.

NoSQL Databases Oracle - Berkeley DB Rasanjalee DM Smriti J CSC 8711 Instructor: Dr. Raj Sunderraman.

NoSQL Databases Oracle - Berkeley DB. Content A brief intro to NoSQL About Berkeley Db About our application.

SQL Queries Relational database and SQL MySQL LAMP SQL queries A MySQL Tutorial and applications Database Building.

+ Hbase: Hadoop Database B. Ramamurthy. + Motivation-0 Think about the goal of a typical application today and the data characteristics Application trend:

VICTORIA UNIVERSITY OF WELLINGTON Te Whare Wananga o te Upoko o te Ika a Maui SWEN 432 Advanced Database Design and Implementation Exam and Lecture Overview.

1 CS 430 Database Theory Winter 2005 Lecture 16: Inside a DBMS.

CS 347Lecture 9B1 CS 347: Parallel and Distributed Data Management Notes 13: BigTable, HBASE, Cassandra Hector Garcia-Molina.

6.1 © 2010 by Prentice Hall 6 Chapter Foundations of Business Intelligence: Databases and Information Management.

The Replica Location Service The Globus Project™ And The DataGrid Project Copyright (c) 2002 University of Chicago and The University of Southern California.

MongoDB is a database management system designed for web applications and internet infrastructure. The data model and persistence strategies are built.

Distributed Networks & Systems Lab Distributed Networks and Systems(DNS) Lab, Department of Electronics and Computer Engineering Chonnam National University.

Introduction to Hbase. Agenda  What is Hbase  About RDBMS  Overview of Hbase  Why Hbase instead of RDBMS  Architecture of Hbase  Hbase interface.

Indexes and Views Unit 7.

Database Management Supplement 1. 2 I. The Hierarchy of Data Database File (Entity, Table) Record (info for a specific entity, Row) Field (Attribute,

CQL – C ASSANDRA Q UERY L ANGUAGE Courtney Robinson – Eric Evans (Python tests and CQL Spec)

Physical Database Design Purpose- translate the logical description of data into the technical specifications for storing and retrieving data Goal - create.

Clusterpoint Margarita Sudņika ms RDBMS & NoSQL Databases & tables → Document stores Columns, rows → Schemaless documents Scales UP → Scales UP.

Distributed Time Series Database

CPSC 203 Introduction to Computers T97 By Jie (Jeff) Gao.

1 HBASE – THE SCALABLE DATA STORE An Introduction to HBase XLDB Europe Workshop 2013: CERN, Geneva James Kinley EMEA Solutions Architect, Cloudera.

Data and Information Systems Laboratory University of Illinois Urbana-Champaign Data Mining Meeting Mar, From SQL to NoSQL Xiao Yu Mar 2012.

Unit-8 Introduction Of MySql. Types of table in PHP MySQL supports various of table types or storage engines to allow you to optimize your database. The.

Database Processing Chapter "No, Drew, You Don’t Know Anything About Creating Queries.” Copyright © 2015 Pearson Education, Inc. Operational database.

Department of Computer Science, Johns Hopkins University EN Instructor: Randal Burns 24 September 2013 NoSQL Data Models and Systems.

SQL Basics Review Reviewing what we’ve learned so far…….

Group members: Phạm Hoàng Long Nguyễn Huy Hùng Lê Minh Hiếu Phan Thị Thanh Thảo Nguyễn Đức Trí 1 BIG DATA & NoSQL Topic 1:

BIG DATA/ Hadoop Interview Questions.

1 Gaurav Kohli Xebia Breaking with DBMS and Dating with Relational Hbase.

and Big Data Storage Systems

Relational database and SQL MySQL LAMP SQL queries

SQL Queries Relational database and SQL MySQL LAMP SQL queries

Physical Database Design and Performance

NOSQL databases and Big Data Storage Systems

Physical Database Design

Introduction to Apache

NoSQL Databases Antonino Virgillito.

Copyright © JanBask Training. All rights reserved Get Started with Hadoop Hive HiveQL Languages.

Presentation transcript:

Discussion MySQL&Cassandra ZhangGang 2012/11/22

Optimize MySQL

Index in MySQL: the dump include the index. –type_*_job has no index. –in_*_job has two indexes. –key_*_* has three indexes.

Optimize MySQL innodb_file_per_table [we can set it in my.cnf] –default innodb_file_per_table = OFF. Store the whole tables and index in one big file named ibdata1. –set innodb_file_per_table = ON. Can store each InnoDB table and its indexes in its own file. –effect: beforenow Diskspace groupby ProcessingType94s80s CPUTime group by Site (08-10)86s73s CPUTime group by Site (10-12)97s81s

Optimize MySQL innodb_buffer_pool_size: –as I know from the web that 70-80% of memory is a safe bet. –My computer’s memory is 2GB, I set the innodb_buffer_pool_size=1GB. But when run the script to communicate with MySQL, the computer becomes very slow and must restart it.

Learning Cassandra

RDBMS: use the ‘join’ operation, increase the normalization and reduce the redundancy. NoSQL ： In contrast with the RDBMS, for getting a better performance and high scalability, get rig of ‘join’ operation, which means denormalizing the data and maintaining multiple copies of data(increase the redundancy). And this is what Cassandra do.

Learning Cassandra column-oriented? row-oriented? –Cassandra is based on Dynamo and BigTable. So it is not incorrect to say it is column-oriented. But each row has a unique key, which makes its data accessible, so it may be more helpful to think of it as an indexed, row-oriented store. –Cassandra stores data in a multidimensional hash table. That means you don’t have to decide ahead of time precisely what your data structure must look like, or what fields your records will need. –In Cassandra, we should think of our queries first, and then provide the data that answers them.

Learning Cassandra Installing Cassandra. –compare with Hadoop HBase, installing Cassandra is simple. Just download the source code and set the right JAVAHOME, input the command “ant”,then Cassandra is successfully installed. –start the Cassandra server: >>bin/cassandra –f –we can use the command line interface:>>bin/cassandra-cli

Learning Cassandra The Cassandra Data Model –Cassandra also has concepts like row, columfamily, column. But the meaning is different. –the column is a name/value pair >cell –the columnfamily is a container for rows that have similar, but not identical, column sets----->table –the keyspace is the outermost container for data in Cassansra > database

Learning Cassandra –If we wanted to create a group of related columns, Cassandra allows us to do this with something called a super column family. A super column family can be thought of as a map of maps.

Learning Cassandra Keyspace has a name and a set of attributes that define keyspace-wide behavior. There are some basic attributes that we can set per keyspace: –Replication factor: refer to the number of nodes that will act as copies of each row of data. –Replica placement strategy: refer to how the replicas will be placed in the cluster. (SimpleStrategy, OldNetworkTopologyStrategy, NetworkTopologyStrategy) –Column families: keyspace is a container for a list of one or more column families. Column families represent the structure of our data.

Learning Cassandra Column Families------is a container for an ordered collection of rows, it likes the table in RDBMS, but it’s not. –It’s schema-free because although the column families are defined, the columns are not. –A column family has two attributes: a name and a comparator. The comparator value indicates how columns will be sorted when they are returned to us in a query. –Cassandra column families as similar to a four-dimensional hash: [Keyspace][ColumnFamily][Key][Column] –If define the column families as super, it will be a five-dimensional hash : [Keyspace][ColumnFamily][Key][SuperColumn][SubColumn]

Learning Cassandra Column Family Options----- There are a few additional parameters that we can define for each column family: –keys_cached –rows_cached –read_repair_chance –preload_row_cache –…

Learning Cassandra Column Sorting---- In Cassandra, we specify how column names will be compared for sort order when results are returned to the client. Here are some choices: –AsciiType –BytesType –LongType –UTF8Type –… Sorting is a design decision –In RDBMS we can use order by to change the orders. In Cassandra, we can’t change the orders after we dictate the it when create a column family.

Learning Cassandra Secondary Indexes: –Secondary Indexes is supported from Cassandra 0.7. It means we can create indexes on column values. Denormalization: –Normalization is not an advantage when working with Cassandra because it performs best when the data model is denormalized. –Instead of modeling the data first and then writing queries, with Cassandra we model the queries and let the data be organized around them. Think of the most common query paths the application will use, and then create the column families that we need to support them.

Learning Cassandra Design patterns: –Materialized View: writing our data to a second column family that is created specifically to represent specified query. –Valueless Column: column name also can save useful information, often used in materialized view 。 –Aggregate Key ： When use the Valueless Column pattern, we may also need to employ the Aggregate Key pattern.It likes xxx:xxx(use colon as the separator)

Learning Cassandra API & python library: –There is a client generation layer, provided by the Thrift API and the Avro project. –There are also high-level Cassandra clients different languages, for python, there has a library named pycassa. Users can easily use python to communicate with Cassandra by using pycassa. Now I'm getting familiar with it.

Thanks now discussing…