Presentation is loading. Please wait.

Presentation is loading. Please wait.

Data Inglorious Atlas: “All this data sure is heavy.” Data: “Indeed, may I suggest moving it to the cloud.”

Similar presentations


Presentation on theme: "Data Inglorious Atlas: “All this data sure is heavy.” Data: “Indeed, may I suggest moving it to the cloud.”"— Presentation transcript:

1 Data Inglorious Atlas: “All this data sure is heavy.” Data: “Indeed, may I suggest moving it to the cloud.”

2 database defined A database is a collection of data, which is organized into files called tables. These tables provide a systematic way of accessing, managing, and updating data. A relational database is one that contains multiple tables of data that relate to each other through special key fields. Relational databases are far more flexible (though harder to design and maintain) than what are known as flat file databases, which contain a single table of data.

3 overview, the payload Oracle Internet Directory, (OID) Zynga Games/Farmville Facebook bioinformatics Calmail

4 ex. oracle OID ex. oracle OID Oracle Internet Directory: 400,000 operations per second on a 500 million user database

5 ex. zynga games ex. zynga games 65 million players a day, millions of web browsers open, millions of farms (Farmville game), millions of frontiers, millions of objects bought and sold…all recorded on a database 500,000 operations-per-second database behind Farmville http://www.readwriteweb.com/cloud/2010/08/me mbase-the-database-powering.php http://www.readwriteweb.com/cloud/2010/08/me mbase-the-database-powering.php

6 ex. facebook 60,000 servers 1,800 MySQL servers, 400 million active users, 200 million a day 50 million operations per second

7 ex. bioinformatics DNA sequence data = prime candidate for study with database systems, Homologous strings Nucleic acids: A denine, G uanine, C ytosine, T hymine 3.4 million base pairs in the human genome, expressed as a string of AGC and T Human Genome Project : 3.4 billion letters of the human genome, Sanger Institute: 1 billion on MySQL

8 ex. calmail Calmail: 4 million e-mails offered a day, 1 million served, MySQL backend, that just failed 

9 flat file v. relational Imagine the needs of two small companies that take customer orders for their products. Company A uses a flat file database with a single table named orders to record orders they receive, while Company B uses a relational database with two tables: orders and customers. When a customer places an order with Company A, a new record (or row) in the table orders is created. Because Company A has only one table of data, all the information pertaining to that order must be put into a single record. This means that the customer's general information, such as name and address, is stored in the same record as the order information, such as product description, quantity, and price. If customers place more than one order, their general information will need to be re-entered and thus duplicated for each order they place. Whenever there is duplicate data, as in the case above, many inconsistencies may arise when users try to query the database. Additionally, a customer's change of address would require the database manager to find all records in orders that the customer placed, and change the address data for each one. Company B is much better off with its relational database. Each of its customers has one and only one record of general information stored in the table customers. Each customer's record is identified by a unique customer code which will serve as the relational key. When a customer orders from Company B, the record in orders need contain only a reference to the customer's code, because all of the customer's general information is already stored in customers. This approach to entering data solves the problems of duplicate data and making changes to customer information. The database manager need change only one record in customers if someone changes addresses. This is document ahrp in domain all. Last modified on April 24, 2006. Indiana University, Knowledge Base http://kb.iu.edu/data/ahrp.htmlhttp://kb.iu.edu/data/ahrp.html

10 flat file v. relational Single table (flat file) v multiple tables (relational)

11 web Connection Example: Plone Content Management System connection to a MySQL database

12 go graphic, phpMyAdmin A graphic interface tool for working with MySQL

13 phpMyAdmin GSPP and phpMyAdmin localhost

14 other database systems Hadoop: distributed processing of large data sets http://code.zynga.com/2011/06/deciding-how-to- store-billions-of-rows-per-day/ http://code.zynga.com/2011/06/deciding-how-to- store-billions-of-rows-per-day/ Membase: new for games and other apps http://www.readwriteweb.com/cloud/2010/08/me mbase-the-database-powering.php http://www.readwriteweb.com/cloud/2010/08/me mbase-the-database-powering.php CouchDB: no schema http://couchdb.apache.org/docs/intro.html


Download ppt "Data Inglorious Atlas: “All this data sure is heavy.” Data: “Indeed, may I suggest moving it to the cloud.”"

Similar presentations


Ads by Google