Presentation is loading. Please wait.

Presentation is loading. Please wait.

SERIALIZED DATA STORAGE Within a Database James Devens (devensj)

Similar presentations


Presentation on theme: "SERIALIZED DATA STORAGE Within a Database James Devens (devensj)"— Presentation transcript:

1 SERIALIZED DATA STORAGE Within a Database James Devens (devensj)

2 THE IDEA  Serialized data can be used to store the current state of objects in a database.  Good alternative to deprecated object based databases.  Storing separate data values into a single byte array.

3 TOOLS USED  MySQL Workbench  DigitalOcean Server Hosting  PuTTY  WinSCP  Microsoft Excel & PowerPoint  Vim (Java Source)  Protocol Buffers (Google)  JDBC (Java Database Connectivity)  United States 2000 Census

4 PREDICTIONS  Data will usually take less storage as byte arrays.  Data will take less time to do basic queries (non-indexed database).  Serialized data will be harder to access in a relational database.  It can defeat the purpose of relational databases

5 DATABASE STRUCTURE  Census Table  Census_ pb Table

6 INSERTING DATA  Data inserted into both tables using JDBC Prepared Statements  Prevents SQL injections  Allows similar queries to execute FASTER  Serialized data through the use of Protocol Buffers  Developed by Google  More secure and portable than Java serialization

7 INSERTING DATA (NON-SERIALIZED)

8 INSERTING DATA (SERIALIZED)

9 QUERYING DATA  Use an array of names  Each of these names will be queried  This process repeats however many times specified (default 1000)  Number of Queries = NumLoops * Names.length * 2

10 QUERYING DATA

11 DATA COLLECTION  Modified the simple query class to record data  Exported to.csv for Microsoft Excel  Each data sample consisted of 5 names being queried 10000 times  5000 data samples were taken  Number of Queries = 50000 * 5000 * 2 = 500,000,000 queries

12 DATA COLLECTION

13 RESULTS (INSERTS)  Results:  Non-Serialized  INSERT Dump Success!  Took: 204651 ms to complete.  Serialized  INSERT Dump Success!  Took: 190233 ms to complete.

14 RESULTS (DATA COLLECTION)  Results:  Took 27623887 ms to complete (7.67 hours).  5000 loops, and 500000000 queries executed.

15 RESULTS (DATA COLLECTION) Every 50,000 Queries

16 RESULTS (STORAGE)  Non-Serialized Data Space  Serialized Data Space 4194303 Byte (4.19 MB) Difference

17 CONCLUSION  Data storage is reduced quite a bit, making it efficient to store serialized data  The query speeds were roughly the same  Serialization is good way to store object states  Serialization is NOT a good way to store frequently changing objects  If an object class is modified it would ruin all of your current data  It is NOT relational friendly (for the most part)  You cannot access the original data values inside the byte array without another program’s help

18 FUTURE WORK  Write a program to return the byte array back to the original object (easy)  Use a different.proto file with tons of data values (e.g. 2000 doubles)  Find more test statistics and collect more data  Index the data to see how it affects query speeds of both methods


Download ppt "SERIALIZED DATA STORAGE Within a Database James Devens (devensj)"

Similar presentations


Ads by Google