Presentation is loading. Please wait.

Presentation is loading. Please wait.

Introduction & Options for Storing Connected Data

Similar presentations


Presentation on theme: "Introduction & Options for Storing Connected Data"— Presentation transcript:

1 Introduction & Options for Storing Connected Data
충북대학교 정보통신공학부 김민수

2 목 차 NOSQL Overview What is a Graph? Usage of graph theory
A High-Level view of the Graph space Options for Storing Connected Data Q&A

3 NOSQL Overview NOSQL Not Only SQL? No to SQL?
NOSQL as a term defines what those data stores are not—they’re not SQL-centric relational databases—rather than what they are, which is an interesting and useful set of storage technologies whose operational, functional, and architectural characteristics are many and varied. Relational Database problem large datasets become unwieldy when stored in relational databases. query execution times increase as the size of tables and the number of joins grow (so-called join pain).

4 NOSQL Overview ACID vs BASE ACID BASE
Atomic, Consistent, Isolated, Durable BASE Basic availability : The store appears to work most of the time Soft-state : Stores don’t have to be write-consistent, nor do different replicas have to be mutually consistent all the time Eventual consistency : Stores exhibit consistency at some later point

5 NOSQL Overview NOSQL store quadrants

6 NOSQL Overview NOSQL store quadrants Document Stores
Document databases store and retrieve documents, just like an electronic filing cabinet Independent records at write time, no need to transact across replicas CouchDB

7 NOSQL Overview NOSQL store quadrants Key-Value Stores
Key-value stores lineage comes from Amazon’s Dynamo databases Distributed hashmap data structures that store and retrieve opaque values by key

8 NOSQL Overview NOSQL store quadrants Column Family
Column family stores are modeled on Google’s BigTable.

9 NOSQL Overview NOSQL store quadrants Graph Databases (Hypergraphs)
A hypergraph is a generalized graph model in which a relationship (called a hyper edge) can connect any number of nodes. Property graph hypergraph

10 NOSQL Overview NOSQL store quadrants Graph Databases (Triples)
Triple stores come from the Semantic Web movement, where researchers are interested in large-scale knowledge inference by adding semantic markup to the links that connect web resources

11 What is a Graph? Graph history 7 bridges of Konigsberg
Konigsberg has a beautiful medieval city in the Prussian empire, situated on the river Pregel. It is located between Poland and Lithuania in today’s Russia The seven bridges are connected to the four different parts of the city The essence of the problem that people were trying to solve was to take a tour of the city, visiting every one of its parts and crossing every single one of its brides, without having to walk a single bridge or street twice

12 What is a Graph? Graph history 7 bridges of Konigsberg

13 What is a Graph? Graph Graph is just a collection of vertices and edges ( nodes and relationships)

14 Usage of graph theory Social studies

15 Usage of graph theory Biological studies

16 Usage of graph theory Computer science

17 Usage of graph theory Flow problems

18 Usage of graph theory Route problems

19 Usage of graph theory Web Search

20 A High-Level View of the Graph space
From 10,000 feet Technologies used primarily for transactional online graph persistence, typically accessed directly in real time from an application These technologies are called graph databases and are the main focus of this book They are the equivalent of “normal” online transactional processing (OLTP) databases in the relational world Technologies used primarily for offline graph analytics, typically performed as a series of batch steps These technologies can be called graph compute engines They can be thought of as being in the same category as other technologies for analysis of data in bulk, such as data mining and online analytical processing (OLAP).

21 A High-Level View of the Graph space
Graph Databases The underlying storage Some graph databases use native graph storage that is optimized and designed for storing and managing graphs Some serialize the graph data into a relational database, an object-oriented database, or some other general-purpose data store The processing engine Some definitions require that a graph database use index-free adjacency, meaning that connected nodes physically “point” to each other in the database

22 A High-Level View of the Graph space
Graph Databases

23 A High-Level View of the Graph space
Graph Compute Engines technology that enables global graph computational algorithms to be run against large datasets “how many relationships, on average, does everyone in a social network have?”

24 Options for Storing Connected Data
Relational Databases Lack Relationships Relational databases were initially designed to codify paper forms and tabular structures Join tables add accidental complexity Foreign key constraints add additional development and maintenance overhead just to make the database work Sparse tables with nullable columns require special checking in code, despite the presence of a schema. Several expensive joins are needed just to discover what a customer bought.

25 Options for Storing Connected Data
Relational Databases Lack Relationships

26 Options for Storing Connected Data
Relational Databases Lack Relationships Alice’s friends-of-friends

27 Options for Storing Connected Data
NOSQL Databases Also Lack Relationships Most NOSQL databases —whether key-value-, document-, or column-oriented—store sets of disconnected documents/values/columns. This makes it difficult to use them for connected data and graphs.

28 Options for Storing Connected Data
NOSQL Databases Also Lack Relationships There’s another weak point in this scheme asking the database who has bought a particular product => expensive operation

29 Options for Storing Connected Data
NOSQL Databases Also Lack Relationships There’s another weak point in this scheme “who are Bob’s friends? => Alice, Zach “who is friends with Bob?” => expensive operation

30 Options for Storing Connected Data
Graph Databases Embrace Relationships Connected data is stored as connected data

31 보충 발표 Key-Value Stores Key를 Hash table을 사용하여 저장 & 검색
DB보단 대용량 cache store 개념 Amazon SimpleDB, Redis

32 보충 발표 Column family stores 여러 서버에 분산된 수많은 데이터를 저장, 처리하기 위해 만들어짐
Cassandra, HBase

33 보충 발표 Document DB 기본적으로 Key-Value store와 비슷
Document는 많은 Key-Value collection들의 collection JSON과 같은 포맷으로 저장됨 Key-Value DB의 차세대 버전 CouchDB, MongoDB

34 보충 발표 Graph the graph database world is populated with both technology designed to be “graph first,” known as native, and technology where graphs are an afterthought, classified as non-native.

35 보충 발표 Native Graph Database (Neo4j)
Non-native graph storage uses a relational database, a columnar database or some other general-purpose data store rather than being specifically engineered for the uniqueness of graph data. Neo4j Native Graph Database

36 보충 발표 Native Graph Processing
A graph database has native processing capabilities if it uses index-free adjacency This means that each node directly references its adjacent nodes, acting as a micro-index for all nearby nodes

37 보충 발표 Non-native Graph Processing

38 보충 발표 Non-native vs Neo4j Neo4j : 128GB machine

39 보충 발표 Graph Databases

40 보충 발표 FlockDB (출처 : flockDB github) MySQL 지원

41 보충 발표 Graph DB 간략 설명(

42 Q & A 감사합니다.


Download ppt "Introduction & Options for Storing Connected Data"

Similar presentations


Ads by Google