Presentation is loading. Please wait.

Presentation is loading. Please wait.

Elasticsearch and SQL Server Integration

Similar presentations


Presentation on theme: "Elasticsearch and SQL Server Integration"— Presentation transcript:

1

2 Elasticsearch and SQL Server Integration
Chris Miller Xceligent

3 Xceligent Data-driven commercial real estate company
We are hiring DBA’s, ETL Engineers, QA, DevOps and Developers (.Net, node, angular). Great benefits, great location (KC Crossroads) and we’re working on really important stuff. See me after the presentation and we can exchange info.

4 Overview What is Elasticsearch and when should I use it?
How does Elasticsearch work? How do I get data into Elasticsearch from SQL Server?

5 What is Elasticsearch? Enterprise-ready full-text search engine
JSON Document store Either schema-less or strictly typed

6 What Cool Stuff does it Do?
Very fast auto-complete for dropdowns on forms. Typical big data scenarios of large data volumes with low change rates. Scales down to small volumes better than other big data solutions, so it’s easy to develop with.

7 Enterprise Features Clustering for redundancy and performance
Storage engine compression using GZip Multi-Platform and Cloud Hosted

8 Clustering Run multiple service instances across multiple nodes.
Automatically redundant. Adding nodes to the cluster makes it faster. Can run data vs master specialized nodes. Scales up and down well.

9 Storage Engine Compression
Uses standard GZip compression libraries, which are fast, portable, and efficient. All I/O to disk is compressed by default, and when read into memory is uncompressed. Reduces I/O load, no noticeable CPU load.

10 Multiplatform and Cloud Friendly
Java runtime required. Runs on Linux, MacOSX, Windows out of the box. Runs on AWS EC2 Microsoft Azure Search Elastic’s found.no service

11 What’s Not in the Box? Management is done via add-ons, specifically head and marvel Security and security models don’t exist outside of add-on modules or web proxies. Shield is from Elastic and works for user- level security, and NGINX works really well to do network filtering and user-based authentication. Marvel, Shield, and support are the business model for Elastic. Support contracts are very reasonable and include those packages. found.no’s SaaS features those add-ons and support built into the base price.

12 When NOT to use Elasticsearch
High-volume updates Very transactional data Multi-user updates If you can’t monitor, manage, and maintain the system.

13 Mapping Elasticsearch onto structures you know.
Cluster = Server Index = Database Mapping = Table Document = Row

14 JSON Document Store Documents are the rows of Elasticsearch.
Documents may be strictly or dynamically mapped. Each document has a primary key, which is either assigned at load time or auto-assigned Also supports Parent-Child relationships between documents.

15 What is a JSON Document? Documents contain fields, which may be simple or multi-valued, or multi-valued arrays. Very rich document structures are possible and manageable. Support for the usual base types plus geo points, auto-complete optimized strings, and a few other things.

16 Sample JSON Document The fields that start with _ are metadata.
The stuff inside _source is the document.

17 Mapping Mappings define the document structure.
May be left out, if you’re either really careful or not real smart. Loosely relate to table structures in a database, but aren’t restricted to 2 dimensions.

18 Mapping Sample “strict”: docs must follow this structure.
StyleList: Possibly an array?

19 _id: Primary Key Each document has a field called _id associated with it. _id’s are unique and will be auto-generated if you don’t supply one when inserting data. Highly recommended to use meaningful _id’s if possible. Type is a string, the generated ones are 20 character alphanumeric. Not part of the mapping, part of the metadata.

20 Parent-Child Modeling
Mappings in an index can declare another mapping as a parent. Works like a foreign key, requires parent when inserted and denies insert if you don’t have a parent. Enables the has_parent and has_child queries.

21 Child Model Example Has a “parent” tag. Can be complex
Can be searched, paged independently of parent.

22 Schema-less Documents
You can create a schema-less type. Field names are case sensitive. Analysis bug pre-2.0. Great if you need absolute flexibility. Bad if you don’t need absolute flexibility. There is a _mapping api to adjust mappings and add columns.

23 How to use Elasticsearch
Install Elasticsearch. Great documentation online with step-by-step instructions for your server. Scales well: single-node clusters are easy to set up for dev and test environments. Operates as a web server, make sure you have CURL or appropriate tools to GET, POST and DELETE (postman, etc)

24 Elasticsearch API’s Elasticsearch uses standard HTTP to communicate (can use HTTPS via add ons or reverse proxy) Sending a GET request with an ID returns a document. Use POST to send requests for _search and others. Sending data uses a POST request to the _bulk API

25 API Example Standard request, this one in Perl, other languages similar. The message that is posted, which is 2 lines (unique to the _bulk api).

26 Another API Example Posted to http://server/index/_search
Returns rows.

27 Copying Data to Elasticsearch
Lots of different scenarios, here are 3 common ones: Data Warehouse Load Operational Data Store Load Real(ish) Time Search Load

28 Data Warehouse Load Get a listing of the things that changed that the warehouse cares about Write an ETL to extract, transform and serialize, and load documents

29 Data Warehouse Example
Uses Logstash and the JDBC connector Extracts, transform/serialize, and load in one package Write each extract into a separate index and use an alias to join them all together. Versatile, open source engine, and part of the Elasticsearch family.

30 ODS Example Lighter weight extract than DW
Designed to be run hourly or more often Still runs in batches Uses change tracking and polling

31 Real-ish time search You *could* build this with SQL Server Service Broker, or any other queue system. You should build it using something like Apache Camel, BusinessWorks, or Mule. This is ESB territory. Doing it the right way requires a mature, modern software stack with a data access layer. Doing it the wrong way will cause lots of sync problems.

32 Closing Thoughts Engineering and Prototyping Don’t overbuild
Monitor carefully


Download ppt "Elasticsearch and SQL Server Integration"

Similar presentations


Ads by Google