Solr Power FTW Alex #solrnosql. What Will I Cover? Who I am What Bazaarvoice does SOLR and NoSQL Can SOLR handle 20K queries per second?

Slides:



Advertisements
Similar presentations
DynaTrace Platform.
Advertisements

Inner Architecture of a Social Networking System Petr Kunc, Jaroslav Škrabálek, Tomáš Pitner.
Module 13: Performance Tuning. Overview Performance tuning methodologies Instance level Database level Application level Overview of tools and techniques.
2 Proprietary & Confidential What is Sharding Benefits of Sharding Alternatives of Sharding When to start Sharding Agenda.
Relational Database Alternatives NoSQL. Choosing A Data Model Relational database underpin legacy applications and meet business needs However, companies.
By: Chris Hayes. Facebook Today, Facebook is the most commonly used social networking site for people to connect with one another online. People of all.
Reporter: Haiping Wang WAMDM Cloud Group
Google Bigtable A Distributed Storage System for Structured Data Hadi Salimi, Distributed Systems Laboratory, School of Computer Engineering, Iran University.
NoSQL and NewSQL Justin DeBrabant CIS Advanced Systems - Fall 2013.
CS 405G: Introduction to Database Systems 24 NoSQL Reuse some slides of Jennifer Widom Chen Qian University of Kentucky.
Chapter 9 Overview  Reasons to monitor SQL Server  Performance Monitoring and Tuning  Tools for Monitoring SQL Server  Common Monitoring and Tuning.
1 Yasin N. Silva Arizona State University This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Performance and Scalability. Performance and Scalability Challenges Optimizing PerformanceScaling UpScaling Out.
First Indico Workshop Database Technology Pedro Ferreira May 2013 CERN.
USING HADOOP & HBASE TO BUILD CONTENT RELEVANCE & PERSONALIZATION Tools to build your big data application Ameya Kanitkar.
AN INTRODUCTION TO NOSQL DATABASES Karol Rástočný, Eduard Kuric.
ZhangGang, Fabio, Deng Ziyan /31 NoSQL Introduction to Cassandra Data Model Design Implementation.
HBase A column-centered database 1. Overview An Apache project Influenced by Google’s BigTable Built on Hadoop ▫A distributed file system ▫Supports Map-Reduce.
Getting Biologists off ACID Ryan Verdon 3/13/12. Outline Thesis Idea Specific database Effects of losing ACID What is a NoSQL database Types of NoSQL.
Physical Database Design & Performance. Optimizing for Query Performance For DBs with high retrieval traffic as compared to maintenance traffic, optimizing.
Modern Databases NoSQL and NewSQL Willem Visser RW334.
NoSQL Not Only SQL Edel Sherratt. What is NoSQL? Not Only SQL Large volumes of data No schema Partition tolerance – scale by adding more commodity servers.
Cloud Computing Clase 8 - NoSQL Miguel Johnny Matias
Methodological Foundations of Biomedical Informatics (BMSC-GA 4449) Himanshu Grover.
When bet365 met Riak and discovered a true, “always on” database.
Processing of the WLCG monitoring data using NoSQL J. Andreeva, A. Beche, S. Belov, I. Dzhunov, I. Kadochnikov, E. Karavakis, P. Saiz, J. Schovancova,
MongoDB is a database management system designed for web applications and internet infrastructure. The data model and persistence strategies are built.
NoSQL Or Peles. What is NoSQL A collection of various technologies meant to work around RDBMS limitations (mostly performance) Not much of a definition...
NoSQL Systems Motivation. NoSQL: The Name  “SQL” = Traditional relational DBMS  Recognition over past decade or so: Not every data management/analysis.
NOSQL DATABASE Not Only SQL DATABASE
Grid Technology CERN IT Department CH-1211 Geneva 23 Switzerland t DBCF GT IT Monitoring WG Technology for Storage/Analysis 28 November 2011.
Your Data Any Place, Any Time Performance and Scalability.
Distributed Logging Facility Castor External Operation Workshop, CERN, November 14th 2006 Dennis Waldron CERN / IT.
Cloudera Kudu Introduction
Data and Information Systems Laboratory University of Illinois Urbana-Champaign Data Mining Meeting Mar, From SQL to NoSQL Xiao Yu Mar 2012.
21 Copyright © 2008, Oracle. All rights reserved. Enabling Usage Tracking.
Group members: Phạm Hoàng Long Nguyễn Huy Hùng Lê Minh Hiếu Phan Thị Thanh Thảo Nguyễn Đức Trí 1 BIG DATA & NoSQL Topic 1:
Improve query performance with the new SQL Server 2016 query store!! Michelle Gutzait Principal Consultant at
Cofax Scalability Document Version Scaling Cofax in General The scalability of Cofax is directly related to the system software, hardware and network.
This document is provided for informational purposes only and Microsoft makes no warranties, either express or implied, in this document. Information.
Intro to NoSQL Databases Tony Hannan November 2011.
Plan for Final Lecture What you may expect to be asked in the Exam?
CSE-291 (Distributed Systems) Winter 2017 Gregory Kesden
CS 405G: Introduction to Database Systems
Mail call Us: / / Hadoop Training Sathya technologies is one of the best Software Training Institute.
DBSI Teaser Presentation
and Big Data Storage Systems
Understanding and Improving Server Performance
Designing High Performance BIRT Reports
Table General Guidelines for Better System Performance
Table spaces.
CS122B: Projects in Databases and Web Applications Winter 2017
Introduction In the computing system (web and business applications), there are enormous data that comes out every day from the web. A large section of.
MongoDB Er. Shiva K. Shrestha ME Computer, NCIT
Java 9: The Quest for Very Large Heaps
Modern Databases NoSQL and NewSQL
NOSQL.
NOSQL databases and Big Data Storage Systems
Software Architecture in Practice
CSE-291 (Cloud Computing) Fall 2016 Gregory Kesden
Blazing-Fast Performance:
1 Demand of your DB is changing Presented By: Ashwani Kumar
NOSQL and CAP Theorem.
ColdFusion Performance Troubleshooting and Tuning
Table General Guidelines for Better System Performance
Cloud computing mechanisms
(A Research Proposal for Optimizing DBMS on CMP)
NoSQL databases An introduction and comparison between Mongodb and Mysql document store.
build a real time operational data lake in minutes.
Presentation transcript:

Solr Power FTW Alex #solrnosql

What Will I Cover? Who I am What Bazaarvoice does SOLR and NoSQL Can SOLR handle 20K queries per second? Lessons learned: large scale multi data center deployment Conclusion

Alex Pinkin Software Engineering Lead, Data Infrastructure team, Bazaarvoice Loves to play with SQL and

Bazaarvoice Bazaarvoice is a software as a service company powering user generated content such as ratings and reviews on thousands of web sites 5 billion page views per month 230 billion impressions 75 million UGC

NoSQL ?

SQL vs NoSQL

NoSQL is Not Only SQL Departs from relational model No fixed schema No joins Eventual consistency is OK Scale horizontally

Types of NoSQL Key-value (Redis, Riak, Voldemort) Document (MongoDB, CouchDB) Graph (Neo4J, FlockDB) Column family (Cassandra, HBase)

SOLR as NoSQL Non-relational model - Check No fixed schema - Check (dynamic fields) No joins - Check (denormalization) Horizontal scaling - Check (with work)

SOLR stats - Bazaarvoice

SOLR Case Study

SOLR Case Study

Life Before SOLR Indexes for sorting and filtering Aggregate tables for stats Nightly jobs Bugs...

Enter SOLR Index content and product catalog De-normalization Filtering and sorting Index every 15 minutes (20 seconds NRT)

SOLR - Statistics COUNT, SUM, AVG, MIN, MAX (StatsComponent) Stored fields Whenever content changes, re-calc stats for all affected subjects

Scaling reads - Replication

Replication - Multiple Data Centers

Chatty if using multiple cores Relay Core auto-warming disabled Connection wait and read timeouts increased Replication poll interval increased (15 min) Compression enabled :15:00 internal...

SOLR Cloud - Bazaarvoice version Multiple cores (100+ per server) Re-balance indexes across cores and servers o Automatic o Manual Deployment map stored in MySQL o Host - Core - Partition o Statistics Partition lifecycle

Schema Changes Re-indexing is time consuming for large indexes Process 1.Full re-index off-line prior to the release 2.Incremental indexing after the release Bottleneck: reading from MySQL Goal: Transparent re-indexing

Performance Tuning Heap size Cache sizing Auto-warming Stored fields Merge factor Commit frequency Optimize frequency Process: Simulate and measure Replay logs Analyze metrics Monitor GC

Performance Tuning - GC # Java memory usage settings # Force the NewSize to be larger than the JVM typically allocates. # In practice, the JVM has been allocating an extremely small Young generation which objects to be prematurely promoted to the Tenured generation JAVA_MEM_OPTS="-Xms27g -Xmx27g -XX:NewRatio=8" # -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps --> Turn on GC Logging # -XX:+UseConcMarkSweepGC --> Use the concurrent collector # -XX:+CMSIncrementalMode --> Incremental mode for the concurrent collector # -XX:+CMSIncrementalPacing --> Let the JVM adjust the amount of incremental collection JAVA_GC_OPTS="-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps - XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:CMSInitiatingOccupancyFraction=55 - XX:ParallelGCThreads=8 -XX:SurvivorRatio=4"

SOLR Performance - Summary SOLR loves RAM! Log replay Same config, same hardware Get the most out of one instance SOLR

Conclusion - SOLR Strengths Lightning fast given enough RAM Good scale out support including multi-data center Great community

Conclusion - SOLR's Gaps Not fully elastic Real time takes work Secondary data store = sync overhead Schema changes