Data modeling for microservices with Cassandra and Spark

Data modeling for microservices with Cassandra and Spark
Jeff Carpenter, Choice Hotels International Thanks for staying for the last session of the conference I’m Jeff Carpenter, system architect at CHI In this session we’re going to talk about why data modeling is important for both transactional and analytics systems, and how we’ve put this into practice at Choice as we’re building new systems. Page 1

1 2 3 4 5 Agenda IT Transformation – Distribution and Analytics
Creating a Data Architecture 3 Data Modeling for Microservices 4 Using Metadata for Diagnostics and Analytics 5 Challenges Overview of what we'll cover

Business Intelligence
IT Capabilities Corporate IT Guest Franchise Relations Hotel Manage-ment Business Intelligence Distribution This talk Choice Hotels is a technology-centric company. We operate according to a franchise model, and a lot of the value proposition of our IT organization is based on the services we provide to our franchisees. As a result we’re continually innovating and looking to modernize key systems. We’ve divided our IT capabilities into several domains, including guest, franchise relations, hotel property management, and our corporate IT services. This presentation focuses especially on systems in the distribution domain, such as our reservation system, and the business intelligence domain, which includes analytics and reporting systems. In particular, we’re going to focus on the relationship between the distribution domain and the BI domain

Distribution - Central Reservation System
Guest Domain Franchisee Domain CRS Web and Mobile External Channels Customer & Loyalty Billing Property Systems Reporting & Analytics Hotel Management Domain Distribution Domain Business Intelligence Domain Key systems in the distribution domain include the Central Reservation System (CRS), our website and mobile apps. Last year we launched new versions of our website and mobile app, and we are currently working on a new reservation system The reservation system interfaces to many of our other IT systems so replacing it is a major undertaking Internal channels like our website and mobile applications allow customers to shop and book rooms External channels as well We interface with property systems so our franchisees can tell us about their room types, rates, and inventory We interface with customer and loyalty systems to credit stays and support reward reservations Reporting and billing systems pull information about reservations

Current Reservation System – By The Numbers
25 years 6,000 hotels 50 distribution channels 4,000 transactions / second 1 instance Our current reservation system is over 25 years old - written in C and running on a large UNIX box with traditional RDBMS We’re currently making reservations for over 6000 hotels worldwide, and distributing over 50 different channels – everything from our own website and mobile apps to GDS and OTA partners This system is very performant and reliable, servicing over 4000 TPS However, the system scales vertically - we need horizontal scalability for future growth

New Systems: Distribution and Data Platforms
Distribution Platform Data Platform Realtime data History This Talk: how we model data and use the self-service platform See: Choice Hotels's journey to better understand its customers through self-service analytics Pulling back the covers a bit, we have the unique opportunity to make major improvements to two important areas of the enterprise at once We’re replacing our legacy Central Resrvation System with a new Distribution Platform which we hope will have the longevity of the previous one We’re also modernizing our business intelligence approach with a new data platform One of the major themes of this talk is the relationship between these two systems, and the role that data modeling plays in it. Specifically, we’re using the data platform to capture changes as they occur in the distribution platform for analytics purposes There are also some use cases in the distribution system where we need to access that historic data for customer service and diagnostic purposes, so we are actually implementing a limited capability to pull some of that data back out. I’m not going to give a detailed presentation of the self-service data platform because my colleague Narasimhan from Choice and Avinash from Clairvoyant have already provided a great talk on that earlier today. Instead we’ll focus on how the distribution platform makes use of capabilities of the data platform

Distribution Platform - Architecture Tenets
Cloud-native Microservices Open Source Infrastructure Extensibility Stable, Scalable, Secure Here are some of the tenets of our architecture: We designed for the cloud to run anywhere, in multiple data centers worldwide We wanted a microservices architecture based primarily on RESTful APIs and event publishing We use open source infrastructure as much as feasible Since this system needs to work for the next 25 years, we want a design which is easily extensible to new features and business areas The key architectural –ilities we repeat again and again are scalability, stability and security

What is a Microservice? (one definition)
ComposingService Data Ownership Client Entity Service AMQ REST API Events Persistence DB Message Driven Service So when I say we have a microservice based architecture, that could mean a lot of things We use a mixture of synchronous and asynchronous approaches in order to support shopping and booking hotel rooms and notification of various partners In our architecture, a typical microservice exposes a RESTful API which allows it to be accessed by clients. The entity service manages the persistence of a specific data type, and publishes events when data is created, updated, or deleted. We have other types of services which compose the entity services, and message driven services which respond to events and generate other events or interface to external systems. But the bedrock principle I want to call is the data ownership. Every data type is owned by a single service, and it owns the persistence.

How can we design our data architecture & models to be…
Scalable? Extensible? Maintainable? Analytics-ready? So, designing a new distribution platform – a greenfield design, so many choices available, the world is new…

Non-relational storage
Our Data Stack Non-relational storage Long Term Storage Logging Reporting & Analytics Metrics First, let’s talk about our technology selection - these are a few of the elements in our data stack We use Cassandra as our primary data store We use Amazon S3 and Glacier for medium to long term storage We use the ELK stack for logging We use Spark, Impala and other technologies in our data platform for reporting and analytics We use Karios DB as our metrics store and Grafana to construct operational dashboards For messaging, we use Active MQ when message ordering is required and Kafka for fast streaming between systems

Data Modeling – Then and Now
Isolated Systems Data Dictionary SOA and Canonical Data Model Services own data Identifying domains and relationships Conceptual Data Model Identifying data types and relationships Logical Data Model Java APIs RESTful APIs (JSON) Events (JSON) Cassandra Schemas Physical Models With that technology foundation, we recognized at the project outset the important role that data modeling would play in the development of our reservation system, especially since it touches so many other systems In the past, our enterprise consisted of multiple stovepipe systems, each with their own data models. There were efforts to reconcile these in a corporate data dictionary, which was a massive undertaking. As we started doing more SOA style work, we began wrapping many of these systems, and developed a canonical data model as way to enforce a common language across these services. This is a centralized definition. This proved difficult to maintain, and when we started work on the new reservation system, we decided to allow each service to own its own data model. We used the classic levels of data modeling – conceptual, logical, and physical, to drive our identification of data types in the system and the microservices that manage each data type, and then the various physical representations we need to drive software development. The next few slides take us through that process.

Conceptual Data Model - Domains
hotels rates inventory offers reservations Let’s introduce some of the key data types within the distribution domain Hotels - descriptive data about the hotels and their products, and policies. Quite static Rates - prices that are charged for the products. these can change many times a day, and could include an automated pricing system Inventory - constantly changes as rooms are booked, cancelled, etc. Data quality and currency is extremely important here so we don’t oversell our hotels Reservations - contract with the customer. Generally only changed when initiated by the customer, infrequent changes (Talk to nuances on inventory buckets, rate plans, packages, rules)

Conceptual Data Model – Domain Relationships
Distribution Domain Guest Domain guest hotels offers loyalty inventory reservations Hotel Management Domain rates stay An important part of our conceptual data model was defining the boundaries and relationships between domains This includes the distribution domain and its sub-domains, and relationsihps to other domains As we see, the inventory and rates domains reference the hotel domain – the inventory and rates are for products at specific hotels Offers and reservations, in turn reference hotels, inventory and rates As we look to relationships outside the data domain, a reservation can reference customer and loyalty accounts that are managed by other systems. In these cases our reservation system holds references to those external data types, since the system of record is external Another interesting case comes when there are data types that form the boundary of relationships between systems. This occurs in the case of the reservation. Reservations are created and managed by both central reservation systems and by property management systems which reside in the Hotel Management domain. In our case, we have an internal representation of a reservation which forms the basis of our exchange with the property management systems. They also need a copy of the reservation so they can manage the guest stay. While it is possible for a reservation itself to be updated while the guest is on property (for example, adding an extra night) we’ve made a clear boundary so that stay information doesn’t start creeping into our reservation definition.

Logical Data Model – Identifying Types
Rates Domain Composite Rate Service Rate Plan Service Rate Service Rate Plan id code hotelId effectiveDates Conditions Rate id ratePlanId productId hotelId dateSpan Product id code hotelId features … Price condition amount Let’s consider the rates domain, this is a sub-domain within distribution As we begin to model this in UML, we see there are distinct types for rates and rate plans. Rate plans comprise the rules that hotels use to describe how to get access to a particular set of rates The rates describe what customers will be charged on a given day, at a given hotel, in association with a rate plan The rates themselves may consist of multiple price points, for example, a one person rate, a two person rate, an extra person rate. We may have references from these data types to data types outside the rates domain. For example a rate references a product or products to which it applies. This is a unique ID reference We draw boundaries around portions of the data model to be owned by each service. In this example we derive microservices for rate plans and rates. In this way each service represents a bounded context. However, from a deployment perspective, it may make sense to reduce the number of services, especially if rates and rate plans are most frequently accessed together (as they are) Identifying potential services at a fairly low level and then potential compositions helps us make sure we maintain an extensible design.

Standardizing Common Data Types
Instead of a Canonical Data Model, we standardize basic building blocks Feature, Category, Brand Geospatial Financial Time Contact information Address lines[] city subdivision country postalCode An example of the common building blocks that demonstrates why standardization is important is the concept of an address. A problem with some of our historic systems has been support across various system for a varying number of addresses. Many coded with 2, some coded with three. We’re constructing our new services to support addresses with a variable number of lines, and using validation to control how long the list can be.

Data Types → Microservice Identification
Internal / External Client Apps Data Maintenance Apps Hotel Service Rates Service Inventory Service Offer Service Reservation Service Hotel Domain Rates Domain Inventory Domain Offer Domain ReservationDomain This summarized the sort of results that arise from identifying services based on the logical data model – we end up with microservices organized around these domains of hotels, rates, inventory, offers, and reservations Each of the services serve as the owners of a specific set of data types, approaching a share-nothing architecture style. We’re then able to build client applications such as our website and mobile apps on top of these services, and build integrations with external partners We also built data maintenance applications to: synchronize of data from other systems – our legacy system as well as some other systems that will stay in operation, such as property management systems Verify data accuracy across systems and across service boundaries Correct data issues caused by defects

JSON = primary definition of the data type owned by each service
Physical Data Models Physical Models Java APIs RESTful APIs (JSON) Events (JSON) Cassandra Schemas JSON = primary definition of the data type owned by each service Once we’ve identified our services, we can approach the physical data modeling associated with each service as an internal concern of that service. This includes the Java and RESTful APIs, events published by the service, and Cassandra schemas for data storage These representations are all derived from the logical data model We decided that the JSON representation of the resource owned by the service was the authoritative definition of the resource from the perspective of external services

Key Data Types → RESTful Resource Paths
Hotel Service /hotels Offer Service /offers Rates Service /rates Reservation Service /reservations Inventory Service /inventory Services are organized around the RESTful resource paths they own Consider these RESTful resource paths as namespaces – need to manage these as well

Java and RESTful APIs – common pattern
Java API GET /types/<id> Type getTypeById() GET /types?<query parameters> Type[] searchType(TypeSearchCriteria) POST /types/ (JSON body) createType(Type) PUT /types/ (JSON body) updateType(Type) DELETE /types/<id> deleteType(TypeId) Our usage of RESTful APIs helped reinforce the focus around data types. While we do not adhere to some of the strictest definitions of what is RESTful, focusing on resources rather than actions helped keep our APIs clean and relatively free of RPC-style interactions The common pattern that emerged for both our Java and RESTful APIs was to have simple CRUD operations The cookie cutter approach was helpful in being able to generate common templates to kickstart development on each new service

Cassandra Data Modeling
(an idealized view)

Cassandra Data Modeling – Access Patterns

Cassandra Data Modeling – Chebotko Diagrams

Cassandra Data Modeling - Physical

Cassandra Data Modeling - Schemas
CREATE KEYSPACE hotel WITH replication = {'class': 'SimpleStrategy', 'replication_factor' : 3}; CREATE TYPE hotel.address ( street text, city text, state_or_province text, postal_code text, country text ); CREATE TABLE hotel.hotels_by_poi ( poi_name text, hotel_id text, name text, phone text, address frozen<address>, PRIMARY KEY ((poi_name), hotel_id) ) WITH CLUSTERING ORDER BY ( hotel_id ASC) ;

And now… Back to reality

Access Patterns and Denormalization
Hotels by amenity Keyspace hotel Locate hotel by identifier Hotels by this hotels_by_id Find hotels by city, state, country hotels_by_brand Find hotels by brand Hotels by that hotels_by_postal_code Find hotels within X miles of point Y Find hotels by postal code … Hotels by something else (TODO: work flow/time or delete) One of the tenets of Cassandra data modeling is to identify access patterns and design tables around those access patterns We followed this pretty strictly at first, but soon ran into cases where adding a table per unique access pattern proved to be too much Take for example hotels and the number of ways by which various clients could search for hotels Since the hotel records are quite large, imagine the impact of all of these tables on our cluster size and storage requirements for hotels. We reined this in by designing tables to support multiple queries, select usage of indexes, and doing some filtering at the service layer, which helped us rein in our computing costs. We’re also looking to move to Cassandra 3.X in order to take advantage of materialized views, which will allow us to shift some of the processing burden back to the database

Metadata Request Context Events Incoming Requestor Service Request
AMQ Logs Request Context Requestor Tracking ID Token Locale ELK Stack Switching gears, the concept of metadata was very important to us – being able to keep a common request context helps us track interactions between services, events, and find key interactions in log files, and so on

Asynchronous events Request Context Event Requestor Type Tracking ID
{ "type" : "UPDATE", "trackingId" : "0da7b794-f2c3-…", "requestor": "Legacy CRS", "newEntity" : { "hotelId": "AZ123", "productId": "NSK", "date": " ", "consumedCount": "22", "totalCount": "25“ }, "oldEntity" : { "productId": "NSK", "date": " ", "consumedCount": "20", "totalCount": "25“ } Request Context Requestor Tracking ID Token Locale Event Type Create Update Delete Request Context Old entity New entity Entity (old/new) Id … Sample Inventory Event

Putting It Together – Diagnostics
Incoming Request Service Data History Logs Metrics C* node Data Platform ELK Stack Metrics Store Putting this all together in the context of a service A service receives an incoming request with data including metadata The data is written to Cassandra An event is generated which is captured as history The operation, metadata and data identifiers are logged The elapsed time for the operation is captured as metrics Now the operations team is able to configure alarms on service state and metrics

Putting It Together – Long Term Storage
Data Platform ELK Stack Metrics Store C* node Long Term Storage We have policies in place to ensure all of our application, logging and metrics data is captured in appropriate tools for long term storage and archival in S3 and Glacier

Separating Active and History Data
Now Yesterday’s data is ancient history Rate + Inventory Data Time We have separated the shopping and booking concerns from our analysis and history uses, which means that in the reservation system, data in the past is not much use. As we insert our data into Cassandra, we set the TTL for when it will no longer be needed, which saves us from developing our own cleanup process and reduces our storage footprint. We still need the historic data for analysis and customer service purposes, though, so we store it in a separate data platform which we feed from the reservation system using asynchronous event processing

Data Platform - Cloudera
History architecture History capture Data Platform - Cloudera Service AMQ Kafka Spark node Other subscribers S3 History retrieval Customer Service Apps History Service Impala Let’s talk about how we architected the history features of our platform. Allowing past data to expire via TTL means we need our data to be present somewhere else Since we’re already capturing the event streams in the data platform, we can reach back into that platform to access data for our customer service applications. These are the applications we use to help answer customer questions about their reservation including what was changed, when it was changed, and who changed it. We also use these applications when things go wrong to diagnose problems My colleague Narasimhan has introduced our data platform in a separate talk, so I won’t reproduce it here. I’ll refer you to his talk. On ingestion, we tie into Active MQ event queues published by our services and bridge the events to Kafka to stream them into our Spark cluster backed by S3. To retrieve historical data, our customer service apps call history services, which make SQL queries via Impala to retrieve the data We work closely between the teams to manage the SLAs for this data retrieval

Microservice Data Challenges
No Joins? Data Maintenance Data Integrity Cascading Deletes Transactions Here are some of the challenges we’ve encountered in working with this kind of architecture When data is spread across multiple services, it can be hard to get a picture of the relationships between data types – you can’t just do a join in Cassandra We’re investigating use of Spark in some environments in order to support ad-hoc searching and exploration There are also challenges to maintaining data in this environment. Teams have to be conscious that changing data in one service may affect other services. What happens if I delete a hotel using the Hotel Service but don’t delete the inventory and rates? We’ve had to put tighter controls around deletes and manage cascading deletes at the application level to prevent these issues. Thankfully, data deletion is more of a maintenance activity and not a regular operational practice Another issue comes when we need to commit changes to multiple data types at the same time. Let’s look at an example

Distributed Transactions, Anyone?
Commit the contract Reservation Service Booking Client reservations Reserve the inventory Inventory Service Data Maintenance Apps Data synchronization inventory One of the challenges of a microservices architecture is keeping changes in sync across service boundaries. One example situation is in booking a reservation. Since the reservation represents our contract with the customer to reserve a specific room at a specific price and with certain conditions, we need to mark a reservation as committed at the same time as we reserve the inventory. This is important so that we don’t accidentally overbook our hotel. Making the situation more complicated, there could be simultaneous bookings and data maintenance activities also trying to access the same inventory Since these types are split across microservice boundaries, there is no transaction mechanism. In fact, since the data is in different rows (and different tables), Cassandra’s lightweight transactions are of no use to us here. We solved this by a layered approach – LWTs to protect inventory counts, retries within the booking service, and compensating processes to detect and cleanup failures

Alternatives to Distributed Transactions
Strong consistency Approach Example Scope C* Lightweight Transaction Updating inventory counts Data Tier C* Logged Batch Writing to multiple denormalized hotel tables Retrying failed calls Data synchronization, reservation processing Service Compensating transactions Verifying reservation processing System Eventual consistency Thankfully we have a variety of tools in our toolbox for guaranteeing consistency. Some of these are provided by Cassandra but some of them are architecture approaches.

Final Thoughts Data Models > Microservices Events = Streams Use Metadata Everywhere Use data modeling to identify key types and bounded contexts, let that drive the microservice design Events are great for decoupling services and systems, you can leverage the streams for history as well Use metadata across services and infrastructure to allow common thread of debugging and performance analysis

Now Available! Cassandra: The Definitive Guide, 2nd Edition
Completely reworked for Cassandra 3.X: Data modeling in CQL SASI indexes Materialized views Lightweight transactions DataStax drivers New chapters on security, deployment, and integration

Contact Info @choicehotels careers.choicehotels.com @jscarp jeffreyscarpenter

Data modeling for microservices with Cassandra and Spark

Similar presentations

Presentation on theme: "Data modeling for microservices with Cassandra and Spark"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Data modeling for microservices with Cassandra and Spark

Similar presentations

Presentation on theme: "Data modeling for microservices with Cassandra and Spark"— Presentation transcript:

Similar presentations

About project

Feedback