SQL Server 2017 Everything Built In—Technical Overview.

SQL Server 2017 Everything Built In—Technical Overview

Server & Tools Business
9/17/2018 Microsoft vision for a new era United platform for the modern service provider CUSTOMER DATACENTER SERVICE PROVIDER MICROSOFT AZURE CONSISTENT PLATFORM Enterprise-grade Global reach, scale, and security to meet business demands Hybrid cloud Consistent platform across multiple environments and clouds People-focused Expands technical skill sets to the cloud for new innovation Speaker notes: This slide shows the Microsoft Cloud Vision. Microsoft has a unique capability to provide SQL Server in different environments, giving the same experience, with the same enterprise-grade and tools for on-premises, on the service provider datacenter, and in Azure. Partners and customers take advantage of this capability to reduce infrastructure costs, increase database resilience, high availability, and performance. SQL Server 2017 provides new functionality that helps partners to innovate and renovate the database business. We will talk more about that when we look at the following slides. © 2012 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

SQL Server 2017 Industry-leading performance and security NOW ON LINUX and DOCKER 1/10th the cost of Oracle Choice of platform and language Industry-leading performance Most secure over the last 7 years Only commercial DB with AI built-in End-to-end mobile BI on any device Microsoft Tableau Oracle $120 $480 $2,230 R 1/10 Self-service BI per user T-SQL Java C/C++ C#/VB.NET PHP Node.js Python Ruby #1 OLTP performance #1 DW performance #1 price/performance That’s not all it can do. SQL Server 2017 continues to deliver industry-leading capabilities: Our latest performance benchmarks on Windows and Linux blow away our old records. OLTP – We have #1 OLTP TPC-E performance DW: And, we have the fastest performing DW. With best price/performance. We offer the most secure database. According to US National Institute of Standards and Technology (NSIT), we have had fewer vulnerabilities over the last 7 years than Oracle or IBM Fewer vulnerabilities mean less patching for you! SQL Server is the first commercial database with Advanced Analytics using R and Python built-in. Why does this matter to you? Now you can use SQL Server to operationalize your data science models in a secure and performant way Use native T-SQL commands to score data in near real-time And unlike our competitors, mobile BI on every device comes built-in. Or add access to powerful, self-service BI visualizations through Power BI - at a fraction of the cost of our competitors. SQL Server 2017 gives you your choice of platform and language, and the most consistent on-prem to cloud environment. And it does all this for 1/10th the cost of Oracle R and Python + in-memory at massive scale Native T-SQL scoring A fraction of the cost In-memory across all workloads Most consistent data platform Private cloud Public cloud

SQL Server 2017 Meeting you where you are
It’s the same SQL Server Database Engine that has many features and services available for all your applications—regardless of your operational ecosystem. { } T-SQL Java C/C++ C#/VB.NET PHP Node.js Python Ruby Linux SQL Server 2017 – Meeting you where you are Any data Access diverse data, including video, streaming, documents, relational—both external data and data internal to your organization. Use PolyBase to access Hadoop big data and Azure Blob storage with the simplicity of T-SQL. You use Azure DocumentDB, a NoSQL document database service, for native JSON support and JavaScript built directly inside the database engine. Any application Use the T-SQL skills of your talent base to run advanced analytics through R/Python models, and to access structured and unstructured data. Take advantage of Microsoft-created database connectivity drivers and open-source drivers that enable developers to build any application using the platforms and tools of their choice—including Python, Ruby, and Node.js. Anywhere Flexible on-premises and in the cloud. Easily back up to the cloud. You can now migrate a SQL Server workload to Azure SQL DB. The parity is there and the notion that SQL Server doesn’t map to Azure SQL DB is no longer relevant. Keep more historical data at your fingertips by dynamically stretching tables to the cloud with Stretch Database. Choice of platform Aligns to your operating system environment. SQL Server is now available on Windows/Windows Server, Linux, and Docker. Benefit from continued integration with Windows Server for industry-leading performance, scale and virtualization on Windows. Note: Tux penguin image created by Larry Ewing Any data Any application Anywhere Choice of platform

How we develop SQL SQL Server and APS Azure SQL Virtual Machines Azure
SQL Database Azure SQL Data Warehouse DB DW Cloud-first but not cloud-only Use SQL Database to improve core SQL Server features and cadence Many interesting and compelling on-premises  cloud scenarios

Consistency and integration
A consistent experience from SQL Server on-premises to Microsoft Azure IaaS and PaaS On-premises, private cloud, and public cloud SQL Server local (Windows and Linux), VMs (Windows and Linux), containers, and SQL Database Common development, management, and identity tools including Active Directory, Visual Studio, Hyper-V, and System Center Scalability, availability, security, identity, backup and restore, and replication Many data sources Reporting, integration, processing, and analytics All supported in the hybrid cloud

SQL Server 2017—new features

Database Engine new features
Linux/Docker support RHEL, Ubuntu, SLES, and Docker Adaptive query processing Faster queries just by upgrading Interleaved execution Batch-mode memory grant feedback Batch-mode adaptive joins

Automatic tuning Automatic plan correction—identify, and optionally fix, problematic query execution plans causing query performance problems Automatic index management—make index recommendations (Azure SQL Database only) Graph Store relationships using nodes/edges Analyze interconnected data using node/edge query syntax SELECT r.name FROM Person AS p, likes AS l1, Person AS p2, likes AS l2, Restaurant AS r WHERE MATCH(p-(l1)->p2-(l2)->r) AND p.name = 'Chris'

Enhanced performance for natively compiled T-SQL modules OPENJSON, FOR JSON, JSON CROSS APPLY operations Computed columns New string functions TRIM, CONCAT_WS, TRANSLATE, and STRING_AGG with support for WITHIN GROUP (ORDER BY) Bulk import now supports CSV format and Azure Blob storage as file source

Native scoring with T-SQL PREDICT Resumable online index rebuild Pause/resume online index rebuilds Clusterless read-scale availability groups Unlimited, geo-distributed, linear read scaling P S1 S3 S2 S4

Integration Services new features
Integration Services scale out Distribute SSIS package execution more easily across multiple workers, and manage executions and workers from a single master computer Integration Services on Linux Run SSIS packages on Linux computers Currently some limitations Connectivity improvements Connect to the OData feeds of Microsoft Dynamics AX Online and Microsoft Dynamics CRM Online with the updated OData components Source:

Analysis Services new features
1400 Compatibility level for tabular models Object level security for tabular models Get data enhancements New data sources, parity with Power BI Desktop and Excel 2016 Modern experience for tabular models Enhanced ragged hierarchy support New Hide Members property to hide blank members in ragged hierarchies Detail Rows Custom row set contributing to a measure value Drillthrough action in more detail than the aggregated level in tabular models Source:

Reporting Services new features
Comments Comments are now available for reports, to add perspective and collaborate with others—you can also include attachments with comments Broader DAX support With Report Builder and SQL Server Data Tools, you create native DAX queries against supported tabular data models by dragging desired fields to the query designers Standalone installer SSRS is no longer distributed through SQL Server setup Power BI Report Server Source:

Machine Learning Services new features
Python support Python and R scripts are now supported Revoscalepy—Pythonic equivalent of RevoScaleR—parallel algorithms for data processing with a rich API MicrosoftML Package of machine learning algorithms and transforms (with Python bindings), as well as pretrained models for image extraction or sentiment analysis Source: Microsoft R Services has been renamed Microsoft Machine Learning Services.

SQL Server on Linux

Evolution of SQL Server
Businesses are embracing choice Microsoft is delivering on choice The world is demanding SQL Server on Linux 20K+ applications for private preview Heterogeneous environments HDInsight on Linux Multiple data types R Server on Linux { } enterprise DB market runs on Linux 36% Linux in Azure T-SQL PHP Java Node.js C/C++ Python C#/VB.NET Ruby Different development languages SQL Server drivers and connectivity On-premises, cloud, and hybrid environments Visual Studio Code extension for SQL Server

Power of the SQL Server Database Engine on the platform of your choice
Windows Linux Linux distributions: RedHat Enterprise Linux (RHEL), Ubuntu, and SUSE Linux Enterprise Server (SLES) Docker: Windows and Linux containers Windows Server/Windows 10 Linux/Windows container To get the most from their data, customers need flexibility when it comes to the choice of platform, programming languages, and data infrastructure. Why? In most IT environments, platforms, technologies and skills are as diverse as they have ever been. The data platform of the future needs you to build intelligent applications on any data, on any platform, in any language, both on-premises and in the cloud. SQL Server manages your data across platforms that require all kinds of skills—both on-premises and in the cloud. Our goal is to meet you where you are, on any platform, with the tools and languages of your choice. SQL Server Database Engine now has support for Windows, Linux and Docker containers.

Same license, new choice
Buying a SQL Server license gives you the option to use it on Windows Server, Linux, or Docker. Regardless of where you run it— VM, Docker, physical, cloud, on- premises—the licensing model is the same; available features depend on which edition of SQL Server you use. LICENSE

Linux-native user experience
Standard installation process Package-based installation using yum for Fedora-based distributions, apt-get for Debian-based distributions, and zypper for SLES Existing package update/upgrade processes for SQL upgrade Familiar experience SQL Server service runs natively using systemd Linux file paths are supported in T-SQL statements and scripts (defining/changing the path, database backup files) Popular Linux high-availability solutions like Pacemaker and Corosync Microsoft has focused on providing a Linux-native user experience for SQL Server, starting with the installation process. Installing SQL Server 2017 uses the standard package-based installation method for Linux, using yum for Fedora-based distributions, apt-get for Debian-based distributions, and zypper for SUSE Linux Enterprise Server (SLES). Administrators can update SQL Server 2017 instances on Linux using their existing package update/upgrade processes. The SQL Server service runs natively using systemd, and performance can be monitored through the file system as for other system daemons. Linux file paths are supported in T-SQL statements and scripts such as defining/changing the location of data files or database backup files. High-availability clustering can be managed with popular Linux solutions like Pacemaker and Corosync. SQL Server command-line tools are available for Linux (including sqlcmd and bcp). MacOS versions of these tools are available as a preview at the time of writing: Existing Windows tools such as SQL Server Management Studio (SSMS), SQL Server Data Tools (SSDT), and PowerShell module (sqlps) can be used to manage SQL Server on Linux from a Windows instance. The Visual Studio Code extension for SQL Server can run on macOS, Linux, or Windows. Microsoft offers tools such as Migration Assistant, also supported on Linux, to assist with moving existing workloads on SQL Server. Cross-platform tools SQL Server command-line tools (sqlcmd, bcp) available for Linux (and soon on macOS) Existing Windows tools such as SQL Server Management Studio (SSMS), SQL Server Data Tools (SSDT), and PowerShell module (sqlps) to manage SQL Server on Linux from Windows Visual Studio Code extension for SQL Server on macOS, Linux, or Windows

Supported platforms Platform Supported version(s)
Supported file system(s) Red Hat Enterprise Linux 7.3 XFS or EXT4 SUSE Linux Enterprise Server v12 SP2 EXT4 Ubuntu 16.04 Docker Engine (on Windows, Mac, or Linux) 1.8+ N/A The SQL Server Docker container is built with Ubuntu and SQL Server for Linux. For full system requirements, see System requirements for SQL Server on Linux

Cross-system architecture
SQL Platform Abstraction Layer (SQLPAL) Host extension mapping to OS system calls (IO, memory, CPU scheduling) Win32-like APIs SQL OS API SQL OS v2 System resource and latency sensitive code paths Everything else RDBMS AS IS RS SQL Platform Abstraction Layer (SQLPAL) Windows Host Extension. Linux Host Extension The Platform Abstraction Layer (PAL) enables SQL Server to run on Linux and Docker. The PAL is used to consolidate OS/platform specific code to enable SQL Server code to become OS agnostic. The SQL Server team sets strict requirements to ensure that functionality, performance, and scale were not compromised when deployed to Linux. Part of what makes this possible is the integration of certain aspects of the Microsoft Research (MSR) Drawbridge project. Drawbridge provided an abstraction between the underlying operating system and the application for the purposes of secure containers. Drawbridge was combined with SQL Server OS, which provided memory management, thread scheduling, and IO services to create SQLPAL. In short, the creation of the PAL allows the same, time-proven core code base for SQL Server to run on new environments such as Docker and Linux—as opposed to porting the Windows code base into multiple operating environments. SQL Server 2017 is not a rewrite or a port—it’s the same performant, scalable product that Microsoft customers have relied upon for years. For more details, see: Windows Linux

Installing SQL Server on Linux
Add the SQL Server repository to your package manager Install the mssql-server package Run mssql-conf setup to configure SA password and edition Configure the firewall to allow remote connections (optional) sudo curl -o /etc/yum.repos.d/mssql-server.repo sudo yum update sudo yum install -y mssql-server sudo /opt/mssql/bin/mssql-conf setup Installing SQL Server on Linux is easy, and follows a pattern familiar to Linux administrators: Add the SQL Server repository to your package manager (and update the package list). Install the mssql-server package. Run mssql-conf setup to configure the SA password and edition. Configure the firewall to allow remote connections (optional). (The commands on the slide are for RHEL 7.) SQL Server Tools, SQL Server Agent, Integration Services, and Full-Text Search each have their own additional installers. sudo firewall-cmd --zone=public --add-port=1433/tcp --permanent sudo firewall-cmd --reload Follow the links from the SQL Server on Linux overview page to find detailed installation instructions for your platform.

What’s installed? /opt/mssql/ /var/opt/mssql/
SQL Server runtime and associated libraries: /opt/mssql/bin/ /opt/mssql/lib/ /var/opt/mssql/ Data and log files for SQL Server databases: /var/opt/mssql/data/ /var/opt/mssql/log/ SQL Server Agent, Full Text Search, SSIS, and mssql-tools are installed independently of the SQL Server service.

Tools and programmability
Windows-based SQL Server tools—like SSMS, SSDT, and Profiler—work when connected to SQL Server on Linux All existing drivers and frameworks supported Third-party tools continue to work Native command-line tools—sqlcmd, bcp Visual Studio Code mssql extension Because the Linux and Windows versions of SQL Server use the same code base, existing applications, drivers, frameworks, and tools will connect to and operate with SQL Server on Linux without modification. (The screenshot shows the Visual Studio Code mssql extension in operation.)

Client connectivity SQL Server client drivers are available for many programming languages, including: Language Platform More Details C# Windows, Linux, macOS Microsoft ADO.NET for SQL Server Java Microsoft JDBC Driver for SQL Server PHP PHP SQL Driver for SQL Server Node.js Node.js Driver for SQL Server Python Python SQL Driver Ruby Ruby Driver for SQL Server C++ Microsoft ODBC Driver for SQL Server The list of languages and drivers given in the slide is not exhaustive. Any language that supports ODBC data sources should be able to use the ODBC drivers. Any languages based on the JVM should be able to use the JDBC or ODBC drivers. The Microsoft ODBC driver for SQL Server is available in native versions for Windows, Linux and macOS.

What’s available on Linux?
DMVs Table partitioning SQL Server Agent Full-Text Search Integration Services Active Directory (integrated) authentication TLS for encrypted connections Operations features Support for RHEL, Ubuntu, SLES, Docker Package-based installs Support for Open Shift, Docker Swarm Failover clustering via Pacemaker Backup/Restore SSMS on Windows connected to Linux Command-line tools: sqlcmd, bcp Transparent Data Encryption Backup Encryption SCOM management pack SQL Server on Linux aims to support the core relational database engine capabilities. In general, our goal is “it’s just SQL Server.” 95 percent of features just work—anything app or coding related. Some features have partial support—for example, SQL Server Agent will not launch a Windows command prompt. Some features are in progress. There are some features we’ll never support—for example, FileTable, where you have a win32 share to place files that show up in the engine. The next slide contains more details. While it’s the same SQL Server that you may (or may not) be used to, we’re putting in a lot of effort to be a good Linux citizen. Call-outs: Package-based installs—if SQL is coming to Linux, we’re going to do it right. Failover clustering—resilience against OS/SQL failures; automatic failover within seconds. Log shipping—warm standbys for DR. Xplat CLI—sqlcmd lets you connect/query from any OS; bcp lets you bulk copy data. In-Memory—30-100x performance increases by keeping tables in-memory and using natively compiled queries. ColumnStore—why SQL Server is the leader in the Gartner Magic Quadrant for Data Warehousing, and holds the top three slots in the performance benchmark. Always Encrypted—protect your most sensitive data, even from high-privileged database administrators. AD authentication—no need to manage separate credentials for SQL Server on Linux (

What’s available on Linux?
CLR JSON, XML Third-party tools Programming features All major language driver compatibility In-Memory OLTP Columnstore indexes Query Store Compression Always Encrypted Row-Level Security, Data Masking Auditing Service Broker SQL Server on Linux aims to support the core relational database engine capabilities. In general, our goal is “it’s just SQL Server.” 95 percent of features just work—anything app or coding related. Some features have partial support—for example, SQL Server Agent will not launch a Windows command prompt. Some features are in progress. There are some features we’ll never support—for example, FileTable, where you have a win32 share to place files that show up in the engine. The next slide contains more details. While it’s the same SQL Server that you may (or may not) be used to, we’re putting in a lot of effort to be a good Linux citizen. Call-outs: Package-based installs—if SQL is coming to Linux, we’re going to do it right. Failover clustering—resilience against OS/SQL failures; automatic failover within seconds. Log shipping—warm standbys for DR. Xplat CLI—sqlcmd lets you connect/query from any OS; bcp lets you bulk copy data. In-Memory—30-100x performance increases by keeping tables in-memory and using natively compiled queries. ColumnStore—why SQL Server is the leader in the Gartner Magic Quadrant for Data Warehousing, and holds the top three slots in the performance benchmark. Always Encrypted—protect your most sensitive data even from high-privileged database administrators.

Features not currently supported on Linux
Database Engine Transactional replication Merge replication Stretch DB PolyBase Distributed query with third-party connections System extended stored procedures (XP_CMDSHELL, etc.) Filetable CLR assemblies with the EXTERNAL_ACCESS or UNSAFE permission set Buffer Pool Extension SQL Server Agent Subsystems: CmdExec, PowerShell, Queue Reader, SSIS, SSAS, SSRS Alerts Log Reader Agent Change Data Capture Managed Backup Services SQL Server Browser SQL Server R Services StreamInsight Analysis Services Reporting Services Data Quality Services Master Data Services NB: This slide is correct as at SQL Server 2017 RC2. The list of unsupported features might change in later releases. For more information, see: High Availability Database Mirroring Security Extensible Key Management

Operational features

In-Memory OLTP

In-Memory Online Transaction Processing (OLTP)
In-Memory OLTP is the premier technology available in SQL Server and Azure SQL Database for optimizing performance of transaction processing, data ingestion, data load, and transient data scenarios. Memory-optimized tables outperform traditional disk-based tables, leading to more responsive transactional applications. Memory-optimized tables also improve throughput and reduce latency for transaction processing, and can help improve performance of transient data scenarios such as temp tables and ETL. In-Memory OLTP is available in all editions of SQL Server 2017 (including SQL Server 2017 Express Edition). This is a change introduced in SQL Server 2016 Service Pack 1—before this, In-Memory OLTP was restricted to Enterprise Edition.

Steps for In-Memory OLTP
SQL Server provides In-Memory OLTP features that can greatly improve the performance of application systems. ALTER DATABASE CURRENT SET COMPATIBILITY_LEVEL = 140; GO Recommended to set the database to the latest compatibility level, particularly for In-Memory OLTP: ALTER DATABASE CURRENT SET MEMORY_OPTIMIZED_ELEVATE_TO_SNAPSHOT=ON GO When a transaction involves both a disk-based table and a memory-optimized table, it’s essential that the memory-optimized portion of the transaction operates at the transaction isolation level named SNAPSHOT. Before you can create a memory-optimized table, you must first create a memory-optimized FILEGROUP and a container for data files: ALTER DATABASE AdventureWorks ADD FILEGROUP AdventureWorks_mod CONTAINS memory_optimized_data GO ALTER DATABASE AdventureWorks ADD FILE (NAME='AdventureWorks_mod', FILENAME='c:\var\opt\mssql\data\AdventureWorks_mod') TO FILEGROUP AdventureWorks_mod

Memory-optimized tables
In short, memory-optimized tables are stored in main memory as opposed to on disk. Memory-optimized tables are fully durable by default; data is persisted to disk in the background. Memory-optimized tables can be accessed with T-SQL, but are accessed more efficiently with natively compiled stored procedures.

Memory-optimized tables
The primary store for memory-optimized tables is main memory; unlike disk-based tables, data does not need to be read in to memory buffers from disk. CREATE TABLE dbo.ShoppingCart ( ShoppingCartId INT IDENTITY(1,1) PRIMARY KEY NONCLUSTERED, UserId INT NOT NULL INDEX ix_UserId NONCLUSTERED HASH WITH (BUCKET_COUNT= ), CreatedDate DATETIME2 NOT NULL, TotalPrice MONEY ) WITH (MEMORY_OPTIMIZED=ON) GO To create a memory-optimized table, use the MEMORY_OPTIMIZED = ON clause INSERT dbo.ShoppingCart VALUES (8798, SYSDATETIME(), NULL) INSERT dbo.ShoppingCart VALUES (23, SYSDATETIME(), 45.4) INSERT dbo.ShoppingCart VALUES (80, SYSDATETIME(), NULL) INSERT dbo.ShoppingCart VALUES (342, SYSDATETIME(), 65.4) Insert records into the table

Natively compiled stored procedures
Natively compiled stored procedures are Transact-SQL stored procedures that are compiled to native code and can access memory-optimized tables. This allows for efficient execution of the queries and business logic in the stored procedure. Native compilation enables faster data access and more efficient query execution than interpreted (traditional) Transact-SQL. For information on creating natively complied stored procedures, see: What’s the difference? An interpreted stored procedure is compiled at first execution, whereas a natively compiled stored procedure is compiled when it is created. With natively compiled stored procedures, many error conditions—such as arithmetic overflow, type conversion, and some divide-by-zero conditions—can be detected at create time and will cause creation of the natively compiled stored procedure to fail. With interpreted stored procedures, these error conditions typically do not cause a failure when the stored procedure is created, although all executions will fail. Natively compiled stored procedures implement a subset of T-SQL. For more information, see:

In-Memory OLTP enhancements (SQL Server 2016)
9/17/2018 5:06 PM In-Memory OLTP enhancements (SQL Server 2016) Better T-SQL coverage, including: Full collations support in native modules Query surface area improvements Nested stored procedures (EXECUTE) Natively compiled scalar user-defined functions Query Store support Other improvements: Full schema change support: add/alter/drop column/constraint Increased size allowed for durable tables ALTER TABLE support Multiple Active Results Sets (MARS) support ALTER TABLE Sales.SalesOrderDetail ALTER INDEX PK_SalesOrderID REBUILD WITH (BUCKET_COUNT= ) T-SQL surface area: New {LEFT|RIGHT} OUTER JOIN Disjunction (OR, NOT) UNION [ALL] SELECT DISTINCT Subqueries (EXISTS, IN, scalar) Objective: This slide highlights the In-Memory OLTP enhancements in SQL Server 2016. Talking points In-Memory OLTP was introduced in SQL Server 2014 and enhanced for SQL Server You will find numerous enhancements to the technology, with some limitations removed. Many of these changes are described below. We will talk more about this in the following slides: SQL Server 2014 does not allow any changes to memory-optimized tables after creation. In SQL Server 2016, ALTER TABLE can be used on memory-optimized tables to add, drop, or alter columns, or to add, drop, or rebuild indexes. SQL Server 2016 has better coverage of T-SQL query surface area, adding support for left outer joins, union all, and distinct syntax. It also includes support for the nesting of natively compiled stored procedures and other increases in the T-SQL surface area. Other T-SQL native compilation improvements include: Full collations support in native modules. Natively compiled scalar user-defined functions (UDFs). Access from both native and interop. Improved performance of scalar UDFs in traditional workloads, if no table access is required. Query Store support. Scaling has also been improved by including a large increase to the total size of durable tables and improved throughput, with increased socket availability and larger log generation capacity for AlwaysOn configuration. Other improvements include: T-SQL table improvements through: Collations Full collations support in index key columns. All code pages supported for (var)char columns. Constraints FOREIGN KEY CHECK UNIQUE constraints DML triggers on memory-optimized tables (AFTER; natively compiled). Row-Level Security. Temporal tables (history is disk-based). In-Memory OLTP now supports Multiple Active Result Sets (MARS) using queries, in addition to parallel plans in interop for reporting queries and Transparent Data Encryption (TDE). Enhancements also include support for the Transaction Performance Analysis report in SQL Server Management Studio to evaluate In-Memory OLTP and improve database application performance. You can access lightweight migration reports and checklists that show unsupported features used in current disk- based tables and interpreted T-SQL stored procedures. © 2013 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

In-Memory OLTP enhancements (SQL Server 2017)
sp_spaceused is now supported for memory-optimized tables. sp_rename is now supported for memory-optimized tables and natively compiled T-SQL modules. CASE statements are now supported for natively compiled T-SQL modules. The limitation of eight indexes on memory-optimized tables has been eliminated. TOP (N) WITH TIES is now supported in natively compiled T-SQL modules. ALTER TABLE against memory-optimized tables is now substantially faster in most cases. Transaction log redo of memory-optimized tables is now done in parallel. This bolsters faster recovery times and significantly increases the sustained throughput of AlwaysOn Availability Group configuration. Memory-optimized filegroup files can now be stored on Azure Storage. Backup/Restore of memory-optimized files on Azure Storage is supported. Support for computed columns in memory-optimized tables, including indexes on computed columns. Full support for JSON functions in natively compiled modules, and in check constraints. CROSS APPLY operator in natively compiled modules. Performance of B-tree (nonclustered) index rebuild for MEMORY_OPTIMIZED tables during database recovery has been significantly optimized. This improvement substantially reduces the database recovery time when nonclustered indexes are used.

Real-time analytics/ HTAP

Real-time analytics/HTAP
SQL Server’s support for columnstore and In-Memory allows you to generate analytics in real time, direct from your transactional databases. This pattern is called Hybrid Transactional and Analytical Processing (HTAP), because it combines OLTP and OLAP in one database. Analytics can be performed on operational data with minimal overhead Improving the timeliness of analytics adds significant business value

Traditional operational/analytics architecture
Microsoft Ignite 2015 9/17/2018 5:06 PM Traditional operational/analytics architecture BI analysts BI and analytics Dashboards Reporting Key issues Complex implementation Requires two servers (capital expenditures and operational expenditures) Data latency in analytics High demand—requires real-time analytics Presentation layer SQL Server Analysis Services IIS Server Application tier SQL Server Relational DW ETL SQL Server Hourly, daily, weekly Database Database © 2015 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Minimizing data latency for analytics
Microsoft Ignite 2015 9/17/2018 5:06 PM Minimizing data latency for analytics Challenges Analytics queries are resource intensive and can cause blocking Minimizing impact on operational workloads Sub-optimal execution of analytics on relational schema Benefits No data latency No ETL No separate data warehouse BI analysts BI and analytics Dashboards Reporting Presentation layer IIS Server SQL Server Analysis Services Application tier This is OPERATIONAL ANALYTICS The ability to run analytics queries concurrently with operational workloads using the same schema is called operational analytics. Not a replacement for: Extreme analytics queries performance that is possible using customized schemas (Star/Snowflake) and pre-aggregated cubes. Data coming from nonrelational sources. Data coming from multiple relational sources requiring integrated analytics. SQL Server Database Add analytics-specific indexes © 2015 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Real-time analytics/HTAP
9/17/2018 Real-time analytics/HTAP The ability to run analytics queries concurrently with operational workloads using the same schema. Goals: Minimal impact on operational workloads with concurrent analytics Performance analytics for operational schema Not a replacement for: Extreme analytics performance queries that are possible only using customized schemas (for example, Star/Snowflake) and preaggregated cubes Data coming from nonrelational sources Data coming from multiple relational sources requiring integrated analytics © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

HTAP: disk-based tables

HTAP with columnstore index
Key points Create an updateable NCCI for analytics queries Drop all other indexes that were created for analytics No application changes Columnstore index is maintained just like any other index Query optimizer will choose columnstore index where needed B-tree index Relational table (clustered index/heap) Hot Delete bitmap Delta row groups Nonclustered columnstore index (NCCI)

Columnstore index overhead
DML operations on OLTP workload Operation B-tree (NCI) Nonclustered columnstore index (NCCI) Insert Insert row into B-tree. Insert row into B-tree (delta store). Delete Seek row(s) to be deleted. Delete the row. Seek row in delta stores. (There can be multiple rows.) If found, delete row. If not found, insert key into delete row buffer. Update Seek the row(s). Update. Delete row (same steps as above). Insert updated row into delta store.

Minimizing columnstore overhead
Key points Create a columnstore only on cold data by using filtered predicate to minimize maintenance Analytics query accesses both columnstore and “hot” data transparently Example: Order management application: create nonclustered columnstore index where order_status = “SHIPPED” B-tree index B-tree index Relational table (clustered index/heap) Hot Delete bitmap Delta row groups Nonclustered columnstore index—filtered index

Using Availability Groups instead of data warehouses
Always On Availability Group Key points Mission critical operational workloads typically configured for high availability using Always On Availability Groups You can offload analytics to readable secondary replica Secondary replica Primary replica Source: To configure an Always On Availability Group to support read-only routing in SQL Server 2016, you use either Transact-SQL or PowerShell. Read-only routing refers to the ability of SQL Server to route qualifying read-only connection requests to an available Always On readable secondary replica (that is, a replica that is configured to allow read-only workloads when running under the secondary role). To support read-only routing, the availability group must possess an availability group listener. Read-only clients must direct their connection requests to this listener, and the client's connection strings must specify the application intent as “read-only.” That is, they must be read-intent connection requests.

HTAP: In-Memory Tables

Columnstore on In-Memory tables
9/17/2018 Columnstore on In-Memory tables Hash index No explicit delta row group Rows (tail) not in columnstore stay in In-Memory OLTP table No columnstore index overhead when operating on tail Background task migrates rows from tail to columnstore in chunks of 1 million rows Deleted Rows Table (DRT)—tracks deleted rows Columnstore data fully resident in memory Persisted together with operational data No application changes required Nonclustered index Deleted rows table Tail In-Memory OLTP table Hot Source: For an overview of columnstore indexes, see Columnstore Indexes Described. Create a clustered columnstore index To create a clustered columnstore index, you first create a rowstore table as a heap or clustered index, and then use the CREATE CLUSTERED COLUMNSTORE INDEX (Transact-SQL) statement to convert the table to a clustered columnstore index. If you want the clustered columnstore index to have the same name as the clustered index, use the DROP_EXISTING option. Load data into a clustered columnstore index You can add data to an existing clustered columnstore index by using any of the standard loading methods. For example, the bcp bulk loading tool, Integration Services, and INSERT … SELECT can all load data into a clustered columnstore index. Clustered columnstore indexes use the deltastore to prevent fragmentation of column segments in the columnstore. Deltastore loading scenarios Rows accumulate in the deltastore until the number of rows is the maximum number of rows allowed for a rowgroup. When the deltastore contains the maximum number of rows per rowgroup, SQL Server marks the rowgroup as “CLOSED”. A background process, called the “tuple-mover”, finds the CLOSED rowgroup and moves into the columnstore, where the rowgroup is compressed into column segments, and the column segments are stored in the columnstore. For each clustered columnstore index there can be multiple deltastores. If a deltastore is locked, SQL Server will try to obtain a lock on a different deltastore. If there are no deltastores available, SQL Server will create a new deltastore. For a partitioned table, there can be one or more deltastores for each partition. Change data in a clustered columnstore index Clustered columnstore indexes support insert, update, and delete DML operations. Use INSERT (Transact-SQL) to insert a row. The row will be added to the deltastore. Use DELETE (Transact-SQL) to delete a row. If the row is in the columnstore, SQL Server marks the row as logically deleted but does not reclaim the physical storage for the row until the index is rebuilt. If the row is in the deltastore, SQL Server logically and physically deletes the row. Use UPDATE (Transact-SQL) to update a row. If the row is in the columnstore, SQL Server marks the row as logically deleted, and then inserts the updated row into the deltastore. If the row is in the deltastore, SQL Server updates the row in the deltastore. Columnstore index © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Operational analytics: Columnstore overhead
DML operations on In-Memory OLTP Operation Hash or range index HK-CCI Insert Insert row into HK. Delete Seek the row(s) to be deleted. Delete the row. Delete the row in HK. If the row is in the TAIL, then return. If not, insert <colstore-RID> into DRT. Update Seek the row(s) to be updated. Update (delete/insert). Seek the row(s) to be updated. Update (delete/insert) in HK. Source: DML Triggers: SQL Server now supports AFTER triggers for INSERT, UPDATE, and DELETE operations on memory-optimized tables. Triggers on memory-optimized tables must be natively compiled—and natively compiled triggers are only supported for memory-optimized tables. Note that, like all native modules, native triggers must be schemabound and consist of an ATOMIC block. To create a trigger, right-click on the Triggers node for the memory-optimized table in Management Studio and select New Natively Compiled Trigger.

Minimizing columnstore index overhead
Microsoft Ignite 2015 9/17/2018 5:06 PM Minimizing columnstore index overhead In-Memory OLTP table Updateable CCI Tail DRT Range index Hash index Hot Like delta rowgroup Syntax: Create nonclustered columnstore index <name> on <table> (<columns>) with (compression_delay = 30) Key points Delta rowgroup is only compressed after compression_delay duration Minimizes/eliminates index fragmentation Source: Changes to data in a columnstore index are aggregated into delta rowgroups before being compressed and added to the index. If many delta rowgroups are added to the index (because of the speed of data change), the columnstore index can become fragmented. The overhead of maintaining the columnstore indexes in an HTAP system can be minimised by setting the compression_delay property—this reduces/eliminates fragmentation by delaying the compression of delta rowgroups for a number of minutes. © 2015 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

High availability

Mission critical availability
Reliable Detects failures reliably Handles multiple failures at once Integrated Provides unified, simplified solution Streamlines deployment, management, and monitoring Flexible Reuses existing investments Offers SAN/DAS environments [this slide has animation – bullet points fly in on click] Efficient Allows use of HA hardware resources Supports fast, transparent failover

Always On Failover cluster instances Availability Groups
for servers Availability Groups for groups of databases Failover on SQL Server instance level Shared storage (SAN/SMB) Failover can take minutes based on load Multi-node clustering Passive secondary nodes Failover on database level Direct attached storage Failover takes seconds Multiple secondaries Active secondaries

Failover cluster instances
Cluster node Cluster node Server failover Shared storage Multi-node clustering Passive secondary nodes Failover in minutes Windows and Linux failover clusters are supported SQL Server 2017 SQL Server 2017 SQL Server failover cluster instance Failover clustering is supported on both Windows and Linux. On Windows, failover clusters are built on Windows Server Failover Clustering (WSFC). On Linux, failover clusters are built on a third-party cluster manager, such as Pacemaker. Shared storage

Configuring failover clusters on Linux
Set up and configure the operating system on each cluster node. Install and configure SQL Server on each cluster node. Configure shared storage and move database files. Install and configure Pacemaker on each cluster node. Create the cluster. You can configure a shared-storage high-availability cluster with Linux to allow the SQL Server instance to failover from one node to another. In a typical cluster, two or more servers are connected to shared storage. The servers are the cluster nodes. Failover clusters provide instance level protection to improve recovery time by allowing SQL Server to failover between two or more nodes. Configuration steps depend on the Linux distribution and clustering solutions. RHEL and SLES require a subscription for the HA add-on to support clustering configurations. Documentation for using Linux failover clusters to build availability groups can be found at NB: At this point, SQL Server's integration with Pacemaker on Linux is not as coupled as it is with WSFC on Windows. From within SQL, there’s no knowledge about the presence of the cluster—all orchestration is outside in and the service is controlled as a standalone instance by Pacemaker. Also, the virtual network name is specific to WSFC, and there’s no equivalent in Pacemaker. For full instructions, see:

Always On Availability Groups
Failover cluster Failover cluster Azure region Azure region Storage Storage Availability Groups: High availability and disaster recovery solution where one or several databases failover together. SQL Server 2017 supports one primary, and up to eight secondaries, for a total of nine replicas. Secondaries can be enabled as read-only replicas, which can be load balanced. An availability group can be built as a hybrid from a mixture of on-premises and Azure resources.

High availability and disaster recovery
Simple HADR Standard HADR Mission critical HADR VM failure Failover cluster Availability Groups Resilience against guest and OS level failures Planned and unplanned events Minimum downtime for patching and upgrades Minutes RTO Instance level protection Automatic failure detection and failover Seconds to minutes RTO Resilience against OS and SQL Server failures Database level protection Seconds RTO No data loss Recover from unplanned outage No downtime for planned maintenance Offload read/backup workload to active secondaries Failover to geographically distributed secondary site Backup/restore Basic Availability Groups Basic availability groups: Protection against accidental or malicious data corruption DR protection Minutes to hours RTO AG with two replicas Replaces Database Mirroring Log shipping Warm standbys for DR

Availability Groups and failover clustering (Windows)
Windows Server Failover Clustering (WSFC) cluster Always On: Failover Cluster Instances and Availability Groups work together to ensure data is accessible despite failures Network Subnet Network Subnet Node Node Node Node Node WSFC Configuration WSFC Configuration WSFC Configuration WSFC Configuration WSFC Configuration SQL Server Instance SQL Server Instance SQL Server Instance AlwaysOn SQL Server Failover Cluster Instance Instance Network Name Instance Network Name Instance Network Name Instance Network Name Always On Availability Group Secondary Replica Secondary Replica Secondary Replica Primary Replica Source: Relationship of SQL Server Always On components to WSFC Several layers of relationships exist between SQL Server Always On and Windows Server Failover Cluster (WSFC) features and components. Always On Availability Groups are hosted on SQL Server instances. A client request that specifies a logical availability group listener network name, to connect to a primary or secondary database, is redirected to the appropriate instance network name of the underlying SQL Server instance or SQL Server Failover Cluster Instance (FCI). SQL Server instances are actively hosted on a single node. If present, a standalone SQL Server instance always resides on a single node with a static instance network name. If present, a SQL Server FCI is active on one of two or more possible failover nodes with a single virtual instance network name. Nodes are members of a WSFC cluster. WSFC configuration metadata and status for all nodes is stored on each node. Each server might provide asymmetric storage or shared storage (SAN) volumes for user or system databases. Each server has at least one physical network interface on one or more IP subnets. The WSFC service monitors health and manages configuration for a group of servers. The WSFC service propagates changes to WSFC configuration metadata and status to all nodes in the cluster. Partial metadata and status might be stored on a WSFC quorum-witness remote file share. Two or more active nodes or witnesses constitute a quorum to vote on the health of the WSFC cluster. Always On Availability Group registry keys are subkeys of the WSFC cluster. NB: The configuration shown in this side is not one that customers typically use (one primary in FCI, one local secondary replica and two remote secondary replicas). The most common layout is primary replica in FCI in the primary datacenter for High Availability (HA), and then secondary replica in the remote datacenter (different subnet) for Disaster Recovery (DR). In this case, the multiple secondary replicas might be used for read scale-out. Availability Group Listener Virtual Network Name Storage Storage Storage Shared Storage

Availability Groups and failover clustering (Linux)
Pacemaker Cluster Always On: Failover Cluster Instances and Availability Groups work together to ensure data is accessible despite failures Network Subnet Network Subnet Node Node Node Node Node Pacemaker Configuration Pacemaker Configuration Pacemaker Configuration Pacemaker Configuration Pacemaker Configuration SQL Server Instance SQL Server Instance SQL Server Instance AlwaysOn SQL Server Failover Cluster Instance Instance Network Name Instance Network Name Instance Network Name Instance Network Name Always On Availability Group Secondary Replica Secondary Replica Secondary Replica Primary Replica Source: NB: At this point, SQL Server's integration with Pacemaker on Linux is not as coupled as it is with WSFC on Windows. From within SQL Server, there is no knowledge about the presence of the cluster—all orchestration is outside in and the service is controlled as a standalone instance by Pacemaker. Also, the virtual network name is specific to WSFC, and there is no equivalent in Pacemaker. Always On dynamic management views that query cluster information will return empty rows. You can still create a listener to use it for transparent reconnection after failover, but you will have to manually register the listener name in the DNS server with the IP used to create the virtual IP resource. NB: The configuration shown in this slide is not one that customers typically use (one primary in FCI, one local secondary replica and two remote secondary replicas). The most common layout is primary replica in FCI in the primary datacenter for HA, and then secondary replica in the remote datacenter (different subnet) for DR. In this case, the multiple secondary replicas might be used for read scale-out. Storage Storage Storage Shared Storage Pacemaker cluster virtual IP DNS name (manual registration)

Mission critical availability on any platform
Always On cross-platform capabilities Always On Availability Groups for Linux NEW* and Windows for HA and DR Flexibility for HA architectures NEW* Ultimate HA with OS-level redundancy and failover Load balancing of readable secondaries Enables testing Enables migrations Cross-operating system Sync/Async Replicas Linux High Availability Offload backups Scale BI reporting [this slide has animation] Mission critical availability on any platform In SQL Server 2017, we are enabling the same High Availability (HA) and Disaster Recovery (DR) solutions on all platforms supported by SQL Server, including Windows and Linux. Always On Availability Groups is SQL Server’s flagship solution for HA [click] and DR. SQL Server Always On Availability Groups can have up to eight readable secondary replicas. Each of these secondary replicas can also have their own replicas. When daisy chained together, these readable replicas create massive scale-out for analytics workloads. This scale-out scenario enables you to replicate around the globe, keeping read replicas close to your Business Analytics users—it’s of particularly high interest to users with large data warehouse implementations, and it’s easy to set up. In fact, you can now create availability groups that span Windows and Linux nodes, and scale out your analytics workloads across multiple operating systems. New flexibility to do HA without Windows Server failover clustering. Failover clustering with Pacemaker and more through integration scripts and guides. Always On Availability Groups with automatic failover, listener, synchronous replication, read-only secondaries. Shared disk failover clusters. Backup and restore: .bak, .bacpac, and .dacpac. Log shipping. High Availability Primary Sync/Async Replicas Windows

Enhanced Always On Availability Groups (SQL Server 2016)
Greater scalability Load balancing readable secondaries Increased number of automatic failover targets Log transport performance Distributed Availability Groups Improved manageability DTC support Database-level health monitoring Group Managed Service Account Domain-independent Availability Groups Basic HA in Standard Edition Unified HA solution AG_Listener Hong Kong (Secondary) Source: Load balancing of read-intent connection requests is now supported across a set of read-only replicas. The previous behavior always directed connections to the first available read-only replica in the routing list. For more information, see Configure load balancing across read-only replicas. The number of replicas that support automatic failover has been increased from two to three. Group Managed Service Accounts are now supported for Always On Failover Clusters. For more information, see Group Managed Service Accounts. For Windows Server 2012 R2, an update is required to avoid temporary downtime after a password change. To obtain the update, see GMSA-based services can't log on after a password change in a Windows Server 2012 R2 domain. Always On Availability Groups supports distributed transactions and the DTC on Windows Server For more information, see SQL Server 2016 Support for DTC and Always On Availablity Groups. You can now configure Always On Availability Groups to failover when a database goes offline. This change requires setting the DB_FAILOVER option to ON in the CREATE AVAILABILITY GROUP (Transact-SQL) or ALTER AVAILABILITY GROUP (Transact-SQL) statements. Asynchronous data movement Synchronous data movement New York (Primary) New Jersey (Secondary)

Guarantee commits on synchronous secondary replicas Use REQUIRED_COPIES_TO_COMMIT with CREATE AVAILABILITY GROUP or ALTER AVAILABILITY GROUP. When REQUIRED_COPIES_TO_COMMIT is set to a value higher than 0, transactions at the primary replica databases will wait until the transaction is committed on the specified number of synchronous secondary replica database transaction logs. If enough synchronous secondary replicas are not online, write transactions to primary replicas will stop until communication with sufficient secondary replicas resumes. Unified HA solution AG_Listener Hong Kong (Secondary) For more information about distributed transactions for databases in availability groups, see: Asynchronous data movement Synchronous data movement New York (Primary) New Jersey (Secondary)

CLUSTER_TYPE CLUSTER_TYPE Use with CREATE AVAILABILITY GROUP. Identifies the type of server cluster manager that manages an availability group. Can be one of the following types: WSFC: Windows Server failover cluster. On Windows, it is the default value for CLUSTER_TYPE. EXTERNAL: A cluster manager that is not a Windows Server failover cluster—for example, on Linux with Pacemaker. NONE: No cluster manager. Used for a read-scale availability group. Unified HA solution AG_Listener Hong Kong (Secondary) Asynchronous data movement Synchronous data movement New York (Primary) New Jersey (Secondary)

Build a mission critical enterprise application
Scenario All-Linux infrastructure Application-level protection Automatic and “within seconds” failover during unplanned outages No downtime during planned maintenance Performance-sensitive application DR required for regulatory compliance Solution HADR with Always On Availability Groups on Linux or Windows HA Async Log Synchronization Sync Log Synchronization DR P Reports Backups

Provide responsive regional BI with Azure and AG
Scenario Primary replica in on-premises datacenter Secondary read-only replicas in on-premises datacenter used for reporting/BI BI generated in other geographical regions performs poorly because of network bandwidth limitations No on-premises datacenters in other geographical regions Solution Hybrid Availability Group with read-only secondary in Azure (other region) On-premises (Region A) Azure cloud (Region B) P S3 Hybrid AG The Availability Group spans both on-premises and cloud resources—a hybrid. In addition to bringing the BI replica geographically closer to the users in region B, employing a read-only replica in Azure offers the opportunity to use Power BI Server in Azure for richer reporting. S1 S2 Power BI Server

Scale/DR with Distributed Availability Groups
Scenario Availability Group must span multiple datacenters Not possible to add all servers to a single WSFC (datacenter networks/inter-domain trust) Secondary datacenter provides DR Geographically distributed read-only replicas required Solution Distributed Always On Availability Groups on Linux or Windows AG 2 AG 1 Source: A Distributed Availability Group is a special type of availability group that spans two separate availability groups. The underlying availability groups are configured on two different Windows Server Failover Clustering (WSFC) clusters. The availability groups that participate in a Distributed Availability Group do not need to be in the same location. They can be physical, virtual, on-premises, in the public cloud, or anywhere that supports an availability group deployment. Providing two availability groups communicate, you can configure a distributed availability group with them. A traditional availability group has resources configured in a WSFC cluster. A Distributed Availability Group does not configure anything in the WSFC cluster—it’s all maintained within SQL Server. A Distributed Availability Group requires that the underlying availability groups have a listener. Rather than provide the underlying server name for a standalone instance (or in the case of a SQL Server FCI, the value associated with the network name resource) as you would with a traditional availability group, you specify the configured listener for the Distributed Availability Group with the parameter ENDPOINT_URL when you create it. Although each underlying availability group of the Distributed Availability Group has a listener, a Distributed Availability Group has no listener. World map image based on: licensed under Creative Commons Attribution-Share Alike 3.0 Unported Async Log Synchronization Distributed Availability Group

Migration/testing Scenarios Solution
ISV solution built on SQL Server on Windows Linux Certification Enterprise moving to an all-Linux infrastructure Rigorous business requirements Seamless migration Solution Minimum downtime and HA for cross- platform migrations with Distributed Availability Groups Migration/testing For more information, see: AG 2 AG 1

Improve read concurrency with read-scale Availability Groups
Scenario SaaS app (website) Catalog database with high volume of concurrent read-only transactions Bottlenecks on Availability Groups primary due to read workloads Increased response time HA/DR elements of Availability Groups not required Solution Read-scale Availability Groups No cluster required Both Linux and Windows P S1 Read-scale availability groups are a new feature in SQL Server Read-scale availability groups can be created without the need for a cluster manager. Up to 17 read-only replicas can be added to a read-scale availability group. Read-only replicas might be geographically distributed. Note that a read-scale availability is not an HADR solution. World map image based on licensed under Creative Commons Attribution-Share Alike 3.0 Unported S3 S2 S4

Automatic tuning

Automatic tuning Automatic plan correction identifies problematic plans and fixes SQL plan performance problems: Adapt Verify Learn Source: Automatic tuning is a continuous monitoring and analysis process that constantly learns about the characteristic of your workload and identifies potential issues and improvements. Automatic tuning in SQL Server 2017 notifies you whenever a potential performance issue is detected, and lets you apply corrective actions—or lets the Database Engine automatically fix performance problems. Automatic tuning in SQL Server 2017 enables you to identify and fix performance issues caused by SQL plan choice regressions. Automatic tuning in Azure SQL Database creates necessary indexes and drops unused indexes. The Database Engine monitors the queries that are executed on the database and automatically improves performance of the workload. Database Engine has a built-in intelligence mechanism that can automatically tune and improve the performance of your queries by dynamically adapting the database to your workload. Two automatic tuning features are available: Automatic plan correction (available in SQL Server 2017) that identifies problematic plans and fixes SQL plan performance problems. Automatic index management (available in Azure SQL Database) that identifies indexes that should be added in your database, and indexes that should be removed. Constantly monitoring performance can be a hard and tedious task, especially when dealing with many databases. Managing a huge number of databases might be impossible to do efficiently. Instead of monitoring and tuning your database manually, you might consider using the automatic tuning feature to delegate some of the monitoring and tuning actions to Database Engine.

Automatic plan choice detection
The Database Engine continuously collects query plan performance information Potential plan choice regression is identified when a change of query plan for a query corresponds to a drop in query performance Regressions (together with suggested mitigations) are reported in the DMV sys.dm_db_tuning_recommendations Automatic plan monitoring is enabled by default in SQL Server 2017. For more information about sys.dm_db_tuning_recommendations, see:

Automatic plan correction
Automatically apply the mitigation identified in sys.dm_db_tuning_recommendations by enabling the AUTOMATIC_TUNING database property: ALTER DATABASE current SET AUTOMATIC_TUNING ( FORCE_LAST_GOOD_PLAN = ON ); Source: In addition to detection, the Database Engine can automatically switch to the last known good plan whenever the regression is detected. When the Database Engine applies a recommendation, it automatically monitors the performance of the forced plan. The forced plan will be retained until a recompile (for example, on next statistics or schema change) if that is better than the regressed plan. If the forced plan is not better than the regressed plan, the new plan will be unforced and the Database Engine will compile a new plan. Enabling automatic plan choice correction The user can enable automatic tuning per database and specify that the last good plan should be forced whenever some plan change regression is detected. Automatic tuning is enabled using the following command: ALTER DATABASE current SET AUTOMATIC_TUNING ( FORCE_LAST_GOOD_PLAN = ON ); When you turn on this option, Database Engine will automatically force any recommendation where the estimated CPU gain is higher than 10 seconds, or the number of errors in the new plan is higher than the number of errors in the recommended plan—and verify that the forced plan is better than the current one.

Adaptive query processing

Adaptive query processing
Interleaved Execution Batch Mode Memory Grant Feedback Batch Mode Adaptive Joins Three features to improve query performance Enabled when the database is in SQL Server 2017 compatibility mode (140) Source: Adaptive query processing provides three techniques to improve execution plan selection: Batch mode memory grant feedback. Batch mode adaptive joins. Interleaved execution for multistatement table valued functions. ALTER DATABASE current SET COMPATIBILITY_LEVEL = 140;

Query processing and cardinality estimation
When estimates are accurate (enough), we make informed decisions around order of operations and physical algorithm selection CE uses a combination of statistical techniques and assumptions During optimization, the cardinality estimation (CE) process is responsible for estimating the number of rows processed at each step in an execution plan [this slide has animation – bullets appear in sequence]

Common reasons for incorrect cardinality estimates
Missing statistics Stale statistics Inadequate statistics sample rate Bad parameter sniffing scenarios Out-of-model query constructs For example, MSTVFs, table variables, XQuery Assumptions not aligned with data being queried For example, independence versus correlation [this slide has animation] There are many reasons that cardinality estimates might be inaccurate. [click] When statistics do not exist. When statistics exist but are out of date, because the profile of the data has changed but the statistics have not. When statistics exist, but are not based on a representative sample of the data. When a cached query plan is optimized for a nonrepresentative parameter value (parameter sniffing). When the query uses constructs for which cardinality estimates cannot be directly inferred. When assumptions inferred from the statistics, such as correlation, are not correct.

Cost of incorrect estimates
Slow query response time due to inefficient plans Excessive resource utilization (CPU, Memory, IO) Spills to disk Reduced throughput and concurrency T-SQL refactoring to work around off-model statements [this slide has animation – bullets appear in sequence]

Interleaved execution
Problem: Multi-statement table valued functions (MSTVFs) are treated as a black box by QP and we use a fixed optimization guess. Interleaved execution will materialize row counts for MSTVFs. Downstream operations will benefit from the corrected MSTVF cardinality estimate. Pre 2017 Optimize Execute 100 rows guessed for MSTVFs Performance issues if skewed 2017+ Source: Interleaved execution changes the unidirectional boundary between the optimization and execution phases for a single-query execution and enables plans to adapt, based on the revised cardinality estimates. During optimization, if we encounter a candidate for interleaved execution, which is currently multi-statement table valued functions (MSTVFs), we will pause optimization, execute the applicable subtree, capture accurate cardinality estimates, and then resume optimization for downstream operations. MSTVFs have a fixed cardinality guess of “100” in SQL Server 2014 and SQL Server 2016, and “1” for earlier versions. Interleaved execution helps workload performance issues that are due to these fixed cardinality estimates associated with MSTVFs. Optimize Execute Optimize Execute… MSTVF identified Execute MSTVF 500,000 rows assumed Good performance

Batch mode memory grant feedback
Problem: Queries can spill to disk or take too much memory, based on poor cardinality estimates. Memory grant feedback (MGF) will adjust memory grants based on execution feedback. MGF will remove spills and improve concurrency for repeating queries. Source:

Batch mode adaptive joins
Problem: If cardinality estimates are skewed, we might choose an inappropriate join algorithm. Batch mode adaptive joins (AJ) will defer the choice of hash join or nested loop until after the first join input has been scanned. AJ uses nested loop for small inputs, and hash joins for large inputs. Build input Hash join Yes Adaptive threshold Nested loop No

About interleaved execution
Expected performance improvements? Benefits workloads with skews and downstream operations

About interleaved execution
Minimal, because MSTVFs are always materialized Expected overhead? First execution cached will be used by consecutive executions Cached plan considerations Contains interleaved execution candidates Is interleaved executed Plan attributes Execution status, CE update, disabled reason Xevents [this slide has animation – bullets appear in sequence]

Interleaved execution candidates
SELECT statements 140 compatibility level MSTVF not used on the inside of a CROSS APPLY Not using plan forcing Not using USE HINT with DISABLE_PARAMETER_SNIFFING (or TF 4136)

About batch mode memory grant feedback
Benefits workloads with spills or overages Expected performance improvements? Before After

About batch mode memory grant feedback
If a plan has oscillating memory requirements, the feedback loop for that plan is disabled Expected overhead? Spill report, and updates by feedback XEvents For spills—spill size plus a buffer For overages—reduce based on waste, and add a buffer Expected decrease and increase size? Memory grant size will go back to original RECOMPILE or eviction scenarios [this slide has animation – bullets appear in sequence] In the first bullet, “oscillating” refers to the scenario where different executions of the same query do not have consistent memory requirements—and oscillate between high and low memory grants.

About batch mode adaptive join
Expected performance benefit? Performance gains occur for workloads where, prior to adaptive joins being available, the optimizer chooses the wrong join type due to incorrect cardinality estimates.

Queries involving columnstore indexes can dynamically switch between nested loop join and hash join operators at execution time: Source: The batch mode adaptive joins feature enables the choice of a hash join or nested loop join method to be deferred until after the first input has been scanned. The adaptive join operator defines a threshold that is used to decide when to switch to a nested loop plan. Your plan can therefore dynamically switch to a better join strategy during execution. Here’s how it works: If the row count of the build join input is small enough that a nested loop join would be more optimal than a hash join, your plan switches to a nested loop algorithm. If the build join input exceeds a specific row count threshold, no switch occurs and your plan continues with a hash join.

Memory is granted even for a nested loop scenario—if nested loop is always optimal, there is a greater overhead Expected overhead? Adaptive threshold rows, estimated and actual join type Plan attributes Adaptive join skipped XEvents Single compiled plan can accommodate low and high row scenarios Cached plan considerations

Eligible statements The join is eligible to be executed both by an indexed nested loop join or a hash join physical algorithm. The hash join uses batch mode—either through the presence of a columnstore index in the query overall or a columnstore indexed table being referenced directly by the join. The generated alternative solutions of the nested loop join and hash join should have the same first child (outer reference).

Adaptive join threshold

Query Store

Problems with query performance
Website is down Fixing query plan choice regressions is difficult Query plan cache is not well suited for performance troubleshooting. Long time to detect the issue (TTD) Which query is slow? Why is it slow? What was the previous plan? Long time to mitigate (TTM) Can I modify the query? How to use plan guide? Database is not working Temporary performance issues Impossible to predict root cause DB upgraded Regression caused by new bits Plan choice change can cause these problems

The solution: Query Store
9/17/2018 5:06 PM The solution: Query Store Dedicated store for query workload performance data Captures the history of plans for each query Captures the performance of each plan over time Persists the data to disk (works across restarts, upgrades, and recompiles) Significantly reduces TTD/TTM Find regressions and other issues in seconds Allows you to force previous plans from history DBA is now in control Monitoring Performance By Using the Query Store, August 14, 2015: © 2014 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Query Store architecture
Collects query texts (plus all relevant properties) Stores all plan choices and performance metrics Works across restarts/upgrade /recompiles Dramatically lowers the bar for performance troubleshooting New views Intuitive and easy plan forcing Compile MSG Execute MSG Query Store Durability latency controlled by DB option DATA_FLUSH_INTERNAL_SECONDS Async write-back Compile Execute SQL Plan store Runtime stats Query Store schema Query Store is a new feature that provides DBAs with insight on query plan choice and performance. It simplifies performance troubleshooting by enabling you to quickly find performance differences caused by changes in query plans. The feature automatically captures a history of queries, plans, and runtime statistics, and retains these for your review. It separates data by time windows, allowing you to see database usage patterns and understand when query plan changes happened on the server. The Query Store presents information by using a Management Studio dialog box, and lets you force the query to one of the selected query plans. For more information, see Monitoring Performance By Using the Query Store.

Query Store write architecture
9/17/2018 Query Store write architecture Query Store captures data in-memory to minimize I/O overhead Data is persisted to disk asynchronously in the background Query Store Query Execution Internal tables Query and plan store Runtime stats store Query execute stats Compile Execute async Query text and plan © 2015 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Query Store read architecture
9/17/2018 Query Store read architecture Views merge in-memory and on-disk content Users always see “latest” data Query Store views Query Store Query Execution Internal tables Query and plan store Runtime stats store Query execute stats Compile Execute async Query text and plan © 2015 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Keeping stability while upgrading to SQL Server 2017
Query Optimizer (QO) enhancements tied to database compatibility level Install bits keep existing compatibility level Run Query Store (create a baseline) Move to compatibility level 140 Fix regressions with plan forcing Source: Maintaining query performance stability For queries that are executed multiple times you might notice that SQL Server used different plans that resulted in different resource utilization and duration. With Query Store, you can easily detect when the query performance regressed and determine the optimal plan within a period of interest. You can then force that optimal plan for future query execution. You can also identify inconsistent query performance for a query with parameters (either auto parameterized or manually parameterized). Among the different plans, you can identify a plan that is fast and optimal enough for all or most of the parameter values. You then force that plan, keeping predictable performance for the wider set of user scenarios. Force a plan for a query (apply forcing policy) When a plan is forced for a certain query, every time a query comes to execution, it will be executed with the plan that is forced. EXEC = = 49; When using sp_query_store_force_plan you can only force plans that were recorded by Query Store as a plan for that query. In other words, the only plans available for a query are those that were already used to execute Q1 while Query Store was active. Remove plan forcing for a query To rely again on the SQL Server query optimizer to calculate the optimal query plan, use sp_query_store_unforce_plan to unforce the plan that was selected for the query. EXEC = = 49; Other notes: De-risk upgrades: QDS PF and COMPAT LEVEL All plans are frozen in pre-upgrade state. We changed QP to keep CE changes under compat level. Questions: Would the described scenario work in your environment? Plan freezing will keep your current forcing decisions and apply the LAST plan for other queries. Is that what you would expect?

Monitoring performance by using the Query Store
The Query Store feature provides DBAs with insight on query plan choice and performance Source: This slide emphasizes that Query Store comes with extraordinary UI that will help a broader set of users to immediately benefit from collected performance data. SSMS is focused around a handful of the most important scenarios that make the feature instantly useful for a typical DBA in their everyday activities. We want to encourage people to try new UI and learn from it: it’s a great knowledge source because you easily learn the first steps of using Query Store DMVs by analyzing queries generated by SSMS.

Working with Query Store
/* (6) Performance analysis using Query Store views*/ SELECT q.query_id, qt.query_text_id, qt.query_sql_text, SUM(rs.count_executions) AS total_execution_count FROM sys.query_store_query_text qt JOIN sys.query_store_query q ON qt.query_text_id = q.query_text_id JOIN sys.query_store_plan p ON q.query_id = p.query_id JOIN sys.query_store_runtime_stats rs ON p.plan_id = rs.plan_id GROUP BY q.query_id, qt.query_text_id, qt.query_sql_text ORDER BY total_execution_count DESC /* (7) Force plan for a given query */ exec sp_query_store_force_plan 12 14 /* (1) Turn ON Query Store */ ALTER DATABASE MyDB SET QUERY_STORE = ON; /* (2) Review current Query Store parameters */ SELECT * FROM sys.database_query_store_options /* (3) Set new parameter values */ ALTER DATABASE MyDB SET QUERY_STORE ( OPERATION_MODE = READ_WRITE, CLEANUP_POLICY = ( STALE_QUERY_THRESHOLD_DAYS = 30 ), DATA_FLUSH_INTERVAL_SECONDS = 3000, MAX_SIZE_MB = 500, INTERVAL_LENGTH_MINUTES = 15 ); /* (4) Clear all Query Store data */ ALTER DATABASE MyDB SET QUERY_STORE CLEAR; /* (5) Turn OFF Query Store */ ALTER DATABASE MyDB SET QUERY_STORE = OFF; DB-level feature exposed through T-SQL extensions ALTER DATABASE Catalog views (settings, compile, and runtime stats) Stored procedures (plan forcing, query/plan/stats cleanup) Source: Enabling the Query Store Query Store is not active for new databases by default. By Using the Query Store page in Management Studio In Object Explorer, right-click a database, and then click Properties. Note: Requires at least the SQL Server 2016 Community Technology Preview 2 (CTP2) version of Management Studio. In the Database Properties dialog box, select the Query Store page. In the Enable box, select True. By Using Transact-SQL Statements Use the ALTER DATABASE statement to enable the Query Store. For example: ALTER DATABASE AdventureWorks2012 SET QUERY_STORE = ON; For more syntax options related to the Query Store, see ALTER DATABASE SET Options (Transact-SQL). For more information, see the Key Usage Scenarios topic:

Query Store enhancements (SQL Server 2017)
Query Store now tracks wait stats summary information. Tracking wait stats categories per query in Query Store enables the next level of performance troubleshooting experience. It provides even more insight into the workload performance and its bottlenecks while preserving the key Query Store advantages.

Live query statistics View CPU/memory usage, execution time, query progress, and more. Enables rapid identification of potential bottlenecks for troubleshooting query performance issues. Allows drill-down to live operator level statistics: Number of generated rows Elapsed time Operator progress Live warnings Source: SQL Server Management Studio provides the ability to view the live execution plan of an active query. This live query plan provides real-time insights into the query execution process as the controls flow from one query plan operator to another. The live query plan displays the overall query progress and operator-level runtime execution statistics—for example, the number of rows produced, elapsed time, operator progress, and so on. Because this data is available in real time without needing to wait for the query to complete, these execution statistics are extremely useful for debugging query performance issues. Remarks The statistics profile infrastructure must be enabled before live query statistics can capture information about the progress of queries. Specifying Include Live Query Statistics in Management Studio enables the statistics infrastructure for the current query session. There are two other ways to enable the statistics infrastructure that can be used to view the live query statistics from other sessions (such as from Activity Monitor). Execute SET STATISTICS XML ON; or SET STATISTICS PROFILE ON; in the target session. Enable the query_post_execution_showplan extended event. This is a server-wide setting that enables live query statistics on all sessions. To enable extended events, see Monitor System Activity Using Extended Events. Limitations Queries using columnstore indexes are not supported. Queries with memory-optimized tables are not supported. Natively compiled stored procedures are not supported.

Summary: Query Store Capability Benefits
Query Store helps customers quickly find and fix query performance issues Query Store is a “flight data recorder” for database workloads Benefits Greatly simplifies query performance troubleshooting Provides performance stability across SQL Server upgrades Allows deeper insight into workload performance Source:

Resource Governor

Resource Governor Resource Governor enables you to specify limits on the amount of CPU, physical IO, and memory that incoming application requests to the Database Engine can use. With Resource Governor, you can: Provide multitenancy and resource isolation on single instances of SQL Server that serve multiple client workloads. Provide predictable performance and support SLAs for workload tenants in a multiworkload and multiuser environment. Isolate and limit runaway queries or throttle IO resources for operations such as DBCC CHECKDB that can saturate the IO subsystem and negatively affect other workloads. Add fine-grained resource tracking for resource usage chargebacks and to provide predictable billing to consumers of the server resources. Source:

Resource Governor architecture
There’s an incoming connection for a session (Session 1 of n). The session is classified (Classification). The session workload is routed to a workload group, (for example, Group 4). The workload group uses the resource pool with which it is associated (for example, Pool 2). The resource pool provides and limits the resources required by the application (for example, Application 3). Source: The following three concepts are fundamental to understanding and using Resource Governor: Resource pools. A resource pool represents the physical resources of the server. You can think of a pool as a virtual SQL Server instance inside a SQL Server instance. Two resource pools (internal and default) are created when SQL Server is installed. Resource Governor also supports user-defined resource pools. For more information, see: Resource Governor Resource Pool. Workload groups. A workload group serves as a container for session requests that have similar classification criteria. A workload allows for aggregate monitoring of the sessions, and defines policies for the sessions. Each workload group is in a resource pool. Two workload groups (internal and default) are created and mapped to their corresponding resource pools when SQL Server is installed. Resource Governor also supports user-defined workload groups. For more information, see: Resource Governor Workload Group. Classification. The classification process assigns incoming sessions to a workload group based on the characteristics of the session. You can tailor the classification logic by writing a user-defined function, called a classifier function. Resource Governor also supports a classifier user-defined function for implementing classification rules. For more information, see: Resource Governor Classifier Function.

Defining resource pools
A resource pool represents the physical resources of the server. A pool is defined as minimum and/or maximum constraints on server resources (CPU, memory, and physical IO): MIN_CPU_PERCENT and MAX_CPU_PERCENT CAP_CPU_PERCENT MIN_MEMORY_PERCENT and MAX_MEMORY_PERCENT AFFINITY MIN_IOPS_PER_VOLUME and MAX_IOPS_PER_VOLUME Source:

Data warehousing and big data

Columnstore

SQL Server performance features: Columnstore
A technology for storing, retrieving, and managing data by using a columnar data format called a columnstore. You can use columnstore indexes for real-time analytics on your operational workload. Key benefits Provides a very high level of data compression, typically 10x, to reduce your data warehouse storage cost significantly. Indexing on a column with repeated values vastly improves performance for analytics. Improved performance: More data fits in memory Batch-mode execution Data stored as columns Existing Table Row Groups Column Segments Columnstore Compressed Column Segments

Columnstore: Clustered vs. nonclustered indexes
Data that is logically organized as a table with rows and columns, and then physically stored in a row-wise data format. Rowstore Data that is logically organized as a table with rows and columns, and physically stored in a column-wise data format. Columnstore In SQL Server, rowstore refers to a table where the underlying data storage format is a heap, clustered index, or memory-optimized table.

Columnstore: Clustered vs. nonclustered indexes
A secondary index on the standard table (rowstore). Nonclustered index The primary storage for the entire table. Clustered index Both columnstore indexes offer high compression (10x) and improved query performance. Nonclustered indexes enable a standard OLTP workload on the underlying rowstore, and a separate simultaneous analytical workload on the columnstore—with negligible impact to performance (Real-Time Operational Analytics).

Steps to creating a columnstore (NCCI)
Add a columnstore index to the table by executing the T-SQL CREATE NONCLUSTERED COLUMNSTORE INDEX [IX_SalesOrderDetail_ColumnStore] ON Sales.SalesOrderDetail (UnitPrice, OrderQty, ProductID) GO SELECT ProductID, SUM(UnitPrice) SumUnitPrice, AVG(UnitPrice) AvgUnitPrice, SUM(OrderQty) SumOrderQty, AVG(OrderQty) AvgOrderQty FROM Sales.SalesOrderDetail GROUP BY ProductID ORDER BY ProductID Execute the query that should use the columnstore index to scan the table SELECT * FROM sys.indexes WHERE name = 'IX_SalesOrderDetail_ColumnStore' GO SELECT * FROM sys.dm_db_index_usage_stats WHERE database_id = DB_ID('AdventureWorks') AND object_id = OBJECT_ID('AdventureWorks.Sales.SalesOrderDetail'); Verify that the columnstore index was used by looking up its object_id and confirming that it appears in the usage stats for the table With a nonclustered column index (NCCI), you can run an analytics workload in conjunction with the existing transactional workload. The OLTP workload continues to run on the standard rowstore table. The analytics workload runs on the columnstore. Because the two workloads are effectively running on two different data stores, there is minimal impact to performance for either load. This is particularly effective for the analytics workload, because the NCCI is optimized for repeated values—similar to what you would see in a BI model. This is the basis for Real-Time Operational Analytics (RTOA).

Columnstore index enhancements (SQL Server 2016)
Improvements SQL Server 2016 Clustered columnstore index Master copy of the data (10x compression) Additional B-tree indexes for efficient equality, short-range searches, and PK/FK constraints Locking granularity at row level using NCI index path DDL: ALTER, REBUILD, REORGANIZE Updatable nonclustered index Updatable Ability to mix OLTP and analytics workload Ability to create filtered NCCI Partitioning supported Equality and short-range queries Optimizer can choose NCI on column C1; index points directly to rowgroup No full index scan Covering NCI index String predicate pushdown Apply filter on dictionary entries Find rows that refer to dictionary entries that qualify (R1) Find rows not eligible for this optimization (R2) Scan returns (R1 + R2) rows Filter node applies string predicate on (R2) Row returned by Filter node = (R1 + R2) Objective: This slide summarizes the feature comparison of columnstore indexes based on their availability in SQL Server 2014 and 2016. Talking points: This table summarizes the improvement of key features for columnstore indexes in SQL 2014 and SQL 2016. SQL Server 2016 adds key enhancements to improve the performance and flexibility of columnstore indexes. This enables Real-Time Operational Analytics. A rowstore table can have one updatable nonclustered columnstore index. Previously, the nonclustered columnstore index was read-only. The nonclustered columnstore index definition supports the use of a filtered condition. Use this feature to create a nonclustered columnstore index on only the cold data of an operational workload. By doing this, the performance impact of having a columnstore index on an OLTP table will be minimal. An in-memory table can have one columnstore index. Previously, only a disk-based table could have a columnstore index. A clustered columnstore index can have one or more nonclustered rowstore indexes. Previously, the columnstore index did not support nonclustered indexes. SQL Server automatically maintains the nonclustered indexes for DML operations. Support for primary and foreign keys is through the use of a B-tree index to enforce these constraints on a clustered columnstore index.

Columnstore index enhancements (SQL Server 2017)
Clustered columnstore indexes now support LOB columns (nvarchar(max), varchar(max), varbinary(max)) Online nonclustered columnstore index build and rebuild support added

PolyBase

Interest in big data spurs customer demand
Increase in number and variety of data sources that generate large quantities of data Realization that data is “too valuable” to delete Dramatic decline in hardware cost, especially storage $ Adoption of big data technologies like Hadoop

PolyBase Query relational and non-relational data with T-SQL
Capability T-SQL for querying relational and non-relational data across SQL Server and Hadoop Benefits New business insights across your data lake Use existing skill sets and BI tools Faster time for insights and a simplified ETL process Apps Query relational and non-relational data, on-premises and in Azure SQL Server Hadoop T-SQL query When it comes to key BI investments, we are making it much easier to manage relational and non-relational data. PolyBase technology allows you to query Hadoop data and SQL Server relational data through a single T-SQL query. One of the challenges we see is that there are not enough people knowledgeable in Hadoop and MapReduce—this technology simplifies the skill set needed to manage Hadoop data. This can also work across your on-premises environment or SQL Server running in Azure.

PolyBase in SQL Server 2017 PolyBase view
Execute T-SQL queries against relational data in SQL Server and semi-structured data in Hadoop or Azure Blob storage Use existing T-SQL skills and BI tools to gain insights from different data stores Query Results SQL Server Hadoop Azure Blob storage

PolyBase use cases Load data Interactively query Age out data
Use Hadoop as an ETL tool to cleanse data before loading to the data warehouse with PolyBase Interactively query Analyze relational data with semi-structured data using split-based query processing Age out data Age out data to HDFS and use it as “cold” but queryable storage

PolyBase components PolyBase Engine Service Head Node
PolyBase Data Movement Service (with HDFS Bridge) External table constructs MR pushdown computation support Head Node SQL 2017 PolyBase Engine PolyBase DMS Introduced in SQL Server 2016.

PolyBase architecture
PolyBase Group Head Node Compute Nodes SQL 2017 PolyBase Engine PolyBase DMS Hadoop Cluster Namenode Datanode File System AB 01 PolyBase T-SQL queries submitted here PolyBase queries can only refer to tables and/or external tables here

Supported big data sources
-- different numbers map to various Hadoop flavors -- example: value 4 stands for HDP 2.x on Linux, value 5 for HDP 2.x on Windows, value 6 for CHD 5.x on Linux Supported big data sources Hortonworks HDP 1.3 on Linux/Windows Server Hortonworks HDP on Windows Server Hortonworks HDP on Windows Server Cloudera CDH 4.3 on Linux Cloudera CDH on Linux Azure Blob storage What happens behind the scenes? Loading the right client jars to connect to Hadoop distribution

After setup PolyBase Group Head Node Compute Nodes SQL 2017 PolyBase Engine PolyBase DMS Hadoop Cluster Namenode Datanode File System AB 01 Compute nodes are used for scale-out query processing on external tables in HDFS Tables on compute nodes cannot be referenced by queries submitted to head node Number of compute nodes can be dynamically adjusted by DBA Hadoop clusters can be shared among multiple PolyBase groups Improved PolyBase query performance with scale-out computation on external data (PolyBase scale-out groups). Improved PolyBase query performance with faster data movement from HDFS to SQL Server, and between PolyBase Engine and SQL Server.

Creating Polybase objects
CREATE EXTERNAL DATA SOURCE HadoopCluster WITH( TYPE = HADOOP, LOCATION = 'hdfs:// :8020' ); CREATE EXTERNAL FILE FORMAT CommaSeparatedFormat WITH( FORMAT_TYPE = DELIMITEDTEXT, FORMAT_OPTIONS (FIELD_TERMINATOR = ',', USE_TYPE_DEFAULT = TRUE) CREATE EXTERNAL TABLE [dbo].[SensorData]( vin varchar(255), speed int, fuel int, odometer int, city varchar(255), datatimestamp varchar(255) ) WITH( LOCATION = '/apps/hive/warehouse/sensordata', DATA_SOURCE = HadoopCluster, FILE_FORMAT = CommaSeparatedFormat Create an external data source Create an external file format Create an external table for unstructured data

Polybase queries Query external data table as SQL data
SELECT [vin], [speed], [datetimestamp] FROM dbo.SensorData [make], [model], [modelYear], FROM dbo.AutomobileData LEFT JOIN dbo.SensorData ON dbo.AutomobileData.[vin] = dbo.SensorData.[vin] Query external data table as SQL data Data returned as defined in external data table Join SQL data with external data Join data between internal and external table All TSQL commands supported PolyBase will optimize between SQL-side query and pushdown to MapReduce

Resumable online indexing

Resumable online indexing
With resumable online index rebuild you can resume a paused index rebuild operation from where the rebuild operation was paused, rather than having to restart the operation at the beginning. In addition, this feature rebuilds indexes using only a small amount of log space. Resume an index rebuild operation after an index rebuild failure, such as following a database failover or after running out of disk space. There’s no need to restart the operation from the beginning. This can save a significant amount of time when rebuilding indexes for large tables. Pause an ongoing index rebuild operation and resume it later—for example, to temporarily free up system resources to execute a high priority. Instead of aborting the index rebuild process, you can pause the index rebuild operation, and resume it later without losing prior progress. Rebuild large indexes without using a lot of log space and have a long-running transaction that blocks other maintenance activities. This helps log truncation and avoids out-of-log errors that are possible for long-running index rebuild operations. Source: See also:

Using resumable online index rebuild
Start a resumable online index rebuild ALTER INDEX test_idx on test_table REBUILD WITH (ONLINE=ON, RESUMABLE=ON) ; Pause a resumable online index rebuild ALTER INDEX test_idx on test_table PAUSE ; Resume a paused online index rebuild ALTER INDEX test_idx on test_table RESUME ; Abort a resumable online index rebuild (which is running or paused) ALTER INDEX test_idx on test_table ABORT ; Source: View metadata about resumable online index operations SELECT * FROM sys.index_resumable_operations ;

Partitioning

Rows in a partitioned table
Partitioning SQL Server supports partitioning of tables and indexes. In partitioning, a logical table (or index) is split into two or more physical partitions, each containing a portion of the data. Allocation of data to (and retrieval from) the partitions is managed automatically by the Database Engine, based on a partition function and partition scheme that you define. Partitioning can enhance the performance and manageability of large data sets, enabling you to work with a subset of the data. Rows in a partitioned table Physical partitions Source: Partitioning is available in all editions of SQL Server 2017 (including SQL Server 2017 Express Edition). This is a change introduced in SQL Server 2016 Service Pack 1—partitioning was previously restricted to Enterprise Edition.

Steps to create a partitioned table
Create a partition function -- Creates a partition function called myRangePF1 that will partition a table into four partitions CREATE PARTITION FUNCTION myRangePF1 (int) AS RANGE LEFT FOR VALUES (1, 100, 1000) ; GO -- Creates a partition scheme called myRangePS1 that applies myRangePF1 to four database filegroups CREATE PARTITION SCHEME myRangePS1 AS PARTITION myRangePF1 TO (test1fg, test2fg, test3fg, test4fg) ; GO Create a partition scheme (assumes four filegroups, test1fg to test4fg) -- Creates a partitioned table called PartitionTable that uses myRangePS1 to partition col1 CREATE TABLE PartitionTable (col1 int PRIMARY KEY, col2 char(10)) ON myRangePS1 (col1) ; GO Create a partitioned table based on the partition scheme New data added to the table will be assigned to a partition, based on the values provided for col1.

Manage large tables with table partitioning
Scenario Log data table grows by millions of rows a day Removing old data (for regulatory compliance) exceeds maintenance window Solution Partition the table with a partition function based on date (day or month) New data loaded into the current active partition Historic data can be removed by clearing down partitions

Security

Security Protect data Control access Monitor access
Microsoft Connect 2016 9/17/2018 5:06 PM Security SQL Server provides enterprise-grade security capabilities on Windows and on Linux—all built in. Protect data Encryption at rest Transparent Data Encryption Backup Encryption Cell-Level Encryption Encryption in transit Transport Layer Security (SSL/TLS) Encryption in use (client) Always Encrypted SQL Control access Database access SQL Authentication Active Directory Authentication Granular Permissions Application access Row-Level Security Dynamic Data Masking [this slide has animation] SQL Server provides enterprise-grade security capabilities on Windows and on Linux. [click] We think about security for SQL Server in terms of layers. At the center, you have your data and how you protect it, typically using encryption. SQL Server supports a variety of encryption features to help protect your data against different types of threats. Transparent Data Encryption (TDE) encrypts your whole database at rest, without requiring any application changes. Backup Encryption encrypts your backup files, and Cell-Level Encryption gives you granular control over the encryption of individual cells of data. To encrypt data in transit to and from the database, SQL Server supports the industry-standard TLS 1.2 protocol. And finally, Always Encrypted enables you to encrypt sensitive data client-side, so that even privileged SQL Server administrators are unable to see it in plaintext. The next layer is about controlling access to the data. SQL Authentication allows users to authenticate via a username and password, while Active Directory Authentication allows users to authenticate using single sign-on through Active Directory and Kerberos. Granular permissions enable you to control access to individual tables or even columns of data. Row-Level Security allows you to control read/write permission for individual rows of data based on a customizable policy, and Dynamic Data Masking allows you to easily mask fields (such as account numbers) so that only part of the data can be seen. The final layer is about monitoring who accesses the data. SQL Server’s Fine-Grained Audit feature allows you to enforce a data audit policy and track which users are performing which actions. So that’s where we’re going with security for SQL Server on Linux. Almost all of these features are already available for you to try in the CTP1 preview release—please try them and share your feedback with us. Support for TLS is in progress, and will become available in an upcoming build. Similarly, Active Directory Authentication is one of our most requested features, and will become available in an upcoming build. Monitor access Tracking activities Fine-Grained Audit © 2016 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

SQL Server 2017 and GDPR compliance
Control access to personal data Authentication Authorization Dynamic Data Masking Row-Level Security Safeguarding data Transparent Data Encryption Transport Layer Security (TLS) Always Encrypted SQL Server Audit Source: SQL Server includes a number of features to help you meet the data security requirements of the European General Data Protection Regulation (GDPR) that comes into force in May 2018. Many of these features are covered in more depth in subsequent slides.

Always Encrypted

Always Encrypted Always Encrypted allows clients to encrypt sensitive data inside client applications, and never reveal the encryption keys to the Database Engine. As a result, Always Encrypted provides a separation between those who own the data (and can view it) and those who manage the data (but should have no access). Always Encrypted makes encryption transparent to applications. An Always Encrypted-enabled driver installed on the client computer achieves this by automatically encrypting and decrypting sensitive data in the client application. The driver encrypts the data in sensitive columns before passing the data to the Database Engine, and automatically rewrites queries so that the semantics to the application are preserved. Similarly, the driver transparently decrypts data that is stored in encrypted database columns, and contained in query results. Source:

Protect your data at rest and in motion (without impacting database performance)
Always Encrypted Trusted Apps SELECT Name FROM Patients WHERE @SSN=' ' SQL Server SELECT Name FROM Patients WHERE @SSN=0x7ff654ae6d Client side Query Enhanced ADO.NET Library Column Master Key Column Encryption Key Result Set Result Set [this slide has animation] A SQL Server database contains a table (dbo.Patients) that contains sensitive information, like SSN. [click] When Always Encrypted is enabled on the SSN column, the values are encrypted with a column encryption key. The encrypted value (the ciphertext) is stored in the database, and can no longer be read directly from the database without access to the column master key. An enhanced database driver library (with Always Encrypted support) is used by client applications to access the data. A client application executes a query to return a row from dbo.Patients by SSN. The query is routed through the enhanced database driver. The database driver uses the column master key to encrypt the search term, and the matching row is returned by the database engine. Encryption of the search term is transparent to the application. The matching row is returned to the application; the database driver decrypts the Always Encrypted column value. Decryption is transparent to the application. Denny Usher Name 0x7ff654ae6d SSN USA Country ciphertext Philip Wegner Name 0x7fg655se2e SSN USA Country Denny Usher 0x7ff654ae6d Alicia Hodge 0x8fj754ea2c dbo.Patients Name SSN Country Denny Usher USA Alicia Hodge Philip Wegner dbo.Patients dbo.Patients Philip Wegner Name SSN USA Denny Usher 0x7ff654ae6d Alicia Hodge 0x8fj754ea2c 0x7fg655se2e Country

Row-Level Security

Row-Level Security Row-Level Security (RLS) enables customers to control access to rows in a database table based on the characteristics of the user executing a query (for example, in a group membership or execution context). The database system applies the access restrictions every time a tier attempts to access data This makes the security system more reliable and robust by reducing the surface area of your security system RLS works with a predicate (condition) which, when true, allows access to appropriate rows Can be either a filter or block predicate A filter predicate “filters out” rows from a query—the filter is transparent, and the end user is unaware of any filtering A block predicate prevents unauthorized action, and will throw an exception if the action cannot be performed

Configure Row-Level Security
Create user accounts to test Row-Level Security USE AdventureWorks2014; GO CREATE USER Manager WITHOUT LOGIN; CREATE USER SalesPerson280 WITHOUT LOGIN; GRANT SELECT ON Sales.SalesOrderHeader TO Manager; GRANT SELECT ON Sales.SalesOrderHeader TO SalesPerson280; Grant read access to users on a required table CREATE SCHEMA Security; GO CREATE FUNCTION AS int) RETURNS TABLE WITH SCHEMABINDING AS RETURN SELECT 1 AS fn_securitypredicate_result WHERE ('SalesPerson' + as VARCHAR(16)) = USER_NAME()) OR (USER_NAME() = 'Manager'); Create a new schema and inline table-valued function CREATE SECURITY POLICY SalesFilter ADD FILTER PREDICATE Security.fn_securitypredicate(SalesPersonID) ON Sales.SalesOrderHeader, ADD BLOCK PREDICATE Security.fn_securitypredicate(SalesPersonID) ON Sales.SalesOrderHeader WITH (STATE = ON); Create a security policy, adding the function as both a filter and block predicate on the table Execute the query to the required table so that each user sees the result (can also alter the security policy to disable)

Dynamic Data Masking

Dynamic Data Masking Prevent the abuse of sensitive data by hiding it from users Table.CreditCardNo Configuration made easy in the new Azure portal Policy-driven at the table and column level, for a defined set of users Data masking applied in real time to query results based on policy Multiple masking functions available (for example, full, partial) for various sensitive data categories (credit card numbers, SSN, and so on) Real-time data masking, partial masking SQL Database SQL Server 2017 Source: Dynamic Data Masking limits sensitive data exposure by masking it to nonprivileged users. Dynamic Data Masking helps prevent unauthorized access to sensitive data by enabling customers to designate how much of the sensitive data to reveal with minimal impact on the application layer. It’s a policy-based security feature that hides the sensitive data in the result set of a query over designated database fields, while the data in the database is not changed. Dynamic Data Masking is easy to use with existing applications, because masking rules are applied in the query results, and there is no need to modify existing queries. For example, a call center support person might identify callers by taking several digits from their social security number or credit card number, but those data items should not be fully exposed to the support person. A developer can define a masking rule to be applied to each query result that masks all but the last four digits of any social security number or credit card number in the result set. For another example, by using the appropriate data mask to protect personally identifiable information (PII) data, a developer can query production environments for troubleshooting purposes without violating compliance regulations. Dynamic Data Masking limits the exposure of sensitive data and prevents accidental viewing by engineers who directly access databases for troubleshooting purposes—or nonprivileged application users. Dynamic Data Masking doesn’t aim to prevent privileged database users from connecting directly to the database and running exhaustive queries that expose pieces of the sensitive data. Dynamic Data Masking is complementary to other SQL Server security features (such as auditing, encryption, and row-level security) and is highly recommended to help to better protect the sensitive data in the database. Because data is masked just before being returned to the user, changing the data type to an unmasked type will return unmasked data. Dynamic Data Masking is available in SQL Server However, to enable Dynamic Data Masking, you must use trace flags 209 and 219. For SQL Database, see Get started with SQL Database Dynamic Data Masking (Azure Preview portal).

Dynamic Data Masking walkthrough
Microsoft Research 2013 9/17/2018 5:06 PM Dynamic Data Masking walkthrough Security officer defines Dynamic Data Masking policy in T-SQL over sensitive data in Employee table. Application user selects from Employee table. Dynamic Data Masking policy obfuscates the sensitive data in the query results. Security Officer ALTER TABLE [Employee] ALTER COLUMN [SocialSecurityNumber] ADD MASKED WITH (FUNCTION = ‘SSN()’) ALTER TABLE [Employee] ALTER COLUMN [ ] ADD MASKED WITH (FUNCTION = ‘ ()’) ALTER TABLE [Employee] ALTER COLUMN [Salary] ADD MASKED WITH (FUNCTION = ‘RANDOM(1,20000)’) GRANT UNMASK to admin1 [this slide has animation] In this scenario, the application database contains employee information that should not be visible to nonadmin users. [click] The organization security officer alters the sensitive information columns in the Employee table to mask their values, then grants the admin1 user permission to view the unmasked data. Users execute queries against the Employee table. [Click] Nonadmin users see masked data, but the administrative login sees the unmasked data. admin1 login other login SELECT [Name], [SocialSecurityNumber], [ ], [Salary] FROM [Employee] © 2013 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Configure Dynamic Data Masking
Use an ALTER TABLE statement to add a masking function to the required column in the table USE AdventureWorks2014; GO ALTER TABLE Person. Address ALTER COLUMN Address ADD MASKED WITH (FUNCTION = ' ()'); CREATE USER TestUser WITHOUT LOGIN; GRANT SELECT ON Person. Address TO TestUser; Create a new user with SELECT permission on the table, and then execute a query to view masked data Verify that the masking function changes the required column with a masked field EXECUTE AS USER = 'TestUser'; SELECT AddressID, Address FROM Person. Address; REVERT;

Auditing

SQL Server 2017 Auditing SQL Server Audit is the primary auditing tool in SQL Server Track and log server-level events in addition to individual database events SQL Server Audit uses Extended Events to help create and run audit-related events SQL Server Audit includes several audit components: SQL Server Audit: This container holds a single audit specification for either server- or database-level audits. You define multiple server audits to run simultaneously. SQL Server Audit specifications: This tracks server-level audits and invokes the necessary extended events as defined by the user. You can define only one server audit per audit (container). SQL Server Database Audit specifications: This object also comes under the server audit. User-defined database-level events are tracked and logged. Predefined templates help you define a database audit.

SQL Server Audit The server audit is the parent component of a SQL Server audit and can contain both: Server audit specifications Database audit specifications It resides in the master database, and is used to define where the audit information will be stored, the file rollover policy, the queue delay and how SQL Server should react in case auditing is not possible. The following server audit configuration is required: The server audit name The action to take Continue and ignore the log issue Shut down the server Fail the operation The audit destination Unfortunately it cannot be done at column level as of yet Permissions required: ALTER ANY SERVER AUDIT CONTROL SERVER

Database Audit Specification
This is at the database level. Using more granular auditing can minimize the performance impact on your server. This is done by using a Database Audit Specification that is only available in Enterprise Edition. Using the Database Audit Specification, auditing can be performed at object or user level. The Database Audit Specification name (optional—default name will be assigned) The server audit that the specification must be linked to The Audit Action type. There are both: Audit Actions Audit Action groups (which may be selected, INSERTED and UPDATED or DELETED) The object name of the object to be audited when an Audit Action has been selected The schema of the selected object The principal name. To audit all users, use the keyword “public” in this field Unfortunately this cannot yet be done at column level Permissions required: ALTER ANY DATABASE AUDIT SPECIFICATION. ALTER or CONTROL (permission for the database to which you would like to add the audit)

TODO: BI placeholder

Advanced analytics

Machine Learning Services

In-Database analytics with SQL Server
In SQL Server 2016, Microsoft launched two server platforms for integrating the popular open source R language with business applications: SQL Server R Services (In-Database), for integration with SQL Server Microsoft R Server, for enterprise-level R deployments on Windows and Linux servers In SQL Server 2017, the name has been changed to reflect support for the popular Python language: Source: Microsoft Machine Learning Services brings the compute to the data by allowing R and Python to run on the same computer as the database. Machine Learning Services includes the Trusted Launchpad service that runs outside the SQL Server process and communicates securely with the R or Python runtime. You use Machine Learning Services to train models, generate plots, perform scoring, and easily move data between SQL Server, and R or Python. Data scientists, who are testing and developing solutions, send scripts from a remote development computer to run code securely on the server, or they can deploy completed solutions to SQL Server by embedding machine learning code in SQL stored procedures. When you install machine learning for SQL Server, you get a distribution of the open source R and/or Python language, in addition to the scalable R and Python libraries provided by Microsoft. The SQL Server Database Engine also includes new components designed to bolster connectivity and ensure faster, more secure communication with external languages such as R or Python. SQL Server Machine Learning Services (In-Database) supports both R and Python for in- database analytics Microsoft Machine Learning Server supports R and Python deployments on Windows servers—expansion to other supported platforms is planned for late 2017

Machine Learning Services
SQL Server Analytical engines Integrate with R/Python Data management layer Relational data Use T-SQL interface Stream data in-memory Analytics library Share and collaborate Manage and deploy R Data scientists Business analysts Publish algorithms, interact directly with data Analyze through T-SQL, tools, and vetted algorithms DBAs Manage storage and analytics together Capability Extensible in-database analytics, integrated with R, exposed through T-SQL Centralized enterprise library for analytic models Benefits No data movement, resulting in faster time to insights Real-time analytics on transactional data Integration with existing workflows Unified governance across analytics and storage

Enhanced Machine Learning Services (SQL Server 2017)
Python support Microsoft Machine Learning package included Process multiple related models in parallel with the rxExecBy function Create a shared script library with R script package management Native scoring with T-SQL PREDICT In-place upgrade of R components MicrosoftML: rxExecBy: Library management: PREDICT: R upgrade:

Setup and configuration
SQL Server setup Install Machine Learning Services (In-Database) Consent to install Microsoft R Open/Python Optional: Install R packages on SQL Server machine Database configuration Enable R language extension in database Configure path for RRO runtime in database Grant EXECUTE EXTERNAL SCRIPT permission to users CREATE EXTERNAL EXTENSION [R] USING SYSTEM LAUNCHER WITH (RUNTIME_PATH = 'c:\revolution\bin‘) GRANT EXECUTE SCRIPT ON EXTERNAL EXTENSION::R TO DataScientistsRole; /* User-defined role / users */

Management and monitoring
ML runtime usage Resource governance via resource pool Monitoring via DMVs Troubleshooting via XEvents/ DMVs CREATE RESOURCE POOL ML_runtimes FOR EXTERNAL EXTENSION WITH MAX_CPU_PERCENT = 20, MAX_MEMORY_PERCENT = 10; select * from sys.dm_resource_governor_resouce_pools where name = ‘ML_runtimes';

External script usage from SQL Server
Original R script: IrisPredict <- function(data, model){ library(e1071) predicted_species <- predict(model, data) return(predicted_species) } library(RODBC) conn <- odbcConnect("MySqlAzure", uid = myUser, pwd = myPassword); Iris_data <-sqlFetch(conn, "Iris_Data"); Iris_model <-sqlQuery(conn, "select model from my_iris_model"); IrisPredict (Iris_data, model); Calling R script from SQL Server: /* Input table schema */ create table Iris_Data (name varchar(100), length int, width int); /* Model table schema */ create table my_iris_model (model varbinary(max)); varbinary(max) = (select model from my_iris_model); exec sp_execute_external_script @language = 'R' = ' IrisPredict <- function(data, model){ library(e1071) predicted_species <- predict(model, data) return(predicted_species) } IrisPredict(input_data_1, model); ' = default = N'select * from Iris_Data' = varbinary(max)' with result sets ((name varchar(100), length int, width int , species varchar(30))); Values highlighted in yellow are SQL queries embedded in the original R script Values highlighted in aqua are R variables that bind to SQL variables by name

The SQL extensibility architecture
9/17/2018 5:06 PM The SQL extensibility architecture MSSQLSERVER Service MSSQLLAUNCHPAD Service (one per SQL Server instance) launchpad.exe sqlservr.exe Named pipe “launcher” sp_execute_external_script SQLOS XEvent Bxlserver.exe sqlsatellite.dll What and how to launch Bxlserver.exe sqlsatellite.dll Windows Satellite Process sqlsatellite.dll [this slide has animation] The extensibility architecture on which Machine Learning Services is based relies on orchestration between three components—the SQL Server Database Engine service, the SQL Server Launchpad service, and satellite processes (the R and Python engines). [click] You execute an R or Python script by calling the sp_execute_external_script stored procedure. When the stored procedure is executed, a named pipe is opened between the SQL Server service and the Launchpad service, and the relevant details of the external script and its context are passed to Launchpad from SQL Server. Launchpad passes the script and context to a satellite process (the R or Python interpreter, as appropriate), which executes the script. The satellite process uses the sqlsatelite.dll to return the results of the external script to SQL Server. An extended event records when the execution is triggered. Run query © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

SQL Server Machine Learning Services is scalable
9/17/2018 5:06 PM SQL Server Machine Learning Services is scalable More efficient than standalone clients Data does not all have to fit in memory Reduced data transmission over the network Most R Open (and Python R) functions are single threaded ScaleR and RevoScalePy APIs in scripts support multi-threaded processing on the SQL Server computer We can stream data in parallel and batches from SQL Server to/from script Use the power of SQL Server and ML to develop, train, and operationalize SQL Server compute context (remote compute context) T-SQL queries Memory-optimized tables Columnstore indexes Data compression Parallel query execution Stored procedures Enterprise Edition gives you the optimum scalability. It’s all about using the compute power of the server close to the data. © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

SQL Server Machine Learning Services is secure
9/17/2018 5:06 PM SQL Server Machine Learning Services is secure Reduced surface area and isolation “external scripts enabled” is required Script execution outside of SQL Server process space Script execution requires explicit permission sp_execute_external_script requires EXECUTE ANY EXTERNAL SCRIPT for non-admins SQL Server login/user required and db/table access Satellite processes have limited privileges Satellite processes run under low privileged, local user accounts in the SQLRUserGroup Each execution is isolated — different users with different accounts Windows firewall rules block outbound traffic © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

MicrosoftML package MicrosoftML is a package for Microsoft R Server, Microsoft R Client, and SQL Server Machine Learning Services that adds state-of-the-art data transforms, machine learning algorithms, and pretrained models to Microsoft R functionality. Data transforms helps you to compose, in a pipeline, a custom set of transforms that are applied to your data before training or testing. The primary purpose of these transforms is to allow you to featurize your data. Machine learning algorithms enable you to tackle common machine learning tasks such as classification, regression and anomaly detection. You run these high-performance functions locally on Windows or Linux machines or on Azure HDInsight (Hadoop/Spark) clusters. Pretrained models for sentiment analysis and image featurization can also be installed and deployed with the MicrosoftML package. Source:

Hybrid cloud

Back up to Azure

Back up to Azure Managed backup Back up to Azure block blobs
Granular control of the backup schedule Local staging support for faster recovery and resiliency to transient network issues Support for system databases Support for simple recovery mode Back up to Azure block blobs Cost savings on storage Significantly improved restore performance More granular control over Azure Storage Azure Storage snapshot backup Fastest method for creating backups and running restores SQL Server database files on Azure Blob storage

Managed backup Support for system databases Support for databases in simple recovery mode Using backup to block blobs: more granular control Allows customized backup schedules: full backup and log backup Customers have found that the one-size-fits-all backup schedule is insufficient. More importantly, there can be a lack of control of the exact window backup. Therefore, we are now providing the options for customers to specify the schedule for full and log backups, in addition to the backup window. Managed Backup V1 only supports user databases, and is incomplete as a backup strategy. We recognize the fact that several customers have databases that are in simple recovery mode. Allowing full backups for these databases is also important. The most consistent feedback we received on Managed Backup V1 was that the direct backup to the cloud was the only option. Customers are concerned about intermittent network errors causing backup failures, and are seeking a two-step approach. We will add the option to allow backups to be taken first locally and then uploaded asynchronously to the cloud to: a) Avoid network errors causing the backups to fail. b) Avoid the backup window to grow significantly. All of these changes will also be reflected in the UI so that our target customers (SMBs) find it easy to use.

Customized scheduling
Step 1: Run the scheduling SP to configure custom scheduling: EXEC Managed_Backup.sp_backup_config_schedule @database_name = 'testDB' 'Custom' = 'weekly' = 'Saturday' = '11:00' = '02:00' = '00:05' Step 2: Run the basic SP to configure managed backup: EXEC msdb.managed_backup.sp_backup_config_basic @database_name= 'testDB', @enable_backup=1, @container_url=' account name.blob.core.windows.net/container name', @retention_days=30

Back up to Azure block blobs
Two times cheaper storage Backup striping and faster restore Maximum backup size is 12 TB-plus Granular access and unified credential story (SAS URLs) Support for all existing backup/restore features (except append) CREATE CREDENTIAL [ WITH IDENTITY = 'Shared Access Signature', SECRET = 'sig=mw3K6dpwV%2BWUPj8L4Dq3cyNxCI‘ BACKUP DATABASE database TO URL = N' URL = N'

Back up to Azure with file snapshots
Instance MDF MDF Database BAK LDF LDF [this slide has animation] Available to users whose database files are located in Azure Storage. When you create a FILE_SNAPSHOT backup [Click] the backup process copies the database using a virtual snapshot within Azure Storage. The database data does not move between storage system and server instance, removing an IO bottleneck. A backup file (.bak) is created, which is a pointer to the snapshots. BACKUP DATABASE database TO URL = N' WITH FILE_SNAPSHOT

Back up to Azure with file snapshots
Available to users whose database files are located in Azure Storage Copies database using a virtual snapshot within Azure Storage Database data does not move between storage system and server instance, removing IO bottleneck Uses only a fraction of the space that a traditional backup would consume Very fast

Point-in-time restore with file snapshots
Traditional backup Multiple backup types Complex point-in-time restore process Full Log Diff Back up to Azure with file snapshots Full backup only once Point-in-time only needs two adjacent backups Full Log

SQL Server on Azure VM

Why SQL Server in an Azure VM?
9/17/2018 5:06 PM Why SQL Server in an Azure VM? Reduced capex/pay-as-you-go pricing Fast deployment Reduced configuration Elastic scale Lift and shift legacy applications 1 2 3 4 5 © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

SQL Server in Azure VM—deploying
9/17/2018 5:06 PM SQL Server in Azure VM—deploying Microsoft gallery images SQL Server 2008 R2 / 2012 / 2014 / 2016 / 2017 SQL Server Web / Standard / Enterprise / Developer / Express Editions Windows Server 2008 R2 / 2012 R2 / 2016 Linux RHEL / Ubuntu SQL licensing Based on SQL Server edition and core count (VM Sizes) Pay-per-minute Bring your own license Move an existing license to Azure through BYOL images Commissioned in ~10 minutes Connect via RDP, ADO .Net, OLEDB, JDBC, PHP, and so on Manage via Azure portal, SSMS, Powershell, CLI, System Center, and so on [this slide has animation – each top-level bullet point flies in] Microsoft gallery images are regularly patched with service packs, cumulative updates, and security fixes. © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Azure VM sizes DSv3, DSv2 , DS & FS Series VM GS, Ls-series VM
9/17/2018 5:06 PM Azure VM sizes Recommended for SQL Server production workloads Local SSD storage Premium storage Portal optimizes VM for SQL Server workloads DSv3, DSv2 , DS & FS Series VM Premium performance Intel® Xeon® processor E5 v3 family GS, Ls-series VM Source: The information on this slide is subject to change as Azure VM classes go in to and come out of production. Azure VMs are available in a range of performance/storage classes. For the latest information about Azure virtual machine capacities, see: © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Azure VM—availability
9/17/2018 5:06 PM Azure VM—availability An Azure Availability Set distributes VMs in different failure domains (racks) and upgrade domains VMs are not impacted at the same time by: Rack/host failures Azure host upgrades Managed disks Distributes disks of different VMs to different storage stamps Higher isolation for Always On or SQL HA © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Azure VM—availability SLAs
9/17/2018 5:06 PM Azure VM—availability SLAs Single-VM SLA: 99.9% (<43 minutes downtime p/month) 99.46% single VMs achieve % (<26 seconds downtime p/month) Multi-VM SLA: 99.95% (<22 minutes downtime p/month) 99.99% of multi-VM deployments achieve % Includes: Planned downtime due to host OS patching Unplanned downtime due to physical failures Doesn’t include servicing of guest OS or software inside (for example, SQL) SQL Server multi-VM deployments need Always On If VM becomes unavailable, fail over to another (~15s) Detects SQL instance failures (for example, service down or hung) © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Azure VM—storage Each disk has three copies in Azure Storage
9/17/2018 5:06 PM Azure VM—storage Each disk has three copies in Azure Storage An extent is replicated three times across different fault and upgrade domains With random selection for where to place replicas for fast MTTR Remote storage connected over high-speed network Quorum-write Checksum all stored data Verified on every client read Scrubbed every few days Automated disk verification and decommissioning Rereplicate on disk/node/rack failure or checksum mismatch [this slide has animation – each top-level bullet point flies in] © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Azure VM—fast storage Virtual Machine Solid-State Drives (SSDs)
9/17/2018 5:06 PM Azure VM—fast storage Virtual Machine Solid-State Drives (SSDs) Up to 7,500 IOPs or 250 MB/s p/disk Average latency less than 5ms Support local read cache (SSD) Average 1ms latency Frees VM bandwidth to Azure Storage (for log) Uncached premium storage disk Cached premium storage disk Local SSD Disk level throttling VM Level cached throttling Source: [this slide has animation] VM level uncached throttling Azure Storage blobs Server SSD © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

SQL Azure VM—many layers of security
9/17/2018 5:06 PM SQL Azure VM—many layers of security Physical security Infrastructure security Many certifications SQL security Datacenters monitored constantly Microsoft Ops and Support personnel don’t have access to customer storage Virtual Networks—deployments are isolated in their own private networks Storage—Encrypted Storage and authenticated via strong keys ISO 27001/27002, SOC 1/SSAE 16/ISAE 3402 and SOC 2, Cloud Security Alliance CCM, FISMA, HIPAA, EU model clauses, FIPS 140-2 Auto patching Encryption of databases and backups Encryption of connections Authentication: Windows/SQL Row-Level Security and Always Encrypted (SQL Server 2016) [this slide has animation – each top level bullet point flies in] © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Azure VM—connectivity
9/17/2018 5:06 PM Azure VM—connectivity Over the internet Over secure site-to-site tunnel On public connection Dedicated connection (Express Route)—recommended Apps transparently connect to primary via listener Listeners are supported through Azure Load Balancer Internal (VNET) or External (internet) Hybrid (Vnet to Vnet) Virtual Network subnet 1 subnet 2 subnet 3 Azure cloud Azure Gateway On-premises VPN Device or Windows RRAS Site-to-site VPN © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Stretch Database

Ever-growing data, ever-shrinking IT
Massive tables (hundreds of millions/billions of rows, TBs in size) Users want/need to retain data indefinitely Cold data infrequently accessed but must be online Datacenter consolidation Maintenance challenges Business SLAs at risk What to do? Expand server and storage Move data elsewhere Delete [this slide has animation – “What to do” options]

Stretch SQL Server into Azure
Securely stretch cold tables to Azure with remote query processing SQL SERVER 2017 Capability Stretch large operational tables from on-premises to Azure with the ability to query Benefits Cost-effective online cold data Entire table is online and remains queryable from on-premises apps No application changes Support for Always Encrypted and Row-Level Security Stretching the history tables of temporal tables is a great scenario Azure [this slide has animation] Source: Stretch Database lets you archive your historical data transparently and securely. Introduced in SQL Server 2016, Stretch Database stores your historical data in the Microsoft Azure cloud. After you enable Stretch Database, it silently migrates your historical data to an Azure SQL Database. You don't have to change existing queries and client apps. You continue to have seamless access to both local and remote data. Your local queries and database operations typically run faster against current data. You typically enjoy reduced cost and complexity.

Stretch Database architecture
How it works Creates a secure linked server definition in the on-premises SQL Server Targets remote endpoint with linked server definition Provisions remote resources and begins to migrate eligible data, if migration is enabled Queries against tables run against both local database and remote endpoint Internet boundary Remote endpoint Remote data Local database Local data Eligible data Linked servers Source: Concepts and architecture for Stretch Database Terms Local database. The on-premises SQL Server 2016 database. Remote endpoint. The location in Microsoft Azure that contains the database remote data. In SQL Server 2016, this is an Azure SQL Database server. This is subject to change in the future. Local data. Data in a database with Stretch Database enabled that will not be moved to Azure based on the Stretch Database configuration of the tables in the database. Eligible data. Data in a database with Stretch Database enabled that has not yet been moved, but will be moved to Azure based on the Stretch Database configuration of the tables in the database. Remote data. Data in a database with Stretch Database enabled that has already been moved to Azure. Architecture Stretch Database uses the resources in Microsoft Azure to offload archival data storage and query processing. When you enable Stretch Database on a database, it creates a secure linked server definition in the on-premises SQL Server. This linked server definition has the remote endpoint as the target. When you enable Stretch Database on a table in the database, it provisions remote resources and begins to migrate eligible data—if migration is enabled. Queries against tables with Stretch Database automatically enabled to run against both the local database and the remote endpoint. Stretch Database uses the processing power in Azure to run queries against remote data by rewriting the query. You can see this rewriting as a “remote query” operator in the new query plan. On-premises instance Azure

Typical workflow to enable Stretch Database
High-level steps Configure local server for remote data archive Create a credential with administrator permission Alter specific database for remote data archive Create a filter predicate (optional) to select rows to migrate Alter table to enable Stretch for a table Stretch Wizard in SQL Server Management Studio makes all this easy (does not currently support creating filter predicates) -- Enable local server EXEC sp_configure 'remote data archive' , '1'; RECONFIGURE; -- Provide administrator credential to connect to -- Azure SQL Database CREATE CREDENTIAL <server_address> WITH IDENTITY = <administrator_user_name>, SECRET = <administrator_password> -- Alter database for remote data archive ALTER DATABASE <database name> SET REMOTE_DATA_ARCHIVE = ON (SERVER = server name); GO -- Alter table for remote data archive ALTER TABLE <table name> ENABLE REMOTE_DATA_ARCHIVE WITH ( MIGRATION_STATE = ON ); GO; Source: Hybrid solutions

Queries continue working
Business applications continue working without disruption DBA scripts and tools work as before (all controls still held in local SQL Server) Developers continue building or enhancing applications with existing tools and methods Orders_History Orders_History Orders

Advanced security features supported
Data in motion always via secure channels (TLS 1.1/1.2) Always Encrypted supported if enabled by user (encryption key remains on-premises) Row-Level Security and auditing supported Orders_History Orders_History Orders

Backup and restore benefits
DBAs only back up/restore local SQL Server hot data StretchDB ensures remote data is transactionally consistent with local data Upon completion of local restore, SQL Server reconciles with remote data using metadata operation, not data copy Time of restore for remote not dependent on size of data Orders_History Orders_History Orders Auto-reconcile

Current limitations that block stretching a table
Tables with more than 1,023 columns or more than 998 indexes cannot be stretched FileTables or FILESTREAM data not supported Replicated tables, memory-optimized tables CLR data types (including geometry, geography, hierarchyid and CLR user-defined types) Column types (COLUMN_SET, computed columns) Constraints (default and check constraints) Foreign key constraints that reference the table in a parent-child relationship—you can stretch the child table (for example Order_Detail) Full text indexes XML indexes Spatial indexes Indexed views that reference the table

Programmability and data structures

Graph processing

What is a graph? A graph is a collection of nodes and edges
Undirected graph Directed graph Weighted graph Property graph Edge 10 A graph is a concept from computer science and mathematics that is used to represent connections (called edges) between entities (called nodes). Graphs fall into several subtypes: Undirected. The relationship between the connected nodes is not directional—for example, connections between cities in a highway network. Directed. The relationship between the connected nodes is directional—for example, a management hierarchy, a SQL Server execution plan, or connections between cities in a river network. Weighted. The relationships between connected nodes carries a numerical weight—possibly representing a probability or preference for a connection to exist or to be selected. Weighted graphs can be directed or undirected. Property graph. In addition to connecting nodes, nodes and/or edges in the graph can hold properties that give the relationship context. Property graphs can be directed or undirected. Manages startDate: Person Name: Shreya Phone: Name: Arvind Phone:

Typical scenarios for graph databases
Hierarchical or interconnected data, entities with multiple parents. Analyze interconnected data, materialize new information from existing facts. Identify connections that are not obvious. Complex many-to-many relationships. Organically grow connections as the business evolves. A Graphs can be challenging to model and query in traditional relational databases. Specialized graph databases, which treat nodes and edges as first-order entities, exist to simplify storing and working with graph data. SQL Server 2017 includes support for graph database objects and queries.

Introducing SQL Server Graph
A collection of node and edge tables in the database Language Extensions DDL Extensions—create node and edge tables DML Extensions—SELECT - T-SQL MATCH clause to support pattern matching and traversals; DELETE, UPDATE, and INSERT support graph tables Graph support is integrated into the SQL Server ecosystem Database Contains Graph isCollectionOf Node Table(s) Edge Table(s) Each SQL Server 2017 database supports up to one graph. A graph is a collection of node and edge tables; node and edge tables can be distributed across more than one schema in the database, but all belong to the same logical graph. Nodes and edges are stored in standard SQL Server tables, so most of the operations that can be carried out on a standard SQL Server table can also be carried out on graph tables. When a table is created as a node or an edge table, a number of metadata columns are automatically added to the table—these are used to provide graph support. Additional columns are added to the sys.tables and sys.columns system views to allow graph tables to be identified. For more detail on the architecture of graph tables, see: Node table has Properties Edges connect Nodes Edge table may or may not have Properties

DDL Extensions Create node and edge tables
Properties associated with nodes and edges CREATE TABLE Product (ID INTEGER PRIMARY KEY, name VARCHAR(100)) AS NODE; CREATE TABLE Supplier (ID INTEGER PRIMARY KEY, name VARCHAR(100)) AS NODE; CREATE TABLE hasInventory AS EDGE; CREATE TABLE located_at(address varchar(100)) AS EDGE; The code on the slide creates two node tables (Product and Supplier) and two EDGE tables (hasInventory and located_at).

DML Extensions Multihop navigation and join-free pattern matching using the MATCH predicate: SELECT Prod.name as ProductName, Sup.name as SupplierName FROM Product Prod, Supplier Sup, hasInventory hasIn, located_at supp_loc, Customer Cus, located_at cust_loc, orders, location loc WHERE MATCH( cus-(orders)->Prod<-(hasIn)-Sup AND cus-(cust_loc)->location<-(supp_loc)-Sup ) ; The query on the slide uses the MATCH clause to perform a join-free search of the graph to find products and suppliers where a customer at the same location as the supplier has ordered a product that the supplier has in stock. These orders could potentially be fulfilled by the supplier dispatching direct to the customer.

Spatial

Spatial Spatial data represents information about the physical location and shape of geometric objects. These objects can be point locations, or lines, or more complex objects such as countries, roads, or lakes. SQL Server supports two spatial data types: the geometry data type and the geography data type. The geometry type represents data in a Euclidean (flat) coordinate system. The geography type represents data in a round-earth coordinate system. Source:

Spatial functionality
Simple and compound spatial data types supported Import and export spatial data to industry-standard formats (Open Geospatial Consortium WKT and WKB) Functions to query the properties of, the behaviours of, and the relationships between, spatial data instances Spatial columns can be indexed to improve query performance Source: Source: Source:

Spatial enhancements (SQL Server 2017)
The FullGlobe geometry data type—FullGlobe is a special type of polygon that covers the entire globe. FullGlobe has an area, but no borders or vertices. Source:

JSON and XML

JSON support Not a built-in data type—JSON is stored as varchar or nvarchar Format SQL data or query results as JSON Convert JSON to SQL data Query JSON data Index JSON data Source:

FOR JSON Export data from SQL Server as JSON, or format query results as JSON, by adding the FOR JSON clause to a SELECT statement. When you use the FOR JSON clause, you can specify the structure of the output explicitly, or let the structure of the SELECT statement determine the output. When you use PATH mode with the FOR JSON clause, you maintain full control over the format of the JSON output. You can create wrapper objects and nest complex properties. When you use AUTO mode with the FOR JSON clause, the JSON output is formatted automatically based on the structure of the SELECT statement. Use the FOR JSON clause to delegate the formatting of JSON output from your client applications to SQL Server. Source:

FOR JSON In PATH mode, you use the dot syntax—for example, 'Item.Price‘—to format nested output. This example also uses the ROOT option to specify a named root element.

OPENJSON Import JSON data into SQL Server by using the OPENJSON rowset function. You can also use OPENJSON to convert JSON data to rows and columns You can call OPENJSON with or without an explicit schema: Use JSON with the default schema. When you use OPENJSON with the default schema, the function returns a table with one row for each property of the JSON object or for each element in the JSON array. Use JSON with an explicit schema. When you use OPENJSON with an explicit schema, the function returns a table with the schema that you define in the WITH clause. In the WITH clause, you specify the output columns, their types, and the paths of the JSON source properties for each output column.

OPENJSON

Query JSON data Built-in functions for JSON:
ISJSON tests whether a string contains valid JSON SELECT id, json_col FROM tab1 WHERE ISJSON(json_col) > 0 JSON_VALUE extracts a scalar value from a JSON string = '$.info.address.town') JSON_QUERY extracts an object or array from a JSON string SELECT FirstName, LastName, JSON_QUERY(jsonInfo, '$.info.address') AS Address FROM Person.Person ORDER BY LastName JSON_MODIFY updates the value of a property in a JSON string and returns the updated JSON string NVARCHAR(100)='{"name":"John","skills":["C#","SQL"]}’ SET Source:

XML support Built-in data type (since SQL Server 2005)
Format SQL data or query results as XML Convert XML to SQL data Query XML data Index XML data

FOR XML Export data from SQL Server as XML, or format query results as JSON, by adding the FOR XML clause to a SELECT statement. When you use the FOR XML clause, you can specify the structure of the output explicitly, or let the structure of the SELECT statement determine the output. The RAW mode generates a single <row> element per row in the rowset that is returned by the SELECT statement. You can generate XML hierarchy by writing nested FOR XML queries. The AUTO mode generates nesting in the resulting XML by using heuristics based on the way the SELECT statement is specified. You have minimal control over the shape of the XML generated. Nested FOR XML queries can be written to generate XML hierarchy beyond the XML shape that is generated by AUTO mode heuristics. The EXPLICIT mode allows more control over the shape of the XML. You can mix attributes and elements at will in deciding the shape of the XML. It requires a specific format for the resulting rowset that is generated because of query execution. The PATH mode, together with the nested FOR XML query capability, provides the flexibility of the EXPLICIT mode in a simpler manner. Use the FOR XML clause to delegate the formatting of XML output from your client applications to SQL Server. Source:

FOR XML In PATH mode, you can use symbol to return columns as attributes. This example also uses the ROOT option to specify a named root element. SELECT Date AS Number AS Customer AS AccountNumber, Price AS UnitPrice, Quantity AS UnitQuantity FROM SalesOrder AS Orders FOR XML PATH('Order'), ROOT('Orders') <Orders> <Order OrderDate=" T00:00:00" OrderNumber="SO43659" > <AccountNumber>AW29825</AccountNumber> <UnitPrice>59.99</UnitPrice> <UnitQuantity>1</UnitQuantity> </Order> <Order OrderDate=" T00:00:00" OrderNumber="SO43661" > <AccountNumber>AW73565</AccountNumber> <UnitPrice>24.99</UnitPrice> <UnitQuantity>3</UnitQuantity> </Orders>

Query XML data The XML data type supports methods to query XML data using Xquery—based on Xpath: query()—return matching XML nodes as XML value()—return matching XML nodes as SQL Server data types exists()—verify whether a matching node exists nodes()—shred XML into multiple rows modify()—update or insert matching nodes Source: Source: xml = '<ROOT><a>111</a></ROOT>' AS Result Result <a>111</a>

Temporal tables

Why temporal? Data changes over time Temporal in DB
9/17/2018 5:06 PM Why temporal? Data changes over time Tracking and analyzing changes is often important Temporal in DB Automatically tracks history of data changes Enables easy querying of historical data states Advantages over workarounds Simplifies app development and maintenance Efficiently handles complex logic in DB engine Time travel Data audit Slowly changing dimensions Repair record-level corruptions Source: A temporal table provides correct information about stored facts at any point in time. Each temporal table consists of two tables—one for the current data and one for the historical data. The system automatically ensures that, when the data changes in the table with the current data, the previous values are stored in the historical table. Querying constructs are provided to hide this complexity from users. For more information, see Temporal Tables. Introduction to key components and concepts What is a temporal table? A temporal table is a table for which a PERIOD definition exists. It contains system columns with a data type of datetime2 into which the period of validity is recorded by the system—and also has an associated history table into which the system records all prior versions of each record with their period of validity. With a temporal table, the value of each record at any point in time can be determined, rather than just the current value of each record. A temporal table is also referred to as a system-versioned table. Why temporal? Real data sources are dynamic and, more often than not, business decisions rely on insights that analysts can get from data evolution. Use cases for temporal tables include: Understanding business trends over time. Tracking data changes over time. Auditing all changes to data. Maintaining a slowly changing dimension for decision support applications. Recovering from accidental data changes and application errors. © 2014 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

No change in programming model
Microsoft Ignite 2015 9/17/2018 5:06 PM How does temporal work? ANSI compliant No change in programming model New insights FOR SYSTEM_TIME AS OF FROM..TO BETWEEN..AND CONTAINED IN Temporal Querying CREATE temporal TABLE PERIOD FOR SYSTEM_TIME… ALTER regular_table TABLE ADD PERIOD… DDL INSERT / BULK INSERT UPDATE DELETE MERGE DML SELECT * FROM temporal Querying [this slide contains animations] Source: How does temporal work? System-versioning for a table is implemented as a pair of tables, a current table, and a history table. [click] Within each of these tables, two additional datetime (datetime2 data type) columns are used to define the period of validity for each record—a system start time (SysStartTime) column and a system end time (SysEndTime) column. The current table contains the current value for each record. The history table contains the previous value for each record, if any, and the start time and end time for the period for which it was valid. INSERTS: On an INSERT, the system sets the value for the SysStartTime column to the UTC time of the current transaction, based on the system clock, and assigns the value for the SysEndTime column to the maximum value of —this marks the record as open. UPDATES: On an UPDATE, the system stores the previous value of the record in the history table and sets the value for the SysEndTime column to the UTC time of the current transaction, based on the system clock. This marks the record as closed, with a period recorded for which the record was valid. In the current table, the record is updated with its new value and the system sets the value for the SysStartTime column to the UTC time for the transaction, based on the system clock. The value for the updated record in the current table for the SysEndTime column remains the maximum value of DELETES: On a DELETE, the system stores the previous value of the record in the history table and sets the value for the SysEndTime column to the UTC time of the current transaction, based on the system clock. This marks the record as closed, with a period recorded for which the previous record was valid. In the current table, the record is removed. Queries of the current table will not return this value. Only queries that deal with history data return data for which a record is closed. MERGE: On a MERGE, MERGE behaves as an INSERT, an UPDATE, or a DELETE, based on the condition for each record. Built-in functions facilitate querying temporal data. The SQL Server implementation of temporal tables is ANSI 2011 compliant, and enables you to find new insights into your data without changing the familiar programming model. © 2015 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Temporal database support: BETWEEN
Provides correct information about stored facts at any point in time, or between two points in time. There are two orthogonal sets of scenarios with regards to temporal data: System (transaction)-time Application-time SELECT * FROM Person.BusinessEntityContact FOR SYSTEM_TIME WHERE ContactTypeID = 17 Source: Returns a table with the values for all record versions that were active within the specified time range, regardless of whether they started being active before the <start_date_time> parameter value for the FROM argument or ceased being active after the <end_date_time> parameter value for the TO argument. Internally, a union is performed between the temporal table and its history table, and the results are filtered to return the values for all row versions that were active at any time during the time range specified. Records that became active exactly on the lower boundary defined by the FROM endpoint are included, in addition to records that became active exactly on the upper boundary defined by the TO endpoint.

How does system time work?
Microsoft Ignite 2015 9/17/2018 5:06 PM How does system time work? Temporal table (actual data) History table * Old versions [this slide contains animations] Source: [click] The first time a record is inserted into a temporal table, it’s written to the temporal table with a start date of the current system time. When the record is changed, the original row is moved from the temporal table to the history table, with an end date of the current system time. The new version of the record is written to the temporal table with a start date of the current system time. [contines on next slide] Update */ Delete * Insert / Bulk Insert © 2015 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

How does system time work?
Microsoft Ignite 2015 9/17/2018 5:06 PM How does system time work? Temporal table (actual data) History table * Include historical version Source: Queries on current data returned from the temporal table. [click] Queries on all data are returned from a union of the temporal table with the history table. Regular queries (current data) Temporal queries * (Time travel, and so on) © 2015 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Temporal enhancements (SQL Server 2017)
System-versioned temporal tables now support CASCADE DELETE and CASCADE UPDATE Temporal tables retention policy support added

Upgrading and migrating to SQL Server 2017

Upgrade and migration tools
Data Migration Assistant (DMA) Upgrade from previous version of SQL Server (on-premises or SQL Server in Azure VM) SQL Server Migration Assistant Migrate from Oracle, MySQL, SAP ASE, DB2, or Access to SQL Server (on-premises or SQL Server 2017 in Azure VM) Azure Database Migration Service Migrate from SQL Server, Oracle, or MySQL to Azure SQL Database or SQL Server 2017 in Azure VM

Upgrading to SQL Server 2017
In-place or side-by-side upgrade path from: SQL Server 2008 SQL Server 2008 R2 SQL Server 2012 SQL Server 2014 SQL Server 2016 Side-by-side upgrade path from: SQL Server 2005 Use Data Migration Assistant to prepare for migration Source: Detailed upgrade path guide: Data Migration Assistant replaces the SQL Server Upgrade Advisor:

DMA: Assess and upgrade schema
9/17/2018 5:06 PM DMA: Assess and upgrade schema 1. Assess and identify issues SQL Server 2017 Legacy SQL Server instance Data Migration Assistant 3. Upgrade database [this slide contains animations] In assessments, Data Migration Assistant (DMA) automates the potentially overwhelming process of checking database schema and static objects for potential breaking changes from prior versions. DMA also offers performance and reliability recommendations on the target server. [click] The first phase is to use DMA to assess the legacy database and identify issues. In the second phase, issues are fixed. The first and second phases are repeated until all issues are addressed. Finally, the database is upgraded to SQL Server 2017. For more information, see: 2. Fix issues © 2014 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Choosing a migration target “What’s the best path for me?”
Intent Visualize the decision point; migrate to Azure SQL Database or Azure VM. Answer the question, “what’s the best path for me?” Two options for cloud migration: Infrastructure-as-a-service (IaaS)—SQL Server in Azure Virtual Machine (VM) allows you to run SQL Server inside a virtual machine in the cloud. Platform-as-a-service (PaaS)—Microsoft Azure SQL Database is a relational database-as-a-service. Both of these different cloud offerings provide enterprise level database support, but their characteristics, capabilities and costs are different.

Migrating to SQL Server 2017 from other platforms
Identify apps for migration Use migration tools and partners Deploy to production SQL Server Migration Assistant SQL Server 2017 on Windows Oracle Source: SAP ASE was formerly known as SAP Sybase ASE/Sybase. SAP ASE AND OR Global partner ecosystem SQL Server 2017 on Linux DB2

Database and application migration process
Scoping and Planning Database Discovery Architecture requirements (HADR, performance, locale, maintenance, dependencies, and so on) Database Migration Migration Assessment Complexity, effort, risk Schema conversion Data migration Embedded SQL statements ETL and batch System and DB interfaces Application Conversion Migration Assistant Application Deployment Database connectivity User login and permission Performance tuning

SQL Server Migration Assistant (SSMA)
Automates and simplifies all phases of database migration Migration Analyzer Assess migration complexity Convert schema and business logic Schema Converter Migrate data Data Migrator Validate converted database code Migration Tester Supports migration from DB2, Oracle, SAP ASE, MySQL, or Access to SQL Server

Using SQL Server Migration Assistant (SSMA)
SSMA: Automates components of database migrations to SQL Server; DB2, Oracle, Sybase, Access, and MySQL analyzers are available SSMA migration analyzer Assess the migration project SSMA schema converter Migrate schema and business logic SSMA data migrator Migrate data Convert the application Test, integrate, and deploy

Azure solution paths Full control and flexibility
INFRASTRUCTURE-AS-A-SERVICE (IaaS) PLATFORM-AS-A-SERVICE (PaaS) SQL Server in an Azure Virtual Machine Azure SQL Database Full control and flexibility Highly customized system to address the application’s specific performance and availability requirements. Simplified administration Do not have to manage any VMs, OS or database software, including upgrades, high availability, and backups. Move existing apps Development and test environment Hybrid HA and disaster recovery Cloud-designed business apps Websites and mobile apps Extend on-premises apps

Azure migration tools and services
Microsoft Build 2017 9/17/2018 5:06 PM Azure migration tools and services Assess Migrate Data Migration Assistant Rich assessments at scale Feature recommendations Schema conversions Azure Database Migration Service MS and non-MS source support Built for scale and reliability Built with enterprise security and privacy We have acknowledged feedback from our customers and are now acting on it. We are addressing the migration concerns by releasing: Data Migration Assistant (DMA)—built by the SQL engineering team using the latest and greatest knowledge base of all SQL versions. This helps with assessing and planning. Database Migration Service (DMS)—the newest Azure service that helps you move your on-premises DBs to Azure at scale. © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

DMA: Assess and migrate schema
9/17/2018 5:06 PM DMA: Assess and migrate schema 1. Assess and identify issues Legacy SQL Server instance 3. Convert and deploy schema DMA [this slide contains animations] In assessments, Data Migration Assistant (DMA) automates the potentially overwhelming process of checking database schema, and static objects for potentially breaking changes from prior versions. DMA also offers performance and reliability recommendations on the target server. [click] The first phase is to use DMA to assess the legacy database and identify issues. In the second phase, issues are fixed. The first and second phases are repeated until all issues are addressed. Finally, the database is converted and deployed to Azure. 2. Fix issues © 2014 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Azure Database Migration Service
Accelerating your journey to the cloud Streamline database migration to Azure SQL Database (PaaS) Managed service platform for migrating databases Migrate SQL Server and third-party databases to Azure SQL Database Oracle SQL SQL DB Source: As organizations look to optimize their IT infrastructure so that they have more time and resources to focus on business transformation, Microsoft is committed to helping to accelerate these initiatives. Microsoft have announced that a new migration service is coming to Azure to streamline customers’ journey to the cloud. This service will streamline the tasks required to move existing competitive and SQL Server databases to Azure. Deployment options will include Azure SQL Database and SQL Server in Azure VM. Managed service platform for migrating databases. Azure SQL DB and managed instance as targets. Competitive DBs—Oracle and more. Meets enterprise nonfunctional requirements (NFRs)—compliance, security, costs, and so on. Talk about the technical details: Source ->Target. Secure. Feature parity with competitors. Zero data loss and near zero downtime migration with the Azure platform service.

Editions, features, and capacity

SQL Server Editions SQL Server Edition Definition Enterprise Standard
The premium offering, SQL Server Enterprise Edition delivers comprehensive high-end datacenter capabilities with extremely fast performance, unlimited virtualization, and end-to-end business intelligence—enabling high service levels for mission critical workloads and end user access to data insights. Standard SQL Server Standard Edition delivers basic data management and a business intelligence database for departments and small organizations to run their applications. It supports common development tools for on-premises and the cloud—enabling effective database management with minimal IT resources. Web SQL Server Web Edition is a low total-cost-of-ownership option for web hosters and web VAPs to provide scalability, affordability, and manageability capabilities for small to large scale web properties. Developer SQL Server Developer Edition lets developers build any kind of application on top of SQL Server. It includes all the functionality of Enterprise Edition but is licensed for use as a development and test system, not as a production server. SQL Server Developer is an ideal choice for people who build SQL Server and test applications. Express Express Edition is the entry-level, free database and is ideal for learning and building desktop and small server data-driven applications. It’s the best choice for independent software vendors, developers, and hobbyists who build client applications. If you need more advanced database features, SQL Server Express can be seamlessly upgraded to other higher-end versions of SQL Server. SQL Server Express LocalDB is a lightweight version of Express that has all of its programmability features, yet runs in user mode and has a fast, zero-configuration installation and a short list of prerequisites. Source:

Capacity limits by edition
Feature Enterprise Standard Web Express Maximum compute capacity used by a single instance—SQL Server Database Engine Operating system maximum Limited to lesser of four sockets or 24 cores Limited to lesser of four sockets or 16 cores Limited to lesser of one socket or four cores Maximum compute capacity used by a single instance—Analysis Services or Reporting Services Maximum memory for buffer pool per instance of SQL Server Database Engine 128 GB 64 GB 1410 MB Maximum memory for columnstore segment cache per instance of SQL Server Database Engine Unlimited memory 32 GB 16 GB 352 MB Maximum memory-optimized data size per database in SQL Server Database Engine Maximum relational database size 524 PB 10 GB Source: (Developer Edition has the same limitations and features of Enterprise Edition, but is licensed only for nonproduction workloads.)

SQL Server features Server components Description
SQL Server Database Engine SQL Server Database Engine provides the core service for storing, processing, and securing data, replication, full-text search, tools for managing relational and XML data, in-database analytics integration, and Polybase integration for access to Hadoop and other heterogeneous data sources—and the Data Quality Services (DQS) server. Analysis Services Analysis Services includes the tools for creating and managing online analytical processing (OLAP) and data mining applications. Reporting Services Reporting Services includes server and client components for creating, managing, and deploying tabular, matrix, graphical, and free-form reports. Reporting Services is also an extensible platform that you can use to develop report applications. Integration Services Integration Services is a set of graphical tools and programmable objects for moving, copying, and transforming data. It also includes the Data Quality Services (DQS) component for Integration Services. Master Data Services Master Data Services (MDS) is the SQL Server solution for master data management. MDS can be configured to manage any domain (products, customers, accounts) and includes hierarchies, granular security, transactions, data versioning, and business rules, in addition to an Add-in for Excel that can be used to manage data. Machine Learning Services (In-Database) Machine Learning Services (In-Database) supports distributed, scalable machine learning solutions using enterprise data sources. SQL Server 2017 supports R and Python. Machine Learning Server (Standalone) Machine Learning Server (Standalone) supports the deployment of distributed, scalable machine learning solutions on multiple platforms and using multiple enterprise data sources, including Linux, Hadoop, and Teradata. SQL Server 2017 supports R and Python. Source:

Features by Edition Some SQL Server features (and sub-features) are available only to certain editions. For a complete list, see: sql-server-2017 Source:

The Microsoft data platform  Microsoft Azure { } VISUALIZE AND DECIDE
Internal and external Dashboards Reports Ask Mobile Information management Orchestration Extract, transform, load Prediction Relational Nonrelational Analytical Apps Streaming VISUALIZE AND DECIDE TRANSFORM AND ANALYZE COLLECT AND MANAGE { }  Microsoft Azure

© 2017 Microsoft Corporation. All rights reserved
© 2017 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

SQL Server 2017 Everything Built In—Technical Overview.

Similar presentations

Presentation on theme: "SQL Server 2017 Everything Built In—Technical Overview."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

SQL Server 2017 Everything Built In—Technical Overview.

Similar presentations

Presentation on theme: "SQL Server 2017 Everything Built In—Technical Overview."— Presentation transcript:

Similar presentations

About project

Feedback