Server-Side Development for the Cloud

Slides:



Advertisements
Similar presentations
Module 13: Performance Tuning. Overview Performance tuning methodologies Instance level Database level Application level Overview of tools and techniques.
Advertisements

Adam Jorgensen Pragmatic Works Performance Optimization in SQL Server Analysis Services 2008.
A comparison of MySQL And Oracle Jeremy Haubrich.
Copyright © 200\8 Quest Software High Performance PL/SQL Guy Harrison Chief Architect, Database Solutions.
1 of 38 Session #4429 Calling SQL from PL/SQL: The Right Way Michael Rosenblum Dulcian, Inc.
12 Copyright © 2005, Oracle. All rights reserved. Proactive Maintenance.
1.1 CAS CS 460/660 Introduction to Database Systems File Organization Slides from UC Berkeley.
1 Tuning PL/SQL procedures using DBMS_PROFILER 20-August 2009 Tim Gorman Evergreen Database Technologies, Inc. Northern California Oracle.
Introduction and simple using of Oracle Logistics Information System Yaxian Yao
Performance Tuning Cubes and Queries in Analysis Services 2008 Chris Webb
Database Systems: Design, Implementation, and Management Eighth Edition Chapter 10 Database Performance Tuning and Query Optimization.
SAGE Computing Services Customised Oracle Training Workshops and Consulting Are you making the most of PL/SQL? Hints and tricks and things you may have.
Oracle PL/SQL Programming Steven Feuerstein All About the (Amazing) Function Result Cache of Oracle Database 11g.
1 Robert Wijnbelt Health Check your Database A Performance Tuning Methodology.
Ideas to Improve SharePoint Usage 4. What are these 4 Ideas? 1. 7 Steps to check SharePoint Health 2. Avoid common Deployment Mistakes 3. Analyze SharePoint.
Improving Efficiency of I/O Bound Systems More Memory, Better Caching Newer and Faster Disk Drives Set Object Access (SETOBJACC) Reorganize (RGZPFM) w/
Oracle PL/SQL Practices. Critical elements of PL/SQL Best Practices Build your development toolbox Unit test PL/SQL programs Optimize SQL in PL/SQL programs.
Autonomous Transactions: Extending the Possibilities Michael Rosenblum Dulcian, Inc. April 14, 2008 Presentation #419.
Deep Dive into Data Management in SharePoint applications Raj Chaudhuri.
Physical Database Design Purpose- translate the logical description of data into the technical specifications for storing and retrieving data Goal - create.
IMS 4212: Database Implementation 1 Dr. Lawrence West, Management Dept., University of Central Florida Physical Database Implementation—Topics.
PL/SQL programming Procedures and Cursors Lecture 1 [Part 2]
Database Performance Eric Grancher - Nilo Segura Oracle Support Team IT/DES.
Copyright © 2009 Rolta International, Inc., All Rights Reserved Michael R. Messina, Management Consultant Rolta-TUSC, Oracle Open World 2009 (60 min) ID#:
Copyright Sammamish Software Services All rights reserved. 1 Prog 140  SQL Server Performance Monitoring and Tuning.
Scott Fallen Sales Engineer, SQL Sentry Blog: scottfallen.blogspot.com.
Diving into Query Execution Plans ED POLLACK AUTOTASK CORPORATION DATABASE OPTIMIZATION ENGINEER.
SQL Advanced Monitoring Using DMV, Extended Events and Service Broker Javier Villegas – DBA | MCP | MCTS.
11 Copyright © 2009, Oracle. All rights reserved. Enhancing ETL Performance.
SQL IMPLEMENTATION & ADMINISTRATION Indexing & Views.
SQL Database Management
Databases and DBMSs Todd S. Bacastow January 2005.
COMP 430 Intro. to Database Systems
Using Ada-C/C++ Changer as a Converter Automatically convert to C/C++ to reuse or redeploy your Ada code Eliminate the need for a costly and.
Designing High Performance BIRT Reports
Working in the Forms Developer Environment
Module 11: File Structure
Informatica PowerCenter Performance Tuning Tips
Data Tracking: On the Hunt for Information About Your Database
SQL Server Monitoring Overview
Database Management Systems (CS 564)
The Client/Server Database Environment
Multiprocessor Cache Coherency
100% Exam Passing Guarantee & Money Back Assurance
Software Architecture in Practice
Database Performance Tuning and Query Optimization
Introduction of Week 3 Assignment Discussion
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 2 Database System Concepts and Architecture.
From 4 Minutes to 8 Seconds in an Hour (or a bit less)
DATABASE MANAGEMENT SYSTEM
Data, Databases, and DBMSs
Oracle Architecture Overview
Server-Side Development for the Cloud
JULIE McLAIN-HARPER LINKEDIN: JM HARPER
CS222: Principles of Data Management Notes #8 Static Hashing, Extendible Hashing, Linear Hashing Instructor: Chen Li.
Hash-Based Indexes Chapter 10
Introduction to Database Systems
CS222P: Principles of Data Management Notes #8 Static Hashing, Extendible Hashing, Linear Hashing Instructor: Chen Li.
Lecture Topics: 11/1 General Operating System Concepts Processes
An Introduction to Software Architecture
Moving from SQL Profiler to xEvents
Chapter 11 Database Performance Tuning and Query Optimization
Physical Data Modeling – Implementation
Diving into Query Execution Plans
SOFTWARE DEVELOPMENT LIFE CYCLE
Query Optimization Techniques
February 11-13, 2019 Raleigh, NC.
CS222/CS122C: Principles of Data Management UCI, Fall 2018 Notes #07 Static Hashing, Extendible Hashing, Linear Hashing Instructor: Chen Li.
MIS 385/MBA 664 Systems Implementation with DBMS/ Database Management
Microsoft Azure Data Catalog
Presentation transcript:

Server-Side Development for the Cloud Michael Rosenblum www.dulcian.com

Known for: Who Am I? – “Misha” Oracle ACE Co-author of 3 books PL/SQL for Dummies Expert PL/SQL Practices Oracle PL/SQL Performance Tuning Tips & Techniques Known for: SQL and PL/SQL tuning Complex functionality Code generators Repository-based development

Yet another cloud presentation?! NO, because: I have been building actual systems for the last two decades. I have hosted systems both in the cloud (public/private) and on-site. Warning: I don’t work for Oracle/Amazon/Microsoft/etc. …so, I WILL use the right of free speech, guaranteed by the FIRST amendment 

Parts of the Equation Cloud Servers Development … i.e. what is the environment? Servers … i.e. what is going on with the hardware? Development … i.e. what is the “right way”?

1. State of the Cloud

It’s Growing!

It’s … well… Amazon […and everybody else]

… And it will stay that way [for now]

IaaS/PaaS is the “new norm.” Observations IaaS/PaaS is the “new norm.” On-premises solutions are “ancient.” Even private clouds are “not cool” anymore. Cloud market has stabilized. All niches have been occupied. New players are less probable. Companies can (more or less) plan long-term.

2. State of Servers

Data is growing!

Databases are the same

Houston, we have a problem! Growth of data + “same old” technologies  means more pressure on “tried and true” solution patterns … which means critical resources become limited more quickly … which means design mistakes become obvious … which means looking for scapegoats quick-fixes 

Cloud??? Old way quick fix “Throw more hardware at it!!!” New way quick fix “Let’s move it to the cloud!!!”

Hardware resources are no longer static Cloud… Resource utilization is easily (and efficiently) monitored by providers … because that’s how they charge you! Hardware resources are no longer static … because providers hope that you will soon need MORE of EVERYTHING!

Sudden discovery: Resource elasticity works both ways! Cloud!!! Sudden discovery: Resource elasticity works both ways! 1. Solving problems by adding resources = Spending money 2. Solving problems by tuning = NOT spending money 3. Tuning existing system = SAVING MONEY

Eureka! Total quality of the code base (including tuning efforts) has a DIRECT cost impact!

Impact Political: Technical: “Good vs. bad” is quantified  objectivity in decision-making Good architecture pays off  good architects are being nurtured and supported by management. Performance tuning is back in style  quality control Solutions usually cross boundaries  DBAs and developers are forced to work together. Technical: Developers are constantly reminded about resource utilization  less sloppy code

What resources?! I/O and storage Memory CPU Cloud storage means SSD Low latency  much faster I/O  no longer a bottleneck Storage is cheap Less worry about data growth  less resource to compress data Memory Significant price drop More in-memory operations  even less I/O workload CPU Prices are OK, but Oracle licenses are CPU-based! Major impact on the overall cost of ownership  new bottleneck!

Shift to cloud  going from I/O-bound to CPU-bound: It’s all about CPU now! Shift to cloud  going from I/O-bound to CPU-bound: On-premises servers usually had CPUs over-allocated: Storage is upgradable and scalable - CPU is not. Support the heaviest workload (Black Friday!) just in case License cost calculations were often “inventive.” Cloud servers usually have CPUs under-allocated. … since everybody thinks “we can add them later” … but at that later moment, usually there is no budget!

3. CPU-friendly development

Ways to Lower CPU Workload Know your system and don’t waste resources Logging Profiling Apply efficient coding techniques Avoid context switches Don’t reinvent the wheel Work in SETs

3.1 - Logging

System Logging Levels of information: Why bother? Core info Process Session Granular info Client Module Action Why bother? StateLESS implementation spawns logical session between multiple physical sessions.

Setting Granular Info (1) -- Client Stuff Begin -- set it to anything you want to describe the session. -- Otherwise useless DBMS_APPLICATION_INFO.SET_CLIENT_INFO ('This is my test-run'); -- Key setting for debugging! -- This ID is traceable. DBMS_SESSION.SET_IDENTIFIER ('misha01'); end; / -- Visibility: select sid, client_info, client_identifier from v$session

Setting Granular Info (2) -- Client Stuff Begin -- Additional info: module and action DBMS_APPLICATION_INFO.SET_MODULE (module_name=>'HR', action_name=>'SALARY_MAINT'); end; / -- Visibility: select sid, module, action from v$session

Application Logging Advantages: Disadvantages: Customized information when needed Disadvantages: Requires discipline of the whole development group

Indestructible Log create or replace package log_pkg is procedure p_log (i_tx varchar2); procedure p_log (i_cl CLOB); end; / create or replace package body log_pkg is procedure p_log (i_tx varchar2) is pragma autonomous_transaction; begin insert into t_log (id_nr, timestamp_dt, log_tx, log_cl) values (log_seq.nextval, systimestamp, case when length(i_tx)<=4000 then i_tx else null end, case when length(i_tx)>4000 then i_tx else null end); commit; procedure p_log (i_cl CLOB) is insert into t_log (id_nr, timestamp_dt,log_cl) values (log_seq.nextval, systimestamp,i_cl);

3.2 - Profiling

PL/SQL Hierarchical Profiler Gathers hierarchical statistics of all calls (both SQL and PL/SQL) for the duration of the monitoring … into a portable trace file Has powerful aggregation utilities … both within the database and using a command-line interface Available since Oracle 11.1 [replaced PL/SQL Profiler] … and constantly improved even in 18c

Destination folder: WRITE is enough Intro (1) SQL> CREATE DIRECTORY IO AS 'C:\IO'; SQL> exec dbms_hprof.start_profiling (location=>'IO',filename=>'HProf.txt'); SQL> DECLARE 2 PROCEDURE p_doSomething (pi_empno NUMBER) IS 3 BEGIN 4 dbms_lock.sleep(0.1); 5 END; 6 PROCEDURE p_main IS 7 BEGIN 8 dbms_lock.sleep(0.5); 9 FOR c IN (SELECT * FROM emp) LOOP 10 p_doSomething(c.empno); 11 END LOOP; 12 END; 13 BEGIN 14 p_main(); 15 END; 16 / SQL> exec dbms_hprof.stop_profiling; Destination folder: WRITE is enough Spend time

Intro (2) Raw file (C:\IO\HProf.txt) is not very readable… P#V PLSHPROF Internal Version 1.0 P#! PL/SQL Timer Started P#C PLSQL."".""."__plsql_vm" P#X 8 P#C PLSQL."".""."__anonymous_block" P#X 6 P#C PLSQL."".""."__anonymous_block.P_MAIN"#980980e97e42f8ec #6 P#X 63 P#C PLSQL."SYS"."DBMS_LOCK"::9."__pkg_init" P#X 7 P#R P#X 119 P#C PLSQL."SYS"."DBMS_LOCK"::11."SLEEP"#e17d780a3c3eae3d #197 P#X 500373 P#X 586 P#C SQL."".""."__sql_fetch_line9" #9."4ay6mhcbhvbf2" P#! SELECT * FROM SCOTT.EMP P#X 3791 P#X 17 <<… and so on …>> Call Elapsed time between events Return from sub-program

Intro (3) … but you can and make it readable via the command-line utility: C:\Utl_File\IO>plshprof -output hprof_intro HProf.txt PLSHPROF: Oracle Database 12c Enterprise Edition Release 12.2.0.1.0 - 64bit Production [8 symbols processed] [Report written to 'hprof_intro.html']

Client’s Help Desk performance complaints: Real World Story Client’s Help Desk performance complaints: Developer checked 10046 trace and couldn’t find anything suspicious I noticed that the core query contains a user-defined PL/SQL function. Action: Wrap suspicious call in HProf start/stop in TEST instance (with the same volume of data)

1. Only 26 owners! 2. Function is doing basic formatting Suspect SQL> exec dbms_hprof.start_profiling ('IO', 'HProf_Case1.txt'); SQL> declare 2 v_tx varchar2(32767); 3 begin 4 select listagg(owner_tx,',') within group (order by 1) 5 into v_tx 6 from ( 7 select distinct scott.f_change_tx(owner) owner_tx 8 from scott.test_tab 9 ); 10 end; 11 / SQL> exec dbms_hprof.stop_profiling; 1. Only 26 owners! 2. Function is doing basic formatting

Profile 50k calls?! Here is my time!

Findings Problem: Solution: Time is wasted on very cheap function which is fired many times … because the original developer “guessed” at the query behavior … i.e. they knew function was doing basic formatting, so the output would also be distinct … but forgot to tell that to the CBO  GIGO! Solution: Rewrite query in a way that helps the CBO … and remind all developers: The number of function calls in SQL will surprise you if you don’t measure them.

Fix SQL> exec dbms_hprof.start_profiling ('IO', 'HProf_Case1_fix.txt'); SQL> declare 2 v_tx varchar2(32767); 3 begin 4 select listagg(owner_tx,',') within group (order by 1) 5 into v_tx 6 from ( 7 select scott.f_change_tx(owner) owner_tx 8 from (select distinct owner 9 from scott.test_tab) 10 ); 11 end; 12 / SQL> exec dbms_hprof.stop_profiling Filter first!

Updated Profile 26 calls 28 times faster!

Real World Story: Impact The module was in the most frequently called part of the system. Overall CPU footprint dropped from an average of 70% to about 35%. Plans to double a number of CPUs for that instance were postponed.

3.3 - Avoid Context Switches

Context Switches are Expensive! Java-to-SQL are much more expensive than PL/SQL-to-SQL  “Thick Database Approach” rules! Proof by Toon Koppelaars https://www.youtube.com/watch?v=8jiJDflpw4Y Even PL/SQL-to-SQL context switch costs can add up Know (and decrease!) call frequency Various caching techniques Decrease the cost of calls PRAGMA UDF

PRAGMA UDF (1) Meaning: Example: PL/SQL function is compiled in the way that is optimized for SQL usage (different memory management). Example: CREATE FUNCTION f_change_udf_nr (i_nr NUMBER) RETURN NUMBER IS PRAGMA UDF; BEGIN counter_pkg.v_nr:=counter_pkg.v_nr+1; RETURN i_nr+1; END;

PRAGMA UDF (2) Much faster in SQL: SQL> SELECT MAX(f_change_nr(object_id)) FROM TEST_TAB; MAX(F_CHANGE_NR(OBJECT_ID)) --------------------------- 51485 Elapsed: 00:00:00.48 SQL> SELECT MAX(f_change_udf_nr(object_id)) FROM TEST_TAB; MAX(F_CHANGE_UDF_NR(OBJECT_ID)) ------------------------------- Elapsed: 00:00:00.06

Functions in WITH Clause Meaning: Functions with the visibility scope of just one SQL query Compilation is tightly linked with SQL SQL> WITH FUNCTION f_changeWith_nr (i_nr number) 2 RETURN NUMBER IS 3 BEGIN 4 RETURN i_nr+1; 5 END; 6 SELECT max(f_changeWith_nr(object_id)) 7 FROM test_tab 8 / MAX(F_CHANGEWITH_NR(OBJECT_ID)) ------------------------------- 51485 Elapsed: 00:00:00.07 Comparable to PRAGMA UDF timing

Side Effects of SELECT from DUAL Definitions: Scalar sub-query returns a single column of a single row (or from the empty rowset) Scalar sub-query caching is an Oracle internal mechanism to preserve results of such queries while processing more complex ones. Implemented as in-memory hash table Cache size is defined by “_query_execution_cache_max_size” [65536 bytes by default] Cache is preserved for the duration of the query. Last value is preserved even if hash table is full. Example: SQL> SELECT empno, f_change_tx(job) FROM emp; SQL> exec counter_pkg.p_check; Fired:14 SQL> SELECT empno, (SELECT f_change_tx(job) FROM dual) 2 FROM emp; Fired:5 Only 5 distinct jobs

Same OUT for the same IN DETERMINISTIC clause: Example: Fired:14 If function does the same thing for exactly the same IN, it can be defined as DETERMINISTIC. Oracle may reuse already calculated values via in-memory hash tables [same as for scalar sub-query and using the same parameter/limit] Oracle does not check to see whether the function is deterministic or not! Example: CREATE FUNCTION f_change_det_tx (i_tx VARCHAR2) RETURN VARCHAR2 DETERMINISTIC IS BEGIN counter_pkg.v_nr:=counter_pkg.v_nr+1; RETURN lower(i_tx); END; SQL> select empno, f_change_tx(job) from emp; SQL> exec counter_pkg.p_check; Fired:14 SQL> select empno, f_change_det_tx(job) from emp; Fired:5 Same 5 calls

PL/SQL Result Cache PL/SQL Function Result Cache Example: Database-level cache (cross-session) Stored in SGA Automatic cache monitoring and invalidation (Oracle 11g R2+) Example: create function f_getdept_dsp (i_deptno number) return varchar2 result_cache is v_out_tx varchar2(256); begin if i_deptno is null then return null; end if; select initcap(dname) into v_out_tx from dept where deptno=i_deptno; counter_pkg.v_nr:=counter_pkg.v_nr+1; return v_out_tx; end;

Result Cache Basics SQL> SELECT empno, f_getDept_dsp(deptno) dept_dsp 2 FROM emp; EMPNO DEPT_DSP ----------- ------------------------ 7369 Research ... 14 rows selected. SQL> exec counter_pkg.p_check; Fired:3 SQL> SELECT empno, f_getDept_dsp(deptno) dept_dsp 2 FROM emp; EMPNO DEPT_DSP ---------- ---------- 7369 Research Fired:0 No calls at all!

Result Cache Stats SQL> SELECT * FROM v$result_cache_statistics; ID NAME VALUE ----- ------------------------------ ---------- 1 Block Size (Bytes) 1024 2 Block Count Maximum 15360 3 Block Count Current 32 4 Result Size Maximum (Blocks) 768 5 Create Count Success 3 6 Create Count Failure 0 7 Find Count 25 8 Invalidation Count 0 9 Delete Count Invalid 0 10 Delete Count Valid 0 11 Hash Chain Length 1 12 Find Copy Count 25 3 distinct INs

Real World Story: Moving Business Logic to the Database Original implementation [by previous contractor ] OO-like database APIs - get/set for every attribute Business logic is in the middle tier 60k roundtrips to the database ~ 30 minutes per call Moving everything to the database: Exact logic (replicated) ~ 23 seconds Bulk processing ~ 1.7 seconds

3.4 - Don’t reinvent the wheel

These features are FREE!!! Just a reminder about: Analytic functions (RANK, LEAD, LAG…) Pivot/Unpivot MODEL LOB APIs Efficient JSON and XML support Etc.…

Original implementation: Real Story: CLOB scare Original implementation: All CLOB changes are made using concatenation I/O is far above the expected level Management is blaming Oracle and contemplating switching to a different platform Using DBMS_LOB APIs (APPEND/WRITEAPPEND): Two orders of magnitude performance improvement I/O problems are gone DBAs got bonuses (for TALKING to developers)

3.5 - Work in SETs

Must use and understand object types Read in SETs/write in SETs … and be aware of memory impact Read in SETs/write in SETs BULK COLLECT FORALL

Real Story: Bulk Operations Optimization Task: Data needs to be retrieved from a remote location via DBLink. Each row must be processed locally. Source table contains 50,000 rows. Problem: Existing system is very CPU-bound

RowByRow SQL> connect scott/TIGER@localDB; sql> declare 2 v_nr number; 3 begin 4 for c in (select * from test_tab@remotedb) loop 5 v_nr :=c.object_id; 6 end loop; 7 end; 8 / Elapsed: 00:00:00.77 SQL> select name, value from stats where name in 2 ('STAT...session pga memory max', 3 'STAT...SQL*Net roundtrips to/from dblink'); NAME VALUE ----------------------------------------- ------- STAT...session pga memory max 2609504 STAT...SQL*Net roundtrips to/from dblink 510

BULK LIMIT sql> declare 2 type collection_tt is table of Limit can be changed! sql> declare 2 type collection_tt is table of 3 test_tab@remotedb%rowtype; 4 v_tt collection_tt; 5 v_nr number; 6 v_cur sys_refcursor; 7 v_limit_nr binary_integer:=5000; 8 begin 9 open v_cur for select * from test_tab@remotedb; 10 loop 11 fetch v_cur bulk collect into v_tt 12 limit v_limit_nr; 13 exit when v_tt.count()=0; 14 for i in v_tt.first..v_tt.last loop 15 v_nr:=v_tt(i).object_id; 16 end loop; 17 exit when v_tt.count<v_limit_nr; 18 end loop; 19 close v_cur; 20 end; 21 /

Analysis Results: Summary: Conclusion: With the increase of bulk limit processing, time stops dropping because memory management becomes costly! This point is different for different hardware/software. Conclusion: Run your own tests and find the most efficient bulk limit. Limit size Time Max PGA Roundtrips 100 0.78 2,543,968 510 250 0.58 2,675,040 210 500 0.49 2,806,112 110 1000 0.44 3,133,792 60 5000 0.40 4,247,904 20 10000 0.41 7,590,240 15 20000 0.43 14,340,448 12

Real World Story: Impact Introduced new functionality with minimal impact Total CPU workload increased ~ 2%, while original estimate was in the range of 10-15% Reminder: Tuning is often about a tradeoff: Save the least available resource by utilizing a more available resource.

Well-written code in a cloud-based system SAVES LOTS OF MONEY. Summary Cloud is here to stay … so, you have to build your systems the right way. Well-written code in a cloud-based system SAVES LOTS OF MONEY. .. so, developers are now visible to CFO Oracle provides enough tools to create well-written code … so, you have to learn new tricks - sorry 

Contact Information Michael Rosenblum – mrosenblum@dulcian.com Dulcian, Inc. website - www.dulcian.com Blog: wonderingmisha.blogspot.com 63