Download presentation
Presentation is loading. Please wait.
1
How to Diagnose Issues in the Cloud
Nathan Reynolds Samuel To Scott Oaks Sr. Architects Oracle Cloud PSR October 22ⁿᵈ, 2018 Confidential – Oracle Internal/Restricted/Highly Restricted
2
Safe Harbor Statement The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, timing, and pricing of any features or functionality described for Oracle’s products may change and remains at the sole discretion of Oracle Corporation. Confidential – Oracle Internal/Restricted/Highly Restricted
3
Your Chance to Influence Product Direction and Design!
The QR code is DYNAMIC so we can change where it points to if we want. Use the URL code-generator.com/manage/?aftercreate=1 Mary Beth has the account with her gmail. The Cloud Platform UX team would love to get your feedback on new designs. Sign up here:
4
Oracle Cloud – PSR Charter
Unified process for Complete Cloud Stack Global monitoring and Diagnostics Production Efficiency Cloud needs to be up
5
Program Agenda 1 Tips from the PSR Team Java, Cloud and Docker Java Cloud Service Tips Oracle Management Cloud Burning Questions – now its your turn! 2 3 4 5 Confidential – Oracle Internal/Restricted/Highly Restricted
6
Program Agenda 1 Tips from the PSR Team Java, Cloud and Docker Java Cloud Service Tips Oracle Management Cloud Burning Questions – now its your turn! 2 3 4 5 Confidential – Oracle Internal/Restricted/Highly Restricted
7
Diagnostic Tools Database – swingbench I/O – dd, vdbench
Example – dd if=/dev/zero of=./testfile bs=8K count= oflag=direct Example – dd if=/dev/zero of=./testfile bs=1M count=1000 Network – iperf, netchecker Example – iperf3 -c VM1 -i 1 -t 30 -p O 5 –w 1M -P 8 Netchecker - &sourceId= &id= CPU – SPECcpu2017, unixbench Caution: Be careful when downloading and using tools. Google cannot find netchecker since it requires a login to access.
8
Java Enable GC logging Analyze GC logs with https://gceasy.io
HotSpot GC Logging Enable GC logging -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintGCTimeStamps -Xloggc:file -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=4 -XX:GCLogFileSize=5M Analyze GC logs with Caution: Be careful when accessing websites.
9
Java Enable Heap Dumps Analyze heap dumps with https://heaphero.io
HotSpot Heap Dump on OutOfMemoryError Enable Heap Dumps -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=path Analyze heap dumps with Caution: Be careful when accessing websites.
10
PSR Top Three Questions about Java in Cloud and Containers
Why is Java running differently in my VM or Docker container? How can I monitor Java in my remote VM or Docker container? How can I inspect Java in my Docker container? Confidential – Oracle Internal/Restricted/Highly Restricted
11
PSR Top Three Java in Container Questions
Why is Java running differently in my VM? Java Ergonomics is different in all environments Runtime.availableProcessors() Default GC Threads Default Max Heap JDK 8, Bare Metal Machine, 32 HW threads, 128GB memory 32 23 32GB JDK 11, Bare Metal Machine, JDK 8, VM.Standard2.8 VM, 16 HW threads, 120GB memory 16 13 30GB JDK 11, VM.Standard2.8 VM, JDK 8, Docker -it –m60GB –cpus 8 on VM.Standard2.8 JDK 11, Docker -it –m60GB –cpus 8 on VM.Standard2.8 8 15GB Confidential – Oracle Internal/Restricted/Highly Restricted
12
PSR Top Three Java Questions
How do I monitor Java in my remote Container? Use tools that support remote management jstat: make sure your image runs jstatd jconsole: make sure your JVM starts with remote management options Can tunnel through firewalls Make sure to set up your VM security rules appropriately Oracle Weblogic: the –Dmanagement flag is your friend For Cloud VMs, nothing beats shell access But what about docker…. Confidential – Oracle Internal/Restricted/Highly Restricted
13
PSR Top Three Java in Container Questions
How do I inspect Java in my Docker Container? Log into a docker container: $ pid=$((docker inspect –format ‘{{.State.Pid}}’ <container> $ sudo nsenter –target $pid –mount –uts –ipc –net –pid # jstat -bash: jstat: command not found Include the JDK in your image If not…mount your local filesystem into your docker image Confidential – Oracle Internal/Restricted/Highly Restricted
14
PSR Top Three Java in Container Questions
# Mount local filesystem into docker container # Based on work from Jérôme Petazzoni # (github.com/jpetazzo) dockerExec() { sudo nsenter --target $DOCKERPID --mount –uts \ --ipc --net --pid -- sh -c "$*” } REALPATH=$(readlink --canonicalize /path/to/jdk) FILESYS=$(df -P $REALPATH | tail -n 1 | awk '{print $6}') while read DEV MOUNT JUNK do [ $MOUNT = $FILESYS ] && break done </proc/mounts REALDEV=$(readlink --canonicalize $DEV) # DEV == /dev/mapper/foo; REALDEV=/dev/dm-0 while read A B C SUBROOT MOUNT JUNK do [ $MOUNT = $FILESYS ] && break done < /proc/self/mountinfo SUBPATH=$(echo $REALPATH | sed s,^$FILESYS,,) DEVDEC=$(printf "%d %d" \ $(stat --format "0x%t 0x%T" $REALDEV)) DEVDIR=$(dirname $DEV) dockerExec "[ -x $DEVDIR ] || mkdir -p $DEVDIR" if [ $DEV = $REALDEV ]; then dockerExec "[ -b $DEV ] || mknod --mode 0600 $DEV b $DEVDEC” else dockerExec "[ -b $REALDEV ] || mknod --mode 0600 $REALDEV b $DEVDEC” dockerExec "[ -h $DEV ] || ln -s $REALDEV $DEV” fi dockerExec mkdir /tmpmnt dockerExec mount $REALDEV /tmpmnt dockerExec mkdir -p $CONTPATH dockerExec mount -o bind \ /tmpmnt/$SUBROOT/$SUBPATH $CONTPATH dockerExec umount /tmpmntdockerExec rmdir /tmpmnt Confidential – Oracle Internal/Restricted/Highly Restricted
15
Program Agenda 1 Tips from the PSR Team Java, Cloud and Docker Java Cloud Service Tips Oracle Management Cloud Burning Questions – now its your turn! 2 3 4 5 Confidential – Oracle Internal/Restricted/Highly Restricted
16
Java Cloud Service Tips – Connection Tagging
It is useful to “tag” connections to make it easy to monitor from database You can use properties in data source to set v$session.program and v$session.process easily Value set in “Properties” field under “Configuration->Connection Pool” are applied to all managed servers so they all get same value Use “System Properties” to set different value for each managed server Confidential – Oracle Internal/Restricted/Highly Restricted
17
Java Cloud Service Tips – Connection Tagging
If using “System Properties”, the value will be picked up from the –D parameter with same name passed to WebLogic on managed server start The –D parameter can be set differently per managed server via the “Arguments” field in “Configuration->Server Start” tab on the managed sever settings pages In this example, v$session.program will be set to “DS/MyDomain/MyServer_1” for this managed server Confidential – Oracle Internal/Restricted/Highly Restricted
18
Java Cloud Service Tips – Connection Tagging
Once these are set it is very easy to check connection usage from perspective of database SQL> select program, process, inst_id, count(*) from gv$session 2 where type = 'USER' 3 group by program, process, inst_id 4 order by 4 desc PROGRAM PROCESS INST_ID COUNT(*) DS/MyDomain/MyServer_ ApplicationDB DS/MyDomain/MyServer_ MetadataDB DS/MyDomain/MyServer_ ApplicationDB DS/MyDomain/MyServer_ ApplicationDB DS/MyDomain/MyServer_ MetadataDB DS/MyDomain/MyServer_ ApplicationDB .... Confidential – Oracle Internal/Restricted/Highly Restricted
19
Java Cloud Service Tips – Connection Tagging
You can use setClientInfo API on java.sql.Connection to set other v$session properties: MODULE/ACTION/CLIENTID // "conn" is an instance of java.sql.Connection: conn.setClientInfo("OCSID.CLIENTID", "Alice_HR_Payroll"); conn.setClientInfo("OCSID.MODULE", "APP_HR_PAYROLL"); conn.setClientInfo("OCSID.ACTION", "PAYROLL_REPORT") These are stored/displayed in Active Session History and SQL Monitor reports and very useful for DBA to track down source of expensive SQLs More info at database/18/jjdbc/JDBC-standards-support.html#GUID-1987FAC4-E93A- 49A5-9EB4-A78B465E6938 Confidential – Oracle Internal/Restricted/Highly Restricted
20
Java Cloud Service Tips – Diagnosing Connection Leaks
WebLogic provides options to detect connection leaks Enable the “Profile Connection Leak” option to collect information about threads that have reserved a connection from the data source To specify the length of time when an inactive connection is considered “leaked” Use Connection Leak Timeout Seconds in 12c. This will only generate a record in datasource.log Use Inactive Connection Timeout in 11g (under Connection Pool configuration tab). This will also forcibly release the connection Confidential – Oracle Internal/Restricted/Highly Restricted
21
Java Cloud Service Tips – Diagnosing Connection Leaks
When leaks are detected there will be an entry written to <domain>/servers/<server>/logs/datasource.log Confidential – Oracle Internal/Restricted/Highly Restricted
22
Java Cloud Service Tips – Diagnosing “Stuck” Threads
WebLogic provides options to set how long a request can run before it is considered “Stuck” Stuck threads are not necessarily “stuck” at a particular line of code, it’s just the request has taken longer than the configured threshold Default is 600s, under “Overload” tab for the managed server configuration Confidential – Oracle Internal/Restricted/Highly Restricted
23
Java Cloud Service Tips – Diagnosing “Stuck” Threads
By default there is a diagnostic module “Module-FMWDFW” configured to generate an “incident” when a stuck thread is detected <Oct 2, 2018, 12:05:30,705 AM UTC> <Error> <WebLogicServer> <BEA > <[STUCK] ExecuteThread: '30' for queue: 'weblogic.kernel.Default (self-tuning)' [waiting for S:myhost.com:[11408,-1,-1,-1,-1,-1,-1]:MyDomain:MyServer_1] has been busy for "608" seconds working on the request "Workmanager: WM-UI, Version: 0, Scheduled=true, Started=true, Started time: ms[GET /myApp/processPayment/secure HTTP/1.1 … ]", which is more than the configured time (StuckThreadMaxTime) of "600" seconds in "server-failure-trigger". Stack trace: java.lang.Object.wait(Native Method) weblogic.rjvm.ResponseImpl.waitForDataInternal(ResponseImpl.java:111) weblogic.rjvm.ResponseImpl.waitForData(ResponseImpl.java:95) weblogic.rjvm.ResponseImpl.getTxContext(ResponseImpl.java:151) weblogic.rjvm.BasicOutboundRequest.sendReceive(BasicOutboundRequest.java:111) weblogic.rmi.cluster.ClusterableRemoteRef.invoke(ClusterableRemoteRef.java:362) … <Oct 2, 2018, 12:05:31,141 AM UTC> <Emergency> <oracle.dfw.incident> <BEA > <incident 1092 created with problem key "BEA [WebLogicServer]"> Confidential – Oracle Internal/Restricted/Highly Restricted
24
Java Cloud Service Tips – Diagnosing “Stuck” Threads
Sample incident Confidential – Oracle Internal/Restricted/Highly Restricted
25
Program Agenda 1 Tips from the PSR Team Java, Cloud and Docker Java Cloud Service Tips Oracle Management Cloud Burning Questions – now its your turn! 2 3 4 5 Confidential – Oracle Internal/Restricted/Highly Restricted
26
Oracle Management Cloud
Application Performance Monitoring (APM) Confidential – Oracle Internal/Restricted/Highly Restricted
27
Oracle Management Cloud
Log Analytics (LA) Confidential – Oracle Internal/Restricted/Highly Restricted
28
Oracle Management Cloud
Infrastructure Monitoring Confidential – Oracle Internal/Restricted/Highly Restricted
29
Oracle Management Cloud
Database Monitoring Confidential – Oracle Internal/Restricted/Highly Restricted
30
Oracle Management Cloud
Sessions See Tuesday 11:15am: Best Practices for Deploying Oracle Management Cloud [PRO4280] 3:45pm: Metrics, Logs, Traces: Monitoring Multitiered Applications [PRO4479] Thursday 10am: Improve Operational Efficiency of Oracle Cloud Infrastructure [PRO4369] 10am: Monitoring Large-Scale Distributed Applications in Oracle Cloud [DEV6151] 12pm: Extending and Customizing Oracle Management Cloud [PRO4450] Check out demo booths at the Exchange in Moscone South
31
Oracle Cloud Infrastructure Compute Health Monitoring
Provides critical insight into the health of bare metal compute instances Monitor health of the hardware including CPU, DIMM, mainboards and storage devices Provides notifications on hardware issues Use the notifications to proactively redeploy instances and improve availability Available for all customers Automatically enabled on all compute bare metal instances in all OCI regions
32
Performance, Scalability and Reliability Sessions
Tuesday 12:30pm: Oracle Cloud Performance Best Practices for IaaS and Paas Marriot Marquis Nob Hill A/B Thursday 9am: Optimizing Spark and Hive Workloads in Big Data Cloud Service Moscone West 3024
33
Program Agenda 1 Tips from the PSR Team Java, Cloud and Docker Java Cloud Service Tips Oracle Management Cloud and Diagnostic Tools Burning Questions – now its your turn! 2 3 4 5 Confidential – Oracle Internal/Restricted/Highly Restricted
34
Burning Questions Now it’s your turn! Please ask questions and share your experience. Caution: Be careful when downloading and using tools. Google cannot find netchecker since it requires a login to access.
Similar presentations
© 2025 SlidePlayer.com Inc.
All rights reserved.