Presentation is loading. Please wait.

Presentation is loading. Please wait.

ESM Database Performance from the Bottom-Up

Similar presentations


Presentation on theme: "ESM Database Performance from the Bottom-Up"— Presentation transcript:

1 ESM Database Performance from the Bottom-Up
Kerry Adkins Senior Technical Support Engineer - ArcSight Sept 2010

2 Agenda ArcSight ESM Database Overview Optimizing for Performance
ArcSight ESM Database Server components Disk Layout ArcSight ESM Database Configuration - Templates Optimizing for Performance Oracle Critical Patch Updates (CPU) Read vs Write Troubleshooting Steps Diagnostic Tools For Performance Troubleshooting GetPartitionInfo – PartitionInfo.log Remote Diagnostics Agent (RDA)

3 ArcSight Database Server Defined
The database server is hardware that contains: - Disks – Local and attached storage - An operating system - Memory/CPUs For optimal performance, ArcSight database: Requires a dedicated instance and machine Cannot tolerate contention from any other application ** Note: If you purchased the embedded license from ArcSight, you are not licensed to run any other applications against the ArcSight database. Storage Database Server Oracle Instance Operating System CPUs/Memory

4 Storage Recommendations
Recommended: RAID Level (or 1+0) for ALL volumes Reasons ArcSight Recommends Raid versus Raid 5 RAID 5 implementations have been known to cause: Performance problems Small random writes cause typical read-read-write-write problem for RAID5 ArcSight database constantly performs a lot of random writes because of large number of event insertions Availability problems If one disk fails, the controller needs to recalculate the missing data using the checksum, causing severe read performance degradation If two disks fail, all data is lost

5 Storage and Server Tips
64 bit Database Server at least GB of memory – as much as you can afford Fewer faster CPUs are better than more slower CPUs Direct connect via at least 1 GB cable between Manager and DB server Don’t use Software based RAID the extra software may cause performance impacts Solid State Drives (SSDs) are more readily available More costly than standard drives by at least 10% Best use - redo logs and arc_event_index files

6 Database Disk Layout – Volume 1
SAN needs to be capable of high volume read/writes Can use Oracle’s ORION tool for benchmarking Seek support from the SAN vendor for optimal performance Database install software ORACLE_HOME ARCSIGHT_HOME Volume 1 (System): Oracle installation directory (i.e. $ORACLE_BASE) ArcSight ESM database software installation directory (i.e. ARCSIGHT_HOME) Volume 1

7 Database Disk Layout - Volume 2
ARC_SYSTEM_DATA – resources - mostly cached on the Manager ARC_EVENT_INDEX 2 times size of ARC_EVENT_DATA Most I/O load on ARC_EVENT_INDEX – random read/writes Volume 2 Volume 2 (Database): Oracle default tablespaces (i.e. $ORACLE_BASE/oradata/arcsight by default) ArcSight System Data Tablespace (ARC_SYSTEM_DATA) ArcSight System Index Tablespace (ARC_SYSTEM_INDEX) ArcSight Event Data Tablespace (ARC_EVENT_DATA) ArcSight Event Index Tablespace (ARC_EVENT_INDEX) ArcSight temp tablespace (ARC_TEMP) ArcSight undo tablespace (ARC_UNDO)

8 Database Disk Layout – Volume 3
High rate of sequential writes Event throughput drops 20% without having on a separate volume Volume 3 (Redo): Oracle Redo logs

9 Database Disk Layout – Volume 4
Required if DB is in Archivelog mode or if running Partition Archiver Written to sequentially for archived redo logs Written to once/day for oldest partition Archiving Needs to be as fast as Volume 3 Redo Otherwise redo archiving will causing intermittent slowness Monitor space or Oracle will hang Volume 4 (Archive)(optional): Oracle Archived Redo Logs (required for hot backups) Partition Archiver files Volume 4

10 ArcSight Database Server Defined (continued)
Database Instance An Oracle Database Server is made up of two parts An Instance A Database SGA Shared Pool DB Buffer Cache Redo Log Buffer Large Pool Java Pool Oracle Processes Database ArcSight Tablespaces Arc_event_data Arc_event_index Arc_system_data Arc_system_index Control Files Online Redo Logs Datafiles

11 ArcSight Database Templates
Types of Database Templates: Small, Medium, Standard, Large , XLarge, XXLarge Located in ARCSIGHT_HOME/installer/oracle10g/<OS>/dbca Where <OS> is Windows or Unix How to Choose your Database Template Size Depends on the amount of memory on your database server Selecting the correct template from the beginning as making changes later is difficult ( i.e. db block size ) ArcSight Database Template is: A set of predefined Oracle instance initialization parameters Optimized to meet ArcSight’s unique performance requirements

12 ArcSight Database Template Tips
Use Standard Template Only for 32 bit systems Recommend XXLarge Template for 64 bit systems Use to configure extra SGA for Oracle – Knowledge Base Article 3897 By default the largest sga_target is 11 GB – for windows 64-bit servers *nix is 70% of total memory Largest pga_target is 4 GB

13 Optimizing for Performance

14 Oracle Critical Patch Updates
Definition: Oracle Critical Patch Updates (CPUs) Quarterly patches that contain security and important code fixes ArcSight certifies CPUs with major releases of ESM CPUs should be applied when patching or upgrading ESM Many performance issues, support has seen, have been alleviated by applying the Oracle CPU certified with that version of ArcSight If you are not sure that your system has a an Oracle CPU, refer to Knowledge Base article 808

15 Types of Performance Challenges
Write Performance common challenges ESM error message that states: “Database Failure Manager for Events Broker: it appears the database is hung. There are insertion threads that have not returned from the database insert call for more than 120 seconds” Connectors caching Threads being rejected Read Performance common challenges Active channel (AC) not loading or hanging Reports not completing / Trends getting disabled ORA snapshot too old Other ORA- errors

16 Write Performance Typical Troubleshooting Steps
Step 1: Data Collection Four to Five thread dumps and dbsessions taken during the slowness. what are the manager threads busy doing and which queries are running in the database $ORACLE_HOME/admin/arcsight/bdump/alert_arcsight.log to look for: Ora- errors, Connectivity timeouts, Data file corruption ARCSIGHT_HOME/logs/default/server.std.log(s) to look for : Temporary loss of database connections, network connectivity, high persistence rates.

17 Write Performance Typical Troubleshooting Steps (cont’d)
Step 2: Identifying persisted events/caching Check (AgentStateTracker) Good persistence (Estimated Cache Size=0) Bad Persistence (Estimated Cache Size=14,000)

18 Write Performance Typical Troubleshooting Steps (cont’d)
How is write performance resolved? Applying the latest recommended Oracle CPU Correcting network connectivity between Database and manager Storage, possible changes of: RAID level Storage layout SAN device

19 Read Performance Typical Troubleshooting Steps
Types of data to collect Details on use case PartitionInfo.log - dbcheck Support provided script - GetExplainPlanData.zip Other items as requested by support ArcSight Support reviews the output indicating: Expected index use Unexpected Full table scans

20 Read Performance Common Solutions
Applying the latest recommended Oracle CPU Regenerating Stats Re-writing Queries more optimally Adding space to ARC_UNDO or ARC_TEMP Resetting I/O transfer speed – GatherSystemStats.sql Knowledge Base Article 3903

21 Best practices for Performance Troubleshooting
TYPE: Reports Behavior: Reports not completing or receiving errors Typical troubleshooting steps: Use Query Tuner on Queries The query tuner evaluates query Provides suggestions on possible hints Address any ORA by using Knowledge Base article 2852 Spread reports throughout the day Don’t run during Partition Compressor run-time Use Trends This will collect smaller chunks of data in shorter time periods for future use Are you talking about Common issues?

22 Best practices for Performance Troubleshooting (cont’d)
Type: General Report retrieval issues Behavior: reports not returning “fast enough” Typical troubleshooting steps: Determine if indexes can be used Determine if indexes should be added Based on the issue found Support can provide scripts to add indexes if they are needed Note: we don’t test with or certify the product with custom indexes and if an issue arises we may ask the custom indexes be removed

23 Batched Data – Performance Issues
ArcSight ARC_EVENT table partitioned by event end_time Causes data access delay as the data is not stored sequentially as expected Incorrect explain plan High clustering Factor on indexes Compressor errors “The partition is still active. 25 rows were updated in the exchange table.” Knowledge Base Article 3324 – find connectors sending late events Consider modifying connectors to use their receipt time as end time while backing up the original end time in another field

24 Performance Monitoring Best Practices
Have a baseline You can use Bleep - Knowledge Base Article 3024 Joe Burke has an excellent document on Protect Benchmarking Your ArcSight SIM - Set up Official/Unofficial Change Control to be aware changes to: Network (switches, firewall, DNS…) New or changed devices or agents Operational procedures (backups…) Content (reports, filters…) Upgrades

25 What to do when Performance goes awry
When experiencing a performance issue Turn on SQL tracing to get the actual SQL statements – Knowledge Base Article 2111 – for Write issues Turn on explain plan tracing to get the execution plans – Knowledge Base Article 57 – For Read issues Identify all recent changes by DBAs, sysadmins, network admins, and security analysts Open a ticket with Support with the information you’ve collected

26 Diagnostic Tools for Performance Troubleshooting

27 What is the PartitionInfo.log
The GetPartitionInfo40.sql script to be replaced with dbcheck in the next patch for 5.0: Outputs the PartitionInfo.log file Is a group of scripts that show the status of the ArcSight Database partitions and configuration. For more information on how to run and download please refer to Knowledge Base article 572 For details of each section in PartitionInfo.log see: Protect 724 > Browse > Blog Posts > Active Blogs > Kerry’s Blog <

28 Overview: Details of a Partitioninfo.log
Verify timestamp Identifies the platform, version, bit the database is on Validate the version of Oracle that is supported. for 4.0 SP1 for 4.0 SP3/4.5/5.0

29 PartitionInfo.log Validate that Automatic Memory Management is on:
shared_pool_size, large_pool_size, java_pool_size and db_cache_size are set to 0 Check value for undo_retention – it should be or depending on version installed

30 PartitionInfo.log – missing stats
Missing lines (holes) or if the LAST_ANALYZED date is not current for ARC_EVENT there could be a stats issue Hash lines (####) are due to column overflow – download GetPartitionInfo40.sql from Knowledge Base Article 572

31 Oracle Global Statistics vs Partition Statistics
Starting with version 5.0 Oracle Global Statistics are disabled on the arcsight schema This is especially important at over 50 million events per day Only need daily partition statistics (run, by default 6 times/day) See Knowledge Base article 2230 for evaluating and turning off global statistics on arcsight schema For missing stats we can regenerate them ie run this command in ARCSIGHT_HOME\bin on your database machine: ./arcdbutil sql RegenerateEventStats.sql

32 Defining Oracle RDA Oracle Remote Diagnostics Agent
System configuration CPUs, Disks, Memory and …much more! Database configuration Init Parameters, Disk allocation, Invalid objects, … much more! How to run can be found in Knowledge Base Article 62 Supplied in the ArcSight Software install ARCSIGHT_HOME/utilities/database/oracle/common/rda

33 RDA top screen Open RDA_start.htm in a web browser
Left hand side shows options Of importance for Performance issues: Operating System Setup Performance RDBMS

34 RDA - Operating System Setup
Selected Operating System Setup Review hardware CPUs Memory Disk Drives

35 RDA – RDBMS – Database Files
Shows datafile to tablespace allocation Check drive allocations spread across drives? Size of files

36 RDA- RDBMS Log/Trace Files- Alert log
Shows Oracle alert log file Can scroll through or search for errors (ORA-) Check for redo log file switching times (this example is too often – every 5 minutes Redo log resizing - Knowledge Base Article 640

37 RDA – Performance - AWR Report
AWR Knowledge Base Article: 1447 Top 5 Timed Events SQLNET items here mean I/O or Network issues These 5 events in various order are normal for ArcSight

38 RDA – AWR – SGA Target Advisory
View SGA Size Factor 1.0 is currently 6,000 MB Est DB Time 204, 818 Est Physical Reads 280,478,810 If SGA is increased to 12,000 MB Est DB time goes to 199,452 Est Physical Reads goes to 278,459,363 For 2x amount of SGA you gain 10% decrease in reads sga_max_size and sga_target – Knowledge Base Article 3897

39 Take Aways ArcSight Database Overview Optimizing for Performance
Datafile layout and Template size are important More memory is good for a database server Optimizing for Performance Apply Oracle CPU with every upgrade Identify issue – read vs write Monitor changes in your environment Diagnostic Tools GetPartitionInfo and dbcheck RDA

40 References Protect 724 – Blogs
ArcSight Knowledge Base from the Support Portal Choose Knowledge Base, login, enter the Knowledge Base numbers referenced in the presentation ArcSight ESM 5.0 Administrators Guide – Troubleshooting section - Query and Trend Performance Tuning Oracle I/O Numbers Calibration Tool (ORION) is a tool for benchmarking I/O performance of Oracle database without installing Oracle or its database

41 Knowledge Base Article Quick Reference Sheet
KB Number Slide Reference Number Oracle CPU 808 14 Report Fails to Run and Ora-01555 2852 18 Running GatherSystemStats.sql 3903 20 Compressor – Partition Still Active 3324 22 Bleep Utility 3024 24 Steps to Perform Thread Dumps and dbsessions 2111 25 How To Enable Explain Plans 57 GetPartitionInfo.sql 572 27 Change Oracle Global Stats 2230 30 RDA – Installing and Running 62 32 Resize the Oracle Redo Logs 640 36 AWR/Statspack on Oracle 10g 1447 37 Setting sga_target and sga_max_size 3897 11, 38

42 To learn more, contact ArcSight at: info@arcsight
To learn more, contact ArcSight at: or ARST ArcSight, Inc. 5 Results Way, Cupertino, CA 95014, USA Corporate Headquarters: ARST EMEA Headquarters: Asia Pac Headquarters:


Download ppt "ESM Database Performance from the Bottom-Up"

Similar presentations


Ads by Google