Presentation is loading. Please wait.

Presentation is loading. Please wait.

The Answer to Free Memory, Swap, Oracle and everything A presentation about using memory where it’s needed most Christo Kutrovsky The Pythian Group 2007.

Similar presentations


Presentation on theme: "The Answer to Free Memory, Swap, Oracle and everything A presentation about using memory where it’s needed most Christo Kutrovsky The Pythian Group 2007."— Presentation transcript:

1 The Answer to Free Memory, Swap, Oracle and everything A presentation about using memory where it’s needed most Christo Kutrovsky The Pythian Group 2007 April

2 The Answer to Free Memory, Swap, Oracle and everything A presentation about using memory where it’s needed most Christo Kutrovsky The Pythian Group 2007 April The 45 minutes version

3 Who Am I? Joined Pythian in 2003 Became team lead for one of Pythian's service delivery teams in 2006 Notable clients: Palm Coast Data, Freshdirect.com Presented at Collaborate '06, '07, RMOUG Special interest in 11g, RAC, Disk IO performance, and memory Pythian's delegate to the 11g beta, participated at the camp level (two visits)

4 Who is Pythian? Provides turnkey global data architecture and operations teams on a linear-cost-to-effort basis Founded in 1997, headquartered in Ottawa, Canada, with offices in India and Australia Supporting almost 100 clients worldwide and more than 600 production databases Almost 50 production engineers engaged in client service delivery Broad data infrastructure expertise primarily focused on Oracle, Microsoft SQL Server, and MySQL on enterprise hardware

5 Agenda Types of memory Virtual Memory areas How do we monitor memory usage  And make sense of it Oracle examples Case studies

6 Questions How many developers How many managing linux How many managing unix (AIX, solaris) How many have root access How many have control of database memory consumption

7 Terminology What is memory  The ability of a computer system to store data

8 Types of Memory Short term  RAM (memory) Long term (“permament”)  Disk, tape (storage)

9 Types of Memory - physical CPU Registers  fastest, very limited CPU Cache (L1/L2/L3)  some latency, LRU maintained RAM  major latency (relatively), partially LRU Disk  do something else while you wait

10 What is RAM Faster, temporary storage A work area A place where you put your data while you process it

11 The Many caches CPU CPU Registers 2 ns CPU Cache 8 ns 1:4 Main Memory (RAM) 100ns 1:12 Disk – Long term memory 3’000’000 ns 1:30’000 TAPE – even longer

12 CPU Cache & CPU Registers CPU Registers – your two hands (or more)  You use them to hold the items while you work on them CPU Cache – your desk  You use it as a quickly accessible location to store your most used items  Represents your current tasks

13 Main Memory - RAM RAM – Random Access Memory It’s like your office  Need to get up from your desk to grab items to work on  You usually grab multiple at a time to save roundtrips

14 Our office CPU Your hands 2 seconds CPU Cache “Desk” 4 sec. Main Memory (RAM) “Your office” 12 seconds Disk “Flying to Australia” 8 hours TAPE – use a cargo ship to go

15 Growing your office You always need more Your “office” needs to handle all your active clients, or they will be unhappy  Running out of space in your office is not acceptable

16 The Disk – extending the memory The Solution? Ship some of your least needed binders to Australia Relatively complex process  need to find the least needed binders  need to know how to return them, when they are needed

17 Introduction to virtual memory Processes “see” memory independently, as if it was alone on the system Each process has freedom to use addresses in the whole “user address space” Typically – 3 Gb user space, 1 Gb system space (on 32 bit)

18 Virtual memory mapping P1 P2 32 bit addressing space 0 gb1 gb2 gb3 gb4 gb RAM RAM split into 4 kb chunks Reserved virtual region for the system (kernel)

19 VM Management Implemented via per process page table Indicates:  page location (disk/memory)  page permissions (read/write/execute)  page attributes (ex. copy on write)

20 Virtual memory PTE table P1 PTE Table for P1 rw – in RAM – 0xFFA rw – in RAM – 0xFFB in RAM – 0xFFC – copy on write w – unallocated rw – on disk - SWAP rx – on disk - FILE unallocated RAM FILESWAP

21 Additional benefits from VM Protection Features  memory mapped files  in memory file system  shared memory  shared memory – copy on write Use more then what you have

22 Concept types of memory Shared  initially exists on disk file cache(linux), buffers, system cache  initially does not exist on disk anonymous(linux), computed(aix) Private  does not exist on disk  special case copy on write

23 Linux VM Components direct “user” dependant types of memory  Buffers (shared)  Cached (shared)  Anonymous (private or shared)  Hugepages indirect (system) managed areas  Slab – kernel structures  PageTables

24 VM areas with Oracle Cached SLAB Pagetables SystemUser Buffers Mapped IPC Memory (SGA) Anonymous (PGA,PLSQL arrays)

25 Monitoring Monitoring Memory with Oracle in mind

26 TOP top  most commonly used tool  most confused interpretation

27 top – sample output top - 22:03:11 up 3:19, 2 users, load average: 2.98, 1.22, 0.52 Tasks: 89 total, 1 running, 88 sleeping, 0 stopped, 0 zombie Cpu0 : 0.7% us, 0.8% sy, 0.0% ni, 0.3% id, 98.0% wa, 0.2% hi, 0.0% si Cpu1 : 0.0% us, 0.8% sy, 0.0% ni, 97.6% id, 1.4% wa, 0.2% hi, 0.0% si Cpu2 : 0.0% us, 0.2% sy, 0.0% ni, 99.7% id, 0.2% wa, 0.0% hi, 0.0% si Cpu3 : 0.2% us, 0.2% sy, 0.0% ni, 33.6% id, 66.1% wa, 0.0% hi, 0.0% si Mem: k total, k used, k free, 36620k buffers Swap: k total, 572k used, k free, k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 8494 oracle m 1.6g 1.5g D :03.15 oracletest (LOCAL=YES) 4796 oracle m 1.5g 1.5g S :03.91 ora_dbw1_test 4794 oracle m 1.5g 1.5g S :12.23 ora_dbw0_test 4798 oracle m 1.5g 1.5g S :03.97 ora_dbw2_test 4800 oracle m 1.5g 1.5g S :04.09 ora_dbw3_test 1 root S :00.86 init [3] 2 root RT S :00.00 [migration/0] 3 root S :00.00 [ksoftirqd/0]

28 Top – data comes from /proc/ /status cat /proc/10450/status Name: oracle State: S (sleeping) SleepAVG: 98% Tgid: Pid: PPid: 1 TracerPid: 0 Uid: Gid: FDSize: 256 Groups: VmSize: kB VmLck: 0 kB VmRSS: kB VmData: 1612 kB VmStk: 124 kB VmExe: kB VmLib: 8420 kB …

29 top – additional columns top can have additional columns  swap file usage computed  code  data THEY ARE ALL WRONG

30 vmstat vmstat 2 procs memory swap io system cpu---- r b swpd free buff cache si so bi bo in cs us sy id wa r – run queue – how many processes currently waiting for or running on the CPU b – how many processes waiting, usually waiting on IO swpd – swap memory usage free – free memory cache – file system cache

31 vmstat cont vmstat 2 procs memory swap io system cpu---- r b swpd free buff cache si so bi bo in cs us sy id wa si/so – swap in / out – in Kb/sec bi/bo – bytes in / out – in Kb/sec cs – context switches us/sy/id/wa – user/system/idle/wait time for CPUs

32 /proc/meminfo cat /proc/meminfo MemTotal: kB MemFree: kB Buffers: kB Cached: kB SwapCached: 0 kB Active: kB Inactive: kB HighTotal: kB HighFree: 8768 kB LowTotal: kB LowFree: kB SwapTotal: kB SwapFree: kB Dirty: 100 kB Writeback: 0 kB Mapped: kB Slab: kB Committed_AS: kB PageTables: 3952 kB VmallocTotal: kB VmallocUsed: 5964 kB VmallocChunk: kB HugePages_Total: 2200 HugePages_Free: 1088 Hugepagesize: 2048 kB

33 /proc/meminfo – 64 bit cat /proc/meminfo MemTotal: kB MemFree: kB Buffers: kB Cached: kB SwapCached: kB Active: kB Inactive: kB HighTotal: 0 kB HighFree: 0 kB LowTotal: kB LowFree: kB SwapTotal: kB SwapFree: kB Dirty: 252 kB Writeback: 0 kB Mapped: kB Slab: kB CommitLimit: kB Committed_AS: kB PageTables: kB VmallocTotal: kB VmallocUsed: 2992 kB VmallocChunk: kB HugePages_Total: 2000 HugePages_Free: 128 Hugepagesize: 2048 kB

34 MemTotal Total memory visible by the OS If it’s not what you’ve put in the machine, probably you have a bad SIM/DIMM

35 MemFree Memory that is currently un-occupied and available to use immediately Not the maximum amount of memory available at the moment Controlled by (Linux RH4) /proc/sys/vm/min_free_kbytes

36 MemFree – example grep MemFree /proc/meminfo MemFree: kB echo > /proc/sys/vm/min_free_kbytes grep MemFree /proc/meminfo MemFree: kB

37 Buffers Cache of raw disk blocks Usually occupied with ext3 metadata  Mostly ext3 pointers (extent management)  Not the cache of actual user data In older kernels, was controllable

38 Cached File system cache  If direct IO is not used for datafiles – will have your datafiles cached Binary (for execution) memory  includes the “oracle” binary caching  all the libraries caching Does not mean “occupied” – usually can be released immediately The Oracle SGA – when not using hugepages

39 Cached – example part 1 ~]# cat /proc/meminfo … MemFree: kB Buffers: 9328 kB Cached: kB … du -smc indx01_* 1714 indx01_01.dbf 1761 indx01_02.dbf 1722 indx01_03.dbf 5197 total … cat indx01_* > /dev/null

40 Cached – example part 2 ~]# vmstat 2 procs memory swap io system cpu---- r b swpd free buff cache si so bi bo in cs us sy id wa … cat /proc/meminfo … MemFree: kB Buffers: kB Cached: kB …

41 Cached – example #2 part 1 cat indx01_* >newfile vmstat 2 procs memory swap io system cpu---- r b swpd free buff cache si so bi bo in cs us sy id wa … … cat /proc/meminfo … MemFree: kB Buffers: 2192 kB Cached: kB … Dirty: kB Writeback: 0 kB …

42 Cached – example #2 part 2 cat /proc/meminfo … MemFree: kB Buffers: 3300 kB Cached: kB … Dirty: 0 kB Writeback: 0 kB … rm newfile procs memory swap io system cpu---- r b swpd free buff cache si so bi bo in cs us sy id wa

43 Swap SwapTotal SwapFree SwapCached  written to swap, but still in memory  applies only to anonymous memory  OS will anticipate memory needs, and pre-swap inactive data, but keep it in memory Actual swapping (memory that will need to be read from disk) = SwapTotal - SwapFree - SwapCached

44 Active/Inactive Active – recently used memory  Includes all types of memory (cached, buffers, anonymous)  OS will try to keep it in RAM Inactive – memory that will be first reused  “free” memory Can be used to gauge the “working set”

45 High/Low Total/Free 32 bit limitations, no high memory on 64 bit Some kernel structures cannot be allocated in “high memory” Used to be a problem in older kernels, newer kernels protect low memory

46 Dirty & Writeback Dirty – cache/buffers memory that requires to be written to disk  thresholds can be adjusted Writeback – memory actively been written to disk  Can reach high values with async writes with large queue

47 Committed_AS & Mapped Committed_AS  Total memory requested on the system  Not used, just requested  If every process in the system is to touch and use the memory it has requested, this is how much would be used Mapped  memory used for in-memory mapped files  all anonymous memory  includes committed & touched memory

48 Committed_AS - example cat grab.c main() {void *p; p=malloc( ); sleep(60);} cat /proc/meminfo... MemFree: kB... Committed_AS: kB./grab cat /proc/meminfo... MemFree: kB... Committed_AS: kB

49 Slab Slab – “in-kernel data structures cache”  similar to Oracle’s “shared_pool”  designed to prevent memory fragmentation  detailed monitoring: /proc/slabinfo slabtop Basically “system space”

50 slabtop – ordered by cache size Active / Total Objects (% used) : / (63.8%) Active / Total Slabs (% used) : 5839 / 5846 (99.9%) Active / Total Caches (% used) : 90 / 132 (68.2%) Active / Total Size (% used) : K / K (74.2%) Minimum / Average / Maximum Object : 0.01K / 0.17K / K OBJS ACTIVE USE OBJ SIZE SLABS OBJ/SLAB CACHE SIZE NAME % 0.27K K radix_tree_node % 0.05K K buffer_head % 4.00K K size % 0.54K K ext3_inode_cache % 0.50K K size % 3.00K K biovec-(256) % 0.15K K dentry_cache % 0.06K K size % 0.12K K size % 0.25K K size-256 …

51 HugePages 2Mb pages organized in a separate memory pool locked in memory available only to shared memory requests pre-allocated via kernel parameter

52 Shared memory mapping P1 P2 32 bit addressing space 0 gb1 gb2 gb3 gb4 gb RAM Reserved virtual region for the system (kernel)

53 Shared memory mapping (huge) P1 P2 32 bit addressing space 0 gb1 gb2 gb3 gb4 gb RAM HugePages Pre-Allocated Memory pool Locked in RAM

54 VLM – 32 bit workarround 32 bit adress space is 4 Gb 32 bit systems with PAE (Intel)  up to 64 Gb of ram Memory filesystem  opens a file in /dev/shm for buffer cache  shared pool still in Shared Memory Beware of small Oracle block size

55 VLM – using 3gb+ on 32 bit P1 P2 32 bit addressing space 0 gb1 gb2 gb3 gb4 gb RAM /dev/shm/ora_ ramfs mmap region

56 USE_INDIRECT_BUFFERS RedHat/SUSE  shmfs – needs size  tmpfs – does not need size  ramfs – does not need size + Locked none can use HugePages  shared pool can still use HugePages  double-memory access due to mapping

57 DirectIO Direct IO (O_DIRECT) – bypasses file system cache and access the files directly DB activity does not pollute OS cache DB activity does not compete with PGA/PLSQL memory

58 PageTables Memory for per-process page tables  B-Tree like structure – this number shows leaf blocks space  Memory to manage memory  One entry of ~4 bytes per process, per used 4kb of memory  In Oracle’s case, assuming an SGA of 2gb 524’288 pages * 8 bytes = 4 Mb per process 1000 sessions = 4 Gb of memory, to manage 2gb of SGA

59 Case studies PageTables using a lot of ram

60 PageTables – bad example Config:  1.7 Gb sga (max on 32 bit without VLM)  1400 Mb in db_cache_size  table sized to fit exactly in cache Start 100 sessions, that full scan the table (cached) in order to touch the memory and allocate the PTEs  Sessions will wait via dbms_lock.allocate to be released Show before and after PageTables usage

61 PageTables – bad example cont. Before starting the sessions (db is UP) cat /proc/meminfo … MemFree: kB … Committed_AS: kB PageTables: 4932 kB … After sessions have finished touching the memory cat /proc/meminfo … MemFree: kB … Committed_AS: kB PageTables: kB

62 HugePages & Oracle Locks SGA in memory  no part of the SGA will ever be swapped out, or even considered for swapping Reduces the number of PTE entries  Assuming 2 Gb SGA 1’000 PTEs * 8 bytes = 8 Kb per process  1000 sessions = 4 Mb of memory, a 512 fold reduction

63 HugePages & Performance Releases more memory for PGA or more db_cache Guarantees that SGA will always be in memory Improves TLB hit ratio  TLB is a CPU level cache of virtual to physical memory mappings, improving performance 8% improvement in a memory only TPC test  not including the fact there is more memory available

64 HugePages & 100 sessions The test from a few slides before Before starting 100 sessions (db up) cat /proc/meminfo … Committed_AS: kB PageTables: 3056 kB … After … Committed_AS: kB PageTables: kB …

65 No hugepages Cached SLAB Pagetables SystemUser Buffers Mapped IPC Memory (SGA) Anonymous (PGA,PLSQL arrays)

66 With HugePages Cached SLAB Pagetables SystemUser Buffers Mapped IPC Memory (SGA) Hugepages Anonymous (PGA,PLSQL arrays)

67 HugePages – on Red Hat what you need to setup RH4  /proc/sys/vm/nr_hugepages  /proc/sys/vm/hugetlb_shm_group  /etc/security/limits.conf

68 Case studies Where is my free memory going?

69 Freshly booted vmstat 2 procs memory swap io system cpu---- r b swpd free buff cache si so bi bo in cs us sy id wa Reading ~1.2 Gb of data, free memory drops twice as much  file system cache  oracle SGA been touched

70 Freshly booted – hugepages vmstat 2 procs memory swap io system cpu---- r b swpd free buff cache si so bi bo in cs us sy id wa … Reading 1.2 Gb of data, free memory drops with same amount  file system cache consuming memory

71 Freshly booted – with directIO vmstat 2 procs memory swap io system cpu---- r b swpd free buff cache si so bi bo in cs us sy id wa … Gb of data – 1.2 Gb drop in free memory NO CHANGE

72 DIRECT_IO Bugs bug  filesystemio_options=directio is ignored on linux  fixed in Note:  bug introduced - O_DIRECT flag was not passed to the open() system call  fixed in Basically you need

73 Shared memory monitoring How to see shared memory?  ipcs – shows the “IPC” shared memory If you kill Oracle without freeing up shared memory  ipcrm – to remove

74 ipcs Shared Memory Segments key shmid owner perms bytes nattch status 0x oracle x oracle x7157be oracle Semaphore Arrays key semid owner perms nsems 0xb1adfd8c oracle Message Queues key msqid owner perms used-bytes messages

75 To remove orphan segments Identified via “sysresv”  or number attached from ipcs  or pmap of an oracle pid Use ipcrm to remove

76 The End Thank you, Questions? Visit my blog at Christo Kutrovsky The Pythian Group 2007 April


Download ppt "The Answer to Free Memory, Swap, Oracle and everything A presentation about using memory where it’s needed most Christo Kutrovsky The Pythian Group 2007."

Similar presentations


Ads by Google