Presentation is loading. Please wait.

Presentation is loading. Please wait.

The Answer to Free Memory, Swap, Oracle and everything A presentation about using memory where it’s needed most Christo Kutrovsky The Pythian Group 2007.

Similar presentations


Presentation on theme: "The Answer to Free Memory, Swap, Oracle and everything A presentation about using memory where it’s needed most Christo Kutrovsky The Pythian Group 2007."— Presentation transcript:

1 The Answer to Free Memory, Swap, Oracle and everything A presentation about using memory where it’s needed most Christo Kutrovsky The Pythian Group 2007 April

2 The Answer to Free Memory, Swap, Oracle and everything A presentation about using memory where it’s needed most Christo Kutrovsky The Pythian Group 2007 April The 45 minutes version

3 Who Am I? Joined Pythian in 2003 Became team lead for one of Pythian's service delivery teams in 2006 Notable clients: Palm Coast Data, Freshdirect.com Presented at Collaborate '06, '07, RMOUG Special interest in 11g, RAC, Disk IO performance, and memory Pythian's delegate to the 11g beta, participated at the camp level (two visits)

4 Who is Pythian? Provides turnkey global data architecture and operations teams on a linear-cost-to-effort basis Founded in 1997, headquartered in Ottawa, Canada, with offices in India and Australia Supporting almost 100 clients worldwide and more than 600 production databases Almost 50 production engineers engaged in client service delivery Broad data infrastructure expertise primarily focused on Oracle, Microsoft SQL Server, and MySQL on enterprise hardware

5 Agenda Types of memory Virtual Memory areas How do we monitor memory usage  And make sense of it Oracle examples Case studies

6 Questions How many developers How many managing linux How many managing unix (AIX, solaris) How many have root access How many have control of database memory consumption

7 Terminology What is memory  The ability of a computer system to store data

8 Types of Memory Short term  RAM (memory) Long term (“permament”)  Disk, tape (storage)

9 Types of Memory - physical CPU Registers  fastest, very limited CPU Cache (L1/L2/L3)  some latency, LRU maintained RAM  major latency (relatively), partially LRU Disk  do something else while you wait

10 What is RAM Faster, temporary storage A work area A place where you put your data while you process it

11 The Many caches CPU CPU Registers 2 ns CPU Cache 8 ns 1:4 Main Memory (RAM) 100ns 1:12 Disk – Long term memory 3’000’000 ns 1:30’000 TAPE – even longer

12 CPU Cache & CPU Registers CPU Registers – your two hands (or more)  You use them to hold the items while you work on them CPU Cache – your desk  You use it as a quickly accessible location to store your most used items  Represents your current tasks

13 Main Memory - RAM RAM – Random Access Memory It’s like your office  Need to get up from your desk to grab items to work on  You usually grab multiple at a time to save roundtrips

14 Our office CPU Your hands 2 seconds CPU Cache “Desk” 4 sec. Main Memory (RAM) “Your office” 12 seconds Disk “Flying to Australia” 8 hours TAPE – use a cargo ship to go

15 Growing your office You always need more Your “office” needs to handle all your active clients, or they will be unhappy  Running out of space in your office is not acceptable

16 The Disk – extending the memory The Solution? Ship some of your least needed binders to Australia Relatively complex process  need to find the least needed binders  need to know how to return them, when they are needed

17 Introduction to virtual memory Processes “see” memory independently, as if it was alone on the system Each process has freedom to use addresses in the whole “user address space” Typically – 3 Gb user space, 1 Gb system space (on 32 bit)

18 Virtual memory mapping P1 P2 32 bit addressing space 0 gb1 gb2 gb3 gb4 gb RAM RAM split into 4 kb chunks Reserved virtual region for the system (kernel)

19 VM Management Implemented via per process page table Indicates:  page location (disk/memory)  page permissions (read/write/execute)  page attributes (ex. copy on write)

20 Virtual memory PTE table P1 PTE Table for P1 rw – in RAM – 0xFFA rw – in RAM – 0xFFB in RAM – 0xFFC – copy on write w – unallocated rw – on disk - SWAP rx – on disk - FILE unallocated RAM FILESWAP

21 Additional benefits from VM Protection Features  memory mapped files  in memory file system  shared memory  shared memory – copy on write Use more then what you have

22 Concept types of memory Shared  initially exists on disk file cache(linux), buffers, system cache  initially does not exist on disk anonymous(linux), computed(aix) Private  does not exist on disk  special case copy on write

23 Linux VM Components direct “user” dependant types of memory  Buffers (shared)  Cached (shared)  Anonymous (private or shared)  Hugepages indirect (system) managed areas  Slab – kernel structures  PageTables

24 VM areas with Oracle Cached SLAB Pagetables SystemUser Buffers Mapped IPC Memory (SGA) Anonymous (PGA,PLSQL arrays)

25 Monitoring Monitoring Memory with Oracle in mind

26 TOP top  most commonly used tool  most confused interpretation

27 top – sample output top - 22:03:11 up 3:19, 2 users, load average: 2.98, 1.22, 0.52 Tasks: 89 total, 1 running, 88 sleeping, 0 stopped, 0 zombie Cpu0 : 0.7% us, 0.8% sy, 0.0% ni, 0.3% id, 98.0% wa, 0.2% hi, 0.0% si Cpu1 : 0.0% us, 0.8% sy, 0.0% ni, 97.6% id, 1.4% wa, 0.2% hi, 0.0% si Cpu2 : 0.0% us, 0.2% sy, 0.0% ni, 99.7% id, 0.2% wa, 0.0% hi, 0.0% si Cpu3 : 0.2% us, 0.2% sy, 0.0% ni, 33.6% id, 66.1% wa, 0.0% hi, 0.0% si Mem: 8310308k total, 8049068k used, 261240k free, 36620k buffers Swap: 7823644k total, 572k used, 7823072k free, 3395900k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 8494 oracle 16 0 1662m 1.6g 1.5g D 2.0 19.8 0:03.15 oracletest (LOCAL=YES) 4796 oracle 16 0 1626m 1.5g 1.5g S 1.0 19.5 0:03.91 ora_dbw1_test 4794 oracle 15 0 1626m 1.5g 1.5g S 0.7 19.5 0:12.23 ora_dbw0_test 4798 oracle 16 0 1626m 1.5g 1.5g S 0.7 19.5 0:03.97 ora_dbw2_test 4800 oracle 16 0 1626m 1.5g 1.5g S 0.7 19.5 0:04.09 ora_dbw3_test 1 root 16 0 2384 600 512 S 0.0 0.0 0:00.86 init [3] 2 root RT 0 0 0 0 S 0.0 0.0 0:00.00 [migration/0] 3 root 34 19 0 0 0 S 0.0 0.0 0:00.00 [ksoftirqd/0]

28 Top – data comes from /proc/ /status cat /proc/10450/status Name: oracle State: S (sleeping) SleepAVG: 98% Tgid: 10450 Pid: 10450 PPid: 1 TracerPid: 0 Uid: 503 503 503 503 Gid: 503 503 503 503 FDSize: 256 Groups: 503 603 VmSize: 83424 kB VmLck: 0 kB VmRSS: 1484204 kB VmData: 1612 kB VmStk: 124 kB VmExe: 52720 kB VmLib: 8420 kB …

29 top – additional columns top can have additional columns  swap file usage computed  code  data THEY ARE ALL WRONG

30 vmstat vmstat 2 procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu---- r b swpd free buff cache si so bi bo in cs us sy id wa 0 0 0 3631424 11096 120204 0 0 35 31 255 20 0 0 99 0 0 0 0 3631488 11096 120204 0 0 0 0 1014 18 0 0 100 0 0 0 0 3631488 11096 120204 0 0 0 0 1012 16 0 0 100 0 r – run queue – how many processes currently waiting for or running on the CPU b – how many processes waiting, usually waiting on IO swpd – swap memory usage free – free memory cache – file system cache

31 vmstat cont vmstat 2 procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu---- r b swpd free buff cache si so bi bo in cs us sy id wa 0 0 0 3631424 11096 120204 0 0 35 31 255 20 0 0 99 0 0 0 0 3631488 11096 120204 0 0 0 0 1014 18 0 0 100 0 0 0 0 3631488 11096 120204 0 0 0 0 1012 16 0 0 100 0 si/so – swap in / out – in Kb/sec bi/bo – bytes in / out – in Kb/sec cs – context switches us/sy/id/wa – user/system/idle/wait time for CPUs

32 /proc/meminfo cat /proc/meminfo MemTotal: 8310308 kB MemFree: 93448 kB Buffers: 132036 kB Cached: 3413324 kB SwapCached: 0 kB Active: 1658252 kB Inactive: 1942032 kB HighTotal: 7470528 kB HighFree: 8768 kB LowTotal: 839780 kB LowFree: 84680 kB SwapTotal: 7823644 kB SwapFree: 7823072 kB Dirty: 100 kB Writeback: 0 kB Mapped: 82500 kB Slab: 92028 kB Committed_AS: 490700 kB PageTables: 3952 kB VmallocTotal: 106488 kB VmallocUsed: 5964 kB VmallocChunk: 99900 kB HugePages_Total: 2200 HugePages_Free: 1088 Hugepagesize: 2048 kB

33 /proc/meminfo – 64 bit cat /proc/meminfo MemTotal: 8165032 kB MemFree: 106428 kB Buffers: 219484 kB Cached: 2864760 kB SwapCached: 69256 kB Active: 1508428 kB Inactive: 1915392 kB HighTotal: 0 kB HighFree: 0 kB LowTotal: 8165032 kB LowFree: 106428 kB SwapTotal: 4816888 kB SwapFree: 4192148 kB Dirty: 252 kB Writeback: 0 kB Mapped: 1350480 kB Slab: 461584 kB CommitLimit: 6851404 kB Committed_AS: 4959776 kB PageTables: 46668 kB VmallocTotal: 536870911 kB VmallocUsed: 2992 kB VmallocChunk: 536867847 kB HugePages_Total: 2000 HugePages_Free: 128 Hugepagesize: 2048 kB

34 MemTotal Total memory visible by the OS If it’s not what you’ve put in the machine, probably you have a bad SIM/DIMM

35 MemFree Memory that is currently un-occupied and available to use immediately Not the maximum amount of memory available at the moment Controlled by (Linux RH4) /proc/sys/vm/min_free_kbytes

36 MemFree – example grep MemFree /proc/meminfo MemFree: 26568 kB echo 900000 > /proc/sys/vm/min_free_kbytes grep MemFree /proc/meminfo MemFree: 210056 kB

37 Buffers Cache of raw disk blocks Usually occupied with ext3 metadata  Mostly ext3 pointers (extent management)  Not the cache of actual user data In older kernels, was controllable

38 Cached File system cache  If direct IO is not used for datafiles – will have your datafiles cached Binary (for execution) memory  includes the “oracle” binary caching  all the libraries caching Does not mean “occupied” – usually can be released immediately The Oracle SGA – when not using hugepages

39 Cached – example part 1 [root@ ~]# cat /proc/meminfo … MemFree: 8232512 kB Buffers: 9328 kB Cached: 28372 kB … du -smc indx01_* 1714 indx01_01.dbf 1761 indx01_02.dbf 1722 indx01_03.dbf 5197 total … cat indx01_* > /dev/null

40 Cached – example part 2 [root@ ~]# vmstat 2 procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu---- r b swpd free buff cache si so bi bo in cs us sy id wa 0 0 0 8093888 10808 163392 0 0 0 0 1012 17 0 0 100 0 0 0 0 8093952 10808 163392 0 0 0 0 1012 16 0 0 100 0 0 1 0 7956736 10948 300272 0 0 68602 0 1567 1126 0 2 76 22 0 1 0 7808576 11092 448068 0 0 73992 80 1623 1210 0 2 75 23 … 0 1 0 2847616 16104 5397616 0 0 65792 0 1542 1076 0 2 75 23 0 0 0 2766272 16180 5479180 0 0 40698 0 1341 675 0 1 85 14 0 0 0 2766208 16192 5479168 0 0 0 114 1033 22 0 0 100 0 cat /proc/meminfo … MemFree: 2766464 kB Buffers: 16192 kB Cached: 5479168 kB …

41 Cached – example #2 part 1 cat indx01_* >newfile vmstat 2 procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu---- r b swpd free buff cache si so bi bo in cs us sy id wa 0 0 0 2765312 17044 5479356 0 0 0 0 1012 17 0 0 100 0 0 3 0 2405376 17428 5833612 0 0 16 36866 1324 144 1 18 76 6 0 2 0 2143616 17688 6091532 0 0 4 111748 2000 213 0 16 50 34 … 0 1 0 16832 6784 8198556 0 0 8556 26684 1942 1267 0 2 74 24 1 1 0 16832 6856 8198744 0 0 12518 20720 2130 1767 0 3 74 23 … cat /proc/meminfo … MemFree: 16768 kB Buffers: 2192 kB Cached: 8196908 kB … Dirty: 277468 kB Writeback: 0 kB …

42 Cached – example #2 part 2 cat /proc/meminfo … MemFree: 20672 kB Buffers: 3300 kB Cached: 8191900 kB … Dirty: 0 kB Writeback: 0 kB … rm newfile procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu---- r b swpd free buff cache si so bi bo in cs us sy id wa 0 0 0 23296 3380 8189480 0 0 0 28 1015 18 0 0 100 0 0 1 0 3257472 3948 4996372 0 0 284 0 1084 160 0 14 78 8 0 1 0 3255552 5828 4996572 0 0 940 0 1247 485 0 1 75 24 0 1 0 3253696 7616 4996344 0 0 884 96 1237 470 0 2 75 23 0 0 0 3253440 7988 4996492 0 0 186 0 1061 112 0 0 95 4 0 0 0 3253440 7988 4996492 0 0 0 0 1012 14 0 0 100 0

43 Swap SwapTotal SwapFree SwapCached  written to swap, but still in memory  applies only to anonymous memory  OS will anticipate memory needs, and pre-swap inactive data, but keep it in memory Actual swapping (memory that will need to be read from disk) = SwapTotal - SwapFree - SwapCached

44 Active/Inactive Active – recently used memory  Includes all types of memory (cached, buffers, anonymous)  OS will try to keep it in RAM Inactive – memory that will be first reused  “free” memory Can be used to gauge the “working set”

45 High/Low Total/Free 32 bit limitations, no high memory on 64 bit Some kernel structures cannot be allocated in “high memory” Used to be a problem in older kernels, newer kernels protect low memory

46 Dirty & Writeback Dirty – cache/buffers memory that requires to be written to disk  thresholds can be adjusted Writeback – memory actively been written to disk  Can reach high values with async writes with large queue

47 Committed_AS & Mapped Committed_AS  Total memory requested on the system  Not used, just requested  If every process in the system is to touch and use the memory it has requested, this is how much would be used Mapped  memory used for in-memory mapped files  all anonymous memory  includes committed & touched memory

48 Committed_AS - example cat grab.c main() {void *p; p=malloc(1073741824); sleep(60);} cat /proc/meminfo... MemFree: 3230592 kB... Committed_AS: 49972 kB./grab cat /proc/meminfo... MemFree: 3230464 kB... Committed_AS: 1098808 kB

49 Slab Slab – “in-kernel data structures cache”  similar to Oracle’s “shared_pool”  designed to prevent memory fragmentation  detailed monitoring: /proc/slabinfo slabtop Basically “system space”

50 slabtop – ordered by cache size Active / Total Objects (% used) : 88874 / 139343 (63.8%) Active / Total Slabs (% used) : 5839 / 5846 (99.9%) Active / Total Caches (% used) : 90 / 132 (68.2%) Active / Total Size (% used) : 17286.03K / 23311.27K (74.2%) Minimum / Average / Maximum Object : 0.01K / 0.17K / 128.00K OBJS ACTIVE USE OBJ SIZE SLABS OBJ/SLAB CACHE SIZE NAME 32382 24900 76% 0.27K 2313 14 9252K radix_tree_node 56925 40013 70% 0.05K 759 75 3036K buffer_head 364 363 99% 4.00K 364 1 1456K size-4096 2485 2471 99% 0.54K 355 7 1420K ext3_inode_cache 2376 413 17% 0.50K 297 8 1188K size-512 256 256 100% 3.00K 128 2 1024K biovec-(256) 4576 4481 97% 0.15K 176 26 704K dentry_cache 10248 4548 44% 0.06K 168 61 672K size-64 4340 1215 27% 0.12K 140 31 560K size-128 1980 316 15% 0.25K 132 15 528K size-256 …

51 HugePages 2Mb pages organized in a separate memory pool locked in memory available only to shared memory requests pre-allocated via kernel parameter

52 Shared memory mapping P1 P2 32 bit addressing space 0 gb1 gb2 gb3 gb4 gb RAM Reserved virtual region for the system (kernel)

53 Shared memory mapping (huge) P1 P2 32 bit addressing space 0 gb1 gb2 gb3 gb4 gb RAM HugePages Pre-Allocated Memory pool Locked in RAM

54 VLM – 32 bit workarround 32 bit adress space is 4 Gb 32 bit systems with PAE (Intel)  up to 64 Gb of ram Memory filesystem  opens a file in /dev/shm for buffer cache  shared pool still in Shared Memory Beware of small Oracle block size

55 VLM – using 3gb+ on 32 bit P1 P2 32 bit addressing space 0 gb1 gb2 gb3 gb4 gb RAM /dev/shm/ora_ ramfs mmap region

56 USE_INDIRECT_BUFFERS RedHat/SUSE  shmfs – needs size  tmpfs – does not need size  ramfs – does not need size + Locked none can use HugePages  shared pool can still use HugePages  double-memory access due to mapping

57 DirectIO Direct IO (O_DIRECT) – bypasses file system cache and access the files directly DB activity does not pollute OS cache DB activity does not compete with PGA/PLSQL memory

58 PageTables Memory for per-process page tables  B-Tree like structure – this number shows leaf blocks space  Memory to manage memory  One entry of ~4 bytes per process, per used 4kb of memory  In Oracle’s case, assuming an SGA of 2gb 524’288 pages * 8 bytes = 4 Mb per process 1000 sessions = 4 Gb of memory, to manage 2gb of SGA

59 Case studies PageTables using a lot of ram

60 PageTables – bad example Config:  1.7 Gb sga (max on 32 bit without VLM)  1400 Mb in db_cache_size  table sized to fit exactly in cache Start 100 sessions, that full scan the table (cached) in order to touch the memory and allocate the PTEs  Sessions will wait via dbms_lock.allocate to be released Show before and after PageTables usage

61 PageTables – bad example cont. Before starting the sessions (db is UP) cat /proc/meminfo … MemFree: 1070472 kB … Committed_AS: 1881544 kB PageTables: 4932 kB … After sessions have finished touching the memory cat /proc/meminfo … MemFree: 473496 kB … Committed_AS: 4708552 kB PageTables: 295068 kB

62 HugePages & Oracle Locks SGA in memory  no part of the SGA will ever be swapped out, or even considered for swapping Reduces the number of PTE entries  Assuming 2 Gb SGA 1’000 PTEs * 8 bytes = 8 Kb per process  1000 sessions = 4 Mb of memory, a 512 fold reduction

63 HugePages & Performance Releases more memory for PGA or more db_cache Guarantees that SGA will always be in memory Improves TLB hit ratio  TLB is a CPU level cache of virtual to physical memory mappings, improving performance 8% improvement in a memory only TPC test  not including the fact there is more memory available

64 HugePages & 100 sessions The test from a few slides before Before starting 100 sessions (db up) cat /proc/meminfo … Committed_AS: 332264 kB PageTables: 3056 kB … After … Committed_AS: 3124640 kB PageTables: 23100 kB …

65 No hugepages Cached SLAB Pagetables SystemUser Buffers Mapped IPC Memory (SGA) Anonymous (PGA,PLSQL arrays)

66 With HugePages Cached SLAB Pagetables SystemUser Buffers Mapped IPC Memory (SGA) Hugepages Anonymous (PGA,PLSQL arrays)

67 HugePages – on Red Hat what you need to setup RH4  /proc/sys/vm/nr_hugepages  /proc/sys/vm/hugetlb_shm_group  /etc/security/limits.conf

68 Case studies Where is my free memory going?

69 Freshly booted vmstat 2 procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu---- r b swpd free buff cache si so bi bo in cs us sy id wa 0 0 0 7725800 13036 497864 0 0 0 16 1013 25 0 0 100 0 0 1 0 7663272 13144 556516 0 0 14806 70 1164 393 2 2 80 16 0 1 0 7513128 13252 706168 0 0 37494 0 1318 631 2 3 75 20 0 1 0 7310824 13408 908032 0 0 50502 64 1429 862 3 4 75 18... 0 1 0 5503208 14724 2709556 0 0 59144 16 1493 987 3 4 75 18 1 0 0 5263080 14856 2948884 0 0 59838 128 1518 995 3 5 75 18 0 0 0 5111272 14944 3106096 0 0 39344 6 1332 663 2 5 82 11 0 0 0 5111272 14944 3106096 0 0 0 16 1013 26 0 0 100 0 0 0 0 5111144 14960 3106080 0 0 0 30 1016 36 0 0 100 0 Reading ~1.2 Gb of data, free memory drops twice as much  file system cache  oracle SGA been touched

70 Freshly booted – hugepages vmstat 2 procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu---- r b swpd free buff cache si so bi bo in cs us sy id wa 0 0 0 3436280 14236 286324 0 0 0 56 1027 54 0 0 100 0 0 1 0 3395192 14336 323924 0 0 18902 24 1193 447 2 1 81 16 0 1 0 3303928 14480 415040 0 0 45572 48 1387 775 3 1 75 21 … 0 1 0 2566776 15560 1150020 0 0 49228 6 1416 828 3 1 75 20 0 1 0 2452152 15720 1264260 0 0 57230 18 1492 977 3 1 75 20 Reading 1.2 Gb of data, free memory drops with same amount  file system cache consuming memory

71 Freshly booted – with directIO vmstat 2 procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu---- r b swpd free buff cache si so bi bo in cs us sy id wa 0 0 0 3436280 14236 286324 0 0 0 56 1027 54 0 0 100 0 0 1 0 3395192 14336 323924 0 0 18902 24 1193 447 2 1 81 16 0 1 0 3303928 14480 415040 0 0 45572 48 1387 775 3 1 75 21 … 0 1 0 2566776 15560 1150020 0 0 49228 6 1416 828 3 1 75 20 0 1 0 2452152 15720 1264260 0 0 57230 18 1492 977 3 1 75 20 1.2 Gb of data – 1.2 Gb drop in free memory NO CHANGE

72 DIRECT_IO Bugs bug 3186847  filesystemio_options=directio is ignored on linux  fixed in 9.2.0.6 Note: 297521.1  bug 2448994 introduced - O_DIRECT flag was not passed to the open() system call  fixed in 9.2.0.7 Basically you need 9.2.0.7

73 Shared memory monitoring How to see shared memory?  ipcs – shows the “IPC” shared memory If you kill Oracle without freeing up shared memory  ipcrm – to remove

74 ipcs ------ Shared Memory Segments -------- key shmid owner perms bytes nattch status 0x00000000 4915200 oracle 600 2097152 14 0x00000000 4947969 oracle 600 1342177280 14 0x7157be04 4980738 oracle 600 278921216 14 ------ Semaphore Arrays -------- key semid owner perms nsems 0xb1adfd8c 622592 oracle 640 354 ------ Message Queues -------- key msqid owner perms used-bytes messages

75 To remove orphan segments Identified via “sysresv”  or number attached from ipcs  or pmap of an oracle pid Use ipcrm to remove

76 The End Thank you, Questions? kutrovsky@pythian.com Visit my blog at http://www.pythian.com/blogs/kutrovsky/ Christo Kutrovsky The Pythian Group 2007 April http://www.pythian.com/


Download ppt "The Answer to Free Memory, Swap, Oracle and everything A presentation about using memory where it’s needed most Christo Kutrovsky The Pythian Group 2007."

Similar presentations


Ads by Google