Presentation is loading. Please wait.

Presentation is loading. Please wait.

Linux I/O Tuning Anshul Gupta. 2 - Performance Gathering Tools -I/O Elevators -File System -Swap and Caches Agenda.

Similar presentations


Presentation on theme: "Linux I/O Tuning Anshul Gupta. 2 - Performance Gathering Tools -I/O Elevators -File System -Swap and Caches Agenda."— Presentation transcript:

1 Linux I/O Tuning Anshul Gupta

2 2 - Performance Gathering Tools -I/O Elevators -File System -Swap and Caches Agenda

3 I/O Performance Gathering 3

4 4 Performance Tuning Process The performance tuning process -Gathering -Measuring -Tuning

5 Performance Gathering Gathering is the most important step To gather performance related data various tools are available like – iostat – sar – vmstat – top – strace etc Data can be stored in text, csv or database

6 Tools – I/O Utilization iostat sar -b vmstat top 6

7 iostat avg-cpu: %user %nice %system %iowait %steal %idle 0.73 2.45 2.72 1.02 0.00 93.09 device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn sda 6.15 120.77 80.33 528908 351792 sdb15.30 92.05 4.04 1003495 25490 7

8 sar -b 12:00:01 AM tps rtps wtps bread/s bwrtn/s 12:05:01 AM 0.11 0.00 0.11 0.00 1.55 12:15:01 AM 0.10 0.00 0.10 0.01 1.21 12:25:01 AM 0.14 0.00 0.14 0.00 2.04 12:35:01 AM 0.09 0.00 0.09 0.00 1.09 12:45:01 AM 0.10 0.00 0.10 0.00 1.32 12:55:01 AM 0.10 0.00 0.10 0.00 1.57 01:05:01 AM 0.10 0.00 0.10 0.00 1.21 01:15:01 AM 0.10 0.00 0.10 0.00 1.36 01:25:01 AM 0.13 0.00 0.13 0.00 1.60 01:35:01 AM 0.10 0.00 0.10 0.00 1.28 8

9 vmstat procs ----------memory----------------------swap-----io---------system-------cpu------------- r b swpd free buff cache si so bi bo in cs us sy id wa 1 0 24876 6572 54732 242940 1 16 1046 1632 300 1199 6 17 21 57 0 2 25496 6344 54944 242820 128 0 2036 132 326 1641 6 19 0 75 0 2 25496 5924 55316 242932 0 0 1312 0 334 1153 6 19 0 75 1 1 25456 6240 55148 242608 192 0 1282 0 330 1113 6 17 0 77 1 1 25456 6016 55308 242584 0 0 1089 0 341 1051 4 19 0 77 1 1 25456 5864 55596 242432 0 0 1118 0 333 1391 4 16 0 80 0 2 25456 6040 55880 241808 0 0 1330 4 311 1110 4 20 0 76 0 2 25456 6312 56636 240916 0 0 1744 0 315 1284 4 16 0 80 0 2 25456 6056 57116 240776 0 0 1129 0 313 1062 4 12 0 84 0 2 25456 6452 56952 240464 0 0 1229 0 342 1075 6 10 0 84 9

10 top top - 09:53:34 up 1:19, 3 users, load average: 0.00, 0.02, 0.00 tasks: 84 total, 1 running, 83 sleeping, 0 stopped, 0 zombie cpu(s): 0.0%us, 0.3%sy, 0.0%ni, 99.3%id, 0.0%wa, 0.3%hi, 0.0%si, 0.0%st mem: 515880k total, 466100k used, 49780k free, 60548k buffers swap: 208804k total, 0k used, 208804k free, 296452k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 1 root 16 0 1628 540 448 S 0.0 0.1 0:00.82 init 2 root RT 0 0 0 0 S 0.0 0.0 0:00.00 migration/0 3 root 34 19 0 0 0 S 0.0 0.0 0:00.00 ksoftirqd/0 4 root RT 0 0 0 0 S 0.0 0.0 0:00.00 watchdog/0 5 root 10 -5 0 0 0 S 0.0 0.0 0:00.01 events/0 6 root 10 -5 0 0 0 S 0.0 0.0 0:00.02 khelper 7 root 10 -5 0 0 0 S 0.0 0.0 0:00.00 kthread 9 root 10 -5 0 0 0 S 0.0 0.0 0:00.05 kblockd/0 10

11 I/O Elevators 11

12 12 I/O Subsystem Slowest subsystem of all Most tunable Workload determination is very important Elevators – I/O Schedulers

13 13 Elevators Schedule disk I/O requests Read and write patterns are very important Incorrectly configured elevator can do more harm then benefit

14 14 2.4 Linux I/O Scheduler Elevator first introduced in Linux kernel 2.4 elevator_linus Each request has a field elevator_sequence Kernel decrement elevator_sequence when new I/O request is scheduled Service I/O request with elevator_sequence field = 0 Tune using elvtune

15 15 2.6 Linux I/O Schedulers Concept of selectable I/O elevators – Completely Fair Queuing (cfq)‏ – Deadline (deadline)‏ – Anticipatory (as)‏ – No-op (noop)‏ Pass “elevator=deadline|as|cfq|noop” Tunable in “/sys/block/ /queue/iosched”

16 16 CFQ I/O Scheduler Completely Fair Queuing – Provides good performance across multiple workloads Algorithm – Assign incoming requests to specific queues based on process – Within each queue requests are sorted sector-wise – Services the queues round robin (4 request/queue)‏ Provides fairness at a per process level

17 CFQ Tunable quantum – total number of requests placed on the dispatch queue per cycle queued – maximum number of requests allowed per request queue 17

18 18 Deadline I/O Scheduler Written to prevent starvation caused by elevator_linus Each request has an expiry time (ms)‏ Algorithm – Maintains request queue sorted by physical location on disk – Also inserts the requests into the second FIFO queue based on type (read/write)‏ – Pulls requests from first queue until the top request in the read and write FIFO queue expires

19 Deadline Tunable read_expire – milliseconds before request expire write_expire - milliseconds before request expire fifo_batch – number of request to batch process writes_starved – preference of reads over write 19

20 20 Anticipatory I/O Scheduler Anticipatory minimizes read latency at the expense of global throughput Anticipatory handles read requests within the normal expiry and then waits for some time for another read For workloads with many dependent reads

21 Anticipatory Tunable read_expire – milliseconds before reads expire read_batch_expire write_expire - milliseconds before writes expire write_batch_expire antic_expire – wait time for I/O 21

22 22 NOOP I/O Scheduler I/O requests are directly placed in the queue Scheduling is left for the hardware No tunable

23 Filesystem 23

24 24 Filesystem VFS Subsystem Layer Filesystem tunable Tuning for RAID Fragmentation Journaling

25 Virtual File System (VFS)‏ 25

26 File System Tunable At filesystem creation time – Block size – Number of I-nodes At filesystem creation time or later – Reserved space – Journaling options 26

27 RAID Tuning Chunk size – Based on workload, Number of disks & RAID level – Multiple of 4 KB Stride size – 1 Kib blocks per stripe 27

28 Fragmentation Increases average time for sequential reads Larger file system block size – Less chance of fragmentation – wasted space Smaller file system block size – Conserve space – Fragmentation can still happen, for eg: if file is deleted Block pre-allocation 28

29 Journaling Operation 29

30 Journaling Modes 3 mount time options – ‘journal’ both filesystem data and metadata are journaled. Slowest. – ‘ordered’ filesystem data is written to disk before metadata is committed to journal – ‘writeback’ filesystem data may not be written to disk before metadata is committed to journal. Fastest. mount data= /filesystem 30

31 Journaling Journal Placement – Internal - located on the same partition as filesystem – External Located on separate partition Faster mkfs.ext3 -O journal_dev Block size should match 31

32 Filesystem Mount Options Disabling access time updates – mount –o noatime – mount -o relatime (Relative Access Time) Longer period between journal commits – mount –o commit= # default = 5 sec 32

33 Swap and Caches 33

34 Swap Areas Expands dynamic memory Can be disk or file Up to 32 areas Maximum size of single swap area is 64 GB 34

35 Swap performance considerations Must be initialized with mkswap Use disk swap instead of file based Use lower numbered partition Distributed swap better than one large area Assign higher priority to faster disks Same priority swap areas are round robin'ed 35

36 Swap Tunable vm.kswapd (2.4) – tries_base – tries_min – swap_cluster vm.swappines (2.6) – 0-100 36

37 Page and Buffer Cache Write frequency handled by page and buffer flushing threads – bdflush, kupdated (2.4) – pdflush (2.6) Read-ahead window sizes – vm.min-readahead, vm.max-readahead (2.4) – /sys/block/ /queue/read_ahead_kb (2.6) Direct I/O – Faster for self caching applications

38 38 Q & A


Download ppt "Linux I/O Tuning Anshul Gupta. 2 - Performance Gathering Tools -I/O Elevators -File System -Swap and Caches Agenda."

Similar presentations


Ads by Google