Presentation is loading. Please wait.

Presentation is loading. Please wait.

Ftrace Tutorial Steven Rostedt

Similar presentations


Presentation on theme: "Ftrace Tutorial Steven Rostedt"— Presentation transcript:

1 Ftrace Tutorial Steven Rostedt (rostedt@goodmis.org)srostedt@redhat.com

2 Introduction ● Kernel internal tracer ● Derived from -rt patch Latency Tracer ● Plugin tracers – ftrace : function tracer – irqsoff : interrupt disabled latency – wakeup : latency of highest priority task to wake up – sched_switch: task context switches – (more) ● Ring buffer ● Saved traces (snap shots) – used to save maximum latency traces

3 The Debug File System ● /sys/kernel/debug ● I prefer: – mkdir /debug – mount -t debugfs nodev /debug ● /etc/fstab – debugfs /sys/kernel/debug debugfs defaults 0 0 – debugfs /debug debugfs defaults 0 0

4 /debug/tracing ● available_tracers ● current_tracer ● tracing_enabled ● trace ● latency_trace ● trace_pipe ● iter_ctrl ● tracing_max_latency ● tracing_cpumask ● trace_entries

5 Selecting a tracer # cat /debug/tracing/available_tracers wakeup preemptirqsoff preemptoff irqsoff ftrace sysprof sched_switch none # echo wakeup > /debug/tracing/current_tracer # cat /debug/tracing/current_tracer wakeup

6 The “none” tracer ● No tracer selected ● “none” is special – it is not a tracer ● echo “none” > /debug/tracing/current_tracer

7 Starting a trace ● do not relay on tracing being enabled ● echo 1 > /debug/tracing/tracing_enabled – note, make sure to have a space between the '1' and the '>'. This has burnt many a kernel programmer. – The “enabled” stays across tracers. ● echo 1 > /debug/tracing/tracing_enabled ● echo ftrace > /debug/tracing/current_tracer ● echo irqsoff > /debug/tracing/current_tracer

8 Stopping a trace ● echo 0 > /debug/tracing/tracing_enabled – do not forget that space! ● Or in a program: int trace_fd; [...] int main(int argc, char *argv[]) { [...] trace_fd = open("/debug/tracing/tracing_enabled", O_WRONLY); [...] if (condition_hit()) { write(trace_fd, "0", 1); } [...] }

9 Reading the Output ● latency_trace ● trace ● trace_pipe

10 Latency Trace Output # tracer: irqsoff # irqsoff latency trace v1.1.5 on 2.6.26-tip -------------------------------------------------------------------- latency: 971 us, #3/3, CPU#1 | (M:preempt VP:0, KP:0, SP:0 HP:0 #P:2) ----------------- | task: swapper-0 (uid:0 nice:20 policy:0 rt_prio:0) ----------------- => started at: acpi_os_acquire_lock => ended at: cpuidle_idle_call # _------=> CPU# # / _-----=> irqs-off # | / _----=> need-resched # || / _---=> hardirq/softirq # ||| / _--=> preempt-depth # |||| / # ||||| delay # cmd pid ||||| time | caller # \ / ||||| \ | / -0 1d..1 1us!: _spin_lock_irqsave (acpi_os_acquire_lock) -0 1d..1 971us : acpi_idle_enter_bm (cpuidle_idle_call) -0 1d..2 972us : trace_hardirqs_on (cpuidle_idle_call)

11 Various outputs -0 1d.h2 1335164us : tick_sched_timer (__run_hrtimer) -0 0.Ns2 1386686us+: _spin_lock_irq (run_timer_softirq) -0 1d.H4 1388217us : ktime_get_ts (ktime_get) bash-3498 1.... 1576794us : rw_verify_area (vfs_write) bash-3498 1d..4 120768us+: 0:140:R + 3096:120:S gnome-terminal bash-3498 1d..3 120796us!: 3498:120:S ==> 0:140:R

12 trace output -0 [01] 1977.853298: read_hpet <-getnstimeofday -0 [01] 1977.853300: set_normalized_timespec <-ktime_get_ts -0 [01] 1977.853301: _spin_lock <-hrtimer_interrupt bash-3498 [01] 2057.488856: 0:140:R + 3096:120:S gnome-terminal-3096 [00] 2057.488882: 3096:120:S ==> 0:140:R

13 iter_ctrl ● print-parent ● sym-offset ● sym-addr ● verbose ● raw ● hex ● binary ● block ● stacktrace ● sched-tree

14 Using iter_ctrl -0 [01] 2975.463936: tick_program_event <-tick_broadcast_oneshot_control # echo noprint-parent > /debug/tracing/iter_ctrl -0 [01] 2975.463936: tick_program_event # echo sym-offset > /debug/tracing/iter_ctrl -0 [01] 2975.463936: tick_program_event+0xb/0x6a # echo sym-addr > /debug/tracing/iter_ctrl -0 [01] 2975.463936: tick_program_event+0xb/0x6a

15 The tracers ● sched_switch ● ftrace ● wakeup ● irqsoff ● preemptoff ● preemptirqsoff

16 Available Tracers? # cat /debug/tracing/available_tracers wakeup preemptirqsoff preemptoff irqsoff ftrace sysprof sched_switch none

17 sched_switch ● Traces task wakeups ● Traces task context switches bash-3498 [01] 5459.824565: 0:140:R + 7971:120:R -0 [00] 5459.824836: 0:140:R ==> 7971:120:R bash-3498 [01] 5459.824984: 3498:120:S ==> 0:140:R -0 [01] 5459.825342: 0:140:R ==> 7971:120:R ls-7971 [00] 5459.825380: 7971:120:R + 3: 0:S ls-7971 [00] 5459.825384: 7971:120:R ==> 3: 0:R migration/0-3 [00] 5459.825401: 3: 0:S ==> 0:140:R ls-7971 [01] 5459.825565: 7971:120:R + 598:115:S

18 stacktrace ● iter_ctrl that effects the tracing itself bash-3498 [01] 6216.772637: 0:140:R + 8495:120:R bash-3498 [01] 6216.772639: do_fork <= sys_clone <= ptregscall_common <= <= 0 <= 0 <= 0 <= 0 -0 [00] 6216.773108: 0:140:R ==> 8495:120:R -0 [00] 6216.773109: schedule <= cpu_idle <= rest_init <= <= 0 <= 0 <= 0 <= 0 bash-3498 [01] 6216.773234: 3498:120:S ==> 0:140:R bash-3498 [01] 6216.773235: schedule <= do_wait <= sys_wait4 <= tracesys <= <= 0 <= 0 <= 0 ls-8495 [00] 6216.773719: 8495:120:R + 3: 0:S ls-8495 [00] 6216.773720: wake_up_process <= sched_exec <= do_execve <= sys_execve <= stub_execve <= <= 0 <= 0 ls-8495 [00] 6216.773889: 8495:120:R ==> 3: 0:R

19 ftrace - function tracer ● Traces at every non inline function ● Other functions not traced – annotated with “notrace” – Makefile with “CFLAGS_REMOVE_... = -pg” ● Must have /proc/sys/kernel/ftrace_enabled=1 ● Appears in most other tracers ● Very verbose init-1 [00] 6710.079562: _spin_lock <-tick_sched_timer init-1 [00] 6710.079563: do_timer <-tick_sched_timer bash-3498 [01] 6710.079810: mutex_unlock <-tracing_ctrl_write bash-3498 [01] 6710.079812: dnotify_parent <-vfs_write bash-3498 [01] 6710.079813: _spin_lock <-dnotify_parent bash-3498 [01] 6710.079814: _spin_unlock <-dnotify_parent bash-3498 [01] 6710.079815: inotify_dentry_parent_queue_event <- vfs_write bash-3498 [01] 6710.079815: inotify_inode_queue_event <-vfs_write bash-3498 [01] 6710.079817: syscall_trace_leave <-int_very_careful

20 Latency Tracers ● Stores the last maximum latency trace ● wakeup : scheduling latency of RT tasks ● irqsoff : interrupts off ● preemptoff : preemption off ● preemptirqsoff: interrupts and/or preemption off ● tracing_max_latency

21 wakeup - sched latency ● Only traces RT tasks – use LatencyTop for non-RT tasks ● Records and traces the maximum latency an RT task took from wake up to schedule ● Remember to reset tracing_max_latency # echo wakeup > /debug/tracing/current_tracer # echo 0 > /debug/tracing/tracing_max_latency # echo 1 > /debug/tracing/tracing_enabled # chrt -f 10 usleep 10 # echo 0 > /debug/tracing/tracing_enabled

22 Wakeup without function tracing # tracer: wakeup # wakeup latency trace v1.1.5 on 2.6.26-tip -------------------------------------------------------------------- latency: 9 us, #2/2, CPU#1 | (M:preempt VP:0, KP:0, SP:0 HP:0 #P:2) ----------------- | task: migration/1-7663 (uid:0 nice:-5 policy:1 rt_prio:99) ----------------- # _------=> CPU# # / _-----=> irqs-off # | / _----=> need-resched # || / _---=> hardirq/softirq # ||| / _--=> preempt-depth # |||| / # ||||| delay # cmd pid ||||| time | caller # \ / ||||| \ | / usleep-10237 1d..2 2us+: try_to_wake_up (wake_up_process) usleep-10237 1d..3 9us : schedule (preempt_schedule)

23 With function tracing # tracer: wakeup # wakeup latency trace v1.1.5 on 2.6.26-tip -------------------------------------------------------------------- latency: 19 us, #18/18, CPU#0 | (M:preempt VP:0, KP:0, SP:0 HP:0 #P:2) ----------------- | task: usleep-10133 (uid:0 nice:0 policy:1 rt_prio:10) ----------------- # _------=> CPU# # / _-----=> irqs-off # | / _----=> need-resched # || / _---=> hardirq/softirq # ||| / _--=> preempt-depth # |||| / # ||||| delay # cmd pid ||||| time | caller # \ / ||||| \ | / -0 0d.h4 1us : try_to_wake_up (wake_up_process) -0 0dNh4 2us : _spin_unlock_irqrestore (try_to_wake_up) -0 0dNh3 3us : _spin_lock (__run_hrtimer) -0 0dNh4 4us : _spin_unlock (hrtimer_interrupt) -0 0dNh3 5us : tick_program_event (hrtimer_interrupt) [...] -0 0.N.2 14us : _spin_lock_irqsave (hrtick_set) -0 0dN.3 15us+: _spin_unlock_irqrestore (hrtick_set) -0 0dN.2 16us : _spin_lock (schedule) -0 0d..3 18us : marker_probe_cb (schedule) -0 0d..3 19us : schedule (cpu_idle)

24 irqsoff

25 preemptoff

26 preemptirqsoff

27 trace_entries ● Not enough data recorded ● Too much data recorded ● Run-time configurable ● Must be done with “none” tracer or it will give an -EBUSY ● Number is number of entries, but the buffers are allocate via pages. – If more entries can fit on a page that was allocated to handle requested entries, the remaining page will be filled with entries

28 Dynamic Ftrace (the fun begins!) ● Produces non-measurable overhead ● Requires kernel thread “ftraced” to check for more updates ● Calls kstop_machine to execute text modification – Not safe to modify code text in SMP environment ● /debug/tracing/ftraced_enabled

29 How it works? ● With the gcc profiler switch “-pg” ● Every non-inline function calls “mcount” 00001adb : 1adb: 55 push %ebp 1adc: 89 e5 mov %esp,%ebp 1ade: 57 push %edi 1adf: 56 push %esi 1ae0: 53 push %ebx 1ae1: 83 ec 1c sub $0x1c,%esp 1ae4: e8 fc ff ff ff call 1ae5 1ae5: R_386_PC32 mcount 1ae9: 89 c3 mov %eax,%ebx 1aeb: 89 c7 mov %eax,%edi 1aed: 81 e3 00 00 00 02 and $0x2000000,%ebx 1af3: 89 ce mov %ecx,%esi

30 Non dynamic i368 mcount ENTRY(mcount) cmpl $ftrace_stub, ftrace_trace_function jnz trace.globl ftrace_stub ftrace_stub: ret /* taken from glibc */ trace: pushl %eax pushl %ecx pushl %edx movl 0xc(%esp), %eax movl 0x4(%ebp), %edx subl $MCOUNT_INSN_SIZE, %eax call *ftrace_trace_function popl %edx popl %ecx popl %eax jmp ftrace_stub END(mcount)

31 Dynamic i386 mcount ENTRY(mcount) pushl %eax pushl %ecx pushl %edx movl 0xc(%esp), %eax subl $MCOUNT_INSN_SIZE, %eax.globl mcount_call mcount_call: call ftrace_stub popl %edx popl %ecx popl %eax ret END(mcount)

32 Call ftrace_record_ip ENTRY(mcount) pushl %eax pushl %ecx pushl %edx movl 0xc(%esp), %eax subl $MCOUNT_INSN_SIZE, %eax.globl mcount_call mcount_call: call ftrace_record_ip popl %edx popl %ecx popl %eax ret END(mcount)

33 ftrace_record_ip

34 ftraced

35 nop 00001adb : 1adb: 55 push %ebp 1adc: 89 e5 mov %esp,%ebp 1ade: 57 push %edi 1adf: 56 push %esi 1ae0: 53 push %ebx 1ae1: 83 ec 1c sub $0x1c,%esp 1ae4: 90 8d 74 26 00 nop 1ae9: 89 c3 mov %eax,%ebx 1aeb: 89 c7 mov %eax,%edi 1aed: 81 e3 00 00 00 02 and $0x2000000,%ebx 1af3: 89 ce mov %ecx,%esi

36 Starting of ftrace 00001adb : 1adb: 55 push %ebp 1adc: 89 e5 mov %esp,%ebp 1ade: 57 push %edi 1adf: 56 push %esi 1ae0: 53 push %ebx 1ae1: 83 ec 1c sub $0x1c,%esp 1ae4: e8 fc ff ff ff call 1ae5 1ae5: R_386_PC32 ftrace_caller 1ae9: 89 c3 mov %eax,%ebx 1aeb: 89 c7 mov %eax,%edi 1aed: 81 e3 00 00 00 02 and $0x2000000,%ebx 1af3: 89 ce mov %ecx,%esi

37 ftrace_caller ENTRY(ftrace_caller) pushl %eax pushl %ecx pushl %edx movl 0xc(%esp), %eax movl 0x4(%ebp), %edx subl $MCOUNT_INSN_SIZE, %eax.globl ftrace_call ftrace_call: call ftrace_stub popl %edx popl %ecx popl %eax.globl ftrace_stub ftrace_stub: ret END(ftrace_caller)

38 Registering an ftrace caller ENTRY(ftrace_caller) pushl %eax pushl %ecx pushl %edx movl 0xc(%esp), %eax movl 0x4(%ebp), %edx subl $MCOUNT_INSN_SIZE, %eax.globl ftrace_call ftrace_call: call function_trace_call popl %edx popl %ecx popl %eax.globl ftrace_stub ftrace_stub: ret END(ftrace_caller)

39 Selective function tracer ● tracing is dynamically enabled ● have a list of functions that need to be traced ● Why not filter which functions we trace?

40 Picking what functions to trace ● /debug/tracing/available_filter_functions ● /debug/tracing/set_ftrace_filter ● /debug/tracing/set_ftrace_notrace

41 available_filter_functions $ cat /debug/tracing/available_filter_functions | head filelock_init __rcu_read_lock kmem_cache_create notifier_call_chain down_write __rcu_read_unlock _spin_lock_irq _spin_unlock_irq _spin_lock __kmalloc

42 set_ftrace_filter # echo sys_open > /debug/tracing/set_ftrace_filter # echo ftrace > /debug/tracing/current_tracer # echo 1 > /debug/tracing/tracing_enabled # ls > /dev/null # echo 0 > /debug/tracing/tracing_enabled # cat /debug/tracing/trace # tracer: ftrace # # TASK-PID CPU# TIMESTAMP FUNCTION # | | | | | ls-5652 [00] 2320.450897: sys_open <- tracesys ls-5652 [01] 2320.452238: sys_open <- tracesys ls-5652 [01] 2320.452311: sys_open <- tracesys ls-5652 [01] 2320.452436: sys_open <- tracesys ls-5652 [01] 2320.452548: sys_open <- tracesys ls-5652 [01] 2320.452651: sys_open <- tracesys ls-5652 [01] 2320.452749: sys_open <- tracesys ls-5652 [01] 2320.452854: sys_open <- tracesys ls-5652 [01] 2320.452950: sys_open <- tracesys ls-5652 [01] 2320.453551: sys_open <- tracesys ls-5652 [01] 2320.453767: sys_open <- tracesys ls-5652 [01] 2320.453851: sys_open <- tracesys bash-5495 [00] 2320.453977: sys_open <- tracesys ls-5652 [01] 2320.454030: sys_open <- tracesys

43 set_ftrace_notrace ● Modify like set_ftrace_filter ● Acts like a “notrace” added to the function ● The function will not be traced even if in the set_ftrace_filter

44 set_ftrace_* wildcards ● Prefix: echo 'sys_*' > /debug/tracing/set... ● Postfix: echo '*lock' > /debug/tracing/set... ● Included: echo '*device*' > /debug/tracing/set... ● Anything else: – use grep on available_filter_functions # grep '^selinux.*open$' /debug/tracing/available_filter_functions > \ /debug/tracing/set_ftrace_notrace

45 Todo: ● ftrace dump on OOPS ● change sleep interval of “ftraced” thread ● use CPU clock (aka TSC) for interrupt and preemption latency traces ● option to force per CPU trace interleaving integrity ● printk like hooks (for debugging purposes only) ● Hooks for “tuna” to show in the oscilloscope


Download ppt "Ftrace Tutorial Steven Rostedt"

Similar presentations


Ads by Google