From Perf Wiki
Revision as of 15:46, 5 May 2015 by Acme
- Enable callchains for guests (used by perf kvm). At least doing this for the guest kernel should be very possible.
- The feature tests should be performed only when a file that needs those tests, or at least only when some .c or .h file will be rebuilt
- Packages needed for the build should be checked before we start building object files, such as bison (bpetkov)
- Forward port the page fault tracepoints and use it in 'trace':
- Use the highest precision level available by default, e.g.: cycles:pp
- Use Kconfig to allow selecting features and build minimal version of perf, e.g. one with just 'record' for use in embedded platforms.
- David Ahern prototyped this, dig those patches and update them.
- Make the instruction augmentation in the annotate browser platform specific.
- Right now they are x86 specific but are in the common code.
- Cherry pick the plugin support in libtraceevent.
- Add build id support in PERF_RECORD_MMAP, so that we can support long running sessions where update of components may take place.
- Allow automatic downloading of DSOs with richer symtabs and DWARF info from debuginfo servers such as darkserver (https://fedoraproject.org/wiki/Darkserver).
- Limit the size of the build id cache (~/.debug), in a way similar to how ccache manages its cache.
- Adopt Vince Weaver's suite of tests in 'perf test'.
- Add reference counters to the dso and thread structs, so that in tools like 'top' we can remove unused threads from the dead_threads list and also unload symbol tables not referenced by any maps.
- Accumulate callchain info in order to get cumulative period info like 'sysprof'.
- Systemtap SDT suppport in 'perf probe'
- Move build-id trimming from perf-record to perf-archive:
- At perf archive time, process all samples and trim the result, so that the tarball is smaller.
- Implement --initial-delay, already available in 'perf stat', on 'perf trace'.
- Fix 'perf top --stdio -g' to limit the number of lines displayed, as it is not considering the callchains, perhaps we need to wire this up with the logic for '--max-stack', that is already available for 'perf top'. The problem is that it scrolls the screen, we can't see the top entries.
- What I want is that if I am on bar*(), it annotates bar*(), no samples just the call site (obtained from the callchain) dissassembly. This is useful because in many cases there maybe multiple call sites within a function and there maybe inlines in between. Hard to track down if you cannot figure out the surrounding addresses of the call site. (Request made by Stephane Eranian)
- Check for control+C, Q, when processing events in 'perf report', so that we can exit the tool when processing big files (acme)
- Factorize the multidimensional sorting between perf report and annotate (will be used by perf trace)
- Implement a perf cmp (profile comparison between two perf.data) (DONE, its called 'perf diff')
- Implement a perf view (GUI) (Partially done, see 'perf report --gtk')
- Enhance perf trace:
- Handle the cpu field
- Handle the timestamp
- Use the in-perf ip -> symbol resolving
- Use the in-perf pid -> cmdline resolving
- Implement multidimensional sorting by field name