From Perf Wiki
Revision as of 00:06, 27 May 2015 by Acme (Talk | contribs)

Jump to: navigation, search
  • Enable callchains for guests (used by perf kvm). At least doing this for the guest kernel should be very possible.
  • The feature tests should be performed only when a file that needs those tests, or at least only when some .c or .h file will be rebuilt
    • An initial step would be for 'make install-doc' not to run the feature tests, there it is not needed at all.
  • Packages needed for the build should be checked before we start building object files, such as bison (bpetkov)
  • Use the highest precision level available by default, e.g.: cycles:pp
  • Use Kconfig to allow selecting features and build minimal version of perf, e.g. one with just 'record' for use in embedded platforms.
    • David Ahern prototyped this, dig those patches and update them.
  • Make the instruction augmentation in the annotate browser platform specific.
    • Right now they are x86 specific but are in the common code.
  • Cherry pick the plugin support in libtraceevent.
  • Add build id support in PERF_RECORD_MMAP, so that we can support long running sessions where update of components may take place.
  • Limit the size of the build id cache (~/.debug), in a way similar to how ccache manages its cache.
  • Add reference counters to the dso and thread structs, so that in tools like 'top' we can remove unused threads from the dead_threads list and also unload symbol tables not referenced by any maps.
  • Accumulate callchain info in order to get cumulative period info like 'sysprof'.
  • Move build-id trimming from perf-record to perf-archive:
    • Just write the build-id for all DSOs, without trying to process all samples at perf-record time to find out which DSOs had samples and thus should be included in the build-id header
    • At perf archive time, process all samples and trim the result, so that the tarball is smaller.
    • Perhaps even an heuristic to figure out if the savings would be worth the trouble of processing all samples, i.e. look at the build-id table and do the math to figure out the sum of file sizes, if it is below some threshold, don't process the samples, just pack those files straight away, doing the sample processing only if it is more than that threshold.
  • Resolve samples in callchains to DSOs and stash its build ids in the file header (acme)
  • Implement --initial-delay, already available in 'perf stat', on 'perf trace'.
  • Fix 'perf top --stdio -g' to limit the number of lines displayed, as it is not considering the callchains, perhaps we need to wire this up with the logic for '--max-stack', that is already available for 'perf top'. The problem is that it scrolls the screen, we can't see the top entries.
  • What I want is that if I am on bar*(), it annotates bar*(), no samples just the call site (obtained from the callchain) dissassembly. This is useful because in many cases there maybe multiple call sites within a function and there maybe inlines in between. Hard to track down if you cannot figure out the surrounding addresses of the call site. (Request made by Stephane Eranian)
  • Check for control+C, Q, when processing events in 'perf report', so that we can exit the tool when processing big files (acme)
  • Adopt the kernel ERR_PTR() macro to avoid passing pointer addresses to return pointers from functions, instead use return pointer or return ERR_PTR(ENOENT), for instance, see include/linux/err.h in the kernel sources. (acme)
  • Adopt the Hints warnings provided by 'perf trace' for permission checks in the other tools (acme)
  • Make pressing 'V' multiple times to go on cycling thru various verbosity levels in 'perf top', so that info that is present in 'perf top -v' can be obtained without having to restart the tool (acme).
  • 'perf top' should be, just like the other tools (trace, stat, record), be able to start a workload to observe it. I.e. 'perf top -e probe_perf:map__get,probe_perf:map__put perf top' should work.

Old entries

  • Factorize the multidimensional sorting between perf report and annotate (will be used by perf trace)
  • Implement a perf cmp (profile comparison between two (DONE, its called 'perf diff')
  • Implement a perf view (GUI) (Partially done, see 'perf report --gtk')
  • Enhance perf trace:
    • Handle the cpu field
    • Handle the timestamp
    • Use the in-perf ip -> symbol resolving
    • Use the in-perf pid -> cmdline resolving
    • Implement multidimensional sorting by field name
Personal tools