Todo

Enable callchains for guests (used by perf kvm). At least doing this for the guest kernel should be very possible.

The feature tests should be performed only when a file that needs those tests, or at least only when some .c or .h file will be rebuilt
- An initial step would be for 'make install-doc' not to run the feature tests, there it is not needed at all.

Packages needed for the build should be checked before we start building object files, such as bison (bpetkov)

Use Kconfig to allow selecting features and build minimal version of perf, e.g. one with just 'record' for use in embedded platforms.
- David Ahern prototyped this, dig those patches and update them.

Make the instruction augmentation in the annotate browser platform specific.
- Right now they are x86 specific but are in the common code.

Add build id support in PERF_RECORD_MMAP, so that we can support long running sessions where update of components may take place.

Allow automatic downloading of DSOs with richer symtabs and DWARF info from debuginfo servers such as darkserver (https://fedoraproject.org/wiki/Darkserver).

Limit the size of the build id cache (~/.debug), in a way similar to how ccache manages its cache.

Add reference counters to the dso and thread structs, so that in tools like 'top' we can remove unused threads from the dead_threads list and also unload symbol tables not referenced by any maps.

Accumulate callchain info in order to get cumulative period info like 'sysprof'.

Move build-id trimming from perf-record to perf-archive:
- Just write the build-id for all DSOs, without trying to process all samples at perf-record time to find out which DSOs had samples and thus should be included in the perf.data build-id header
- At perf archive time, process all samples and trim the result, so that the tarball is smaller.
- Perhaps even an heuristic to figure out if the savings would be worth the trouble of processing all samples, i.e. look at the build-id table and do the math to figure out the sum of file sizes, if it is below some threshold, don't process the samples, just pack those files straight away, doing the sample processing only if it is more than that threshold.

Fix 'perf top --stdio -g' to limit the number of lines displayed, as it is not considering the callchains, perhaps we need to wire this up with the logic for '--max-stack', that is already available for 'perf top'. The problem is that it scrolls the screen, we can't see the top entries.

What I want is that if I am on bar*(), it annotates bar*(), no samples just the call site (obtained from the callchain) dissassembly. This is useful because in many cases there maybe multiple call sites within a function and there maybe inlines in between. Hard to track down if you cannot figure out the surrounding addresses of the call site. (Request made by Stephane Eranian)

Factorize the multidimensional sorting between perf report and annotate (will be used by perf trace)
Implement a perf cmp (profile comparison between two perf.data) (DONE, its called 'perf diff')
Implement a perf view (GUI) (Partially done, see 'perf report --gtk')
Enhance perf trace:
- Handle the cpu field
- Handle the timestamp
- Use the in-perf ip -> symbol resolving
- Use the in-perf pid -> cmdline resolving
- Implement multidimensional sorting by field name