https://perf.wiki.kernel.org/api.php?action=feedcontributions&user=Dantruong&feedformat=atomPerf Wiki - User contributions [en]2024-03-29T05:11:42ZUser contributionsMediaWiki 1.19.24https://perf.wiki.kernel.org/index.php/Main_PageMain Page2009-09-24T19:46:28Z<p>Dantruong: /* Internals */</p>
<hr />
<div><b><big><i><center>...More than just counters...</center></i></big></b><br />
<br />
<br />
<big>'''Performance Counters for Linux Wiki'''</big><br />
<br />
This is the wiki page for the perfcounters subsystem in Linux.<br />
<br />
Performance counters are special hardware registers available on most modern <br />
CPUs. These registers count the number of certain types of hw events: such <br />
as instructions executed, cache-misses suffered, or branches mispredicted - <br />
without slowing down the kernel or applications. These registers can also <br />
trigger interrupts when a threshold number of events have passed - and can <br />
thus be used to profile the code that runs on that CPU. <br />
<br />
The Linux Performance Counter subsystem provides rich abstractions over these <br />
hardware capabilities. It provides per task, per CPU and per-workload counters,<br />
counter groups, and it provides sampling capabilities on top of those - and more.<br />
<br />
It also provides abstraction for 'software events' - such as minor/major page faults, task migrations, task context-switches and tracepoints.<br />
<br />
There is a new tool ('perf') that makes full use of this new kernel subsystem. It can be used to optimize, validate and measure applications, workloads or the full system.<br />
<br />
'perf' is hosted in the upstream kernel repository and can be found under: tools/perf/<br />
<br />
== Getting Started ==<br />
<br />
Once you have installed 'perf' on your system, the simplest way to start profiling an userspace program is to use the "perf record" and "perf report" command as follows:<br />
<br />
$ <b>perf record -f -- git gc</b><br />
&nbsp;<br />
Counting objects: 1283571, done.<br />
Compressing objects: 100% (206724/206724), done.<br />
Writing objects: 100% (1283571/1283571), done.<br />
Total 1283571 (delta 1070675), reused 1281443 (delta 1068566)<br />
[ perf record: Captured and wrote 31.054 MB perf.data (~1356768 samples) ]<br />
&nbsp;<br />
$ <b>perf report --sort comm,dso,symbol</b> | head -10<br />
# Samples: 1355726<br />
#<br />
# Overhead Command Shared Object Symbol<br />
# ........ ............... ....................................... ......<br />
#<br />
31.53% git /usr/bin/git [.] 0x0000000009804f<br />
13.41% git-prune /usr/bin/git-prune [.] 0x000000000ad06d<br />
10.05% git /lib/tls/i686/cmov/libc-2.8.90.so [.] _nl_make_l10nflist<br />
5.36% git-prune /usr/lib/libz.so.1.2.3.3 [.] 0x00000000009d51<br />
4.48% git /lib/tls/i686/cmov/libc-2.8.90.so [.] memcpy<br />
<br />
For more examples of how 'perf' can be used see [[perf examples]].<br />
<br />
== TODO list ==<br />
<br />
=== Perf tools ===<br />
<br />
* Factorize the multidimensional sorting between perf report and annotate (will be used by perf trace)<br />
* Implement a perf cmp (profile comparison between two perf.data)<br />
* Implement a perf view (GUI)<br />
* Enhance perf trace:<br />
** Handle the cpu field<br />
** Handle the timestamp<br />
** Use the in-perf ip -> symbol resolving<br />
** Use the in-perf pid -> cmdline resolving<br />
** Implement multidimensional sorting by field name<br />
<br />
== Internals ==<br />
<br />
* Performance Monitoring Units (PMUs)<br />
** [[Nehalem | Intel(TM) x86 Nehalem PMU]]<br />
** [[Montecito | Intel(TM) Itanium(TM) 2 PMU]]<br />
* Performance Counters for Linux<br />
** [[PCLstruct| PCL core kernel data structures]]<br />
** [[PCL internals | PCL core kernel internals]]<br />
** [[perf internals | perf tool internals]]<br />
<br />
== Notes ==<br />
<br />
* INSTALLATION:<br />
** in order to get the documentation installed you'll need these packages:<br />
** I used on RHEL5: yum install asciidoc xmlto<br />
*** asciidoc<br />
*** tetex-fonts<br />
*** tetex-dvips<br />
*** dialog<br />
*** tetex<br />
*** tetex-latex<br />
*** xmltex<br />
*** passivetex<br />
*** w3m<br />
*** xmlto<br />
** Don't forget to go into tools/perf and do 'make install-man'<br />
** without doing the above, you won't be able to run 'perf help <command>'</div>Dantruonghttps://perf.wiki.kernel.org/index.php/File:Montecito_pmc.jpgFile:Montecito pmc.jpg2009-09-11T23:18:57Z<p>Dantruong: uploaded a new version of "File:Montecito pmc.jpg"</p>
<hr />
<div>Itanium PMCs bit fields</div>Dantruonghttps://perf.wiki.kernel.org/index.php/MontecitoMontecito2009-09-11T22:49:29Z<p>Dantruong: /* Overview */</p>
<hr />
<div>== Intel® Itanium® 2 Montecito® processor PMU ==<br />
<br />
<br />
=== Introduction ===<br />
<br />
Intel defines the architected PMU in the 'Intel IA-64 Architecture Software Developer's Manual'. However the architected PMU is a bare bone version of what is actually implemented. It is noteworthy that the Itanium 2<br />
PMU has not changed too much, so support of the whole familly does not require rewriting the tools for each member<br />
of the familly. Therefore readers interrested in the capabilities of the Itanium PMU should go directly to the model specific updated documentation of a processor's functionality.<br />
<br />
The Itanium-1 (Merced) is obsolete, so there's two PMU implementations that are available: The McKinley class PMU (with 4 counters), and the Montecito class PMU (with 12 counters). The Montecito processor supports hardware threads<br />
(hyperthreading), so the 4 architected counters allow monitoring of activity of the whole core or the hardware thread,<br />
while the 8 new counters can only monitor thread activity.<br />
<br />
== Overview ==<br />
<br />
Itanium defines configuration and data registers: PMC, PMDs. Counting registers are paired PMC/PMD, where you program what the counter will count in the PMC and read the actual counter in the PMD. Other PMC/PMDs that are not be counters may not be paired-up.<br />
<br />
The Itanium PMU is a synchronized subsystem with a global state, where all counters can be started or stopped synchronously. When a counter wraps around, it can trigger a PMU overflow interrupt and freeze the whole PMU, which allows for consistent measurment of multiple events. The state of the PMU can be read in the overflow PMCs. Counters are 47bits physically but presented as a 64-bit sign extended entity.<br />
<br />
The PMU can filter events through multiple mechanisms to focus on specific areas. It can filter by privilege level with a counter's plm bitmaks. It can filter by instruction op-codes, by virtual address range using the debug registers.<br />
<br />
The PMU also has an '''EAR''', event address registers which are special PMCs & PMDs to collect address traces and memory latencies; and the ETB, the event trace buffer, to collect branch history.<br />
<br />
[[File:Montecito pmu list.PNG]]<br />
<br />
Montecito control registers go beyond controlling event counters. The following picture shows the bit field configuration for the control registers.<br />
<br />
[[File:Montecito_pmc.jpg]]<br />
<br />
Brief description:<br />
* CNT: regular performance counters configuration<br />
* OVF: overflow bitmask. A bit set to 1 indicates the associated event counter overflowed. PMC0.fr indicates that the whole PMU was frozen by an overflowing counter.<br />
* IARC: filter event collection to a range of instruction addresses. Events happening while instructions are outside that range will be ignored.<br />
* OM: filter event collection to specific instruction op-codes.<br />
* MPEC: filter event collection to specific memory events.<br />
* IEAR: gather a trace sample for specific instruction events into the EAR.<br />
* DEAR: gather a trace sample specific data event into the EAR.<br />
* BTB: collect a trace stack of branches into the ETB buffer.<br />
<br />
== External links ==<br />
* Wikipedia [[ http://en.wikipedia.org/wiki/Itanium | Intel Itanium ]]<br />
* Wikipedia [[ http://en.wikipedia.org/wiki/Montecito_(processor) | Montecito processor ]]<br />
* Intel [[ http://www.intel.com/design/itanium/manuals/iiasdmanual.htm | Intel® Itanium® Architecture ]]<br />
* Intel [[ http://cache-www.intel.com/cd/00/00/21/93/219348_software_optimization.pdf | Introduction to Microarchitectural Optimization for Itanium2 Processors ]]<br />
* Intel [[ http://download.intel.com/design/Itanium2/manuals/30806501.pdf | Dual-Core Update to the Intel® Itanium® 2 Processor Reference Manual]] (PDF 2.29MB)<br />
<br />
----<br />
Last edit by: --[[User:Dantruong|Dantruong]] 01:48, 28 August 2009 (UTC)</div>Dantruonghttps://perf.wiki.kernel.org/index.php/MontecitoMontecito2009-09-11T22:46:18Z<p>Dantruong: /* Overview */</p>
<hr />
<div>== Intel® Itanium® 2 Montecito® processor PMU ==<br />
<br />
<br />
=== Introduction ===<br />
<br />
Intel defines the architected PMU in the 'Intel IA-64 Architecture Software Developer's Manual'. However the architected PMU is a bare bone version of what is actually implemented. It is noteworthy that the Itanium 2<br />
PMU has not changed too much, so support of the whole familly does not require rewriting the tools for each member<br />
of the familly. Therefore readers interrested in the capabilities of the Itanium PMU should go directly to the model specific updated documentation of a processor's functionality.<br />
<br />
The Itanium-1 (Merced) is obsolete, so there's two PMU implementations that are available: The McKinley class PMU (with 4 counters), and the Montecito class PMU (with 12 counters). The Montecito processor supports hardware threads<br />
(hyperthreading), so the 4 architected counters allow monitoring of activity of the whole core or the hardware thread,<br />
while the 8 new counters can only monitor thread activity.<br />
<br />
== Overview ==<br />
<br />
Itanium defines configuration and data registers: PMC, PMDs. Counting registers are paired PMC/PMD, where you program what the counter will count in the PMC and read the actual counter in the PMD. Other PMC/PMDs that are not be counters may not be paired-up.<br />
<br />
The Itanium PMU is a synchronized subsystem with a global state, where all counters can be started or stopped synchronously. When a counter wraps around, it can trigger a PMU overflow interrupt and freeze the whole PMU, which allows for consistent measurment of multiple events. The state of the PMU can be read in the overflow PMCs. Counters are 47bits physically but presented as a 64-bit sign extended entity.<br />
<br />
The PMU can filter events through multiple mechanisms to focus on specific areas. It can filter by privilege level with a counter's plm bitmaks. It can filter by instruction op-codes, by virtual address range using the debug registers.<br />
<br />
The PMU also has an [b]EAR[/b], event address registers which are special PMCs & PMDs to collect address traces and memory latencies; and the ETB, the event trace buffer, to collect branch history.<br />
<br />
[[File:Montecito pmu list.PNG]]<br />
<br />
Montecito control registers go beyond controlling event counters. The following picture shows the bit field configuration for the control registers.<br />
<br />
[[File:Montecito_pmc.jpg]]<br />
<br />
Brief description:<br />
* CNT: Event counter configuration<br />
* OVF: overflow bitmask. A bit set to 1 indicates the associated event counter overflowed. PMC0.fr indicates that the whole PMU was frozen by an overflowing counter.<br />
* IARC: filter event collection to a range of instruction addresses. Events happening while instructions are outside that range will be ignored.<br />
* OM: filter event collection to specific instruction op-codes.<br />
* MPEC: filter event collection to specific memory events.<br />
* IEAR: gather a trace sample for specific instruction events into the EAR.<br />
* DEAR: gather a trace sample specific data event into the EAR.<br />
* BTB: collect a trace stack of branches into the ETB buffer.<br />
<br />
== External links ==<br />
* Wikipedia [[ http://en.wikipedia.org/wiki/Itanium | Intel Itanium ]]<br />
* Wikipedia [[ http://en.wikipedia.org/wiki/Montecito_(processor) | Montecito processor ]]<br />
* Intel [[ http://www.intel.com/design/itanium/manuals/iiasdmanual.htm | Intel® Itanium® Architecture ]]<br />
* Intel [[ http://cache-www.intel.com/cd/00/00/21/93/219348_software_optimization.pdf | Introduction to Microarchitectural Optimization for Itanium2 Processors ]]<br />
* Intel [[ http://download.intel.com/design/Itanium2/manuals/30806501.pdf | Dual-Core Update to the Intel® Itanium® 2 Processor Reference Manual]] (PDF 2.29MB)<br />
<br />
----<br />
Last edit by: --[[User:Dantruong|Dantruong]] 01:48, 28 August 2009 (UTC)</div>Dantruonghttps://perf.wiki.kernel.org/index.php/MontecitoMontecito2009-09-11T22:37:22Z<p>Dantruong: /* Overview */</p>
<hr />
<div>== Intel® Itanium® 2 Montecito® processor PMU ==<br />
<br />
<br />
=== Introduction ===<br />
<br />
Intel defines the architected PMU in the 'Intel IA-64 Architecture Software Developer's Manual'. However the architected PMU is a bare bone version of what is actually implemented. It is noteworthy that the Itanium 2<br />
PMU has not changed too much, so support of the whole familly does not require rewriting the tools for each member<br />
of the familly. Therefore readers interrested in the capabilities of the Itanium PMU should go directly to the model specific updated documentation of a processor's functionality.<br />
<br />
The Itanium-1 (Merced) is obsolete, so there's two PMU implementations that are available: The McKinley class PMU (with 4 counters), and the Montecito class PMU (with 12 counters). The Montecito processor supports hardware threads<br />
(hyperthreading), so the 4 architected counters allow monitoring of activity of the whole core or the hardware thread,<br />
while the 8 new counters can only monitor thread activity.<br />
<br />
== Overview ==<br />
<br />
Itanium defines configuration and data registers: PMC, PMDs. Counting registers are paired PMC/PMD, where you program what the counter will count in the PMC and read the actual counter in the PMD. Other PMC/PMDs that are not be counters may not be paired-up.<br />
<br />
The Itanium PMU is a synchronized subsystem with a global state, where all counters can be started or stopped synchronously. When a counter wraps around, it can trigger a PMU overflow interrupt and freeze the whole PMU, which allows for consistent measurment of multiple events. The state of the PMU can be read in the overflow PMCs. Counters are 47bits physically but presented as a 64-bit sign extended entity.<br />
<br />
The PMU can filter events through multiple mechanisms to focus on specific areas. It can filter by privilege level with a counter's plm bitmaks. It can filter by instruction op-codes, by virtual address range using the debug registers.<br />
<br />
The PMU also has an [b]EAR[/b], event address registers which are special PMCs & PMDs to collect address traces and memory latencies; and the ETB, the event trace buffer, to collect branch history.<br />
<br />
[[File:Montecito pmu list.PNG]]<br />
<br />
Montecito control registers go beyond controlling event counters. The following picture shows the bit field configuration for the control registers.<br />
<br />
[[File:Montecito_pmc.jpg]]<br />
<br />
== External links ==<br />
* Wikipedia [[ http://en.wikipedia.org/wiki/Itanium | Intel Itanium ]]<br />
* Wikipedia [[ http://en.wikipedia.org/wiki/Montecito_(processor) | Montecito processor ]]<br />
* Intel [[ http://www.intel.com/design/itanium/manuals/iiasdmanual.htm | Intel® Itanium® Architecture ]]<br />
* Intel [[ http://cache-www.intel.com/cd/00/00/21/93/219348_software_optimization.pdf | Introduction to Microarchitectural Optimization for Itanium2 Processors ]]<br />
* Intel [[ http://download.intel.com/design/Itanium2/manuals/30806501.pdf | Dual-Core Update to the Intel® Itanium® 2 Processor Reference Manual]] (PDF 2.29MB)<br />
<br />
----<br />
Last edit by: --[[User:Dantruong|Dantruong]] 01:48, 28 August 2009 (UTC)</div>Dantruonghttps://perf.wiki.kernel.org/index.php/File:Montecito_pmc.jpgFile:Montecito pmc.jpg2009-09-11T22:34:11Z<p>Dantruong: Itanium PMCs bit fields</p>
<hr />
<div>Itanium PMCs bit fields</div>Dantruonghttps://perf.wiki.kernel.org/index.php/NehalemNehalem2009-08-31T21:00:01Z<p>Dantruong: /* PMU overview */</p>
<hr />
<div>== Intel x86-64 PMU (Nehalem) ==<br />
<br />
The x86-64 PMUs are specific to each underlying architecture. The Netburst architecture of the Pentium-4 has given way to the Core architecture, in use in Nehalem, i7 processors.<br />
<br />
The recent PMUs seem to be somewhat structured similarly across Core and Atom CPU models.<br />
<br />
<br />
== PMU overview ==<br />
<br />
The diagram of PMU registers shows the registers available depending on which processor family you are using. You are offered at least 4 counters.<br />
<br />
[[File:X86-64_pmu.PNG]]<br />
<br />
The Nehalem PMU supports a hardware bufering scheme to collect multiple samples from the PMU. When a counter overflows, the CPU interrupts into firmware, which in turn copies the counters into a PEBS buffer record. The following graph shows the indirection levels and the content of each record.<br />
<br />
[[File:X86-64_pebs.PNG]]</div>Dantruonghttps://perf.wiki.kernel.org/index.php/File:X86-64_pebs.PNGFile:X86-64 pebs.PNG2009-08-31T20:57:33Z<p>Dantruong: Nehalem x86-64 PMU PEBS</p>
<hr />
<div>Nehalem x86-64 PMU PEBS</div>Dantruonghttps://perf.wiki.kernel.org/index.php/NehalemNehalem2009-08-31T20:01:18Z<p>Dantruong: </p>
<hr />
<div>== Intel x86-64 PMU (Nehalem) ==<br />
<br />
The x86-64 PMUs are specific to each underlying architecture. The Netburst architecture of the Pentium-4 has given way to the Core architecture, in use in Nehalem, i7 processors.<br />
<br />
The recent PMUs seem to be somewhat structured similarly across Core and Atom CPU models.<br />
<br />
<br />
== PMU overview ==<br />
<br />
The diagram of PMU registers shows the registers available depending on which processor family you are using. You are offered at least 4 counters.<br />
<br />
[[File:X86-64_pmu.PNG]]</div>Dantruonghttps://perf.wiki.kernel.org/index.php/File:X86-64_pmu.PNGFile:X86-64 pmu.PNG2009-08-31T19:59:11Z<p>Dantruong: x86-64 PMU registers</p>
<hr />
<div>x86-64 PMU registers</div>Dantruonghttps://perf.wiki.kernel.org/index.php/NehalemNehalem2009-08-31T19:58:25Z<p>Dantruong: Created page with ' == Intel x86-64 PMU (Nehalem) == The x86-64 PMUs are specific to each underlying architecture. The Netburst architecture of the Pentium-4 has given way to the Core architecture...'</p>
<hr />
<div><br />
== Intel x86-64 PMU (Nehalem) ==<br />
<br />
The x86-64 PMUs are specific to each underlying architecture. The Netburst architecture of the Pentium-4 has given way to the Core architecture, in use in Nehalem, i7 processors.<br />
<br />
The recent PMUs seem to be somewhat structured similarly across Core and Atom CPU models.<br />
<br />
<br />
== PMU overview ==<br />
<br />
The diagram of PMU registers shows the registers available depending on which processor family you are using.</div>Dantruonghttps://perf.wiki.kernel.org/index.php/MontecitoMontecito2009-08-28T02:34:02Z<p>Dantruong: /* Overview */</p>
<hr />
<div>== Intel® Itanium® 2 Montecito® processor PMU ==<br />
<br />
<br />
=== Introduction ===<br />
<br />
Intel defines the architected PMU in the 'Intel IA-64 Architecture Software Developer's Manual'. However the architected PMU is a bare bone version of what is actually implemented. It is noteworthy that the Itanium 2<br />
PMU has not changed too much, so support of the whole familly does not require rewriting the tools for each member<br />
of the familly. Therefore readers interrested in the capabilities of the Itanium PMU should go directly to the model specific updated documentation of a processor's functionality.<br />
<br />
The Itanium-1 (Merced) is obsolete, so there's two PMU implementations that are available: The McKinley class PMU (with 4 counters), and the Montecito class PMU (with 12 counters). The Montecito processor supports hardware threads<br />
(hyperthreading), so the 4 architected counters allow monitoring of activity of the whole core or the hardware thread,<br />
while the 8 new counters can only monitor thread activity.<br />
<br />
== Overview ==<br />
<br />
Itanium defines configuration and data registers: PMC, PMDs. Counting registers are paired PMC/PMD, where you program what the counter will count in the PMC and read the actual counter in the PMD. Other PMC/PMDs that are not be counters may not be paired-up.<br />
<br />
The Itanium PMU is a synchronized subsystem with a global state, where all counters can be started or stopped synchronously. When a counter wraps around, it can trigger a PMU overflow interrupt and freeze the whole PMU, which allows for consistent measurment of multiple events. The state of the PMU can be read in the overflow PMCs. Counters are 47bits physically but presented as a 64-bit sign extended entity.<br />
<br />
The PMU can filter events through multiple mechanisms to focus on specific areas. It can filter by privilege level with a counter's plm bitmaks. It can filter by instruction op-codes, by virtual address range using the debug registers.<br />
<br />
The PMU also has an [b]EAR[/b], event address registers which are special PMCs & PMDs to collect address traces and memory latencies; and the ETB, the event trace buffer, to collect branch history.<br />
<br />
[[File:Montecito pmu list.PNG]]<br />
<br />
== External links ==<br />
* Wikipedia [[ http://en.wikipedia.org/wiki/Itanium | Intel Itanium ]]<br />
* Wikipedia [[ http://en.wikipedia.org/wiki/Montecito_(processor) | Montecito processor ]]<br />
* Intel [[ http://www.intel.com/design/itanium/manuals/iiasdmanual.htm | Intel® Itanium® Architecture ]]<br />
* Intel [[ http://cache-www.intel.com/cd/00/00/21/93/219348_software_optimization.pdf | Introduction to Microarchitectural Optimization for Itanium2 Processors ]]<br />
* Intel [[ http://download.intel.com/design/Itanium2/manuals/30806501.pdf | Dual-Core Update to the Intel® Itanium® 2 Processor Reference Manual]] (PDF 2.29MB)<br />
<br />
----<br />
Last edit by: --[[User:Dantruong|Dantruong]] 01:48, 28 August 2009 (UTC)</div>Dantruonghttps://perf.wiki.kernel.org/index.php/File:Montecito_pmu_list.PNGFile:Montecito pmu list.PNG2009-08-28T02:33:09Z<p>Dantruong: List of Montecito PMC and PMD registers</p>
<hr />
<div>List of Montecito PMC and PMD registers</div>Dantruonghttps://perf.wiki.kernel.org/index.php/MontecitoMontecito2009-08-28T02:21:07Z<p>Dantruong: </p>
<hr />
<div>== Intel® Itanium® 2 Montecito® processor PMU ==<br />
<br />
<br />
=== Introduction ===<br />
<br />
Intel defines the architected PMU in the 'Intel IA-64 Architecture Software Developer's Manual'. However the architected PMU is a bare bone version of what is actually implemented. It is noteworthy that the Itanium 2<br />
PMU has not changed too much, so support of the whole familly does not require rewriting the tools for each member<br />
of the familly. Therefore readers interrested in the capabilities of the Itanium PMU should go directly to the model specific updated documentation of a processor's functionality.<br />
<br />
The Itanium-1 (Merced) is obsolete, so there's two PMU implementations that are available: The McKinley class PMU (with 4 counters), and the Montecito class PMU (with 12 counters). The Montecito processor supports hardware threads<br />
(hyperthreading), so the 4 architected counters allow monitoring of activity of the whole core or the hardware thread,<br />
while the 8 new counters can only monitor thread activity.<br />
<br />
== Overview ==<br />
<br />
Itanium defines configuration and data registers: PMC, PMDs. Counting registers are paired PMC/PMD, where you program what the counter will count in the PMC and read the actual counter in the PMD. Other PMC/PMDs that are not be counters may not be paired-up.<br />
<br />
The Itanium PMU is a synchronized subsystem with a global state, where all counters can be started or stopped synchronously. When a counter wraps around, it can trigger a PMU overflow interrupt and freeze the whole PMU, which allows for consistent measurment of multiple events. The state of the PMU can be read in the overflow PMCs. Counters are 47bits physically but presented as a 64-bit sign extended entity.<br />
<br />
The PMU can filter events through multiple mechanisms to focus on specific areas. It can filter by privilege level with a counter's plm bitmaks. It can filter by instruction op-codes, by virtual address range using the debug registers.<br />
<br />
The PMU also has an [b]EAR[/b], event address registers which are special PMCs & PMDs to collect address traces and memory latencies; and the ETB, the event trace buffer, to collect branch history.<br />
<br />
== External links ==<br />
* Wikipedia [[ http://en.wikipedia.org/wiki/Itanium | Intel Itanium ]]<br />
* Wikipedia [[ http://en.wikipedia.org/wiki/Montecito_(processor) | Montecito processor ]]<br />
* Intel [[ http://www.intel.com/design/itanium/manuals/iiasdmanual.htm | Intel® Itanium® Architecture ]]<br />
* Intel [[ http://cache-www.intel.com/cd/00/00/21/93/219348_software_optimization.pdf | Introduction to Microarchitectural Optimization for Itanium2 Processors ]]<br />
* Intel [[ http://download.intel.com/design/Itanium2/manuals/30806501.pdf | Dual-Core Update to the Intel® Itanium® 2 Processor Reference Manual]] (PDF 2.29MB)<br />
<br />
----<br />
Last edit by: --[[User:Dantruong|Dantruong]] 01:48, 28 August 2009 (UTC)</div>Dantruonghttps://perf.wiki.kernel.org/index.php/MontecitoMontecito2009-08-28T01:48:46Z<p>Dantruong: Creation of the Intel Itanium-2 PMU overview</p>
<hr />
<div>== Intel Itanium 2 Montecito processor PMU ==<br />
<br />
<br />
=== Introduction ===<br />
<br />
Intel defines the architected PMU in the 'Intel IA-64 Architecture Software Developer's Manual'. However the architected PMU is a bare bone version of what is actually implemented. It is noteworthy that the Itanium 2<br />
PMU has not changed too much, so support of the whole familly does not require rewriting the tools for each member<br />
of the familly. Therefore readers interrested in the capabilities of the Itanium PMU should go directly to the specific documentation of a processor's functionality.<br />
<br />
The Itanium-1 (Merced) is obsolete, so there's two PMU implementations that are available: The McKinley class PMU (with 4 counters), and the Montecito class PMU (with 12 counters). The Montecito processor supports hardware threads<br />
(hyperthreading), so the 4 architected counters allow monitoring of activity of the whole core or the hardware thread,<br />
while the 8 new counters can only monitor thread activity.<br />
<br />
<br />
<br />
== External links ==<br />
* Wikipedia [[ http://en.wikipedia.org/wiki/Itanium | Intel Itanium ]]<br />
* Wikipedia [[ http://en.wikipedia.org/wiki/Montecito_(processor) | Montecito processor ]]<br />
* Intel [[ http://www.intel.com/design/itanium/manuals/iiasdmanual.htm | Intel® Itanium® Architecture ]]<br />
<br />
----<br />
Last edit by: --[[User:Dantruong|Dantruong]] 01:48, 28 August 2009 (UTC)</div>Dantruonghttps://perf.wiki.kernel.org/index.php/Main_PageMain Page2009-08-28T01:15:16Z<p>Dantruong: Added Internals section</p>
<hr />
<div><b><big><i><center>...More than just counters...</center></i></big></b><br />
<br />
<br />
<big>'''Performance Counters for Linux Wiki'''</big><br />
<br />
This is the wiki page for the perfcounters subsystem in Linux.<br />
<br />
Performance counters are special hardware registers available on most modern <br />
CPUs. These registers count the number of certain types of hw events: such <br />
as instructions executed, cache-misses suffered, or branches mispredicted - <br />
without slowing down the kernel or applications. These registers can also <br />
trigger interrupts when a threshold number of events have passed - and can <br />
thus be used to profile the code that runs on that CPU. <br />
<br />
The Linux Performance Counter subsystem provides rich abstractions over these <br />
hardware capabilities. It provides per task, per CPU and per-workload counters,<br />
counter groups, and it provides sampling capabilities on top of those - and more.<br />
<br />
It also provides abstraction for 'software events' - such as minor/major page faults, task migrations, task context-switches and tracepoints.<br />
<br />
There is a new tool ('perf') that makes full use of this new kernel subsystem. It can be used to optimize, validate and measure applications, workloads or the full system.<br />
<br />
'perf' is hosted in the upstream kernel repository and can be found under: tools/perf/<br />
<br />
== Getting Started ==<br />
<br />
Once you have installed 'perf' on your system, the simplest way to start profiling an userspace program is to use the "perf record" and "perf report" command as follows:<br />
<br />
$ <b>perf record -f -- git gc</b><br />
&nbsp;<br />
Counting objects: 1283571, done.<br />
Compressing objects: 100% (206724/206724), done.<br />
Writing objects: 100% (1283571/1283571), done.<br />
Total 1283571 (delta 1070675), reused 1281443 (delta 1068566)<br />
[ perf record: Captured and wrote 31.054 MB perf.data (~1356768 samples) ]<br />
&nbsp;<br />
$ <b>perf report --sort comm,dso,symbol</b> | head -10<br />
# Samples: 1355726<br />
#<br />
# Overhead Command Shared Object Symbol<br />
# ........ ............... ....................................... ......<br />
#<br />
31.53% git /usr/bin/git [.] 0x0000000009804f<br />
13.41% git-prune /usr/bin/git-prune [.] 0x000000000ad06d<br />
10.05% git /lib/tls/i686/cmov/libc-2.8.90.so [.] _nl_make_l10nflist<br />
5.36% git-prune /usr/lib/libz.so.1.2.3.3 [.] 0x00000000009d51<br />
4.48% git /lib/tls/i686/cmov/libc-2.8.90.so [.] memcpy<br />
<br />
For more examples of how 'perf' can be used see [[perf examples]].<br />
<br />
== TODO list ==<br />
<br />
=== Perf tools ===<br />
<br />
* Factorize the multidimensional sorting between perf report and annotate (will be used by perf trace)<br />
* Implement a perf cmp (profile comparison between two perf.data)<br />
* Implement a perf view (GUI)<br />
* Enhance perf trace:<br />
** Handle the cpu field<br />
** Handle the timestamp<br />
** Use the in-perf ip -> symbol resolving<br />
** Use the in-perf pid -> cmdline resolving<br />
** Implement multidimensional sorting by field name<br />
<br />
== Internals ==<br />
<br />
* Performance Monitoring Units (PMUs)<br />
** [[Nehalem | Intel(TM) x86 Nehalem PMU]]<br />
** [[Montecito | Intel(TM) Itanium(TM) 2 PMU]]<br />
* Performance Counters for Linux<br />
** [[PCL internals | PCL core kernel internals]]<br />
** [[perf internals | perf tool internals]]</div>Dantruong