Pentium Performance monitoring counters (pmc) device driver for Linux

What are the Pentium performance monitoring counters?

The Intel Pentium Pro and Pentium II CPUs have hardware support for counting any two of several dozen low-level hardware events. For example, I-cache misses can be precisely measured with the counters. By default, the counters cannot be read or programmed from user mode. If the PCE bit in control register 4 is set, then the counters can be read from user mode, but they still must be set from kernel mode. The pmc device driver module sets this PCE bit at module load time to allow subsequent user mode programs to read the values of the counters with the rdpmc instruction.

Note that the Pentium and Pentium with MMX CPUs are very different under the hood from the Pentium Pro and Pentium II CPUs. The Pentium and Pentium with MMX cpus have different performance counters which are accessed in different ways. The pmc device driver does not support the Pentium nor the Pentium with MMX. It looks easy enough to add this support, but I don't have one of these CPUs to test on. Let me know if you'd like to donate appropriate code, or use of such a CPU to develop such code on. 486 and previous are right out.

The pmc driver provides ioctls to program each of the two counters to count any of the available events.

Also in this distribution is the pmcTime program, which provides an easy way to measure these events from the command line.

The Intel Architecture Developer's Manual, volume 3, availabile from Intel describes the counters in more detail.

Getting pmc

Get the pmc package here.

Installing pmc

What about AMD, Cyrix and other x86 clones?

I doubt that this code will run on these processors. If you try though, please let me know.

What events can I measure?

Here follows a table with the mnemonic and a brief description of the event.
DATA_MEM_REFSAll memory references, both cacheable and noncacheable.
DCU_LINES_INTotal lines allocated in the DCU.
DCU_M_LINES_INNumber of M state lines allocated in the DCU.
DCU_M_LINES_OUTNumber of M state lines evicted from the DCU.
DCU_MISS_OUTSTANDINGWeighted number of cycles while a DCU miss is outstanding.
IFU_IFETCHNumber of instruction fetches, both cacheable and noncacheable.
IFU_IFETCH_MISSNumber of instruction fetch misses.
ITLB_MISSNumber of ITLB misses.
IFU_MEM_STALLNumber of cycles that the instruction fetch pipe stage is stalled, including cache misses, ITLB misses, ITLB faults, and victimcache evictions.
ILD_STALLNumber of cycles that the instruction length decoder is stalled.
L2_IFETCHNumber of L2 instruction fetches.
L2_LDNumber of L2 data loads.
L2_STNumber of L2 data stores.
L2_LINES_INNumber of lines allocated in the L2.
L2_LINES_OUTNumber of lines removed from the L2 for any reason.
L2_M_LINES_INMNumber of modified lines allocated in the L2.
L2_M_LINES_OUTMNumber of modified lines removed from the L2 for any reason.
L2_RQSTSNumber of L2 requests.
L2_ADSNumber of L2 address strobes.
L2_DBUS_BUSYNumber of cycles during which the data bus was busy.
L2_DBUS_BUSY_RDNumber of cycles during which the data bus was busy transferring data from L2 to the processor.
BUS_DRDY_CLOCKS_SELFNumber of clocks during which DRDY is asserted by CPU.
BUS_DRDY_CLOCKS_ANY(Any) Number of clocks during which DRDY is asserted by any agent.
BUS_LOCK_CLOCKS_SELFNumber of clocks during which LOCK is asserted.
BUS_LOCK_CLOCKS_ANYNumber of clocks during which LOCK is asserted.
BUS_REQ_OUTSTANDING_SELFNumber of bus requests outstanding.
BUS_REQ_OUTSTANDING_ANYNumber of bus requests outstanding.
BUS_TRAN_BRD_SELFNumber of burst read transactions.
BUS_TRAN_BRD_ANYNumber of burst read transactions.
BUS_TRAN_RFONumber of read for ownership transactions.
BUS_TRANS_WBNumber of write back transactions.
BUS_TRAN_IFETCHNumber of instruction fetch transactions.
BUS_TRAN_INVALNumber of invalidate transactions.
BUS_TRAN_PWRNumber of partial write transactions.
BUS_TRANS_PNumber of partial transactions.
BUS_TRANS_IONumber of I/O transactions.
BUS_TRAN_DEFNumber of deferred transactions.
BUS_TRAN_BURSTNumber of burst transactions.
BUS_TRAN_ANYNumber of all transactions.
BUS_TRAN_MEMNumber of memory transactions.
BUS_DATA_RCVNumber of bus clock cycles during which this processor is receiving data.
BUS_BNR_DRVNumber of bus clock cycles during which this processor is driving the BNR pin.
BUS_HIT_DRVNumber of bus clock cycles during which this processor is driving the HIT pin.
BUS_HITM_DRVNumber of bus clock cycles during which this processor is driving the HITM pin.
BUS_SNOOP_STALLNumber of clock cycles during which the bus is snoop stalled.
FLOPSNumber of computational floating-point operations retired.
FP_COMP_OPS_EXENumber of computational floating-point operations executed.
FP_ASSISTNumber of floating- point exception cases handled by microcode.
MULNumber of multiplies.
DIVNumber of divides.
CYCLES_DIV_BUSY Number of cycles during which the divider is busy.
LD_BLOCKSNumber of store buffer blocks.
SB_DRAINSNumber of store buffer drain cycles.
MISALIGN_MEM_REFNumber of misaligned data memory references.
INST_RETIREDNumber of instructions retired.
UOPS_RETIREDNumber of UOPs retired.
INST_DECODERNumber of instructions decoded.
HW_INT_RXNumber of hardware interrupts received.
CYCLES_INT_MASKEDNumber of processor cycles for which interrupts are disabled.
CYCLES_INT_PENDING_AND_MASKEDNumber of processor cycles for which interrupts are disabled and interrupts are pending.
BR_INST_RETIREDNumber of branch instructions retired.
BR_MISS_PRED_RETIREDNumber of mispredicted branches retired.
BR_TAKEN_RETIREDNumber of taken branches retired.
BR_MISS_PRED_NRETNumber of taken mispredictions branches retired.
BR_INST_DECODEDNumber of branch instructions decoded.
BTB_MISSESNumber of branches that miss the BTB.
BR_BOGUSNumber of bogus branches.
BACLEARSNumber of time BACLEAR is asserted.
RESOURCE_STALLSNumber of cycles during which there are resource related stalls.
PARTIAL_RAT_STALLSNumber of cycles or events for partial stalls.
SEGMENT_REG_LOADSNumber of segment register loads.
CPU_CLK_UNHALTEDNumber of cycles during which the processor is not halted.
home