Re: The sixty second pmc howto

Robert Watson writes:

On Thu, 23 Feb 2006, Andrew Gallatin wrote:

Robert Watson writes:

(2) Run "pmcstat -S instructions -O /tmp/sample.out" to start sampling of
instruction retirement events, saving the results to /tmp/sample.out.

Dumb question, but what does "instructions" really mean? The number of
instructions, the time spent executing them, ?

pmcstat magically translates 'instructions' into 'p4-instr-retired', which
might well refer to what happens to an instruction when it is believed to have
successfully executed. Presumably this happens once you know it hasn't been
mispredicted, etc, but I'm sure someone can give a better and more detailed

Let's say it takes 1000 cycles to issue a memory load because of a
cache miss, and 1 cycle to increment something already in a register.
Let's also say that your program does each operation the same number
of times.

Does the 'instructions' count each operation identically so both
operations appear to cost the same, or is it sampled from some clock
interrupt, so that the memory load (correctly) shows up 1000 times
more often?

For what its worth, I tend to use k8-bu-cpu-clk-unhalted because
my gut feeling is that it probably gives the latter behaviour.

freebsd-current@xxxxxxxxxxx mailing list
To unsubscribe, send any mail to "freebsd-current-unsubscribe@xxxxxxxxxxx"

Relevant Pages

  • Re: The sixty second pmc howto
    ... ag> Let's say it takes 1000 cycles to issue a memory load ... ag> Does the 'instructions' count each operation identically so ... 'instructions' is a convenience alias for closest ...
  • Re: Lies, damn lies and benchmarks
    ... When running using just the 16-bit registers, ... extra cycles when run on the 386 over the 286 (these were mostly system ... instructions which didn't get run too often anyways), ... The FPU was another story, the 287 FPU was usually run at an asynchronous ...
  • Re: [PATCH RFC/RFB] x86_64, i386: interrupt dispatch changes
    ... cycles to the bus. ... LOCK slowness is not because of the bus. ... maybe 150-200 regular pipelined, superscalar instructions. ...
  • Re: SSE2-Sort within a register
    ... register files. ... cycles. ... 128 bit SSEinstructions are split into Doubles ... Most 128 bit SSE and SSE2 ...
  • Re: Adjusting PC Hyperthreading for Spice Simulation
    ... ago), 350 CPU cycles for a code cache miss was not atypical, but RAM ... delay in which a sequence of instructions totalling 100 cycles could be ... and others) support speculative execution and out of order execution ...