This document pertains to how C++ samples are captured, and how they are processed for reporting in perf.html. Specifically the profiler samples capture memory addresses for frames in C++ functions, and there can be many different frame addresses per single function.
0x01 int main() {
0x02 for (int i = 0; i < 10; i++) {
0x03 doSomething(i);
0x04 }
0x05
0x06 someInterlude();
0x07
0x08 for (int i = 10; i < 20; i++) {
0x09 doSomething(i);
0x0A }
0x0B
0x0C return 0;
0x0D }
0x0E
0x0F void doSomething(int i) {
0x11 // do something
0x12 }
0x13
0x14 void someInterlude() {
0x15 // do something else
0x16 }
We'll use line numbers and code addresses interchangably here in order to simplify things. In reality, each line in the code above will probably generate multiple instructions, and each of those instructions will take up more than a single byte.
We'll have a frame for every line of the script above that was on the stack during sampling.
We have three funcs in the code above, each of them has a start address
main()
has address0x01
doSomething(int)
has address0x0F
someInterlude()
has address0x14
Each frame is assigned to one of these funcs. (We pick the func with the highest address among funcs with start address <= frame address.)
In the first line of the main()
function, there's only one frame on the stack: 0x02
. So we'll have a stack object with:
- frame:
0x02
- prefix: null
When the CPU is executing doSomething()
during the first loop, the stack will have these frames on it:
0x03
0x11
So we'll have a stack object with
- frame:
0x11
- prefix: a stack with
- frame:
0x03
- prefix: null
- frame:
If the CPU is executing doSomething()
from the second loop, the stack will be
0x09
0x11
so we'll have a different leaf stack object whose prefix is a stack object with frame 0x09
.
If we just used stacks to create the call tree, with one tree node per stack, the tree would look like this:
- main@0x02
- main@0x03
- doSomething@0x11
- main@0x06
- someInterlude@0x15
- main@0x08
- main@0x09
- doSomething@0x11
This can be very confusing and is usually not what you want. What you want instead is this:
- main
- doSomething
- someInterlude
In other words, we want to combine many different stack
object into the same tree node, reflecting which frames have been combined into the same func.
We call the node of such a combined tree a CallNode
- it's like a stack
, but instead of being based on frames, it's based on funcs.
We want to keep around enough information to be able to show per-line profiling information for a given CallNode. For example, for the CallNode for main
, we want to be able to highlight the expensive lines / inside main()
. So we need to know how many samples were executing a given line of code. With the information we have, we can do that by finding all samples whose stacks are in the selected CallNode, and then look at the leaf frames of those stacks.
Glad you asked.
When we display a profile without any symbolication information, all we have is stacks and frames. At that point, we don't know which frames are from the same function. Only once the symbol information comes in we can actually start combining multiple frames into the same func, and multiple stacks into the same CallNode.
The takeaway here is: The symbolication process doesn't only give us strings, it can also completely change the shape of the call tree.
In perf.html, we want to be fully interactive before any symbol information has arrived, and stay interactive while it's arriving. That means that we need to handle collapsing of funcs gracefully without destroying any state, as far as that can be done. Here's how this works:
- When loading a Gecko profile, we naively create a func for every single frame.
- When a symbol table arrives:
- Every address that has a symbol is treated as the start of a function.
- For each symbol, we pick one func object, and combine the func objects that we created for all frames in that address range into that one func.
- We have a map,
oldFuncToNewFuncMap
, that records these collapsed funcs and the func they were collapsed into. - We have a few things in the redux state which refer to funcs, most notably the set of expanded tree nodes (expandedCallNodePaths) and the currently selected tree node (selectedCallNodePath). If any of those points to a func that was collapsed away, we replace that pointer with the func that it was collapsed into.
- This means that selected row in the tree can jump around. But it won't jump to an unrelated function; it will still be selecting the stack of the functions that you were looking at.