-
Notifications
You must be signed in to change notification settings - Fork 125
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use xctrace
on macOS
#246
Comments
Yeah, what you'd want to do is implement a "collapser" for the xctrace output format (see the existing collapsers). Once that's in place, you should be able to take an output file from xctrace and feed it through |
Progress so farThe raw file is undocumented and as of Xcode >=11 there is no OSS that can parse it. However it Here is the different schema that can be accessed after # test.xml
<table schema="tick" frequency="10"/>
<table schema="life-cycle-period" target-pid="SINGLE"/>
<table schema="tick" frequency="1"/>
<table schema="device-thermal-state-intervals"/>
<table schema="os-log" category="PointsOfInterest"/>
<table subsystem=""com.apple.ConditionInducer.LowSeverity"" schema="os-signpost" category="InduceCondition"/>
<table schema="os-signpost" category="PointsOfInterest"/>
<table target-pid="SINGLE" exclude-os-logs="0" schema="global-poi-layout" signpost-code-map="" colorize-by-arg4="0"/>
<table target-pid="SINGLE" context-switch-sampling="0" high-frequency-sampling="0" schema="time-profile" needs-kernel-callstack="0" record-waiting-threads="0"/>
<table target-pid="SINGLE" schema="kdebug-signpost" signpost-code-map=""/>
<table codes=""0x1f,0x05" "0x1f,0x07" "0x1f,0x08"" schema="kdebug-strings"/>
<table schema="dyld-library-load" target-pid="SINGLE"/>
<table schema="os-log-arg" category="PointsOfInterest"/>
<table codes=""0x2b,0xdc"" schema="kdebug" target="SINGLE"/>
<table target-pid="SINGLE" kdebug-match-rule="0" exclude-os-logs="0" schema="region-of-interest" signpost-code-map=""/>
<table codes=""0x1,0x25"" schema="kdebug" target="SINGLE"/>
<table target="SINGLE" schema="kdebug" codes=""0x2b,0x87""/>
<table codes=""0x2b,0x65"" schema="kdebug"/>
<table codes=""0x1f,0x7"" schema="kdebug" target="SINGLE"/>
<table target="SINGLE" schema="kdebug" codes=""0x2b,0xd8""/>
<table codes=""0x2d,*"" schema="kdebug" target="SINGLE" callstack="user"/>
<table schema="developer-thread-name-update"/>
<table target="SINGLE" schema="kdebug" codes=""0x31,0xca""/>
<table codes=""46,2"" schema="kdebug" target="SINGLE" callstack="user"/>
<table schema="app-spin" target-pid="SINGLE"/>
<table codes=""0x1,0xa"" schema="kdebug" target="SINGLE" callstack="user"/>
<table codes=""0x07,0x00" "0x1f,0x05" "0x1f,0x07" "0x1f,0x08"" schema="kdebug" target="SINGLE" callstack="user"/>
<table target-pid="SINGLE" kdebug-match-rule="0" schema="global-roi-layout" signpost-code-map="" colorize-by-arg4="0"/>
<table codes=""0x21,0xa"" schema="kdebug" target="SINGLE" callstack="user"/>
<table schema="gcd-perf-event" target-pid="SINGLE"/>
<table schema="thread-name" target-pid="SINGLE"/>
<table target-pid="SINGLE" kdebug-match-rule="0" exclude-os-logs="0" schema="roi-metadata" signpost-code-map=""/>
<table sample-rate-micro-seconds="1000" callstack="user" schema="time-sample" target="SINGLE" all-thread-states="NO"/>
<table schema="os-signpost-arg" category="PointsOfInterest"/> I think we'd be interested by the <row><sample-time id="22" fmt="00:00.039.455">39455541</sample-time><thread ref="2"/><process ref="4"/><core ref="14"/><thread-state ref="8"/><weight ref="9"/><backtrace id="23" fmt="arrayvec::arrayvec::ArrayVec$LT$T$C$_$GT$::retain::h202491b79ad35b30 ← (11 other frames)"><process ref="4"/><text-addresses id="24" fmt="frag 1732">4331406713 4331406716</text-addresses><process ref="4"/><text-addresses id="25" fmt="frag 1733">4331403404 4331396176 4331396516 4331396516 4331396992 4331396636 4331396660 4331486836 4331398140 4336799884</text-addresses></backtrace></row>
<row><sample-time id="26" fmt="00:00.040.455">40455541</sample-time><thread ref="2"/><process ref="4"/><core ref="14"/><thread-state ref="8"/><weight ref="9"/><backtrace id="27" fmt="arrayvec::arrayvec::ArrayVec$LT$T$C$_$GT$::retain::h202491b79ad35b30 ← (11 other frames)"><process ref="4"/><text-addresses id="28" fmt="frag 1735">4331406669 4331406716</text-addresses><process ref="4"/><text-addresses ref="25"/></backtrace></row>
<row><sample-time id="29" fmt="00:00.041.455">41455541</sample-time><thread ref="2"/><process ref="4"/><core ref="14"/><thread-state ref="8"/><weight ref="9"/><backtrace id="30" fmt="shakmaty::attacks::attacks::ha1bbcd54d01602de ← (13 other frames)"><process ref="4"/><text-addresses id="31" fmt="frag 1737">4331408097 4331404772</text-addresses><process ref="4"/><text-addresses id="32" fmt="frag 1738">4331404284 4331406716 4331403404 4331396176 4331396516 4331396516 4331396992 4331396636 4331396660 4331486836 4331398140 4336799884</text-addresses></backtrace></row>
<row><sample-time id="33" fmt="00:00.042.455">42455541</sample-time><thread ref="2"/><process ref="4"/><core ref="14"/><thread-state ref="8"/><weight ref="9"/><backtrace id="34" fmt="retroboard::retroboard::RetroBoard::pseudo_legal_unmoves::h01fcdf6bff18987a ← (11 other frames)"><process ref="4"/><text-addresses id="35" fmt="frag 1740">4331400581 4331400208</text-addresses><process ref="4"/><text-addresses ref="21"/></backtrace></row> This is where I'm not sure on how to continue to parse it and reconstruct the tree, as not very familiar with low-level CPU record, and there is little documentation. I think that |
Hmm, interesting. It's not clear that there's quite enough information in there to construct a flamegraph since it chops off the other frames in the backtrace (or at least it looks that way). Maybe there's enough context elsewhere to reconstruct the full call stack the way flamegraph requires though. In general, the goal is be able to out the total number of samples spent in each call stack leaf (so ~= function, but taking where that function was called from into account) not including its children, and to print that out along with the call stack that led to that function call. So, for example, if I'm afraid I know basically nothing about the |
For what it's worth, I spent some time looking at the exported xml files. They make heavy use of references to nodes in the XML document. Some samples reference complete backtraces. Other samples split the backtrace into two parts, a unique portion (I think) and a shared portion. Here's an example of that. <backtrace id="182" fmt="first ← (3 other frames)">
<process ref="141"/>
<text-addresses id="183" fmt="frag 1862">4309499693 4309499756</text-addresses>
<process ref="141"/>
<text-addresses ref="180"/>
</backtrace> Note the <backtrace id="178" fmt="first ← (3 other frames)">
<process ref="141"/>
<text-addresses id="179" fmt="frag 1859">4309499697 4309499756</text-addresses>
<process ref="141"/>
<text-addresses id="180" fmt="frag 1860">4309499756 4310323340</text-addresses>
</backtrace> So I think the backtrace consists of these addresses: I don't know why there's a duplicated address (in bold above). In my experiments, there is usually—but not always—a duplicated address. Note that these addresses are subject to address space layout randomization. I wasn't able to find the dynamic image base addresses in the exported files. I considered using function name ( For example, this test binary has the following symbols. $ nm a.out
0000000100000000 T __mh_execute_header
0000000100008000 S _dummy
0000000100003ed0 T _first
0000000100003d48 T _fourth
0000000100003f58 T _main
U _puts
0000000100003e48 T _second
0000000100003dc0 T _third The difference between the first text address and the address of We can subtract that value from each of the addresses in the backtrace.
The second and third are duplicates as mentioned above and the fourth is wrong because it's not actually from
The You might notice that the first address is odd. I have no idea what to make of that!
Maybe it refers to address 0x100003f2c. I'm not sure. Here's the initial disassembly of
The address 0x100003f6c makes sense. It's the address of the instruction following the I stopped looking at this when I first saw the odd address a week or so ago hence this is the end of my analysis. Open questions:
|
Hey, Thanks for your investigations, on my end also tried to reconstruct the backtrace from the addresses but failed. I was under the assumption that several text-addresses meant multiple threads.
A month ago I did create a post about it on the apple forums which instructed me to file a report feedback, and I haven’t gotten any update since :| Maybe bumping the thread will be on interest for those who have an apple dev account. |
There's some existing parsing code for this format in speedscope and in a Firefox Profiler PR, but neither works with the format from recent Xcode versions. Anyone interested in implementing this may want to consult firefox-devtools/profiler#3684 (comment) and follow various links from there. |
Unfortunately all links there are outdated. I feel upvoting the question on apple forums is our best bet. Maybe asking on SO could be beneficial too. |
I have created a SO question hoping for someone more knowledgable about |
I just stumbled upon this thread. I’ve built a tool that can convert Instruments into Gecko format for Firefox Profiler. It can read the samples from XCTrace, extract load address for all loaded libraries from the KDebug tracepoints and desymbolicate using that. I’m going to see if I can open source this, then you could use that as a frame of reference (its written in Kotlin). My use case was that I wanted interactive flame graphs and more importantly time-ordered sample view. In the meantime, here is a run down of how this works... BackgroundFirst a quick primer on the XCTrace output format. XCTrace tries to avoid redundancies in the output by using reference nodes. Each node with a given tag, set of attributes and value is given a unique ID for that tag . Future nodes that use the same tag, attributes and values will include a <kdebug-class id="1" fmt="0x1">1</kdebug-class>
<!-- Equivalent to <kdebug-class id="1" fmt="0x1">1</kdebug-class> -->
<kdebug-class ref="1 /> Documentation on the set of tables available and their formats can be found by opening up a trace file in Instruments and then going to Secondly, the XCTrace file contains For example, we can find the launch executable kdebug event which contains the loadAddress of main application launchTraceID = dyld3::kdebug_trace_dyld_duration_start(DBG_DYLD_TIMING_LAUNCH_EXECUTABLE, (uint64_t)state.config.process.mainExecutable, 0, 0); This event corresponds to
Note: As shown below, we'll actually want to use a different tracepoint but this one is simpler to understand SummaryAt a high-level, we need to extract the backtraces from the XCTrace format and secondly find the load addresses for every library loaded by dylidb using the kdebug event info. 1. Extract backtraces
Find
Note, the backtrace node 3. Find load addresses We can get library name and it's load address using
Find the relevant From their get the relevant arguments from each. The 2nd arg of Given that string ID, go and find the following entry
e.g.
4. De-symbolicate Given the backtraces and addresses, de-symbolicate using a tool like # Replace `16.1` with your device / arch
# Library name comes from kdebug strings above
~/Library/Developer/Xcode/iOS\ DeviceSupport/16.1\ \(20B82\)\ arm64e/Symbols/<libraryName> # Run the list of addresses through `atos` for every dSYM file
# Unfortunately, this is really slow. Maybe there is a better way. It takes ~1m30s for ~800 dSYMs on my M1 macbook even when parallelized
$ atos -arch arm64e -o <dSymFile> -l <loadAddress> -f <file with all addresses> Running this for each dSym will give you a list of partially de-obfuscated symbols. Merge the lists together (preserve the lines that don't start with With that, you can take the backtraces and the mapping and transform into some other format. |
This sounds excellent. It would be amazing if you could open source your code. I'd love to integrate a rust version of it into samply, which can perform fast symbolication and also allows the source view in the Firefox profiler to work. |
Here is repo with logic to convert Instruments to Gecko: https://github.com/benjaminRomano/instruments-to-gecko |
Thank you very much for sharing your findings and code @benjaminRomano! Quick question, I see that you parse the backtraces then de-symbolicate. I think that for inferno it’d would make better sense to get as input de-symbolicated backtraces. The preprocessing would be left to the user. Do you think that would cause issues? Can we just de-symbolicate beforehand with atmos and replace addresses in the backtrace file without corrupting it? |
So the If we can figure that out it greatly simplifies the conversion and it also avoids one of the issues I ran into where the KDebug tracepoints with the load addresses only get emitted on process creation. |
@kraktus XCode 14.3 Beta updated |
Very nice! Apple recently came around my feedback assistant request with the same answer, but I wasn’t able to install the beta version due to lack of developer status at the time. |
This is a crosspost from the flamegraph repo: flamegraph-rs/flamegraph#192
Although not familiar with the codebase I'd be ready to give it a go, but would it falls into the scope of
inferno
or do you not want to support an additional profiler?The text was updated successfully, but these errors were encountered: