-
Notifications
You must be signed in to change notification settings - Fork 137
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[REP-2014] Benchmarking performance in ROS 2 #364
Conversation
Signed-off-by: Víctor Mayoral Vilches <[email protected]>
Signed-off-by: Víctor Mayoral Vilches <[email protected]>
Signed-off-by: Víctor Mayoral Vilches <[email protected]>
Signed-off-by: Víctor Mayoral Vilches <[email protected]>
Signed-off-by: Víctor Mayoral Vilches <[email protected]>
Signed-off-by: Víctor Mayoral Vilches <[email protected]>
This pull request has been mentioned on ROS Discourse. There might be relevant details there: https://discourse.ros.org/t/ros-2-tsc-meeting-minutes-2022-9-15/27394/2 |
Signed-off-by: Víctor Mayoral Vilches <[email protected]>
This pull request has been mentioned on ROS Discourse. There might be relevant details there: https://discourse.ros.org/t/robotic-processing-unit-and-robotcore/27562/21 |
This pull request has been mentioned on ROS Discourse. There might be relevant details there: https://discourse.ros.org/t/rep-2014-rfc-benchmarking-performance-in-ros-2/27770/1 |
@christophebedard, @iluetkeb, @anup-pem, @dejanpan, @kylemarcey (and surely many others), this work is heavily inspired by your prior work. As pointed out in here, I'd love to have your views on this and wouldn't mind co-authoring it together. |
rep-2014.rst
Outdated
|
||
- `benchmarking`: a method of comparing the performance of various systems by running a common test. | ||
|
||
From these definitions, inherently one can determine that both benchmarking and tracing are connected in the sense that the test/benchmark will use a series of measurements for comparison. These measurements will come from tracing probes. In other words, tracing will collect data that will then be fed into a benchmark program for comparison. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Measurements don't have to come from tracing, right? The "black box performance testing" section above seems to say that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The definition of tracing that is given above is so broad that it encompasses any kind of procedure "used to understand what goes on in a running software system". I haven't seen such a broad definition before, usually, tracing is defined as logging (partial) execution information while the system is running. Common usage is that these probes try to be very low overhead and log only information available directly at the tracepoint. This more narrow definition of tracing would not encompass all kinds of performance data capture I could envision (for example, another common approach is to compute internal statistics and make them available in various ways).
Unless it is really intended to restrict data capture to tracing, which would need a justification, I would suggest using the more common, narrow definition of the term in the definition section, and then edit this paragraph to encompass diverse methods of capturing data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rep-2014.rst
Outdated
Methodology for benchmarking performance in ROS 2 | ||
================================================= | ||
|
||
In this REP, we **recommend adopting a grey-box and non-functional benchmarking approach** to measure performance and allow to evaluate ROS 2 individual nodes as well as complete computational graphs. To realize it in an architecture-neutral, representative, and reproducible manner, we also recommend using the Linux Tracing Toolkit next generation (`LTTng <https://lttng.org/>`_) through the `ros-tracing` project, which leverages probes already inserted in the ROS 2 core layers and tools to facilitate benchmarking ROS 2 abstractions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This section seems to be the most important part of the REP. I'd recommend moving this to the top. I also recommend removing any definitions not used here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree about the importance but struggle to see how this could help a non-experienced person (into the topic). This paragraph leverages prior definitions including what's benchmarking, grey-box and non-functional approaches to it. Without properly defining these things, understanding things properly might be very hard.
We somewhat bumped into such scenario at ros-navigation/navigation2#2788 wherein after a few interactions, even for experienced ROS developers, it became somewhat clear to me (due to the lack of familiary with this approach) that the value of benchmarking things with this approach wasn't being understood properly. Lots of unnecesary demands were set because the approach was simply not understood, to later take a much less rigorous approach to benchmark the nav stack. In my experience the topics introduced in here are hard to digest and require proper context.
Maybe a compromise would be to grab these bits and rewrite them without introducing too much terminology in the abstract of the REP? @sloretz feel free to make a suggestion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vmayoral really like this effort here - great work!
Co-authored-by: Ingo Lütkebohle <[email protected]> Co-authored-by: Shane Loretz <[email protected]> Signed-off-by: Víctor Mayoral Vilches <[email protected]>
Co-authored-by: Shane Loretz <[email protected]> Signed-off-by: Víctor Mayoral Vilches <[email protected]>
Co-authored-by: Ingo Lütkebohle <[email protected]>
Co-authored-by: Ingo Lütkebohle <[email protected]>
Co-authored-by: Ingo Lütkebohle <[email protected]>
Co-authored-by: Evan Flynn <[email protected]>
Co-authored-by: Evan Flynn <[email protected]>
Signed-off-by: Víctor Mayoral Vilches <[email protected]>
|
||
A quantitative approach | ||
----------------------- | ||
Performance must be studied with real examples and measurements on real robotic computations, rather than simply as a collection of definitions, designs and/or marketing actions. When creating benchmarks, prefer to use realistic data and situations rather than data designed to test one edge case. Follow the quantitative approach [1]_ when designing your architecture. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you elaborate on this comparison? It makes sense to study performance with real examples and measurements. Could you clarify what this is recommending against? What does it mean to study performance as a collection of definitions, designs, and/or marketing actions?
re-instrument | ||
|
||
|
||
1. instrument both the target ROS 2 abstraction/application using `LTTng <https://lttng.org/>`_. Refer to `ros2_tracing <https://github.com/ros2/ros2_tracing>`_ for tools, documentation and ROS 2 core layers tracepoints; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1. instrument both the target ROS 2 abstraction/application using `LTTng <https://lttng.org/>`_. Refer to `ros2_tracing <https://github.com/ros2/ros2_tracing>`_ for tools, documentation and ROS 2 core layers tracepoints; | |
1. instrument the target ROS 2 application using `LTTng <https://lttng.org/>`_. Refer to `ros2_tracing <https://github.com/ros2/ros2_tracing>`_ for tools, documentation and ROS 2 core layers tracepoints; |
Methodology for benchmarking performance in ROS 2 | ||
================================================= | ||
|
||
In this REP, we **recommend adopting a grey-box and non-functional benchmarking approach** to measure performance and allow to evaluate ROS 2 individual nodes as well as complete computational graphs. To realize it in an architecture-neutral, representative, and reproducible manner, we also recommend using the Linux Tracing Toolkit next generation (`LTTng <https://lttng.org/>`_) through the `ros2_tracing` project, which leverages probes already inserted in the ROS 2 core layers and tools to facilitate benchmarking ROS 2 abstractions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... benchmarking ROS 2 abstractions.
What does it mean to benchmark a ROS 2 abstractions? Should this instead be "applications" ?
This pull request has been mentioned on ROS Discourse. There might be relevant details there: https://discourse.ros.org/t/ros-2-tsc-meeting-minutes-2023-06-15/32038/1 |
First off, I'm sorry to be harsh, but I think you'd rather me be honest and concise than verbose but still communicating the same thoughts in 10x as many words of fluff. The first 178 lines of the REP are just background context, which is genuinely useful, but doesn't actually culminate in any prescriptive testing / evaluation procedures or test-stand setup standardization. There is no standard within this proposed standard. The best I see are 2 non-tangible recommendations without any specificity. The proposed REP points to a package, but a package is not a standard nor is it a proxy for a written set of instructions, explicit aims and processes, and requirements which might have been used to create a particular implementation. This is a real problem if you're proposing a standard to fit an implementation rather than writing an standard based on requirements and needs and then proposing an implementation alongside it that meets the standard. Personally, I would not approve this since the document contains no standard. Separately, I think more or less this sentence included in the REP explains why this probably shouldn't be an REP
If there is not even generally understood standards or benchmarking outlines for robotics, it seems like we're just making This feels better suited as a paper or something in literature. I applaud the effort, something like this to standardize how we take performance metrics of ROS based systems has great utility, but I think for a standard it needs to have more detail wrt the process and fulfilled requirements & be based on agreed upon best practices |
Thanks for the input @SteveMacenski. Processing this now. Let me take your feedback and try re-iterating the content and throw it back to you and others for review before the next meeting. |
I disagree with this input @SteveMacenski. As discussed during the last TSC meeting, it's relevant to understand that there're different types of REPs (see REP-0001). What you refer to is the Standards Track REPs. This is not one of them, and never meant to be. This has always been an Informational REP and content/structure follows various other existing/approved ones. Let me quote that for you:
Having remarked all of this, as pointed out above, I think there's value in your input. Thanks for that. Let me see what I can do about it to improve this REP PR. |
Ah I see, that is my mistake then. I was not aware of the different REP tracks 😄 Given that context, I think that this is sufficient, but some additional high-level description would aid in the value to a reader to describe a bit about what ros2_tracing is measuring / doing, in particular. But, that's more of a request for some useful context more than the standards-level formality I was thinking previously. I retract my previous hard-line position! |
|
||
The reader is referred to `ros2_tracing <https://github.com/ros2/ros2_tracing>`_ and `LTTng <https://lttng.org/>`_ for the tools that enable a grey-boxed performance benchmarking approach in line with what the ROS 2 stack has been using (ROS 2 common packages come instrumented with LTTng probes). In addition, [3]_ and [4]_ present comprehensive descriptions of the `ros2_tracing <https://github.com/ros2/ros2_tracing>`_ tools and the `LTTng <https://lttng.org/>`_ infrastructure. | ||
|
||
Reference implementations complying with the recommendations of this REP can be found in literature for applications like perception and mapping [5]_, hardware acceleration [6]_ [7]_ or self-driving mobility [8]_. A particular example of interest for the reader is the instrumentation of the `image_pipeline <https://github.com/ros-perception/image_pipeline/tree/humble/>`_ ROS 2 package [12]_, which is a set of nodes for processing image data in ROS 2. The `image_pipeline <https://github.com/ros-perception/image_pipeline/tree/humble/>`_ package has been instrumented with LTTng probes available in the ROS 2 `Humble` release, which results in various perception Components (e.g. `RectifyNode <https://github.com/ros-perception/image_pipeline/blob/ros2/image_proc/src/rectify.cpp#L82/>`_ *Component*) leveraging intrumentation which if enabled, can help trace the computational graph information flow of a ROS 2 application using such Component. The results of benchmarking the performance of `image_pipeline <https://github.com/ros-perception/image_pipeline/tree/humble/>`_ are available in [13]_ and launch scripts to both trace and analyze perception graphs available in [14]_. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest we make this section more explicit via a bulleted list. E.g. "Here are reference implementations for particular application domains. Developers in these domains are encouraged to format their results in a similar style."
Methodology for benchmarking performance in ROS 2 | ||
================================================= | ||
|
||
In this REP, we **recommend adopting a grey-box and non-functional benchmarking approach** to measure performance and allow to evaluate ROS 2 individual nodes as well as complete computational graphs. To realize it in an architecture-neutral, representative, and reproducible manner, we also recommend using the Linux Tracing Toolkit next generation (`LTTng <https://lttng.org/>`_) through the `ros2_tracing` project, which leverages probes already inserted in the ROS 2 core layers and tools to facilitate benchmarking ROS 2 abstractions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a lot of lead up to this section, which I think is important, but this is the "meat" of the rep. I think it might be helpful to make this section more self contained and explicitly state what is meant by gray-box
and non-functional.
Personally, I don't find these definitions particularly clear:
"Grey-box performance testing: More application-specific measure which is capable of watching internal states of the system and can measure (probe) certain points in the system, thus generate the measurement data with minimal interference. Requires to instrument the complete application."
Just to clarify, gray box here means the code is instrumented at compile time with an external tracer as opposed to say, just creating a time object and measuring the elapsed time?
Non-functional performance testing: The measurable performance of those aspects that don't belong to the system's functions. In the motion planning example, the latency the planner, the memory consumption, the CPU usage, etc.
Just to clarify this section, do we mean that tests should be "end-to-end", as in from the time from when a "function" is called until it returns? This would be in contrast to say measuring every sub-routine of a "function" and only evaluating the "meaty" part of the system.
Since this is a recommendation to the community, many of whom are not experts in this sub-domain, we need to be clear, plain-spoken, and direct with our recommendations. A smart undergrad should be able to read this REP and clearly understand how to approach profiling.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did a quick re-read before the TSC meeting. I left some feedback related to document clarity.
Abstract | ||
======== | ||
|
||
This REP describes some principles and guidelines for benchmarking performance in ROS 2. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you make this "This Informational REP" to make it clear this is not a Standard REP?
Also, can you add "This REP then provides recommendations for tools to use when benchmarking ROS 2, and reference material for prior work done using those tools."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's clearly specified at the top. See https://github.com/ros-infrastructure/rep/pull/364/files#diff-26ca1241a786a8a285a3dc326474f69a9b971c06eab2db73c0ddd53f8a74ae1cR5. Past work is detailed below.
+-----------------------------+ +-----------------------------+ | ||
|
||
|
||
Transparent Opaque |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You don't describe the difference between transparent and opaque. You should either add that or remove the diagrams.
- `performance_test <https://gitlab.com/ApexAI/performance_test/>`_: Tool is designed to measure inter and intra-process communications. Runs at least one publisher and at least one subscriber, each one in one independent thread or process and records different performance metrics. It also provides a way to generate a report with the results through a different package. | ||
- `reference_system <https://github.com/ros-realtime/reference-system/>`_: Tool designed to provide a framework for creating reference systems that can represent real-world distributed systems in order to more fairly compare various configurations of each system (e.g. measuring performance of different ROS 2 executors). It also provides a way to generate reports as well. | ||
- `ros2-performance <https://github.com/irobot-ros/ros2-performance/>`_: Another framework to evaluate ROS communications and inspired on `performance_test`. There's a decent rationale in the form of a proposal, a good evaluation of prior work and a well documented set of experiments. | ||
- `system_metrics_collector <https://github.com/ros-tooling/system_metrics_collector/>`_: A lightweight and *real-time* metrics collector for ROS 2. Automatically collects and aggregates *CPU* % used and *memory* % performance metrics used by both system and ROS 2 processes. Data is aggregated in order to provide constant time average, min, max, sample count, and standard deviation values for each collected metric. *Deprecated*. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would put the Deprecated comment at the beginning of the description to make it more visible.
Methodology for benchmarking performance in ROS 2 | ||
================================================= | ||
|
||
In this REP, we **recommend adopting a grey-box and non-functional benchmarking approach** to measure performance and allow to evaluate ROS 2 individual nodes as well as complete computational graphs. To realize it in an architecture-neutral, representative, and reproducible manner, we also recommend using the Linux Tracing Toolkit next generation (`LTTng <https://lttng.org/>`_) through the `ros2_tracing` project, which leverages probes already inserted in the ROS 2 core layers and tools to facilitate benchmarking ROS 2 abstractions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You state the recommendation without giving a summary statement of "why" you recommend it. example: We recommend using grey box testing because of reasons 1, 2 and 3.
This pull request has been mentioned on ROS Discourse. There might be relevant details there: https://discourse.ros.org/t/ros-2-tsc-meeting-minutes-2023-07-20/32525/1 |
This pull request has been mentioned on ROS Discourse. There might be relevant details there: https://discourse.ros.org/t/project-proposal-ros2-tool-to-collect-topic-metrics/32790/2 |
Hi Everyone, This REP was reviewed and voted on by the ROS 2 TSC in our August 2023 meeting.. The vote was in favor of rejecting REP-2014. We plan to merge REP-2014 with status "rejected" so the information is still available. The anonymous feedback from the TSC members was that the REP contains valuable information, but does not constitute a community standard. Some TSC members suggested REP-2014 would better serve the community as part of the ROS 2 Docs or as a stand-alone blog post. |
Signed-off-by: Chris Lalancette <[email protected]>
Based on the TSC decision, I've marked this REP as "Rejected", and will merge it as such. |
This pull request has been mentioned on ROS Discourse. There might be relevant details there: https://discourse.ros.org/t/ros-2-tsc-meeting-minutes-2023-08-17/33061/4 |
This pull request has been mentioned on ROS Discourse. There might be relevant details there: https://discourse.ros.org/t/ros-2-tsc-meeting-minutes-2023-08-17/33061/6 |
This pull request has been mentioned on ROS Discourse. There might be relevant details there: https://discourse.ros.org/t/ros-2-tsc-meeting-minutes-2023-08-17/33061/10 |
This pull request has been mentioned on ROS Discourse. There might be relevant details there: https://discourse.ros.org/t/ros-2-tsc-meeting-minutes-2023-08-17/33061/11 |
This pull request has been mentioned on ROS Discourse. There might be relevant details there: https://discourse.ros.org/t/ros-2-tsc-meeting-minutes-2023-08-17/33061/12 |
This pull request has been mentioned on ROS Discourse. There might be relevant details there: https://discourse.ros.org/t/ros-2-tsc-meeting-minutes-2023-08-17/33061/16 |
This pull request has been mentioned on ROS Discourse. There might be relevant details there: https://discourse.ros.org/t/ros-2-tsc-meeting-minutes-2023-08-17/33061/17 |
This pull request has been mentioned on ROS Discourse. There might be relevant details there: https://discourse.ros.org/t/ros-2-tsc-meeting-minutes-2023-08-17/33061/25 |
This pull request has been mentioned on ROS Discourse. There might be relevant details there: |
Signed-off-by: Víctor Mayoral Vilches [email protected]