Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CPU spikes of existing nodes when starting new node #741

Open
tonynajjar opened this issue Feb 16, 2024 · 18 comments
Open

CPU spikes of existing nodes when starting new node #741

tonynajjar opened this issue Feb 16, 2024 · 18 comments

Comments

@tonynajjar
Copy link

tonynajjar commented Feb 16, 2024

Bug report

Required Info:

  • Operating System:

    • Ubuntu 22.04
  • Installation type:

    • binaries
  • Version or commit hash:

    • Humble
    • ros-humble-fastrtps/now 2.6.7-1jammy.20240125.204216 amd64 [installed,local]
    • ros-humble-rmw-fastrtps-cpp/now 6.2.6-1jammy.20240125.215950 amd64 [installed,local]
  • DDS implementation:

    • FastDDS

Steps to reproduce issue

  1. Use default XML configuration (FASTRTPS_DEFAULT_PROFILES_FILE not set)
  2. I have my robot bringup running consuming a decent amount of CPU with about 70 nodes (across several docker containers sharing network and ipc with the host if that is relevant)
  3. Launch some node on the side (e.g. teleop, ros2 topic echo, etc...)
  4. Witness a CPU spike

Expected behavior

No considerable CPU spike for the existant nodes

Actual behavior

CPU spikes for a few seconds for all the nodes to about double their consumption! I'm guessing it has to do with discovery?

Additional information

I quickly tried with Cyclone and did not witness the CPU spike but I would like to fix it with Fastdds if possible (otherwise will have to switch)

@tonynajjar tonynajjar changed the title CPU spikes when starting new node CPU spikes of existing nodes when starting new node Feb 16, 2024
@tonynajjar
Copy link
Author

tonynajjar commented Feb 16, 2024

With some experimentation I also noticed that the higher the number of existing nodes, the higher the CPU rise is when an extra node is added to the network

@fujitatomoya
Copy link
Collaborator

@tonynajjar thanks for creating issue. we have been meeting the similar situation...

a couple of things,

  • Initial Announcements can be related to the CPU usage spike during the discovery process. Depends on network resource and reliability, and also requirement for discovery latency, but this could mitigate the CPU usage spike during discovery initial state? (i believe this setting will be also applied to Endpoint Discovery.)
  • Do you guys happen to use ROS 2 Security Enclaves? Enabling security brings more work like handshaking during discovery process.

I am not sure if you can use ROS 2 Fast-DDS Discovery Server since it changes the architecture, either acceptable or not this will reduce the discovery cost significantly.

@fujitatomoya
Copy link
Collaborator

CC: @MiguelCompany @EduPonz

@tonynajjar
Copy link
Author

tonynajjar commented Feb 17, 2024

Thanks for your answer @fujitatomoya.

  • We do not use security enclaves.

  • I did quickly try to make the Discovery server work to confirm it was a discovery issue but failed to do so for some reason; maybe because of my docker setup, not sure. But anyway on the long run I'd like to avoid using the discovery server (no strong reason but it feels like going back to the ROS1 centralized approach which was criticized and changed in ROS2)

  • Regarding the Initial Announcements, are you proposing testing out something with the config? I'm really not a DDS configuration expert (as most roboticists) so you have to spell it out for me 😅

@tonynajjar
Copy link
Author

tonynajjar commented Feb 17, 2024

Depends on network resource and reliability, and also requirement for discovery latency

Your comment reminded me to clarify that all the nodes are running on one machine so I guess the issue can't be caused by a suboptimal network.

@fujitatomoya
Copy link
Collaborator

Regarding the Initial Announcements, are you proposing testing out something with the config?

i think you can create DEFAULT_FASTRTPS_PROFILES.xml in the running directory where you issue ros2 run xxx, and it should be loaded to the Fast-DDS. (initial announcement count is 1 from 5 and period is changed into 500 msec from 100 msec below.)

<participant profile_name="participant_profile_simple_discovery">
    <rtps>
        <builtin>
            <discovery_config>
                <initialAnnouncements>
                    <count>1</count>
                    <period>
                        <nanosec>500000000</nanosec>
                    </period>
                </initialAnnouncements>
            </discovery_config>
        </builtin>
    </rtps>
</participant>

my expectation here is,

  • with existed 70 ROS 2 context (70 Participants), new ROS 2 node (context) will send initial discovery packet 5 times with 100 msec periods in default. and each of 70 receivers get these packets then sends back the own participant's information every time. this could generate the CPU usage spike. (reliable and good latency discovery, but expensive?)
  • if we have all nodes in localhost, network is reliable enough. so we could just send a single shot initial discovery for each participant for initial announcement?

anyway, i would like to have the opinion from eProsima.
hopefully this helps,

@EduPonz
Copy link

EduPonz commented Feb 18, 2024

Thanks @fujitatomoya, this is indeed what I would have suggested to try out as well. Please @tonynajjar do let us know how it goes.

@tonynajjar
Copy link
Author

Thank you for your recommendation. Unfortunately it did not work. All the nodes in my localhost network are running this configuration:

<?xml version="1.0" encoding="UTF-8" ?>
<profiles xmlns="http://www.eprosima.com/XMLSchemas/fastRTPS_Profiles">
    <participant profile_name="participant_profile_simple_discovery" is_default_profile="true">
        <rtps>
            <builtin>
                <discovery_config>
                    <initialAnnouncements>
                        <count>1</count>
                        <period>
                            <nanosec>500000000</nanosec>
                        </period>
                    </initialAnnouncements>
                </discovery_config>
            </builtin>
        </rtps>
    </participant>
</profiles>

I still get the CPU spike

@fujitatomoya
Copy link
Collaborator

@tonynajjar i am curious, what command did you use for this verification? e.g ros2 topic xxx without daemon starting?

@tonynajjar
Copy link
Author

@tonynajjar i am curious, what command did you use for this verification? e.g ros2 topic xxx without daemon starting?

I just started some custom teleop node. But I think 'ros2 topic echo xxx' would also cause the spike; it has in the past

@tonynajjar
Copy link
Author

tonynajjar commented Feb 26, 2024

Any alternative solutions I could try? Could someone of the maintainers try to reproduce this so that we at least know for sure that this is not a local/configuration issue? If we can confirm this, I think this bug deserves some high-prio attention, as for applications already reaching the limits of CPU consumption, this bug would be a deal breaker for using fastdds

@fujitatomoya
Copy link
Collaborator

@tonynajjar
CC: @EduPonz

I still get the CPU spike

i think there is still spike after the configuration is applied, but expecting spike period should be mitigated and CPU consumption comes down quicker than before? if you are seeing the no difference, maybe configuration is not applied. make sure that DEFAULT_FASTRTPS_PROFILES.xml in the running directory where you issue ros2 run xxx.

something else i would try is to disable the shared memory transport.
our experience tells that shared memory transport provides good performance and latency, but uses more CPU resources in the application. if shared memory transport is disabled, it takes advantage of the network interface resource.

<?xml version="1.0" encoding="UTF-8" ?>
<profiles xmlns="http://www.eprosima.com/XMLSchemas/fastRTPS_Profiles">
    <transport_descriptors>
        <transport_descriptor>
            <transport_id>udp_transport</transport_id>
            <type>UDPv4</type>
        </transport_descriptor>
    </transport_descriptors>

    <participant profile_name="UDPParticipant">
        <rtps>
            <userTransports>
                <transport_id>udp_transport</transport_id>
            </userTransports>
            <useBuiltinTransports>false</useBuiltinTransports>
        </rtps>
    </participant>
</profiles>

if anything above does not work, that is out of my league...

@tonynajjar
Copy link
Author

Thank you for your answer. I'm pretty sure that the configuration was applied; I made sure by making a typo and seeing errors when I launch the nodes.
I didn't really see much difference, maybe I didn't look in great detail but even if the spike goes away quicker than before, having it in the first place is not really acceptable for my application.

Regarding disabling Shared Memory, I think I tried that already but I can't remember for sure; I'll give it another shot in the next few days.

I'd appreciate if someone could try reproducing it. I'll try to create a minimal reproducible launch file, e.g. launching 40 talkers and 40 listeners.

@tonynajjar
Copy link
Author

tonynajjar commented Mar 6, 2024

from launch import LaunchDescription
from launch_ros.actions import Node

def generate_launch_description():
    # Initialize an empty list to hold all the nodes
    nodes = []

    # Define the number of talkers and listeners
    num = 40

    # Create talker nodes
    for i in range(num):
        talker_node = Node(
            package='demo_nodes_cpp',
            executable='talker',
            namespace='talker_' + str(i),  # Use namespace to avoid conflicts
            name='talker_' + str(i)
        )
        nodes.append(talker_node)

    # Create listener nodes
    for i in range(num):
        listener_node = Node(
            package='demo_nodes_cpp',
            executable='listener',
            namespace='listener_' + str(i),  # Use namespace to avoid conflicts
            name='listener_' + str(i),
        remappings=[
            (f"/listener_{str(i)}/chatter", f"/talker_{str(i)}/chatter"),
        ],
        )
        nodes.append(listener_node)

    # Create the launch description with all the nodes
    return LaunchDescription(nodes)

Here is a launch file for you to reproduce the issue. After this is launched, run ros2 run demo_nodes_cpp listener in another terminal and see with htop that the CPU of all nodes get multiplied by 2-3.
Because the initial CPU usage of these nodes is not so big, the CPU jump is not so noticeable but from what I tested earlier, this scales when the initial CPU usage is already high.

@tonynajjar
Copy link
Author

tonynajjar commented Apr 22, 2024

@fujitatomoya or @EduPonz were you able to reproduce the issue with the example I provided? It would be already useful if I can confirm whether or not this is a bug or suboptimal configuration from my side

@fujitatomoya
Copy link
Collaborator

@tonynajjar sorry for being late to get back to you. we have know this situation, i did not use your example, but having more than 100 nodes generates the CPU spike for a few seconds. as we already know, this is because of the participant discovery.

i am not sure any other configuration would work to mitigate this transient CPU load...

@bochen87
Copy link

bochen87 commented Oct 2, 2024

We have the same issue. We use SHM since we have components in a container that are exchanging large pointcloud data, it seems to perform better and more efficient with that. However, if we launch other nodes later on, for example debug tools, UI, etc. it does cause huge CPU spikes and causes timings to go off / heartbeats to die and software to go into error due to this. would be good to have some solution for it.

@Mario-DL
Copy link
Contributor

Mario-DL commented Oct 8, 2024

Hi @tonynajjar,

We were wondering if the CPU usage spike could be related to the fact of being synchronously waiting for sockets to send the data buffers. Would it be possible for you to test with the following configuration ?

Thanks in advance

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants