-
Notifications
You must be signed in to change notification settings - Fork 137
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC REP-155: Conventions, Topics, Interfaces for Perception in HRI #338
RFC REP-155: Conventions, Topics, Interfaces for Perception in HRI #338
Conversation
This pull request has been mentioned on ROS Discourse. There might be relevant details there: https://discourse.ros.org/t/new-rep-proposal-human-robot-interaction-in-ros-ros4hri/23776/6 |
0202ddc
to
aaf510b
Compare
This pull request has been mentioned on ROS Discourse. There might be relevant details there: https://discourse.ros.org/t/ros-news-for-the-week-of-1-21-2022/23940/1 |
Suggestions by Andres Ramirez-Duque received by email on the 24-Jan-2022:
In your idea, what would be the frame of reference for this pose?
Thanks for pointing that out! Initially, a 'gaze' TF frame was also specified, but for some reason, it seems it disappeared. I'll add it back. My idea was to expose a If you think a pose is better, can you quickly explain why?
Yes. In earlier version, more messages were timestamped. But I eventually removed the headers, as I was unsure what the timestamp would add. Initially, the purpose of the timestamp was to be able to match eg a face detection to its source image (image with the same timestamp), but it is not a great (!) way of tracking the source, so I finally decided to remove them. Happy to reconsider, though! |
Thanks for posting my message,
The frame of reference could be /world or /local_sensor_frame and could be useful in applications that use multiple fields of view (onboard and global) or for applications that use multiple sensors (e.x. IMU) and to apply fusion algorithms before publishing the TF
Totally agree
For the same reason as above, to apply fusion before publishing the TF
A possible reason would be to monitor the quality of communication between the main topics (e.x. /humans/persons) when working on multiple workstations |
This pull request has been mentioned on ROS Discourse. There might be relevant details there: https://discourse.ros.org/t/ros-news-for-the-week-of-january-23rd-2022/24073/1 |
I think it's worth erring on the side of 'make use of headers': time plays an important part of interaction, and it's better to have the ability to leverage that (even if some applications won't use it). |
Yes... I guess I'm a bit reluctant to create a bunch of new messages that are simply standard messages + headers. For instance, I initially had a
Any thoughts, @wjwwood @tfoote or others? But happy to add headers to the new types. Would you recommend suffixing them all with |
I don't have a strong preference, but my general guidance is this:
I'm sorry I don't have time to dig into your proposal in more detail right now, but hopefully my recommendations help a bit. |
a89a219
to
befbf07
Compare
This pull request has been mentioned on ROS Discourse. There might be relevant details there: https://discourse.ros.org/t/ros-2-tsc-meeting-minutes-2022-04-21/25293/1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One general comment, we prefer one sentence per line of text, and usually avoid manually wrapping lines by length, though that is allowed. Either way, new sentences should start on a new line. See:
Though that doc is for ROS 2 specifically, the reasons behind the requested style still stand.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just some random comments while reading.
rep-0155.rst
Outdated
*Anonymous persons* are treated like regular persons. They however publish a | ||
latched ``true`` boolean on their ``/anonymous`` subtopic, and their ID is not | ||
guaranteed to be permanent (it can in fact change/be removed at any point). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if this should be a subtopic or instead should be part of one of the other messages.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what other messages do you have in mind? (the rationale for using a subtopic here was to ensure that any node dealing with persons would transparently see 'anonymous' persons as well. But I guess this could be modeled differently...)
Based on suggestion by Andres Ramirez-Duque.
(it is assigned by a face/body/voice recogniser)
…s and face/body/voices
Anonymous persons are persons which are not yet identified, yet exist (because eg their face is detected -- but not yet recognised)
thanks, formatting fixed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've left a number of fairly minor comments inline.
Overall, this is a well-written document. However, the concern with it as a REP is that it seems to describe a particular implementation of an HRI system. It's unclear whether this is a good generic representation of HRI systems in general, and we (Open Robotics) aren't really in a position to make that determination.
To move this forward, our recommendation is to socialize this REP proposal with others in the HRI community. We know you've started this with https://discourse.ros.org/t/new-rep-proposal-human-robot-interaction-in-ros-ros4hri/23776/6 and this REP, but there hasn't been a lot of feedback to either. Encourage others in the HRI community to comment on, approve, or otherwise interact with this REP so we can gauge its wider applicability. If we get enough engagement from the wider HRI community, then we can move forward here (including eventually calling for a vote on it).
rep-0155.rst
Outdated
person, by publishing the ID of the other person on a special subtopic named | ||
``alias``. See section `Topics structure`_ for details. | ||
|
||
.. Note:: The reverse operation (spliting a person into two) can be realised |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.. Note:: The reverse operation (spliting a person into two) can be realised | |
.. Note:: The reverse operation (splitting a person into two) can be realised |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi - I am not very familiar with the ROS standardization process, but I went over the document linked here: REP-155-20220713.pdf and I do not see what @clalancette sees, namely that this is a particular HRI implementation. These seem to be useful and extremely generic conventions for any person-interactive robotic system.
On a more general note, I think that having conventions and data structures that HRI ROS folks agree on would greatly enhance adoption of standard libraries and tools that could save HRI researchers time and accelerate research.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In fairness to clancette and the OSRF folks, it is a particular HRI implementation (here)... the question is "is this a general enough toolset that should be made standard across all of ROS, or is it a standalone tool"? Generally standards like this would be more "here's what a message should look like", this standard is focused on what functionality will be available to a system: it's more of an attempt at standardizing an architecture.
Personally, while I do feel that it describes a specific implementation, I also feel that that implementation prioritizes being general and is therefore useful for a wide array of use-cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I said, I might not be looking in the right place, but the REP PDF linked above only (mostly?) describes what the messages should be, and not how they are implemented. Does the REP also include a specific reference to the implementation you linked to?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cst0 as per ROS process, REP must have a reference implementation -- that is what you found under https://github.com/ros4hri/
As pointed by @guyhoffman , the REP itself is design to be agnostic of a particular implementation -- of course, one might always object that a specification always shape implementations to some extend...
In any case, I'd be happy to try to address specific issue with the spec that over-constrain potential implementations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(as a side note, I'm already aware of a 2nd implementation of the spec that only reuse one or two nodes from the reference implementation)
When generated, the URDF models of the humans should be loaded on the ROS | ||
parameter server under ``/human_description_<bodyID>``. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is specific to ROS 1, so I'd change this to be a little more generic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some small changes from a partial review:
1. human representation, as a combination of a permanent identity (*person*) | ||
and transient parts that are intermittently detected (e.g. *face*, | ||
*skeleton*, *voice*); | ||
2. topic naming conventions under the ``/humans/`` topic namespace; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want this to be global (e.g., starting with leading /
)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd say yes -- not sure what would be the use-case for 'different' humans in different parts of the system, but happy to discuss that point/change it
I've been sort-of lurking on this thread and keeping an eye on how it goes (since I'm in the HRI research space). @clalancette made the point that it would be good to get more members of the HRI community involved: I have some thoughts I can share. My apologies in advance for a long comment. The short version of my positive notes would be that:
My implementation concerns:
I think I see why there's a unique subtopic per person: as the REP observes there needs to be a unique identifier per person. Each identifier is a likely a unique node, which again makes sense to me (different data sources, different goals). So those two things combined force you to split the identification results to unique topics which are grouped by person; a "Human.msg" that encapsulates all that data isn't ideal because that would require some centralized aggregator node. That's not terrible, but it's not ideal and would restrict more distributed systems, and so subtopics it is. There's a few problems the multi-subtopic approach presents, in my opinion. First, when is it acceptable to introduce or remove a new topic? Identification processes will have some uncertainty, so it's not uncommon for a recognizer to "flicker". Surely we don't want to bring up and take down a topic with every image message we mis-identify someone with, but then we don't want to leave them up indefinitely either. With each human, the RFC discusses as many as 9 persons subtopics, 4 audio subtopics, 6 bodies subtopics, and 8 faces subtopics, with room for more if more identifiers are presented. So are we potentially having implementations where we're dynamically constructing 25+ new topics per person? I don't love that in settings where I know there are only a handful of people being recognized, and I predict it being prohibitively difficult to interact with in (for example) the mall setting where a single time step may have 30+ people. Each of those is another publisher and subscriber to spin up, and will also present developer-side difficulties in introspecting on node/graph behavior. Developer-side difficulty presents a fairly major hurdle to adoption in my view. The good news is, I think both these concerns could be resolved by introducing a "human header" message and grouping topics into streams of per-person messages. As a result, I think the current approach could be adapted to have a topic per identifier, where the topic is responsible for publishing a stream of observations that are "tagged" by person. For example, the current approach of
could become:
Per @wjwwood's earlier advice on the Header message, If an identifier can't provide information about what is happening due to time, uncertainty, or it being out of view, it doesn't provide it. An identifier can also be multiple nodes from multiple sources publishing on the one topic. It's then up to the subscriber to do what they want with that observation (or that a Bob hasn't been seen for a while). This approach does have the problem that a node might only care about Alice, and now it's getting a firehose of data about Alice, Bob, and whoever else... But we do see this come up with TF (where I might only care about map->base_link but to get that I receive all of Anyway, those are my main thoughts. My apologies again for a long comment, but hopefully that's helpful. |
Co-authored-by: Audrow Nash <[email protected]>
Thanks @cst0 for the insightful comment. I like your suggestion of a 'human header' and I will definitely give it an in-depth look. Some additional comments below (and a first general terminology point: the REP calls identifier the unique IDs associated to a person/voice/face/body/etc -- each of these IDs correspond to namespace with its own subtopics. Those subtopics could be called attributes of the identifiers, but not identifiers themselves).
Yes, this summarizes well the rationale for identifiers' namespaces with subtopics.
This is a question that is in general highly dependent on the application domain: for instance, if your application is tracking people in a mall, you might only care about the tf frame of the detected faces and probably do not need eg facial landmarks or face identification. That's why the REP only specify very few mandatory subtopics for each identifiers. The other ones are primarily listed in the REP in order to set the topic naming convention (and that list could easily be extended based on what additional social signals the community want to use). So to answer your question, it is always acceptable to introduce or remove subtopics (as long as the few mandatory ones are published)
The (multi-modal) identification process is specified to be handled in a distinct way in the REP: probabilistic 'candidate' match between faces/bodies/voices and persons are published on the /humans/candidate_matches topic by dedicated identification nodes, and up to a 'person fusion' node to effectively perform the association, accounting for uncertainty. See https://github.com/ros4hri/hri_person_manager for an example implementation.
I was not a big fan of it either at first. But in practice, it works better than what we feared. Especially because libhri (https://github.com/ros4hri/libhri) abstracts/hides away all the handling of the topics (similar to libtf with the /tf topic). The main downside is the output of
I like a lot the idea of a human header. One major difficulty, though: we do not always have access to the ID of the person (either because the application does not need person identification, or because a face is detected, but not yet identified). So the semantic of the 'person' field of the header would need to be carefully designed. |
@clalancette thanks a lot for the thorough proof-reading. Suggestions all accepted. I need to read more about ROS2 parameters to better understand how to change them. I'll work on that next. |
Co-authored-by: Chris Lalancette <[email protected]>
I would like to thank @severin-lemaignan for introducing this very well written, detailed document that represents the ROS4HRI system very well. I am one of the authors of the ROS for human robot interaction paper and co-developer of the system. to address @cst0 and @clalancette point of involving more people from the HRI community. In the HRI lab here at KTH sweden, more and more people are discussing the integration of the ROS4HRI pipeline in their systems. As it has been a challenge to robustly create systems that react to social situations in an appropriate manner. The system usage will be to react to human frustration. In addition, many other colleagues are expressing their interest in using an already standardized system. So, the system is gaining a bit of traction in the HRI community. |
thank you for this initiative, |
Such a nice work! We definitely need a HRI standard in ROS!🦾 |
Great work! it would be very useful have a HRI standard in ROS! |
Well done, we need a standard for HRI in ROS!!! |
Thank you for this work. I think the HRI standard will be really helpful for a lot of people in our niche. Thanks! |
@cst0 : I've explored a bit more the idea of the 'HRI header', and I think it can indeed work well. An amended version of the REP is here: severin-lemaignan@c7e4134 It relies on new/additional hri_msgs that can be found here: https://github.com/ros4hri/hri_msgs/tree/hri_headers/msg The main difficulties I can think of are:
But overall, a big improvement/simplification. So I'm in favour of changing the design to follow your idea. |
In the current standard groups are published under the /humans/interactions/groups topic with Group ID + list of person IDs. But we might need more information about groups. Thus, I suggest to use for groups the same structure as for persons, bodies, faces, and voices. Groups: The list of currently detected groups (list of group IDs) is published under /humans/groups/tracked (as a hri_msgs/IdsList message).
|
@ChrisReinke sounds like a good idea. Thanks for the contribution! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We've discussed this REP both internally in Open Robotics and in the ROS 2 TSC.
The consensus is that this is a well-written REP, and though we have some performance concerns about having so many topics in the system, there are also benefits to that approach (mainly in introspection).
With that said, this REP is approved as a Draft, and will be merged into the repository. @severin-lemaignan the next steps here are to get further consensus from the HRI community to eventually attempt to make this an Accepted REP.
Thanks for a well-written article and bearing with the long time delay here.
Great news! Thanks @clalancette :-) I've created a new PR for the proposed 'HRI headers' idea by @cst0 : @clalancette, would that address some of your concerns regarding performance? |
This pull request has been mentioned on ROS Discourse. There might be relevant details there: https://discourse.ros.org/t/ros-2-tsc-meeting-minutes-8-18-2022/27050/1 |
Following announcement on discourse, I open a PR for a new REP 155, aiming at standardizing perception for human-robot interaction in ROS.
The REP is relatively large by ROS standards. It is based on the IROS paper 'ROS for Human-Robot Interaction' (non-paywalled version) with several significant updates based on experience learned during initial implementations.
PDF printouts of the REP:
Feedback welcome!