-
Notifications
You must be signed in to change notification settings - Fork 81
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inclusion of a broadcast option to notify other processes of RAS events #24
base: master
Are you sure you want to change the base?
Inclusion of a broadcast option to notify other processes of RAS events #24
Conversation
A simple server implementation using UNIX domain sockets. The server is initialized on start-up upon a call to ras_server_start(). The initialization spawns a new thread to manage socket connections. The main thread broadcasts RAS events upon detection to client processes using ras_server_broadcast(). The broadcast uses a mutex to synchronize with the server thread and avoid reading wrong file descriptors. The Server can be closed with ras_server_stop(), which cancels the server thread.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Besides the comments for some parts of the code, it would be nice to also have at least an example code about how to use the report server feature. Perhaps you could create a "contrib/" dir and place a simple client to receive the messages there.
AC_DEFINE(HAVE_BROADCAST,1,"broadcast RAS events") | ||
AC_SUBST([WITH_BROADCAST]) | ||
]) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The name of this feature ("broadcast") doesn't look nice, as this term is more used for TCP/IP special messages that are sent to a range of machines.
The way this feature is mapped, it is actually adding an Unix socket for communication with other process(es).
@@ -0,0 +1,363 @@ | |||
#include <poll.h> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a license at the file.
While here, please also add a Signed-off-by to this patch.
(see https://www.kernel.org/doc/html/latest/process/submitting-patches.html?highlight=certificate%20origin#sign-your-work-the-developer-s-certificate-of-origin)
int ras_server_start(void) { | ||
struct sockaddr_un addr; | ||
|
||
server.socketfd = socket(AF_UNIX, SOCK_STREAM, 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe this could be made a little more generic, also allowing reporting issues to other machines via TCP/IP.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @mchehab, a remote report was one of my motivations for this functionality.
However, I chose to keep this out of RASdaemon as I thought it would not fit RASdaemon scope.
I envision that a local server feature is enough to enable a wide range of agents for handling the RAS event stream.
These agents can perform event pre-processing, reliability introspective analysis, and others without being tightly-coupled with RASdaemon.
In my opinion, the remote report feature is best suited for other software pieces that can determine their own protocol and/or the preferred system bus to do so.
What do you think?
if(server.socketfd == -1) { | ||
log(ALL, LOG_WARNING, "Can't create local socket for broadcasting\n"); | ||
goto create_err; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nitpick: Rasdaemon more or less follow Kernel coding style. We use:
foo(bar)
for functions, and:
if (foo)
(with a whitespace) for instructions like if, for, while, switch,...
ev->rwbs, | ||
ev->cmd); | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a little bit worried about maintainership with the above. As newer messages and types are added at rasdaemon, the above will become outdated with time. IMO, the better here would be to have those feature-specific functions inside the code where they are added, in a way that, if something changes there, such change will be reflected at the broadcasting code. Granted, ras-report.c has the same issue, but it is at least a single place where we have those things. Now that a new report daemon is added, we should unify, in order to make easier to maintain it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would a couple of functions to create strings from events be enough?
Something in the lines of:
// Serialize a RAS event field by field, separated by 'sep'
void ras_to_str(int type, void* ev, char sep, char* buf);
// Get the name of a RAS event type
const char* ras_event_name(int type);
Is an error message on the default clause of a switch statement enough to warn developers of the need to serialize new event types?
980e51e
to
4e1fffb
Compare
This patch includes the addition of a RAS server using connection-oriented local sockets for inter-process communication.
The broadcast feature allows RASdaemon to assist other applications to be fault-aware while retaining its original purpose.
This initial implementation does not define an intricate protocol for communication.
It is simple enough so that user-level applications can receive RAS events just by issuing a socket connection.
The RAS server is initialized through a new function, namely ras_server_start, if both conditions apply:
The initialization spawns a new thread to manage socket connections using poll.
Whenever RASdaemon detects a new event, a call to ras_server_broadcast is issued in the main thread.
This function will broadcast the events to the connected client sockets.
A mutex is used to synchronize both threads when writing to the file descriptors.
The server is closed with a call to ras_server_stop.
This will cancel the server thread and de-allocate the server structures.
A new call to ras_server_start should get the server back up again.
Signed-off-by: Alexandre de Limas Santana [email protected]