Fix zombie socket and random dropped logs when closing transport (TLS) #140
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
While trying to debug issues around my logs not being sent to the syslog, I noticed 2 things:
the socket keeps re-opening even after logger.end(). This is because the setupEvents() function attaches a 'close' listener to the socket that always re-opens it. This was breaking my AWS Lambda handlers, as AWS waits for the event loop to be empty. The event loop will never be empty while a socket is open, so I want all my sockets closed. (https://stackoverflow.com/questions/41621776/why-does-aws-lambda-function-always-time-out)
sometimes messages would randomly be dropped after a call to logger.end(). It seems that when using TLS, it is possible to have a message in the process of being written when socket.destroy is invoked, and then the message is lost. After some testing in my environment, it seemed that using socket.end() will preserve messages that socket.destroy() will discard, so I replaced destroy() with end(), and after some further testing I noticed no more lost log messages. I believe that this has to do with the socket being half-closed, and so ongoing messages can still be completed. (https://nodejs.org/api/net.html#net_socket_end_data_encoding_callback). I was having trouble reproducing this in a vaccum, it seems that the server needs to be under stress in order to observe this behavior.
I was able to find this semi-related stackoverflow, and the answer from Janith seemed to indicate the .end() over .destroy() is a best practice and illustrated why. (https://stackoverflow.com/questions/9191587/how-to-disconnect-from-tcp-socket-in-nodejs)
How I stressed my server:
I am sending my logs to Papertrail. My setup was using Netlify Functions and calling logger.end() after registering a listener to logger.on('finish'). This means that I was creating and end()ing the logger within a 500ms interval while also performing multiple async invocations of fetch() for my business logic, and expecting all the messages to be sent by the 'finish' event. This seemed to gum up the network enough to observe the issue.
if the tests fail, please forgive me as i dont have access to a unix machine and the tests seem to require one, so I was not able to execute them before submitting the PR.