Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix zombie socket and random dropped logs when closing transport (TLS) #140

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

askrzypczak
Copy link

@askrzypczak askrzypczak commented Jun 5, 2020

While trying to debug issues around my logs not being sent to the syslog, I noticed 2 things:

  1. the socket keeps re-opening even after logger.end(). This is because the setupEvents() function attaches a 'close' listener to the socket that always re-opens it. This was breaking my AWS Lambda handlers, as AWS waits for the event loop to be empty. The event loop will never be empty while a socket is open, so I want all my sockets closed. (https://stackoverflow.com/questions/41621776/why-does-aws-lambda-function-always-time-out)

  2. sometimes messages would randomly be dropped after a call to logger.end(). It seems that when using TLS, it is possible to have a message in the process of being written when socket.destroy is invoked, and then the message is lost. After some testing in my environment, it seemed that using socket.end() will preserve messages that socket.destroy() will discard, so I replaced destroy() with end(), and after some further testing I noticed no more lost log messages. I believe that this has to do with the socket being half-closed, and so ongoing messages can still be completed. (https://nodejs.org/api/net.html#net_socket_end_data_encoding_callback). I was having trouble reproducing this in a vaccum, it seems that the server needs to be under stress in order to observe this behavior.

I was able to find this semi-related stackoverflow, and the answer from Janith seemed to indicate the .end() over .destroy() is a best practice and illustrated why. (https://stackoverflow.com/questions/9191587/how-to-disconnect-from-tcp-socket-in-nodejs)

How I stressed my server:
I am sending my logs to Papertrail. My setup was using Netlify Functions and calling logger.end() after registering a listener to logger.on('finish'). This means that I was creating and end()ing the logger within a 500ms interval while also performing multiple async invocations of fetch() for my business logic, and expecting all the messages to be sent by the 'finish' event. This seemed to gum up the network enough to observe the issue.

if the tests fail, please forgive me as i dont have access to a unix machine and the tests seem to require one, so I was not able to execute them before submitting the PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants