Skip to content

Commit

Permalink
sync: fix connection error on macOS
Browse files Browse the repository at this point in the history
With a large number of sync workers, the sync process may fail on
macOS due to connection errors. The root cause is that multiple
workers may attempt to connect to the multiprocessing manager server
at the same time when handling the first job. This can lead to
connection failures if there are too many pending connections, exceeding
the socket listening backlog.

Bug: 377538810
Change-Id: I1924d318d076ca3be61d75daa37bfa8d7dc23ed7
Reviewed-on: https://gerrit-review.googlesource.com/c/git-repo/+/441541
Tested-by: Josip Sokcevic <[email protected]>
Commit-Queue: Josip Sokcevic <[email protected]>
Reviewed-by: Josip Sokcevic <[email protected]>
  • Loading branch information
kcwu authored and LUCI committed Nov 6, 2024
1 parent aada468 commit ab2d321
Showing 1 changed file with 11 additions and 0 deletions.
11 changes: 11 additions & 0 deletions subcmds/sync.py
Original file line number Diff line number Diff line change
Expand Up @@ -821,6 +821,16 @@ def _GetSyncProgressMessage(self):
jobs = jobs_str(len(items))
return f"{jobs} | {elapsed_str(elapsed)} {earliest_proj}"

@classmethod
def InitWorker(cls):
# Force connect to the manager server now.
# This is good because workers are initialized one by one. Without this,
# multiple workers may connect to the manager when handling the first
# job at the same time. Then the connection may fail if too many
# connections are pending and execeeded the socket listening backlog,
# especially on MacOS.
len(cls.get_parallel_context()["sync_dict"])

def _Fetch(self, projects, opt, err_event, ssh_proxy, errors):
ret = True

Expand Down Expand Up @@ -913,6 +923,7 @@ def _ProcessResults(pool, pm, results_sets):
# idle while other workers still have more than one job in
# their chunk queue.
chunksize=1,
initializer=self.InitWorker,
)
finally:
sync_event.set()
Expand Down

0 comments on commit ab2d321

Please sign in to comment.