Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement a new BigQuery sink #549

Open
tneymanov opened this issue Feb 12, 2020 · 1 comment
Open

Implement a new BigQuery sink #549

tneymanov opened this issue Feb 12, 2020 · 1 comment
Assignees

Comments

@tneymanov
Copy link
Collaborator

Issue #499 is currently on halt, since

  • v2 schema requires splitting writing stage into (for human genome) into 23 steps to create a
  • new sink introduces 22 nodes per write stage.

This leads to pipeline graph to exceed current limitation of 10MB and returns errors out.

After conversation with Beam team we need to wait for several prereqs before we proceed:

  • Apache beam to land [BEAM-9291] Upload graph option in dataflow's python sdk apache/beam#10829 PR that could extend the limit to 100MB
  • Use --experiments=upload_graph to upload the graph through staging GS location (which means that staging directory will be a prerequisite henceforth).
  • Wait for apache beam release 2.21.0 at least, since new BQ sink is broken in releases 2.19.0 and 2.20.0.
@tneymanov tneymanov self-assigned this Feb 12, 2020
@samanvp
Copy link
Member

samanvp commented Feb 12, 2020

PR #499 will be reconsidered when these issues are all resolved.

Thanks @tneymanov for documenting this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants