New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Implement a new BigQuery sink #549

Open

tneymanov opened this issue Feb 12, 2020 · 1 comment

Assignees

Collaborator

tneymanov commented Feb 12, 2020

Issue #499 is currently on halt, since

v2 schema requires splitting writing stage into (for human genome) into 23 steps to create a
new sink introduces 22 nodes per write stage.

This leads to pipeline graph to exceed current limitation of 10MB and returns errors out.

After conversation with Beam team we need to wait for several prereqs before we proceed:

Apache beam to land [BEAM-9291] Upload graph option in dataflow's python sdk apache/beam#10829 PR that could extend the limit to 100MB
Use --experiments=upload_graph to upload the graph through staging GS location (which means that staging directory will be a prerequisite henceforth).
Wait for apache beam release 2.21.0 at least, since new BQ sink is broken in releases 2.19.0 and 2.20.0.

The text was updated successfully, but these errors were encountered:

tneymanov self-assigned this

Member

samanvp commented Feb 12, 2020

PR #499 will be reconsidered when these issues are all resolved.

Thanks @tneymanov for documenting this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment