Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dataflow documentation incorrect #78

Open
Mythicalian opened this issue Jul 7, 2022 · 0 comments
Open

Dataflow documentation incorrect #78

Mythicalian opened this issue Jul 7, 2022 · 0 comments

Comments

@Mythicalian
Copy link

I followed the streaming documentation successfully and went through the dataflow documentation, but found it's both not up to date and doesn't work for me.

#1 for the bigquery command, this isn't in the repo:
src/main/resources/errors-schema.json

#2 This appears in the logs, but no tables are ever created in BigQuery:
"Executing operation polygonBlocksReadFromPubSub/PubsubUnboundedSource+polygonBlocksReadFromPubSub/MapElements/Map+polygonBlocksConvertToTableRows+polygonBlocksWriteToBigQuery/PrepareWrite/ParDo(Anonymous)+polygonBlocksWriteToBigQuery/StreamingInserts/CreateTables/ParDo(CreateTables)+polygonBlocksWriteToBigQuery/StreamingInserts/StreamingWriteTables/ShardTableWrites+polygonBlocksWriteToBigQuery/StreamingInserts/StreamingWriteTables/TagWithUniqueIds+polygonBlocksWriteToBigQuery/StreamingInserts/StreamingWriteTables/Reshuffle/Window.Into()/Window.Assign+polygonBlocksWriteToBigQuery/StreamingInserts/StreamingWriteTables/Reshuffle/GroupByKey/WriteStream"

#3 In the PolygonBlocksWriteToBigQuery stage, there are a bunch of warns, I'm not sure if that's relevant, but no more logs appear after that. I've mostly focused on getting the blocks from my local polygon to bigquery:
2022-07-07T18:55:58.630ZOperation ongoing in step polygonBlocksWriteToBigQuery/StreamingInserts/StreamingWriteTables/StreamingWrite for at least 02h10m00s without outputting or completing in state finish at [email protected]/jdk.internal.misc.Unsafe.park(Native Method) at [email protected]/java.util.concurrent.locks.LockSupport.park(LockSupport.java:194) at [email protected]/java.util.concurrent.FutureTask.awaitDone(FutureTask.java:447) at [email protected]/java.util.concurrent.FutureTask.get(FutureTask.java:190) at app//org.apache.beam.sdk.io.gcp.bigquery.BigQueryServicesImpl$DatasetServiceImpl.insertAll(BigQueryServicesImpl.java:817) at app//org.apache.beam.sdk.io.gcp.bigquery.BigQueryServicesImpl$DatasetServiceImpl.insertAll(BigQueryServicesImpl.java:882) at app//org.apache.beam.sdk.io.gcp.bigquery.StreamingWriteFn.flushRows(StreamingWriteFn.java:143) at app//org.apache.beam.sdk.io.gcp.bigquery.StreamingWriteFn.finishBundle(StreamingWriteFn.java:115) at app//org.apache.beam.sdk.io.gcp.bigquery.StreamingWriteFn$DoFnInvoker.invokeFinishBundle(Unknown Source)

I would really like to get this ETL job running properly so any advice would be great

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant