Different counts for parallel scan and sequential scan of dynamodb table when acquire all records. #3643
-
I would like to implement retrieving all data from a dynamodb table, processing it, and then uploading it to S3. Currently we have 40,000 data items and the data size is 50MB. If I search all the data in a sequential scan, we can retrieve 40000 data successfully, I have confirmed that if the table has about 100 data items and is about 3 MB in size, all values can be successfully acquired even with parallel scans. What is the problem? here is the code. import boto3
import botocore
import os
import threading
os.environ['AWS_REGION'] = "ap-northeast-1"
dynamodb = boto3.resource("dynamodb", region_name=os.environ['AWS_REGION'], config=botocore.client.Config(max_pool_connections=5000))
table = dynamodb.Table('test_table')
def get_dynamo_data(segment, total_segments):
try:
response = table.scan(
TotalSegments=total_segments,
Segment=segment
)
print(f"ScannedCount: {response['ScannedCount']}. segment {segment}")
return response.get("Items", [])
except Exception as e:
raise Exception('DynamoDB Access Error:' + str(e))
def main():
threads: list[threading.Thread] = []
total_segments = 50
for segment in range(0, total_segments):
th = threading.Thread(target=get_dynamo_data, args=(segment, total_segments))
th.start()
threads.append(th)
for th in threads:
th.join()
main() |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment
-
I think this could be an issue with eventual consistency in DynamoDB. For more info you can refer to this documentation: https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/HowItWorks.ReadConsistency.html. Have you tried setting |
Beta Was this translation helpful? Give feedback.
I think this could be an issue with eventual consistency in DynamoDB. For more info you can refer to this documentation: https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/HowItWorks.ReadConsistency.html.
Have you tried setting
ConsistentRead=True
in your scan command? Here is the scan documentation for reference: https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/dynamodb/client/scan.html.