You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Am seeing an issue when iterating over DSS search results with post_search.iterate() - I believe this is a corner case that occurs when the number of results returned is exactly the same as the page size, and the error happens when the iterator tries to return the second page.
Here is the setup: I start by creating a DSS client, and I write an ElasticSearch query that returns exactly 10 results (the page size of the returned results, when metadata is included). Here is the code to do that:
Now if we execute this query with a call to post_search(), we can see that there are exactly 10 results returned:
search_results = client.post_search(
es_query=query, replica='aws', output_format='raw')
print("post_search() found %d results"%(search_results['total_hits']))
print("post_search() returned %d results"%(len(search_results['results'])))
which results in
post_search() found 10 results
post_search() returned 10 results
executing query with post_search.iterate
Now if we want to iterate over all results returned by the query, we should use post_saerch.iterate() instead of post_search(). Swapping out the call:
results_generator = client.post_search.iterate(es_query=query, replica='aws', output_format='raw')
for bundle in results_generator:
print(f"Now processing bundle {bundle['bundle_fqid']}")
which results in the following exception:
Now processing bundle fd7a46db-1e90-4bfd-8e70-a77baa01faa5.2019-09-23T173116.106310Z
Now processing bundle fbda9910-5076-47a6-83d6-cfff39d17606.2019-09-26T051748.268160Z
Now processing bundle fb2ae8b7-06b0-4881-ad9f-1f37255b91b6.2019-09-23T173116.107225Z
Now processing bundle c65efd23-bbc4-459a-ac60-d3cde705193d.2019-09-23T173116.107641Z
Now processing bundle c59a8de8-d4f3-424b-b716-06b7152b980a.2019-09-23T173116.106782Z
Now processing bundle be9f2d04-77ee-4f59-a0f7-f0b58034cf8c.2019-09-23T173116.105576Z
Now processing bundle 82164816-64d4-4975-a248-b66c4fdad6f8.2019-09-26T054646.254919Z
Now processing bundle 56cce395-634e-4c53-976c-931727d22dfa.2019-09-26T074801.713933Z
Now processing bundle 3a7af639-ac18-49a7-aef9-2eb4b1ecf598.2019-09-26T072342.935554Z
Now processing bundle 2f62f508-6503-4c2e-a714-8298f55bdaa2.2019-09-26T064659.900169Z
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-45-5afaae8a16d1> in <module>
1 results_generator = client.post_search.iterate(es_query=query, replica='aws', output_format='raw')
2
----> 3 for bundle in results_generator:
4 print(f"Now processing bundle {bundle['bundle_fqid']}")
~/codes/data-consumer-vignettes/vp/lib/python3.6/site-packages/hca/util/__init__.py in iterate(self, **kwargs)
235 yield file
236 else:
--> 237 for collection in page.json().get('collections'):
238 yield collection
239
TypeError: 'NoneType' object is not iterable
If I modify the query to search for a different organ type, the number of results returned is different - not a multiple of 10 - and so this bug does not occur. This bug only occurs when the number of results returned is exactly equal to the size of each page. The error occurs because it does not handle the case of the second page being completely empty (which only happens when number of results is an exact multiple of the page size).
The text was updated successfully, but these errors were encountered:
chmreid
changed the title
post_search.iterate() iterator does not terminate cleanly
post_search.iterate() iterator does not terminate cleanly
Dec 23, 2019
Am seeing an issue when iterating over DSS search results with
post_search.iterate()
- I believe this is a corner case that occurs when the number of results returned is exactly the same as the page size, and the error happens when the iterator tries to return the second page.Here is the setup: I start by creating a DSS client, and I write an ElasticSearch query that returns exactly 10 results (the page size of the returned results, when metadata is included). Here is the code to do that:
executing query with post_search
Now if we execute this query with a call to
post_search()
, we can see that there are exactly 10 results returned:which results in
executing query with post_search.iterate
Now if we want to iterate over all results returned by the query, we should use
post_saerch.iterate()
instead ofpost_search()
. Swapping out the call:which results in the following exception:
If I modify the query to search for a different organ type, the number of results returned is different - not a multiple of 10 - and so this bug does not occur. This bug only occurs when the number of results returned is exactly equal to the size of each page. The error occurs because it does not handle the case of the second page being completely empty (which only happens when number of results is an exact multiple of the page size).
The text was updated successfully, but these errors were encountered: