You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on May 9, 2024. It is now read-only.
In several reference requests join queries have similar issues.
In case of big tables main performance drop is issued by fetchChunks stage.
In current case AFAIU the slowest stage is compileWorkUnit,
There are 2 subqueries, so launchCPU, fetchChunks and compileWU are faced twice. The first subquery is to collect metadata - select count(*) ....
But there are still several blind spots in the total time of compileWorkUnit execution. (119 ms total, 90 ms reify, 15 ms initHashTable)
Also convertToArrowTable took pretty long time(16ms), but maybe it's incorrect test.
This is an example that compares HDK with Pandas performance.
Can be modified by increasing N (originally was 10).
See also: #567
Signed-off-by: Dmitrii Makarenko <[email protected]>
It was decided to test the behavior of the HDK as the data set size increases. The start number is 15_000, which may be too low for the hdk use case.
I checked benchmark with the following number of rows: [15_000, 300_000, 500_000, 700_000, 800_000, 1_000_000, 3_000_000, 6_000_000, 9_000_000, 15_000_000]
In several reference requests join queries have similar issues.
In case of big tables main performance drop is issued by fetchChunks stage.
In current case AFAIU the slowest stage is compileWorkUnit,
60% of total execution time was spent in compileWorkUnit.
Pandas in this case executes query for ~5 ms vs ~240 ms HDK for single query.
Looks like the most significant part of compileWorkUnit is
There are 2 subqueries, so launchCPU, fetchChunks and compileWU are faced twice. The first subquery is to collect metadata -
select count(*) ...
.But there are still several blind spots in the total time of compileWorkUnit execution. (119 ms total, 90 ms reify, 15 ms initHashTable)
Also convertToArrowTable took pretty long time(16ms), but maybe it's incorrect test.
Total debug timers log:
The text was updated successfully, but these errors were encountered: