You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Right now, since 1 partition means 1 repository, we know joins (by repository) can only happen in the same partition.
Instead, we iterate and try to join with all of the partitions together.
Imagine we have 3 partitions.
These are the rows returned by each partition in the left side of a join.
P1: 45
P2: 5
P3: 50
These are the rows returned by each partition in the right side of a join.
P1: 55
P2: 15
P3: 30
We are joining 100 rows with 100 rows, which produces 10000 rows, that are then filtered by the join conditions (but we still make those 10k iterations).
Instead, if we did this per partition, these would be the produced rows (then filtered by conditions):
P1: 2475
P2: 75
P3: 1500
The total amount of rows produced is 4050 rows, which is a 40% of the number of rows generated before. This number grows enormously as the number of partitions and rows grow.
What could we do?
A rule that runs at the end of the analysis and transforms joins (the ones left after the squash) into something like:
PartitionTable is a table that will only return the rows for one partition.
Concat is a node that will iterate over all partitions and transform all its Table children into PartitionTable. Then, all the rows of each partition will be put together and returned to the user. This will also happen in parallel.
Essentially, Concat is like an Exchange. The only thing it differs is the fact that it can handle binary nodes and not only unary nodes. This is something that cannot be done in go-mysql-server but can be done here, since we know for certain that partitions are the same for each table.
Called it Concat but the name is pretty lame so we should think of a better name, like PartitionExchange, BinaryExchange or something like that.
This should make (not squashed) joins —and in a real life applications you will have many of them because leaves will be subqueries— much much faster.
The text was updated successfully, but these errors were encountered:
This is tailored for joins after squashing. There are cases in which you will have joins for sure, for example, when you are joining the result of two subqueries, which is a pretty common thing to have in any real world application.
Consider the following query:
SELECT uast_extract(
uast(blob_content, 'PHP', "//uast:String"),
'@pos') AS positions,
repository_id,
file_path
FROM (
SELECTf.repository_id,
f.file_path,
b.blob_contentFROM (
SELECT*FROM refs r
NATURAL JOIN commit_blobs cb
NATURAL JOIN blobs
WHEREr.ref_name='HEAD'AND NOT IS_BINARY(blob_content)
) b
INNER JOIN (
SELECT repository_id, file_path, blob_hash
FROM refs r
NATURAL JOIN commit_files cf
WHEREr.ref_name='HEAD'
) f
ONb.blob_hash=f.blob_hashANDb.repository_id=f.repository_idWHERE language(f.file_path, b.blob_content) ='PHP'
) t
WHERE positions IS NOT NULL
This cannot be squashed. Because of it, you're performing a massive cross join on a lot of data if you run it on a big dataset. If you do the joins per-partition, you're generating way less data and thus making it much faster.
It also has another advantage: since we know only rows from the same partition will match, in-memory joins become more feasible. You don't need to keep in memory all rows of one side, only all rows of a partition. Then do next partition and so on.
Right now, since 1 partition means 1 repository, we know joins (by repository) can only happen in the same partition.
Instead, we iterate and try to join with all of the partitions together.
Imagine we have 3 partitions.
These are the rows returned by each partition in the left side of a join.
These are the rows returned by each partition in the right side of a join.
We are joining 100 rows with 100 rows, which produces 10000 rows, that are then filtered by the join conditions (but we still make those 10k iterations).
Instead, if we did this per partition, these would be the produced rows (then filtered by conditions):
The total amount of rows produced is 4050 rows, which is a 40% of the number of rows generated before. This number grows enormously as the number of partitions and rows grow.
What could we do?
A rule that runs at the end of the analysis and transforms joins (the ones left after the squash) into something like:
PartitionTable is a table that will only return the rows for one partition.
Concat is a node that will iterate over all partitions and transform all its
Table
children intoPartitionTable
. Then, all the rows of each partition will be put together and returned to the user. This will also happen in parallel.Essentially, Concat is like an Exchange. The only thing it differs is the fact that it can handle binary nodes and not only unary nodes. This is something that cannot be done in go-mysql-server but can be done here, since we know for certain that partitions are the same for each table.
Called it Concat but the name is pretty lame so we should think of a better name, like PartitionExchange, BinaryExchange or something like that.
This should make (not squashed) joins —and in a real life applications you will have many of them because leaves will be subqueries— much much faster.
The text was updated successfully, but these errors were encountered: