Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

update docs on super command for SQL/OLAP audience #5481

Merged
merged 4 commits into from
Nov 15, 2024
Merged

Conversation

mccanne
Copy link
Collaborator

@mccanne mccanne commented Nov 14, 2024

No description provided.

@mccanne mccanne assigned philrz and unassigned philrz Nov 14, 2024
@mccanne mccanne requested review from philrz and a team November 14, 2024 23:43
super -f arrows file1.json file2.parquet file3.csv > file-combined.arrows
```
When `super` is run with a query that has no "from" operator and no input arguments,
the SuperSQL query is fed a single `null` value analagous to SQL's default
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
the SuperSQL query is fed a single `null` value analagous to SQL's default
the SuperSQL query is fed a single `null` value analogous to SQL's default

select value 1+1
```
To learn more about shortcuts, refer to the SuperSQL
[documenation on shortcuts](../language/pipeline-model.md#implied-operators).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
[documenation on shortcuts](../language/pipeline-model.md#implied-operators).
[documentation on shortcuts](../language/pipeline-model.md#implied-operators).

`super` supports a number of [input](#input-formats) and [output](#output-formats) formats, but the super formats
([Super Binary](../formats/bsup.md),
[Super Columnar](../formats/csup.md),
and [Super JSON](../formats/jsup.md)) tend to the most versatile and
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
and [Super JSON](../formats/jsup.md)) tend to the most versatile and
and [Super JSON](../formats/jsup.md)) tend to be the most versatile and

...
wget https://data.gharchive.org/2023-02-08-23.json.gz
```
We downloadied these files into a directory called `gharchive_gz`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
We downloadied these files into a directory called `gharchive_gz`
We downloaded these files into a directory called `gharchive_gz`

`super` with Super Binary is substantially faster than the relational systems for
the search use cases and performs on par with the others for traditional OLAP queries,
except for the union query, where the super-structured data model trounces the relational
model (by over 100X!) for stiching together disparate data types for analysis in an aggregation.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
model (by over 100X!) for stiching together disparate data types for analysis in an aggregation.
model (by over 100X!) for stitching together disparate data types for analysis in an aggregation.


We used the Bash `time` command to measure elapsed time.
For our tests, We diverged a bit from the methodology in the DuckDB blog and wanted
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
For our tests, We diverged a bit from the methodology in the DuckDB blog and wanted
For our tests, we diverged a bit from the methodology in the DuckDB blog and wanted

```
duckdb gha.db -c "CREATE TABLE gha AS FROM read_json('gharchive_gz/*.json.gz', union_by_name=true)"
```
We now have the `duckdb` database file for out GitHub Archive data called `gha.db`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
We now have the `duckdb` database file for out GitHub Archive data called `gha.db`
We now have the `duckdb` database file for our GitHub Archive data called `gha.db`

Copy link
Contributor

@philrz philrz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I put up suggestions to fix some obvious typos and such. There's more changes I'd propose but I'm fine with seeing this merged and I could put up my proposals in a follow-on PR.

@mccanne mccanne merged commit 5f18349 into main Nov 15, 2024
4 checks passed
@mccanne mccanne deleted the super-doc-updates branch November 15, 2024 19:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants