Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Minor edits to super command doc #5487

Merged
merged 1 commit into from
Nov 18, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
48 changes: 26 additions & 22 deletions docs/commands/super.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ check out the [`super db`](super-db.md) set of commands.
By invoking the `-c` option, a query expressed in the [SuperSQL language](../language/README.md)
may be specified and applied to the input stream.

Super's data model is based on super-structured data, meaning that all data
The [super data model](../formats/zed.md) is based on [super-structured data](../formats/README.md#2-a-super-structured-pattern), meaning that all data
is both strongly _and_ dynamically typed and need not conform to a homogeneous
schema. The type structure is self-describing so it's easy to daisy-chain
queries and inspect data at any point in a complex query or data pipeline.
Expand All @@ -52,27 +52,31 @@ do not haphazardly change when input data changes in subtle ways.

Each `input` argument to `super` must be a file path, an HTTP or HTTPS URL,
an S3 URL, or standard input specified with `-`.
These input arguments are treated as if a SQL "from" operator precedes
These input arguments are treated as if a SQL `FROM` operator precedes
the provided query, e.g.,
```
super -c "from example.json | select typeof(this)"
super -c "FROM example.json | SELECT typeof(this)"
```
is equivalent to
```
super -c "select typeof(this)" example.json
super -c "SELECT typeof(this)" example.json
```
and both are equivalent to the classic SQL
```
super -c "SELECT typeof(this) FROM example.json"
Comment on lines +64 to +66
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still getting the feel for when and how much to reassure readers that standard SQL is still in there. I know we've talked about covering this topic in more detail when we get to overhauling the Language docs. Regardless, my instincts are to make some nods to classic SQL early in individual docs pages since we can't always be sure where users will first land and we can't be confident they'll always click on links when we hope they will. While we hope that users will quickly catch on to SuperSQL's pipe extensions so they can do things with super-structured data that classic SQL can't, I also want to make it as clear as possible to the new users that learning new elements are not a prerequisite to getting started, hence my addition of this new text.

Also related to this is when to use ALL CAPS. Once again, it feels worthwhile to use all caps in contexts where we're showing pure SQL that could drop directly into some other classic relational system. Likewise if we're showing an example that learns mostly/exclusively on SuperSQL shortcuts it seems reasonable to opt for lowercase, even if the example shares a few keywords common in classic SQL. Then there seems to be a gray area when it's a mix. On the whole though, my instincts are to start from all caps when possible to maximize approachability to the SQL-centric audience.

```
Output is written to one or more files or to standard output in the format specified.

When multiple input files are specified, they are processed in the order given as
if the data were provided by a single, concatenated "from" clause.
if the data were provided by a single, concatenated `FROM` clause.

If no query is specified with `-c`, the inputs are scanned without modification
and output in the desired format as [described below](#input-formats),
providing a convenient means to convert files from one format to another, e.g.,
```
super -f arrows file1.json file2.parquet file3.csv > file-combined.arrows
```
When `super` is run with a query that has no "from" operator and no input arguments,
When `super` is run with a query that has no `FROM` operator and no input arguments,
the SuperSQL query is fed a single `null` value analogous to SQL's default
input of a single empty row of an unnamed table.
This provides a convenient means to explore examples or run in a
Expand All @@ -85,13 +89,13 @@ emits
2
```
Note that SuperSQL's has syntactic shortcuts for interactive data exploration and
an expression that stands alone is a shortcut for `select value`, e.g., the query text
an expression that stands alone is a shortcut for `SELECT VALUE`, e.g., the query text
```
1+1
```
is equivalent to
```
select value 1+1
SELECT VALUE 1+1
```
To learn more about shortcuts, refer to the SuperSQL
[documentation on shortcuts](../language/pipeline-model.md#implied-operators).
Expand Down Expand Up @@ -139,14 +143,14 @@ The input format is typically [detected automatically](#auto-detection) and the
"Auto" is "yes" in the table above support _auto-detection_.
Formats without auto-detection require the `-i` option.

### Hard-wired Input Format
#### Hard-wired Input Format
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In #5481, a new "Data Formats" section was created and the "Input Formats" and "Output Formats" sub-sections were moved below that, which all makes sense. However, the sub-sections under "Input Formats" and "Output Formats" were left at their prior level, so the hierarchical order was disrupted. This and other ####-style changes in this PR are to correct that.


The input format is specified with the `-i` flag.

When `-i` is specified, all of the inputs on the command-line must be
in the indicated format.

### Auto-detection
#### Auto-detection

When using _auto-detection_, each input's format is independently determined
so it is possible to easily blend different input formats into a unified
Expand All @@ -173,11 +177,11 @@ would produce this output in the default Super JSON format
{a:3,b:"baz"}
```

### JSON Auto-detection: Super vs. Plain
#### JSON Auto-detection: Super vs. Plain

Since [Super JSON](../formats/jsup.md) is a superset of plain JSON, `super` must be careful how it distinguishes the two cases when performing auto-inference.
While you can always clarify your intent
with the `-i jsup` or `-i json`, `super` attempts to "just do the right thing"
via `-i jsup` or `-i json`, `super` attempts to "just do the right thing"
when you run it with Super JSON vs. plain JSON.

While `super` can parse any JSON using its built-in Super JSON parser this is typically
Expand Down Expand Up @@ -231,7 +235,7 @@ Since Super JSON is a common format choice, the `-z` flag is a shortcut for
And since plain JSON is another common format choice, the `-j` flag is a shortcut for
`-f json` and `-J` is a shortcut for pretty printing JSON.

### Output Format Selection
#### Output Format Selection

When the format is not specified with `-f`, it defaults to Super JSON if the output
is a terminal and to Super Binary otherwise.
Expand All @@ -250,7 +254,7 @@ binary output to their terminal when forgetting to type `-f jsup`.
In practice, we have found that the output defaults
"just do the right thing" almost all of the time.

### Pretty Printing
#### Pretty Printing

Super JSON and plain JSON text may be "pretty printed" with the `-pretty` option, which takes
the number of spaces to use for indentation. As this is a common option,
Expand Down Expand Up @@ -295,7 +299,7 @@ produces
When pretty printing, colorization is enabled by default when writing to a terminal,
and can be disabled with `-color false`.

### Pipeline-friendly Super Binary
#### Pipeline-friendly Super Binary

Though it's a compressed format, Super Binary data is self-describing and stream-oriented
and thus is pipeline friendly.
Expand Down Expand Up @@ -330,7 +334,7 @@ produces
00000012
```

### Schema-rigid Outputs
#### Schema-rigid Outputs

Certain data formats like [Arrow](https://arrow.apache.org/docs/format/Columnar.html#ipc-streaming-format)
and [Parquet](https://github.com/apache/parquet-format) are "schema rigid" in the sense that
Expand All @@ -351,7 +355,7 @@ causes this error
parquetio: encountered multiple types (consider 'fuse'): {x:int64} and {s:string}
```

#### Fusing Schemas
##### Fusing Schemas

As suggested by the error above, the [`fuse` operator](../language/operators/fuse.md) can merge different record
types into a blended type, e.g., here we create the file and read it back:
Expand All @@ -365,7 +369,7 @@ but the data was necessarily changed (by inserting nulls):
{x:null(int64),s:"hello"}
```

#### Splitting Schemas
##### Splitting Schemas

Another common approach to dealing with the schema-rigid limitation of Arrow and
Parquet is to create a separate file for each schema.
Expand Down Expand Up @@ -393,7 +397,7 @@ produces the original data
While the `-split` option is most useful for schema-rigid formats, it can
be used with any output format.

### Simplified Text Outputs
#### Simplified Text Outputs

The `text` and `table` formats simplify data to fit within the
limitations of text-based output. Because they do not capture all the
Expand Down Expand Up @@ -461,7 +465,7 @@ one 1 -
hello - greeting
```

### SuperDB Data Lake Metadata Output
#### SuperDB Data Lake Metadata Output

The `lake` format is used to pretty-print lake metadata, such as in
[`super db` sub-command](super-db.md) outputs. Because it's `super db`'s default output format,
Expand Down Expand Up @@ -582,7 +586,7 @@ have many examples, but here are a few more simple `super` use cases.

_Hello, world_
```mdtest-command
super -z -c "select value 'hello, world'"
super -z -c "SELECT VALUE 'hello, world'"
```
produces this Super JSON output
```mdtest-output
Expand All @@ -602,7 +606,7 @@ produces
```
_The types of various data_
```mdtest-command
echo '1 1.5 [1,"foo"] |["apple","banana"]|' | super -z -c 'select value typeof(this)' -
echo '1 1.5 [1,"foo"] |["apple","banana"]|' | super -z -c 'SELECT VALUE typeof(this)' -
```
produces
```mdtest-output
Expand Down