Add option for specifying case for identifiers #663

chrjorgensen · 2023-11-11T22:15:41Z

Add an option identifierCaseto specify what case the identifiers should be converted to - like the keyword case. Possible options are preserver, upper and lower.

nene

It's a good start, but needs some more work and thought.

docs/identifierCase.md

nene · 2023-11-11T22:52:00Z

src/formatter/ExpressionFormatter.ts

@@ -506,4 +506,19 @@ export default class ExpressionFormatter {
        return node.text.toLowerCase();
    }
  }
+
+  private showIdentifier(node: IdentifierNode): string {
+    if (/['"\\`]/.test(node.text[0]) || node.text.startsWith(`U&`)) {


This is a problematic way to tell the quoted and unquoted identifiers apart. When new types of quoted identifiers are added, one would need to also remember to update this code, which is way too easy to forget. For example, this code will already fail with Transact-SQL [bracket-quoted] identifiers.

The lexer already has two tokens: IDENTIFIER and QUOTED_IDENTIFIER. Currently the parser simply throws this info away, we could instead store into about this inside the IdentifierNode object.

I agree - I had some trouble identifying quoted identifiers.
Your mention of IDENTIFIER and QUOTED_IDENTIFIER from the lexer was the key!
I followed your suggestion and changed the code to store the token type in the node for identifiers - and now only nodes with token type IDENTIFIER will be converted.

nene · 2023-11-11T23:02:18Z

src/formatter/ExpressionFormatter.ts

@@ -286,7 +286,7 @@ export default class ExpressionFormatter {
  }

  private formatIdentifier(node: IdentifierNode) {
-    this.layout.add(node.text, WS.SPACE);
+    this.layout.add(this.showIdentifier(node), WS.SPACE);


In addition to plain identifiers we also have several identifier-like things:

variables

parameters

these usually behave just like identifiers and the only thing distinguishing them from normal identifiers is some prefix like @myvar or :myparam.

As the parser currently treats variables as identifiers, these too end up upper/lowercased. But parameters are kepts separate by the parser, so the case of these doesn't change. Should parameters also change together with identifiers?

My change is only for unquoted identifiers - I think variables and parameters are outside the scope and should be handled in another PR...

nene · 2023-11-11T23:09:37Z

test/options/identifierCase.ts

+
+import { FormatFn } from '../../src/sqlFormatter.js';
+
+export default function supportsIdentifierCase(format: FormatFn) {


There are likely more tests needed here to cover:

that all identifiers in expression like foo.bar.baz get uppercased,

variables

parameters

array names (treated differently by lexer)

various types of identifier quoting styles

As most of these things depend on the dialect, it might be better to just add additional tests to places like supportsIdentifiers().

I added a test for multi-part identifiers in commit 558efa3

nene · 2023-11-11T23:13:17Z

test/options/identifierCase.ts

+    `);
+  });
+
+  it('does not uppercase identifiers inside strings', () => {


nit: I wouldn't really call it an identifier inside a string - it's just a string.

I personally wouldn't add a separate test for this, I'd just include some strings inside the general "converts identifiers to lowercase/uppercase" test cases. But that's really a personal preference. Doesn't matter much either way.

String test is now included in identifier case tests and removed as separate test.

nene · 2023-11-11T23:15:40Z

docs/identifierCase.md

@@ -0,0 +1,54 @@
+# identifierCase
+
+Converts identifiers to upper or lowercase.


This might be a good place to specify exactly what classifies as an identifier and what doesn't.

I added a note in commit afe5b48 - is it okay?

nene

This is much better now.

Should definitely fix the handling of array[index].

Dealing with the casing of variables and parameters is I think out of scope for this PR. While I believe that this identifierCase option should also cover these, a further work is needed in the lexer and parser to make it possible to distinguish between quoted and unquoted variants of both.

I'll also have to think how should this feature interact with another feature I'd like to have: specifying the casing of functions. Function names are also identifiers. But I'd like to be able to say functionCase: "upper", identifierCase: "lower".

nene · 2023-11-13T09:11:01Z

src/parser/grammar.ne

@@ -202,7 +208,7 @@ atomic_expression ->
 array_subscript -> %ARRAY_IDENTIFIER _ square_brackets {%
  ([arrayToken, _, brackets]) => ({
    type: NodeType.array_subscript,
-    array: addComments({ type: NodeType.identifier, text: arrayToken.text}, { trailing: _ }),
+    array: addComments({ type: NodeType.identifier, tokenType: TokenType.ARRAY_IDENTIFIER, text: arrayToken.text}, { trailing: _ }),


So, currently ARRAY_IDENTIFIER tokens are treated differently from normal identifiers. This results in the following SQL:

select foo, foo[1] from tbl

being formatted as:

select FOO, foo[1] from TBL

That is, when the foo column is used normally, it gets uppercased, but when used together with array-accessor operator it is not uppercased.

This difference between normal identifiers and array-identifiers is really just an internal quirk of SQL Formatter implementation. We should treat both as identifiers and change the case of them.

Array identifiers are now being converted together with ordinary identifiers - see commit 0bb7d21

nene · 2023-11-13T09:25:32Z

docs/identifierCase.md

+
+Note: An identifier is a name of a SQL object.
+There are two types of SQL identifiers: ordinary identifiers and quoted identifiers.
+Only ordinary identifiers are subject to be converted.


This is better, but according to this description I would expect the case of variables and parameters to also be converted, because they too are names of SQL objects.

In general I see all these as different kinds of identifiers:

schema, table and column names

variable names

function names

parameter names

Variables are a particularly tricky case. Some dialects like MySQL have variables with a prefix like @foo, and therefore we can easily distinguish them. Other dialects like PostgreSQL also have variables, but there is no special prefix, an idientifier foo could refer to a variable, or it could refer to a table or column.

Ah, I think I mixed up variables and parameters... by variables, do you mean SQL variables defined by CREATE VARIABLE (in DB2)? If yes, then I agree that they should also be considered an identifier and converted.

Yes, variables like that.

nene

Thanks!

karlhorky · 2023-11-26T18:32:16Z

Should this case transformation be applied after the generic keywordCase transform?

If it were applied after keywordCase, then it would also resolve this issue:

Upper case option affect table fields names #156

Consider the following input query:

CREATE TABLE
  actors (
    ID INTEGER PRIMARY KEY GENERATED ALWAYS AS IDENTITY,
    NAME VARCHAR(80) NOT NULL
  );

Currently, by using the configuration identifierCase: 'lower', it will result in the following (actors.id is lowercase, but actors.NAME is still uppercase, because NAME is also a keyword):

CREATE TABLE
  actors (
    id INTEGER PRIMARY KEY GENERATED ALWAYS AS IDENTITY,
    NAME VARCHAR(80) NOT NULL
  );

If the identifier transform was applied after the keywordCase transform, then any identifiers which are also keywords would also be lowercased (actors.name would be lowercase):

CREATE TABLE
  actors (
    id INTEGER PRIMARY KEY GENERATED ALWAYS AS IDENTITY,
    name VARCHAR(80) NOT NULL
  );

Alternatives

Alternatives from #156 (comment) would be:

accepting an object for the option keywordCase, for granular configuration:

const options = {
  language: 'postgresql',
  keywordCase: {
    upper: true, // true: default case
    lower: ['name', 'varchar', 'integer'],
    preserve: ['in'],
  },
}

an option keywordCaseIgnore (array), which would allow for more flexible customization (would not allow for same configurability, I prefer option 1):

const options = {
  language: 'postgresql',
  keywordCase: 'upper',
  keywordCaseIgnore: [
    // Avoid changing case of `name` fields in tables
    'name',
    // Avoid changing case of data types
    'varchar',
    'integer',
    // ...
  ],
}

nene · 2023-11-27T15:40:39Z

@karlhorky there is no before or after here. The formatter does not do multiple passes of applying the format. It just detects some words as keywords and some as identifiers. And then these get converted to upper/lower case if configured so. It will never be the case that a word is considered both identifier and keyword by the formatter.

karlhorky · 2023-11-27T15:57:32Z

Ok understood, thanks for the clarification.

Add option for specifying case for identifiers

f3f935a

nene requested changes Nov 11, 2023

View reviewed changes

chrjorgensen added 6 commits November 12, 2023 16:01

Describe identifiers and what's subject to conversion

afe5b48

Fix example of identifier uppercase conversion

0c74c02

Export type 'IdentifierCase'

d6664d1

Store identifier type in node

5354887

Only change casing of ordinary identifiers

bb2d3cc

Enhance tests for identifier case conversions

558efa3

chrjorgensen force-pushed the feature/add-identifier-case branch from 88ec87c to 558efa3 Compare November 12, 2023 22:49

nene requested changes Nov 13, 2023

View reviewed changes

chrjorgensen added 2 commits November 13, 2023 17:00

Add conversion of array identifiers

0bb7d21

Add additional tests for array identifier conversions

ca3f3d9

nene approved these changes Nov 13, 2023

View reviewed changes

nene merged commit 47a30d0 into sql-formatter-org:master Nov 13, 2023
2 checks passed

chrjorgensen deleted the feature/add-identifier-case branch November 13, 2023 16:22

This was referenced Nov 26, 2023

Upper case option affect table fields names #156

Closed

prettier-plugin-sql: Update to sql-formatter@14? un-ts/prettier#313

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add option for specifying case for identifiers #663

Add option for specifying case for identifiers #663

chrjorgensen commented Nov 11, 2023

nene left a comment

nene Nov 11, 2023

chrjorgensen Nov 12, 2023 •

edited

Loading

nene Nov 11, 2023

chrjorgensen Nov 12, 2023

nene Nov 11, 2023

chrjorgensen Nov 12, 2023

nene Nov 11, 2023

chrjorgensen Nov 12, 2023

nene Nov 11, 2023

chrjorgensen Nov 12, 2023

nene left a comment

nene Nov 13, 2023

chrjorgensen Nov 13, 2023

nene Nov 13, 2023

chrjorgensen Nov 13, 2023 •

edited

Loading

nene Nov 13, 2023

nene left a comment

karlhorky commented Nov 26, 2023 •

edited

Loading

nene commented Nov 27, 2023

karlhorky commented Nov 27, 2023


		import { FormatFn } from '../../src/sqlFormatter.js';

		export default function supportsIdentifierCase(format: FormatFn) {

		@@ -0,0 +1,54 @@
		# identifierCase

		Converts identifiers to upper or lowercase.

Add option for specifying case for identifiers #663

Add option for specifying case for identifiers #663

Conversation

chrjorgensen commented Nov 11, 2023

nene left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

chrjorgensen Nov 12, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nene left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

chrjorgensen Nov 13, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nene left a comment

Choose a reason for hiding this comment

karlhorky commented Nov 26, 2023 • edited Loading

Alternatives

nene commented Nov 27, 2023

karlhorky commented Nov 27, 2023

chrjorgensen Nov 12, 2023 •

edited

Loading

chrjorgensen Nov 13, 2023 •

edited

Loading

karlhorky commented Nov 26, 2023 •

edited

Loading