Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Request: Add string formatting operator #85

Open
jonesmartins opened this issue Nov 18, 2022 · 11 comments
Open

Request: Add string formatting operator #85

jonesmartins opened this issue Nov 18, 2022 · 11 comments

Comments

@jonesmartins
Copy link
Contributor

String formatting overloads would make debugging easier when used in conjunction with PrintT(). I'm aware VSCode has a LLVM debugger, but the Toolbox does not... There might be other uses of which I'm not aware.

For example, we could have a TLCExt!Format(sequence) operator that converts any non-string element of a sequence to string and concatenates all of them. That might be possible in TLA⁺, but slow because the language lacks typing.

Any thoughts?

@jonesmartins
Copy link
Contributor Author

Also, TLA+ doesn't allow indexing strings to check if it starts and ends with \", or checking whether STRING contains a non-character element.

@lemmy
Copy link
Member

lemmy commented Nov 18, 2022

Strings are sequences in TLA+. Do any of the operators below help with your use case?

IndexFirstSubSeq(s, t) ==
(**************************************************************************)
(* The (1-based) index of the beginning of the subsequence s of the *)
(* sequence t . If s appears in t multiple times, this equals the *)
(* lowest index. *)
(* For example: IndexFirstSubSeq(<<1>>, <<1,1,1>>) = 1 *)
(**************************************************************************)
LET last == CHOOSE i \in 0..Len(t) :
/\ s \in SubSeqs(SubSeq(t, 1, i))
/\ \A j \in 0..i-1 : s \notin SubSeqs(SubSeq(t, 1, j))
IN last - (Len(s) - 1)
ReplaceSubSeqAt(i, r, s, t) ==
(**************************************************************************)
(* The sequence t with its subsequence s at position i replaced by *)
(* the sequence r . *)
(**************************************************************************)
LET prefix == SubSeq(t, 1, i - 1)
suffix == SubSeq(t, i + Len(s), Len(t))
IN prefix \o r \o suffix
ReplaceFirstSubSeq(r, s, t) ==
(**************************************************************************)
(* The sequence t with its subsequence s replaced by the sequence r *)
(**************************************************************************)
IF s \in SubSeqs(t)
THEN ReplaceSubSeqAt(IndexFirstSubSeq(s, t), r, s, t)
ELSE t
ReplaceAllSubSeqs(r, s, t) ==
(**************************************************************************)
(* The sequence t with all subsequences s replaced by the sequence r *)
(* Overlapping replacements are disambiguated by choosing the occurrence *)
(* closer to the beginning of the sequence. *)
(* *)
(* Examples: *)
(* *)
(* ReplaceAllSubSeqs(<<>>,<<>>,<<>>) = <<>> *)
(* ReplaceAllSubSeqs(<<4>>,<<>>,<<>>) = <<4>> *)
(* ReplaceAllSubSeqs(<<2>>,<<3>>,<<1,3>>) = <<1,2>> *)
(* ReplaceAllSubSeqs(<<2,2>>,<<1,1>>,<<1,1,1>>) = <<2,2,1>> *)
(**************************************************************************)
CASE s = t -> r
[] r = s -> t \* TLC optimization
[] s # t /\ Len(s) = 0 ->
LET z == Interleave([i \in 1..Len(t) |-> r], [i \in 1..Len(t) |-> <<t[i]>>])
IN FlattenSeq(FlattenSeq(z))
[] s # t /\ Len(s) > 0 /\ s \in SubSeqs(t) ->
\* Not defined recursively to avoid infinite loops.
LET match(f) == { i \in 1..Len(f) : s = f[i] }
comp(p, q) == \A i \in 1..Len(p) : p[i] <= q[i]
\* TODO: Replace with Seq(Seq(Range(t))) once *total* Java module
\* override in place. The current override handles only the
\* case where the parameters are strings (hence Range("abc")
\* not a problem with TLC).
R == BoundedSeq(BoundedSeq(Range(t), Len(t)), Len(t))
\* A) Matches the input t.
S == { f \in R : FlattenSeq(f) = t }
\* B) Has the max number of matches...
T == { f \in S : \A g \in S :
Cardinality(match(g)) <= Cardinality(match(f)) }
\* C) ...of min (leftmost) matches.
u == CHOOSE f \in T :
\A g \in T : comp(
SetToSortSeq(match(f), <), SetToSortSeq(match(g), <))
IN FlattenSeq([i \in 1..Len(u) |-> IF s = u[i] THEN r ELSE u[i]])
[] OTHER -> t

By the way, does the TLA+ debugger (VSCode) address your underlying debugging need?

@jonesmartins
Copy link
Contributor Author

jonesmartins commented Nov 19, 2022

Strings are sequences in TLA+. Do any of the operators below help with your use case?

They don't, unfortunately.


TLC isn't treating strings as sequences here:

\* ToString, PrintT from TLC module
\* Head from Sequences module
\* flatten and Last from SequencesExt module
 
Str(s) ==
  LET Result == ToString(s)
  IN IF Head(Result) = "\"" /\ Last(Result) = "\""
     THEN s
     ELSE Result

Format(seq) ==
  IF Len(seq) = 0 THEN seq
  ELSE
    LET flatten[i \in 1..Len(seq)] ==
        IF i = 1
        THEN Str(seq[i])
        ELSE flatten[i-1] \o Str(seq[i])
    IN flatten[Len(seq)]

...

ASSUME PrintT(Format(<<"Hello", 2>>))

It's interesting that the flatten function works fine with indexes, but Str does not

The example above generated the error: The argument of Head should be a sequence, but instead it is: "2"

Also, this doesn't work either:

Str(s) ==
  IF s \notin STRING
  THEN ToString(s)
  ELSE s

Generates Attempted to check if the value: 2 is an element of STRING.

Although I'm not so sure about the latter Str operator.


By the way, does the TLA+ debugger (VSCode) address your underlying debugging need?

It did! :)

@lemmy
Copy link
Member

lemmy commented Nov 19, 2022

You've discovered a shortcoming of TLC, not of TLA+: TLC does not have characters, which is why, e.g., Head("abc") doesn't work (compare: tlaplus/tlaplus#512 (comment)).

@jonesmartins
Copy link
Contributor Author

jonesmartins commented Nov 19, 2022

Given than many operators in the CommunityModules are overriden by Java implementations, would it be possible or, rather, useful to have a Format operator (or anything of the sort) anyway, since one can't be specified in TLA+?

@lemmy
Copy link
Member

lemmy commented Nov 19, 2022

What would the definition of TLCExt!Format look like?

@jonesmartins
Copy link
Contributor Author

My TLA+ definition of TLCExt!Format above failed, so I don't really know. But operator-wise, we have two options:

  1. Format(formatString, parameters) like Format("[{}, {}, {}]", <<1, 2, 3>>);
  2. Format(sequence) like Format(<<"[", 1, ",", 2, ",", 3, "]">>), which I tried to specify;

The first one reminds me of a few dynamically typed languages. It's easier to read and harder to implement. Also, it's depends on some arbitrary placeholder. In this case, "{}".

The second one is a little harder to read, but it only depends on string conversion and concatenation.

@lemmy
Copy link
Member

lemmy commented Nov 19, 2022

Let's just expose Java's String#format as Format(formatString, sequenceOfParameters).

It is also consistent with:

CSVWrite(template, val, file) ==
(*
CSVWrite("%1$s#%2$s#%3$s",
<<"abc", 42, {"x", "y"} >>, "/tmp/out.csv")
*)
TRUE
?

@jonesmartins
Copy link
Contributor Author

So I assume the placeholders would be %1, %2, and so on? Looks good to me!

@lemmy
Copy link
Member

lemmy commented Nov 19, 2022

@jonesmartins
Copy link
Contributor Author

Perfect. I get it now. Thanks, Markus

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

2 participants