Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Attach leading and trailing comments to AST nodes #39

Open
KvanTTT opened this issue Sep 23, 2018 · 7 comments
Open

Attach leading and trailing comments to AST nodes #39

KvanTTT opened this issue Sep 23, 2018 · 7 comments

Comments

@KvanTTT
Copy link

KvanTTT commented Sep 23, 2018

Input

Parse the following code:

// leading
a // trailing

with turned on Comment = true option in ParserOptions:

var program = parser.ParseProgram(source, new ParserOptions(source) { Comment = true; });

After that, serialize the result AST with JsonConvert.

Expected

Leading and trailing should be attached to AST nodes. Check it on Esprima.org.

{
    "type": "Program",
    "body": [
        {
            "type": "ExpressionStatement",
            "expression": {
                "type": "Identifier",
                "name": "a",
                "trailingComments": [
                    {
                        "type": "Line",
                        "value": " trailing",
                        "range": [
                            13,
                            24
                        ]
                    }
                ]
            },
            "leadingComments": [
                {
                    "type": "Line",
                    "value": " leading",
                    "range": [
                        0,
                        10
                    ]
                }
            ]
        }
    ],
    "sourceType": "script"
}

Actual

Comments are completely skipped.

{
  "Type": "Program",
  "Body": [
    {
      "Type": "ExpressionStatement",
      "Expression": {
        "Type": "Identifier",
        "Name": "a",
        "Loc": {}
      },
      "Loc": {}
    }
  ],
  "SourceType": "script",
  "Loc": {}
}
@adams85
Copy link
Collaborator

adams85 commented Aug 21, 2022

A few thoughts for those who would make an attempt on implementing this: now there's everything in place to port CommentHandler from Esprima JS. But be aware that it'll be capable of an approximation at best. There's no way to precisely represent comments in the current AST model because it's simply not detailed enough: adding LeadingComments/TrailingComments to AST nodes is not enough to describe pieces of syntax like: async /*some comment*/ function f() { ... }. To handle such cases, AST nodes should also include tokens as child nodes. (I suggest checking out how the problem is solved in Roslyn.)

However, it's unlikely that proper comment handling will ever be possible in this lib because its main consumer is Jint, which wouldn't want to pay the price of a more detailed AST model. The best we can have is what CommentHandler does but it will be a half-assed solution. Which, of course, still could be useful in some use cases.

@CharlieEriksen
Copy link
Contributor

I worked on this problem last week and concluded that this is fools' errand most likely, and shouldn't be a first-class citizen in the public API.

Consider a system where you want to tag javascript code, and have these markings show up in the AST. Here's the code with markings:

true; 
/* mark */ 
var /* mark */ x = /* mark */ "test"; /* mark */ 

The solution I came up with was:

public class TestMarkerVisitor : AstVisitor
{
    public static object Comment = "Comment";
    private readonly List<SyntaxComment> _comments;

    private TestMarkerVisitor(IReadOnlyList<SyntaxComment> comments)
    {
        _comments = comments;
    }

    public static void Parse(Node node, IReadOnlyList<SyntaxComment> comments)
    {
        if (comments.Count == 0) return;
        var visitor = new TestMarkerVisitor(comments);
        visitor.Visit(node);
    }

    private Node? previousNode = null;

    public override object? Visit(Node node)
    {
        // Check if we went past the start
        if (previousNode != null)
        {
           var commentsMatching = _comments.Where(currentComment => 
               currentComment.Location.Start.Compare(previousNode.Location.End) >= 0 // Previous node ends before the comment
            && currentComment.Location.End.Compare(node.Location.Start) <= 0        // Next node starts after the end of the comment
               ).ToList();
           if (commentsMatching.Count > 0)
           {
               node.SetAdditionalData(Comment, commentsMatching);
               _comments.RemoveAll(x=> commentsMatching.Contains(x));
           }
           

        }
        previousNode = node;
        return base.Visit(node);
    }


}

But I really don't think this is a great solution either. And esprima doesn't produce the correct result IMO:
Example.

It makes the following mistakes, IMO:

  • mark1 is attached as trailing to the true literal. That makes no sense. In this case, that makes it more of a statement in the tree to me.
  • mark1 is also attached as leading to the VariableDeclaration. This is more correct in my opinion.
  • mark2 is attached as leading to the VariableDeclaration. It should be leading to the Identifier

I kinda think it's a can of worms and super use-case-dependent.

@adams85
Copy link
Collaborator

adams85 commented Aug 22, 2022

You're right and this is what I was talking about in my previous comment. To be able to implement this correctly, we'd also need keywords and punctuators included in AST nodes. Which will probably not happen because of Jint. Maybe in a fork if someone desperately need this.

@scgm0
Copy link

scgm0 commented Sep 3, 2023

@adams85
I'm trying to implement "The toString() of 'class' and 'function' return SourceText" in jint to comply with ECMAScript specification. But currently the source code returned by Esprima .NET from the AST does not contain comments, and I was wondering if I could update this feature?

@adams85
Copy link
Collaborator

adams85 commented Sep 3, 2023

Esprima doesn't really return original source code but rather synthesizes JS code from a given AST. Comments are not included because the current AST is simply not detailed enough to store all the information necessary for properly representing comments. (See also this discussion).

However, for what you want to achieve, you may not need a full fidelity AST at all. Esprima provides the location of each node in the parsed script (see Node.Range). So, using this information, you can extract the original source text of a function or class, given that you keep the parsed script around.

@scgm0
Copy link

scgm0 commented Sep 4, 2023

Esprima doesn't really return original source code but rather synthesizes JS code from a given AST. Comments are not included because the current AST is simply not detailed enough to store all the information necessary for properly representing comments. (See also this discussion).

However, for what you want to achieve, you may not need a full fidelity AST at all. Esprima provides the location of each node in the parsed script (see Node.Range). So, using this information, you can extract the original source text of a function or class, given that you keep the parsed script around.

In the test of test262, it does not return the real source code, what is needed is to synthesize JS code containing comments from AST.

@adams85
Copy link
Collaborator

adams85 commented Sep 4, 2023

what is needed is to synthesize JS code containing comments from AST.

Once again, this is impossible with Esprima's current, ESTree-based AST.

However, you don't need to synthesize anything to implement toString for functions or classes. Please study the following code:

var input =
@"console.log(function f() { return /* comment 
*/ 'hello world' }.toString())";

var ast = new JavaScriptParser().ParseScript(input);
var func = ast.DescendantNodes().OfType<FunctionExpression>().First();

/*** This is how you can extract the original source text of an AST node ***/
var funcSourceText = input.Substring(func.Range.Start, func.Range.Length);

Console.WriteLine(funcSourceText);

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants