code generator reparses same inputs files again and again #1060

mraleph · 2024-04-05T12:41:20Z

I became interested in freezed / build_runner performance when I saw this reddit thread.

I took @jankuss original benchmark https://github.com/jankuss/genq/tree/00b0ea50143a1e813d7a1010dc172ccc09483cd1/benchmarks/freezed_benchmark and profiled build_runner invocation. To my surprise most of the time was spent in parsing. So I added a print('parsing $uri at ${StackTrace.current}') in FileState.parse in analyzer. That revealed that we are just continuously parsing the same files again and again. I then looked at stack traces: that revealed things like documentationOfParameter and tryGetAstNodeForElement. It seems that for each parameter it calls getParsedLibraryByElement which causes the input file to be reparsed. This seems suboptimal and certainly will result in rather poor performance.

I am by no means a build expert, but it seems that getParsedLibraryByElement should not be used like that or it should do more caching and not reparse input file again and again. /cc @natebosch @jakemac53

I think currently each input file is reparsed for each constructor and each parameter in constructor? That's a lot of reparsing. So you can probably make freezed run 2 orders of magnitude faster on this benchmark.

The text was updated successfully, but these errors were encountered:

rrousselGit · 2024-04-05T12:55:44Z

To my surprise most of the time was spent in parsing

How did you know that?

rrousselGit · 2024-04-05T12:56:29Z

I'll look into it this w-e. Thanks for the heads up.

mraleph · 2024-04-05T13:05:56Z

How did you know that?

I profiled it with perf and then looked at the flame graph.

rrousselGit · 2024-04-05T14:00:30Z

Is there no way to parse the AST across code-generators?

I have two code-generators that need the AST. Sounds like if someone uses both, all their files will be parsed twice.

mraleph · 2024-04-05T15:02:54Z

Is there no way to parse the AST across code-generators?

I am not sure. @natebosch @jakemac53 would know.

I am a bit puzzled why there does not seem to be any caching/sharing - maybe due to mutability of underlying AST?

Then I would at least expect some documentation highlighting the cost of these methods.

But again - today was the first time I am even looking at this code.

I think as long as each code-generator parses input files small amount of times things are going to have acceptable performance, but calling these methods repeatedly for each element seems to be a no-no which would lead to really bad performance.

rrousselGit · 2024-04-05T15:09:44Z

Afaik it's a question of "ast is too complex" and "ast costs a lot of memory".

I asked for some improvements there before (as I need the Ast in all of my generators) and this was the TL;DR from what I remember

natebosch · 2024-04-05T15:22:25Z

I am not sure. @natebosch @jakemac53 would know.

I am a bit puzzled why there does not seem to be any caching/sharing - maybe due to mutability of underlying AST?

We don't cache at the build system level because it leads to problems like dart-lang/build#3657

Afaik it's a question of "ast is too complex" and "ast costs a lot of memory".

I asked for some improvements there before (as I need the Ast in all of my generators) and this was the TL;DR from what I remember

This is the answer for why we don't preemptively parse ASTs for all libraries so they can be read synchronously. The reason we don't cache is the behavior of the analyzer when we reuse objects after adding more files to the analysis context.

jakemac53 · 2024-04-05T15:28:50Z

build_runner does have the Resource API, which can be used to cache things across build steps. But, I don't think it would work well to cache analyzer things using that, because they tend to get invalidated often (which can make them unusable). The resolved AST might not do this but I really don't know. It is better to cache some result which was derived from the analyzer result.

But in general, yes it is known that going from Element model => AST node causes a full re-parse, AST nodes are not cached. You could instead ask for the resolved AST of a library once, and then work from that, instead of the element model?

ultimately, @scheglov is probably the best one to weigh in there.

rrousselGit · 2024-04-05T17:38:58Z

I've pushed a fix to this specific issue. Although it's still not ideal.

To be fair, genq's comparison is quite misleading. It's faster because it only relies on the AST.

mraleph · 2024-04-05T20:07:19Z

Thanks! I can confirm that it is much faster now - 10s vs 100s it used to take before on my machine.

I think whopping 50% of the time is spent formatting generated code. When I comment out formatOutput here code generation time decreases to ~4.5s.

mraleph · 2024-04-05T20:26:30Z

If I 10x the number of input files the time goes to 18s, which to me indicates that the large portion of ~5s is JIT warmup overhead and the actual cost of codegeneration is just half of that.

I tried just AOT compiling build_runner build script - but the naive attempt fails because build_runner seems to use mirrors somewhere.

Anyway, I might poke more at this some time later - but TBH this reveals to me that we have some really low hanging fruit here and there.

rrousselGit · 2024-04-05T21:25:14Z

Yeah the fact that 50% of the time is spent on formatting is a known issue. It's shared with all code-generators.
If dart format could naturally ignore generated files, I'd just not run the formatter on generated files.

rrousselGit · 2024-04-05T21:28:07Z

I agree that we have a bunch of low hanging fruits.

IMO another big issue is how build_runner listens to all imported files (including transitive imports).
On large projects, we can easily end-up having all generated files re-generate on any change in the project, which is really inefficient.
I was thinking about changing riverpod_generator to rely only on the Ast, to avoid this. But for Freezed, that's not really doable.

There are probably a few possible APIs to help here.

mraleph added bug Something isn't working needs triage labels Apr 5, 2024

mraleph assigned rrousselGit Apr 5, 2024

rrousselGit removed the needs triage label Apr 5, 2024

rrousselGit mentioned this issue Apr 5, 2024

Fix ast parsing #1061

Merged

rrousselGit closed this as completed in #1061 Apr 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

code generator reparses same inputs files again and again #1060

code generator reparses same inputs files again and again #1060

mraleph commented Apr 5, 2024

rrousselGit commented Apr 5, 2024

rrousselGit commented Apr 5, 2024

mraleph commented Apr 5, 2024

rrousselGit commented Apr 5, 2024

mraleph commented Apr 5, 2024

rrousselGit commented Apr 5, 2024

natebosch commented Apr 5, 2024

jakemac53 commented Apr 5, 2024

rrousselGit commented Apr 5, 2024

mraleph commented Apr 5, 2024

mraleph commented Apr 5, 2024

rrousselGit commented Apr 5, 2024

rrousselGit commented Apr 5, 2024

code generator reparses same inputs files again and again #1060

code generator reparses same inputs files again and again #1060

Comments

mraleph commented Apr 5, 2024

rrousselGit commented Apr 5, 2024

rrousselGit commented Apr 5, 2024

mraleph commented Apr 5, 2024

rrousselGit commented Apr 5, 2024

mraleph commented Apr 5, 2024

rrousselGit commented Apr 5, 2024

natebosch commented Apr 5, 2024

jakemac53 commented Apr 5, 2024

rrousselGit commented Apr 5, 2024

mraleph commented Apr 5, 2024

mraleph commented Apr 5, 2024

rrousselGit commented Apr 5, 2024

rrousselGit commented Apr 5, 2024