-
-
Notifications
You must be signed in to change notification settings - Fork 237
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
code generator reparses same inputs files again and again #1060
Comments
How did you know that? |
I'll look into it this w-e. Thanks for the heads up. |
I profiled it with |
Is there no way to parse the AST across code-generators? I have two code-generators that need the AST. Sounds like if someone uses both, all their files will be parsed twice. |
I am not sure. @natebosch @jakemac53 would know. I am a bit puzzled why there does not seem to be any caching/sharing - maybe due to mutability of underlying AST? Then I would at least expect some documentation highlighting the cost of these methods. But again - today was the first time I am even looking at this code. I think as long as each code-generator parses input files small amount of times things are going to have acceptable performance, but calling these methods repeatedly for each element seems to be a no-no which would lead to really bad performance. |
Afaik it's a question of "ast is too complex" and "ast costs a lot of memory". I asked for some improvements there before (as I need the Ast in all of my generators) and this was the TL;DR from what I remember |
We don't cache at the build system level because it leads to problems like dart-lang/build#3657
This is the answer for why we don't preemptively parse ASTs for all libraries so they can be read synchronously. The reason we don't cache is the behavior of the analyzer when we reuse objects after adding more files to the analysis context. |
build_runner does have the Resource API, which can be used to cache things across build steps. But, I don't think it would work well to cache analyzer things using that, because they tend to get invalidated often (which can make them unusable). The resolved AST might not do this but I really don't know. It is better to cache some result which was derived from the analyzer result. But in general, yes it is known that going from Element model => AST node causes a full re-parse, AST nodes are not cached. You could instead ask for the resolved AST of a library once, and then work from that, instead of the element model? ultimately, @scheglov is probably the best one to weigh in there. |
I've pushed a fix to this specific issue. Although it's still not ideal. To be fair, genq's comparison is quite misleading. It's faster because it only relies on the AST. |
Thanks! I can confirm that it is much faster now - 10s vs 100s it used to take before on my machine. I think whopping 50% of the time is spent formatting generated code. When I comment out |
If I 10x the number of input files the time goes to 18s, which to me indicates that the large portion of ~5s is JIT warmup overhead and the actual cost of codegeneration is just half of that. I tried just AOT compiling build_runner build script - but the naive attempt fails because build_runner seems to use mirrors somewhere. Anyway, I might poke more at this some time later - but TBH this reveals to me that we have some really low hanging fruit here and there. |
Yeah the fact that 50% of the time is spent on formatting is a known issue. It's shared with all code-generators. |
I agree that we have a bunch of low hanging fruits. IMO another big issue is how There are probably a few possible APIs to help here. |
I became interested in
freezed
/build_runner
performance when I saw this reddit thread.I took @jankuss original benchmark https://github.com/jankuss/genq/tree/00b0ea50143a1e813d7a1010dc172ccc09483cd1/benchmarks/freezed_benchmark and profiled build_runner invocation. To my surprise most of the time was spent in parsing. So I added a
print('parsing $uri at ${StackTrace.current}')
inFileState.parse
in analyzer. That revealed that we are just continuously parsing the same files again and again. I then looked at stack traces: that revealed things likedocumentationOfParameter
andtryGetAstNodeForElement
. It seems that for each parameter it callsgetParsedLibraryByElement
which causes the input file to be reparsed. This seems suboptimal and certainly will result in rather poor performance.I am by no means a
build
expert, but it seems thatgetParsedLibraryByElement
should not be used like that or it should do more caching and not reparse input file again and again. /cc @natebosch @jakemac53I think currently each input file is reparsed for each constructor and each parameter in constructor? That's a lot of reparsing. So you can probably make freezed run 2 orders of magnitude faster on this benchmark.
The text was updated successfully, but these errors were encountered: