Parsing is slow relative to LLVM #6

pwaller · 2018-10-08T11:17:01Z

I have a 44kloc input ir/testdata/coreutils/vdir.ll, and github.com/mewmew/l/asm.Parse(input) is taking 8 CPU-seconds (4s wall). opt -verify < ir/testdata/coreutils/vdir.ll takes 250ms (280ms wall).

The relative slowdown is ~30x.

Is performance a goal of this project? I'm concerned about what this will look like when I feed it larger inputs.

The text was updated successfully, but these errors were encountered:

pwaller · 2018-10-08T11:23:48Z

Interesting, turning the garbage collector off with GOGC=off reduces CPU time by a half, and wall clock by 12%.

70% of CPU time is spent in mapassign, as best as I can tell simply executing this line:

https://github.com/mewmew/l/blob/2f2fc8d9956b16f583893cb6bf27b02311d285b2/ir/irutil/walk.go#L95

pwaller · 2018-10-08T12:09:41Z

If I turn module resolution off (the bulk of the code in Translate()):

real    0m0.898s
user    0m1.623s
sys     0m0.509s

If I turn module resolution off and set GOGC=off:

real    0m0.790s
user    0m0.512s
sys     0m0.127s

mewmew · 2018-10-08T13:36:52Z

Hi @pwaller,

Thanks for the feedback! It's great that you've started to profile and locate some of the performance bottlenecks. Yes, map access is a common concern, and can often be solved rather easily. A similar case was identified in a hand-written lexer that was part of llir/llvm before we switched to the gocc generated one. @quarnster submitted llir/llvm#11 which improved the performance quite substantially using binary search instead of naive search.

I think there are also several other low-hanging fruits to improve the performance of the parsing. The main one is related to the parser generated by gocc, as it currently contains a huge state space. One proposed solution to this problem is to implement Pager's practical general method for constructing LR(k) parsers. This is tracked by goccmack/gocc#34.

Is performance a goal of this project? I'm concerned about what this will look like when I feed it larger inputs.

The goal of this project is to be roughly on pair with the performance of the official LLVM project. Perhaps ~2x slowdown is ok, but definitely not >= 10x slowdown.

That being said, the immediate goal is to complete llir/llvm#29 and merge mewmew/l back into llir/llvm. Then once llir/llvm supports read/write of the entire LLVM IR language, we will switch focus to performance optimizations.

Cheers,
Robin

mewmew · 2018-10-12T08:24:34Z

Currently evaluating using Textmapper instead of Gocc to generate the lexer and parser, hopefully this may help improve the performance.

In inspirer/textmapper#6 (comment) @inspirer mentions the following:

On real-world languages, generated parsers in Go gave me ~100-230MB/sec of lexing throughtput and 20-60MB/sec of parsing throughput. It gets slightly better with each Go release, mostly because of improved register allocation within the Go compiler.

The evaluation is tracked in this repo: https://github.com/mewmew/l-tm

mewmew · 2018-10-13T01:06:14Z

The grammar has now rewritten to Textmapper (no semantic actions yet though). And the initial performance results are great!

As a benchmark I tried parsing the LLVM IR assembly produced when compiling Coreutils with Clang. The LLVM IR files are located at https://github.com/decomp/testdata/tree/master/coreutils/testdata

And the benchmark results are reported at https://github.com/mewmew/l-tm/blob/master/BENCH.md

Parsing 1,733,842 lines and 135 MB of LLVM IR assembly, as contained in the 107 source files at decomp/testdata took ~3 seconds; thus ~30ms was used per file, or ~45 MB/s.

Notice, these results is only from shift/reduce parsing. But no semantic actions are yet used to produce the AST, resolve identifiers, do type checking, etc. That is yet to be done. But, as for parsing. I think Textmapper makes a good choice for performance.

mewmew · 2018-10-15T16:13:04Z

@pwaller as of mewspring/l-tm#1 we are now able to construct an AST for the entire LLVM IR language as defined by LLVM 7.0 (except for modules with SummaryEntry which I have yet to see in the wild, if you have one, please send it my way!).

The step to translate the AST to the IR representation is yet to be done.

However, this gives us an indication of the performance we can expect.

Official LLVM:

u@x1 ~/D/g/s/g/m/l/c/l-tm> time opt -verify < testdata/vdir.ll > /dev/null
real 0.25
user 0.23
sys 0.01

llir using parser generated by Textmapper:

u@x1 ~/D/g/s/g/m/l/c/l-tm> time l-tm testdata/vdir.ll > /dev/null
real 0.21
user 0.19
sys 0.06

To replicate these steps, do as follows:

Download and install Textmapper. The tool is currently written in Java and generates parsers in Go (similar to ANTLR). @inspirer is working on porting the Textmapper tool to Go (see inspirer/textmapper#6).

$ go get -v -u github.com/inspirer/textmapper
$ cd $GOPATH/src/github.com/inspirer/textmapper/tm-tool
$ ant deploy

go get mewmew/l-tm and generate the LLVM IR parser from the grammar using Textmapper.

$ go get -v -u github.com/mewmew/l-tm
$ cd $GOPATH/src/github.com/mewmew/l-tm/asm/ll
$ make
lalr: 0.286s, text: 0.541s, parser: 2163 states, 186KB
$ go get -v -u github.com/mewmew/l-tm/cmd/l-tm

Now, use l-tm to parse LLVM IR assembly files produced by the LLVM 7.0 toolchain.

$ time l-tm coreutils/vdir.ll 
=== [ coreutils/vdir.ll ] =======================

took: 184.894959ms

entity *ast.SourceFilename: source_filename = "llvm-link"
   name: "llvm-link"
entity *ast.TargetDataLayout: target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
   datalayout: "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
...

real	0m0.727s
user	0m0.194s
sys	0m0.190s

And without printouts:

$ time l-tm coreutils/vdir.ll > /dev/null

real	0m0.207s
user	0m0.175s
sys	0m0.075s

pwaller · 2018-10-15T17:02:08Z

Impressive, great work! I watch with interest. I assume this is without any kind of resolution?

Do you have a plan for attacking the symbol resolution problem?

I have this sense that the current code must be descending into nodes far more often than necessary. Also, because of the use of interfaces it must be doing a lot of work. I'd be interested to hack together some generated code which does this as a side project, I might have some time for that on Friday 19th, perhaps.

mewmew · 2018-10-15T17:23:31Z

Impressive, great work! I watch with interest. I assume this is without any kind of resolution?

Indeed, this is before resolution.

Do you have a plan for attacking the symbol resolution problem?

I'm open to suggestions :) How would you attack the problem?

Basically, I learn as I go. Have learned a great deal about LR, LALR, Pager's LR(k), etc these last few weeks. The aim of Textmapper is more or less precisely what I was hoping for, basically ANTLR but using LR instead of LL and Go instead of Java.

I have this sense that the current code must be descending into nodes far more often than necessary. Also, because of the use of interfaces it must be doing a lot of work. I'd be interested to hack together some generated code which does this as a side project, I might have some time for that on Friday 19th, perhaps.

Definitely. I'd love to hack on this with you. Friday the 19th, it's a date!

mewmew · 2018-10-15T22:45:51Z

@pwaller We now have type resolution of type definitions (in the translate branch).

I've added a -types flag to enable/disable type resolution of type definitions.

Note: I'm translating to the IR of llir/l which is basically a merge between llir/llvm/ir and mewmew/l/ir. The performance should be similar regardless of which of these three we translate to.

The preliminary results are as follows:

Type resolution of type definitions enabled

$ time l-tm -types=true testdata/vdir.ll
=== [ testdata/vdir.ll ] =======================

parsing into AST took: 172.208673ms

type resolution of type definitions took: 4.56632ms

total time for file "testdata/vdir.ll": 176.821143ms
real 0.18
user 0.17
sys 0.05

$ time l-tm -types=true testdata/vdir.ll
=== [ testdata/vdir.ll ] =======================

parsing into AST took: 162.403211ms

type resolution of type definitions took: 4.881672ms

total time for file "testdata/vdir.ll": 167.355645ms
real 0.17
user 0.17
sys 0.03

$ time l-tm -types=true testdata/vdir.ll
=== [ testdata/vdir.ll ] =======================

parsing into AST took: 183.710061ms

type resolution of type definitions took: 5.144666ms

total time for file "testdata/vdir.ll": 188.905968ms
real 0.19
user 0.18
sys 0.05

Type resolution of type definitions disabled

$ time l-tm -types=false testdata/vdir.ll
=== [ testdata/vdir.ll ] =======================

parsing into AST took: 187.108077ms

total time for file "testdata/vdir.ll": 187.143403ms
real 0.19
user 0.18
sys 0.05

$ time l-tm -types=false testdata/vdir.ll
=== [ testdata/vdir.ll ] =======================

parsing into AST took: 186.322634ms

total time for file "testdata/vdir.ll": 186.357051ms
real 0.19
user 0.19
sys 0.05

$ time l-tm -types=false testdata/vdir.ll
=== [ testdata/vdir.ll ] =======================

parsing into AST took: 182.948167ms

total time for file "testdata/vdir.ll": 182.98329ms
real 0.19
user 0.19
sys 0.03

Running on all 107 LLVM IR files of Coreutils

Type resolution of type definitions enabled

$ time l-tm -types=true testdata/*.ll > /dev/null
real 3.81
user 4.09
sys 0.15

$ time l-tm -types=true testdata/*.ll > /dev/null
real 3.85
user 4.18
sys 0.12

$ time l-tm -types=true testdata/*.ll > /dev/null
real 3.85
user 4.12
sys 0.17

Type resolution of type definitions disabled

$ time l-tm -types=false testdata/*.ll > /dev/null
real 3.74
user 4.05
sys 0.12

$ time l-tm -types=false testdata/*.ll > /dev/null
real 3.72
user 4.05
sys 0.09

$ time l-tm -types=false testdata/*.ll > /dev/null
real 3.72
user 4.05
sys 0.11

pwaller · 2018-10-16T07:40:22Z

😮 nice! 💯

Is this now doing more or less equivalent work as when I made the original report, or are there any other major bits missing?

mewmew · 2018-10-16T09:05:54Z

Is this now doing more or less equivalent work as when I made the original report, or are there any other major bits missing?

Oh, no.. Much work left to do. It's a start though :) This is the list of things to do to translate the AST to IR, roughly:

For entire module

Identifier resolution and translation in module scope.

identifier resolution
- type name resolution (local identifiers used in module scope; e.g. %t)
- global variable and function name resolution (global identifiers used in module scope; e.g. @foo, @42)
- function attribute group ID resolution (attribute group IDs used in module scope; e.g. #42)
- Comdat name resolution (Comdat identifiers used in module scope; e.g. $foo)
- named metadata resolution (metadata names used in module scope; e.g. !foo)
- metadata ID resolution (metadata IDs used in module scope; e.g. !42)
translation from AST to IR representation
- translation of type definitions
- translation of global declarations and definitions
- translation of constants
  - note to self: the blockaddress constant must be translated after basic blocks have been added to functions, and local variables have been assigned names. However, since blockaddress may be used within any function, this may create a dependency relationship that will be a bit tricky to solve in a simple fashion when translating from AST to IR. Will have to give it some thought.
  - a potential solution:
    1. index global identifiers (global variables and functions)
    2. translate function bodies (handle blockaddress with care, as detailed below)
      - translate blockaddress to &ir.ConstBlockAddress{Func: f, Block: &ir.BasicBlock{LocalName: local(old.Name())}}, the point being that we create a dummy IR basic block, that will record the basic block name used in the blockaddress constant. Then, once function bodies have been translated, run f.AssignIDs() to assign local IDs to unnamed instruction results, basic blocks and function parameters. As far as I can tell, this would solve the dependency relationship mentioned above.
    3. translate global init constants
- translation of constant expressions
- translation of function headers
- translation of named metadata definitions
- translation of metadata definitions
- translation of specialized metadata nodes (note: the final IR representation of this data (e.g. DWARF metadata) is not yet known. We would like to find an IR representation that is easy to work with. Any suggestions are warmly appreciated)
- translation of indirect symbols (alias definitions and IFuncs)
- translation of source filename, target definitions, module-level inline assembly, Comdat definitions, attribute group definitions, use-list order directives, inline assembly values (note: all of these are small and may be completed very quickly)

For each function

Identifier resolution and translation in function scope.

identifier resolution
- function parameter, result of instruction and basic block resolution (local identifiers used in function scope; e.g. %foo, %42)
translation from AST to IR representation
- translation of basic blocks
- translation of instructions
- translation of terminators

As you can see, there are a lot of steps. And any of these steps may become the bottle neck if implemented poorly. I'm fairly certain that was the case last time around. And given your profiling, it seems identifier resolution killed the performance by walking the AST too many times (and the AST walker tracked visited nodes to avoid infinite recursion).

So, any ideas on how to better implement this is also warmly welcome.

mewmew · 2018-10-16T09:10:21Z

@pwaller would you like to participate in the future development of llir/llvm more closely? I can see from your previous involvement that you are quite persistent, which is required for a project of this size and ambition.

I would be glad to invite you to the llir GitHub organization, and we can bounce some ideas on how to move things forward, and more importantly what the end goals of the library should be. I'd still be very interested to hear your big ideas :)

Also, do note that I don't expect you to work on this more than you want and understand that you have commitments both in personal life, work, time away from screen, etc. We all do :) So, please consider this a hobby project with a serious ambition in the long term. I would be very glad to invite you onboard, should you choose to join!

Cheerful regards,
Robin

Edit: and not to scare you with the big list above. I'll go ahead and implement most items in the weeks/months to come (so the above invitation was not to be interpreted as hey, I got this huge list, would you like to do it for me). I have a few projects I'm working on that would love to have this well implemented, so have to iterate a few times.

pwaller · 2018-10-16T21:16:43Z

Thanks for the very nicely written message. I am uncertain how much I'll be able to contribute, other than by doing as I am already doing. Let's have a chat on Friday! :)

mewmew · 2018-10-16T23:50:04Z

Let's have a chat on Friday! :)

Sounds good! When may I catch you on Friday?

pwaller · 2018-10-17T08:44:51Z

I sent you a message about it on the 11th around 0900 UTC and 15th around 1700 UTC. Might they have gotten stuck in a spam filter?

mewmew · 2018-10-17T10:09:13Z

I sent you a message about it on the 11th around 0900 UTC and 15th around 1700 UTC. Might they have gotten stuck in a spam filter?

Sent a reply :)

ref: mewspring/mewmew-l#6 (comment) Translate blockaddress to `&ir.ConstBlockAddress{Func: f, Block: &ir.BasicBlock{LocalName: local(old.Name())}}`, the point being that we create a dummy IR basic block, that will record the basic block name used in the blockaddress constant. Then, once function bodies have been translated, run `f.AssignIDs()` to assign local IDs to unnamed instruction results, basic blocks and function parameters. As far as I can tell, this would solve the dependency relationship mentioned above. Updates #6.

mewmew · 2018-11-09T04:20:36Z

As a notice, this issue has been superseded by llir/llvm#34

mewmew · 2018-11-10T22:06:11Z

We managed to achieve ~1.6x slowdown compared to the official LLVM version. This is before trying to profile and further optimize llir/llvm. The main performance gains comes from using Textmapper instead of Gocc, and of course not relying on the IR walker (and it's expensive map access) to handle translation from AST to IR.

In relation, the dev branch of llir/llvm has now surpassed the capabilities of llir/l, mewmew/l and the old llir/llvm master branch. Therefore, and to avoid confusion for future development, all repos except llir/llvm have been depricated (as was the intention all along, to have them be experimental until we had gained enough knowledge to integrate back into llir/llvm). As such, these repos have marked as archived (and moved to mewspring).

mewmew mentioned this issue Oct 13, 2018

Question: status of Go version of Textmapper? inspirer/textmapper#6

Closed

This was referenced Nov 2, 2018

Upcoming release of the llir/llvm project llir/llvm#29

Closed

Performance llir/llvm#34

Open

mewmew closed this as completed Nov 9, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parsing is slow relative to LLVM #6

Parsing is slow relative to LLVM #6

pwaller commented Oct 8, 2018

pwaller commented Oct 8, 2018

pwaller commented Oct 8, 2018

mewmew commented Oct 8, 2018 •

edited

Loading

mewmew commented Oct 12, 2018

mewmew commented Oct 13, 2018 •

edited

Loading

mewmew commented Oct 15, 2018

pwaller commented Oct 15, 2018

mewmew commented Oct 15, 2018

mewmew commented Oct 15, 2018 •

edited

Loading

pwaller commented Oct 16, 2018

mewmew commented Oct 16, 2018 •

edited

Loading

mewmew commented Oct 16, 2018 •

edited

Loading

pwaller commented Oct 16, 2018

mewmew commented Oct 16, 2018

pwaller commented Oct 17, 2018

mewmew commented Oct 17, 2018 •

edited

Loading

mewmew commented Nov 9, 2018

mewmew commented Nov 10, 2018

Parsing is slow relative to LLVM #6

Parsing is slow relative to LLVM #6

Comments

pwaller commented Oct 8, 2018

pwaller commented Oct 8, 2018

pwaller commented Oct 8, 2018

mewmew commented Oct 8, 2018 • edited Loading

mewmew commented Oct 12, 2018

mewmew commented Oct 13, 2018 • edited Loading

mewmew commented Oct 15, 2018

pwaller commented Oct 15, 2018

mewmew commented Oct 15, 2018

mewmew commented Oct 15, 2018 • edited Loading

Type resolution of type definitions enabled

Type resolution of type definitions disabled

Running on all 107 LLVM IR files of Coreutils

Type resolution of type definitions enabled

Type resolution of type definitions disabled

pwaller commented Oct 16, 2018

mewmew commented Oct 16, 2018 • edited Loading

For entire module

For each function

mewmew commented Oct 16, 2018 • edited Loading

pwaller commented Oct 16, 2018

mewmew commented Oct 16, 2018

pwaller commented Oct 17, 2018

mewmew commented Oct 17, 2018 • edited Loading

mewmew commented Nov 9, 2018

mewmew commented Nov 10, 2018

mewmew commented Oct 8, 2018 •

edited

Loading

mewmew commented Oct 13, 2018 •

edited

Loading

mewmew commented Oct 15, 2018 •

edited

Loading

mewmew commented Oct 16, 2018 •

edited

Loading

mewmew commented Oct 16, 2018 •

edited

Loading

mewmew commented Oct 17, 2018 •

edited

Loading