Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Neo VM] Towards an Execution Time Determinastic VM: Dynamic OpCode Price #3600

Open
Jim8y opened this issue Nov 26, 2024 · 12 comments
Open
Labels
Discussion Initial issue state - proposed but not yet accepted

Comments

@Jim8y
Copy link
Contributor

Jim8y commented Nov 26, 2024

Summary or problem description
As is explained in #3535 , neo vm excution time is not determined for the same amount of GAS, say one GAS, it can run at a range from a few minutes to few milliseconds. This issue follows #3535 and propose a potential solution to address that issue.

Do you have any solution you want to propose?

For OpCode operation that involves multiple subitems, we use the subitem count as a factor for the OpCode price.

The solution dependes on pr #3581 for light GC and pr #3528 for benchmark results.

Benchmark env:

BenchmarkDotNet v0.13.12, Windows 11 (10.0.22631.4460/23H2/2023Update/SunValley3)
Intel Core i9-14900HX, 1 CPU, 32 logical and 24 physical cores
.NET SDK 8.0.403
  [Host]     : .NET 8.0.10 (8.0.1024.46610), X64 RyuJIT AVX2
  Job-EMDFFJ : .NET 8.0.10 (8.0.1024.46610), X64 RyuJIT AVX2

For different opcodes, we construct corresponding benchmark scripts that contain infinite loops. During test execution, the bench will automatically stop when 1GAS system fee is exhausted. Since DOS attacks also need to construct loops, we believe it is reasonable to test OpCodes within loops.

The baseline of one GAS execution time is based on OPCODE.PUSHX, which costs around 400 ms.

Solution Overview

Based on a large amount of benchmark data, we designed the following formula to calculate the final price of an opcode during runtime:

If the number of elements exceeds the threshold: Final Price = Initial Price * Number of subitems* Price Coefficient

Otherwise: Final Price = Initial Price

We set this threshold because when the number of elements is below the threshold, its execution time remains basically stable. However, when the number of elements exceeds the threshold, its execution time increases dramatically. Different opcodes will have different thresholds and price coefficients, which need to be determined based on extensive benchmark data.

The key points are:

  • The pricing model adjusts based on the number of subitems
  • There's a critical threshold beyond which computational complexity changes significantly
  • Each opcode have unique threshold and pricing factor.
  • The parameters are empirically derived from benchmark measurements

REVERSEN ( Initial Price: 1<<4)

Benchmark script:

        protected override byte[] CreateOneGASScript( )
        {
            var builder = new InstructionBuilder();
            var initBegin = new JumpTarget();
            builder.AddInstruction(new Instruction { _opCode = VM.OpCode.INITSLOT, _operand = [1, 0] });
            builder.Push(ItemCount);
            builder.AddInstruction(VM.OpCode.STLOC0);
            initBegin._instruction = builder.AddInstruction(VM.OpCode.NOP);
            builder.Push(ushort.MaxValue * 2);
            builder.AddInstruction(VM.OpCode.NEWBUFFER);
            builder.AddInstruction(VM.OpCode.LDLOC0);
            builder.AddInstruction(VM.OpCode.DEC);
            builder.AddInstruction(VM.OpCode.STLOC0);
            builder.AddInstruction(VM.OpCode.LDLOC0);
            builder.Jump(VM.OpCode.JMPIF, initBegin);
            var loopBegin = new JumpTarget { _instruction = builder.AddInstruction(VM.OpCode.NOP) };
            builder.Push(ItemCount);
            builder.AddInstruction(VM.OpCode.REVERSEN);
            builder.Jump(VM.OpCode.JMP, loopBegin);
            return builder.ToArray();
        }

Original benchmark:

Method ItemCount Mean Error StdDev
Bench_OneGAS 4 174.9 ms 1.24 ms 1.10 ms
Bench_OneGAS 8 182.5 ms 3.28 ms 2.91 ms
Bench_OneGAS 16 202.4 ms 2.36 ms 2.09 ms
Bench_OneGAS 32 244.1 ms 3.49 ms 2.92 ms
Bench_OneGAS 64 302.2 ms 5.39 ms 5.05 ms
Bench_OneGAS 128 417.8 ms 4.31 ms 3.82 ms
Bench_OneGAS 256 644.1 ms 8.12 ms 7.59 ms
Bench_OneGAS 512 1,119.7 ms 8.87 ms 8.29 ms
Bench_OneGAS 1024 2,057.8 ms 11.39 ms 10.10 ms
Bench_OneGAS 2040 4,009.0 ms 12.07 ms 10.08 ms

Dynamic Price: ( Initial Price: 1<<2):

SubItem Threshold:64

Dynamic Factor: 0.02

Method ItemCount Mean Error StdDev
Bench_OneGAS 4 447.6 ms 7.13 ms 6.67 ms
Bench_OneGAS 8 459.8 ms 9.08 ms 8.49 ms
Bench_OneGAS 16 521.0 ms 7.90 ms 7.39 ms
Bench_OneGAS 32 628.7 ms 12.46 ms 13.85 ms
Bench_OneGAS 64 511.1 ms 9.64 ms 9.02 ms
Bench_OneGAS 128 523.2 ms 6.63 ms 6.21 ms
Bench_OneGAS 256 469.7 ms 6.49 ms 6.08 ms
Bench_OneGAS 512 473.5 ms 3.18 ms 2.98 ms
Bench_OneGAS 1024 469.4 ms 8.33 ms 7.80 ms
Bench_OneGAS 2040 465.1 ms 6.58 ms 6.15 ms

Keep updating.

@Jim8y Jim8y added the Discussion Initial issue state - proposed but not yet accepted label Nov 26, 2024
@Jim8y
Copy link
Contributor Author

Jim8y commented Nov 26, 2024

The reason that i am not benchmark on single Opcode is becuause 1. DOS attack needs to be in a loop; 2. single OpCode execution time is related to its price, yet what we want is to have a dynamic price for it, thus setting up a benchmark env for single Opcode is way more complex than have a fixed GAS price.

@roman-khimov
Copy link
Contributor

#3510, #3535, #3600, I'm counting...

The baseline of one GAS execution time is based on OPCODE.PUSHX

Why not NOP? I've told about this many times (the last iteration: #3552 (comment)).

not benchmark on single Opcode

The script has NOP/PUSH/JMP per each REVERSEN, this affects executions with low number of elements and your "threshold" might as well be just that. You can have more accurate numbers (like in https://github.com/nspcc-dev/neo-go/blob/e89f9fe2a417c917d40d933c1b6a5f92804d6e2d/pkg/vm/opcodebench_test.go#L13) and accurate numbers is what we need here to make any decisions.

Also, the model doesn't need to be perfect, I'd opt for simpler model that reflects the behavior reasonably (static + num_of_elements * per_element_cost or even static * (num_of_elements >> 2))). "Each opcode have unique threshold and pricing factor" sounds too complex.

@Jim8y
Copy link
Contributor Author

Jim8y commented Nov 27, 2024

Why not NOP? I've told about this many times (the last iteration: #3552 (comment)).

Nop is basically the same, its 500 ms.

not benchmark on single Opcode

The script has NOP/PUSH/JMP per each REVERSEN, this affects executions with low number of elements and your "threshold" might as well be just that. You can have more accurate numbers (like in https://github.com/nspcc-dev/neo-go/blob/e89f9fe2a417c917d40d933c1b6a5f92804d6e2d/pkg/vm/opcodebench_test.go#L13) and accurate numbers is what we need here to make any decisions.

Yes, i have the same concern as well, the problem of this is what we want is to defend against DOS attack and have a worse case execution time expectation. Basically for all DOS attacks, loop is required.

And aside from that, doing benchmark on single OpCode is way more complex than having a fixed GAS fee, the reason is we can not get the execution time until the benchmark is done, which means we need extra process on the value we get from the benchmark to know if the price is fairly set.

i have tried that, need many days to collect tons of data to have a proper equation and reasonable parameters for a single OpCode. That is not the only problem, if you do benchmark on existing OpCode, you will find that even for OpCode that has the same price, their execution time is very very different, and you can not really find a baseline for that,,,,, But with a fixed GAS, i think loop itself took up most of the time, but this is not against our goal here.

And as i said above, even if we did bench on single OpCode, it basically works the same as what we get from having a fixed GAS fee.

Also, the model doesn't need to be perfect, I'd opt for simpler model that reflects the behavior reasonably (static + num_of_elements * per_element_cost or even static * (num_of_elements >> 2))). "Each opcode have unique threshold and pricing factor" sounds too complex.

Having a threshold and a pricing factor at the same time is because i wish to introduce minumum influence to our existing price model, ordinary contract will still behave the same, even cheaper. If we only have one equation without a price factor, the final price can be too cheap or too expensive for different OpCode, for example, UNPACK and PACK need a factor of 1 and 0.5, while REVERSEN need a factor of 0.002 to limit their execution time to around 400-500 ms. otherwise it can be 2s or 30 ms.

But this can be updated accordingly, I am open to any suggestions.

@roman-khimov
Copy link
Contributor

Nop is basically the same, its 500 ms.

Yeah, they're close, still this is the basic unit to measure everything against. At least if we're talking about price model adjustments and not complete rework. It's more or less balanced for a lot of opcodes, the only problematic are the ones that deal with a lot of elements.

Basically for all DOS attacks, loop is required.

That I agree with.

Maybe we need to agree on a list of opcodes that we will not change first since they're considered to be perfectly aligned with execution time already. Then the assumption of "1 GAS executes in ~500ms" can be a good enough approximation. I still think that isolated numbers from NeoGo tests are better, but at the same time I know what you're talking about, Go benchmarks from above have to do a lot of iterations to get any reasonable numbers.

Also, NeoGo test is easy to extend/reproduce.

Having a threshold and a pricing factor at the same time is because i wish to introduce minumum influence to our existing price model

That's exactly the point. Take UNPACK, it's 2048 now, make it 4 * ((num_of_elements + 4) >> 2), get a reasonable 4 for 0-3 elements and 8 for 3-7 which is the majority of the real cases. Wanna play a real game with 2048 elements? Pay 2052 which is about the same as now. Specific numbers can be adjusted, but you get the idea, basic price pays for 4 elements and then grows linearly with the same per-4 step. But I'd really want to have this formula be the same for every opcode, different "thresholds" and coefficients seriously complicate the game.

0.5 ... 0.002

Because these are all wrong numbers, you know it, we need something that can be counted easily in integers.

@Jim8y
Copy link
Contributor Author

Jim8y commented Nov 27, 2024

That's exactly the point. Take UNPACK, it's 2048 now, make it 4 * ((num_of_elements + 4) >> 2), get a reasonable 4 for 0-3 elements and 8 for 3-7 which is the majority of the real cases. Wanna play a real game with 2048 elements? Pay 2052 which is about the same as now. Specific numbers can be adjusted, but you get the idea, basic price pays for 4 elements and then grows linearly with the same per-4 step. But I'd really want to have this formula be the same for every opcode, different "thresholds" and coefficients seriously complicate the game.

0.5 ... 0.002

Because these are all wrong numbers, you know it, we need something that can be counted easily in integers.

I get all your point above. Let me have a try on your methods. But different coefficients for different OpCode might be unavoidable, cause the execution time increase ratio can be different for different OpCodes, 0.5,and 0.002 are for that purpose. Some opcoe execution time increases sharply when the subitem number grows, while some grows more gentle.

@roman-khimov
Copy link
Contributor

Sure, this of course depends on real benchmark data.

@Jim8y
Copy link
Contributor Author

Jim8y commented Nov 30, 2024

@roman-khimov this is the benchmark with 4 * ((num_of_elements + 4) >> 2), problem without a threshold here is that when the subitem count low, the price is high compared to its execution time, while i think majority of the contract wont have large subitems, but something like 10,20,30, expensive price is not good for normal use.

Method ItemCount Mean Error StdDev
Bench_OneGAS 4 154.5 ms 3.01 ms 3.46 ms
Bench_OneGAS 8 178.3 ms 3.56 ms 10.04 ms
Bench_OneGAS 16 246.7 ms 5.25 ms 15.33 ms
Bench_OneGAS 32 471.0 ms 19.13 ms 56.40 ms
Bench_OneGAS 64 546.3 ms 31.88 ms 94.01 ms
Bench_OneGAS 128 520.6 ms 33.42 ms 98.53 ms
Bench_OneGAS 256 485.4 ms 31.65 ms 93.33 ms
Bench_OneGAS 512 435.2 ms 28.57 ms 84.25 ms
Bench_OneGAS 1024 448.8 ms 24.81 ms 73.16 ms
Bench_OneGAS 2040 294.5 ms 4.25 ms 3.97 ms

With a threshold of 64, the result is:

Method ItemCount Mean Error StdDev Median
Bench_OneGAS 4 515.7 ms 7.35 ms 6.88 ms 516.3 ms
Bench_OneGAS 8 667.3 ms 13.17 ms 32.30 ms 666.0 ms
Bench_OneGAS 16 238.5 ms 4.77 ms 13.76 ms 240.2 ms
Bench_OneGAS 32 462.8 ms 17.02 ms 50.19 ms 454.7 ms
Bench_OneGAS 64 557.6 ms 31.91 ms 94.08 ms 553.0 ms
Bench_OneGAS 128 557.7 ms 35.62 ms 105.03 ms 547.8 ms
Bench_OneGAS 256 501.2 ms 34.20 ms 100.83 ms 467.3 ms
Bench_OneGAS 512 438.8 ms 28.37 ms 83.21 ms 432.6 ms
Bench_OneGAS 1024 440.4 ms 22.68 ms 66.51 ms 426.8 ms
Bench_OneGAS 2040 293.9 ms 4.60 ms 4.30 ms 292.9 ms

@roman-khimov
Copy link
Contributor

when the subitem count low, the price is high compared to its execution time

I see your formula is a bit more accurate. But likely you're overoptimizing it for a particular CPU. These "thresholds" are very likely to be cache/pipeline effects and these are very different for different CPUs, you're using Intel Core i9-14900HX, now take some Ryzen and see how it works there. Then some newest Xeon, then an M1/M2/M3/whatever is it now and you're likely to see different numbers again. You can't make it perfect, so I'd set a goal of making it good enough. And from your numbers it looks OK, the difference is ~3-4× most. Notice that the "threshold" case also has some ~3× different cases, it's just that they're structured a bit differently.

Maybe we can build this ladder with a step of 8 or 16, btw, can be tried, but I'd expect it to be good enough in the end.

expensive price

And "expensive" or not should first of all be compared to the current price. And there I think we'd get a huge improvement anyway. Like paying 8 instead of 2048 is a much bigger difference than worrying that it takes 150ms per 1 GAS instead of 500ms.

@roman-khimov
Copy link
Contributor

And I'm not even talking about different implementations and potential future optimizations, btw.

@cschuchardt88
Copy link
Member

cschuchardt88 commented Nov 30, 2024

these will also differ from dotnet vs golang. Because dotnet has it own assembly language that runs in a vm. Also take jit into account for optimization. If neovm was written for x86_64 in c/c++ than your benchmark for CPU would be good. Your only benchmarking the runtime.

@Jim8y
Copy link
Contributor Author

Jim8y commented Nov 30, 2024

This is PACK without threshold and factor:

Method ItemCount Mean Error StdDev Median
Bench_OneGAS 4 98.87 ms 1.864 ms 1.831 ms 99.10 ms
Bench_OneGAS 8 117.30 ms 2.247 ms 5.297 ms 116.81 ms
Bench_OneGAS 16 150.08 ms 4.001 ms 11.607 ms 152.22 ms
Bench_OneGAS 32 311.90 ms 11.550 ms 34.054 ms 322.22 ms
Bench_OneGAS 64 364.28 ms 20.553 ms 60.602 ms 368.60 ms
Bench_OneGAS 128 367.39 ms 25.840 ms 76.189 ms 368.45 ms
Bench_OneGAS 256 338.10 ms 22.533 ms 66.438 ms 344.35 ms
Bench_OneGAS 512 292.99 ms 16.181 ms 47.711 ms 283.96 ms
Bench_OneGAS 1024 269.50 ms 9.724 ms 28.671 ms 266.70 ms
Bench_OneGAS 2040 209.94 ms 2.680 ms 2.376 ms 209.42 ms

with threshold of 16 and factor of 0.7

Method ItemCount Mean Error StdDev
Bench_OneGAS 4 558.2 ms 8.46 ms 7.92 ms
Bench_OneGAS 8 705.4 ms 13.89 ms 12.31 ms
Bench_OneGAS 16 207.3 ms 4.13 ms 10.52 ms
Bench_OneGAS 32 413.7 ms 14.87 ms 43.83 ms
Bench_OneGAS 64 528.4 ms 26.96 ms 79.49 ms
Bench_OneGAS 128 491.6 ms 30.98 ms 91.35 ms
Bench_OneGAS 256 423.3 ms 25.79 ms 76.03 ms
Bench_OneGAS 512 371.7 ms 21.84 ms 64.41 ms
Bench_OneGAS 1024 362.7 ms 24.49 ms 72.20 ms
Bench_OneGAS 2040 257.0 ms 2.12 ms 1.88 ms

@Jim8y
Copy link
Contributor Author

Jim8y commented Nov 30, 2024

when the subitem count low, the price is high compared to its execution time

I see your formula is a bit more accurate. But likely you're overoptimizing it for a particular CPU. These "thresholds" are very likely to be cache/pipeline effects and these are very different for different CPUs, you're using Intel Core i9-14900HX, now take some Ryzen and see how it works there. Then some newest Xeon, then an M1/M2/M3/whatever is it now and you're likely to see different numbers again. You can't make it perfect, so I'd set a goal of making it good enough. And from your numbers it looks OK, the difference is ~3-4× most. Notice that the "threshold" case also has some ~3× different cases, it's just that they're structured a bit differently.

Maybe we can build this ladder with a step of 8 or 16, btw, can be tried, but I'd expect it to be good enough in the end.

expensive price

And "expensive" or not should first of all be compared to the current price. And there I think we'd get a huge improvement anyway. Like paying 8 instead of 2048 is a much bigger difference than worrying that it takes 150ms per 1 GAS instead of 500ms.

As you said different platforms can have different performance, but since i dont really have that many machines to test, and even if we can, we still need to set a target for the bench... Anyway, i am open to all options here, if you think it is OK, i would happily adopt your suggestion directly. As to comment from @cschuchardt88, the same as above, we still need to have one final propsoal that works for all languages, differences between languages, yes, but i dont think i can take that into consideration. But i think anna or roman will provide the correspoing golang bench result and suggest for a proper update to the equation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Discussion Initial issue state - proposed but not yet accepted
Projects
None yet
Development

No branches or pull requests

3 participants