[Neo VM] Towards an Execution Time Determinastic VM: Dynamic OpCode Price #3600

Jim8y · 2024-11-26T15:32:49Z

Summary or problem description
As is explained in #3535 , neo vm excution time is not determined for the same amount of GAS, say one GAS, it can run at a range from a few minutes to few milliseconds. This issue follows #3535 and propose a potential solution to address that issue.

Do you have any solution you want to propose?

For OpCode operation that involves multiple subitems, we use the subitem count as a factor for the OpCode price.

The solution dependes on pr #3581 for light GC and pr #3528 for benchmark results.

Benchmark env:

BenchmarkDotNet v0.13.12, Windows 11 (10.0.22631.4460/23H2/2023Update/SunValley3)
Intel Core i9-14900HX, 1 CPU, 32 logical and 24 physical cores
.NET SDK 8.0.403
  [Host]     : .NET 8.0.10 (8.0.1024.46610), X64 RyuJIT AVX2
  Job-EMDFFJ : .NET 8.0.10 (8.0.1024.46610), X64 RyuJIT AVX2

For different opcodes, we construct corresponding benchmark scripts that contain infinite loops. During test execution, the bench will automatically stop when 1GAS system fee is exhausted. Since DOS attacks also need to construct loops, we believe it is reasonable to test OpCodes within loops.

The baseline of one GAS execution time is based on OPCODE.PUSHX, which costs around 400 ms.

Solution Overview

Based on a large amount of benchmark data, we designed the following formula to calculate the final price of an opcode during runtime:

If the number of elements exceeds the threshold: Final Price = Initial Price * Number of subitems* Price Coefficient

Otherwise: Final Price = Initial Price

We set this threshold because when the number of elements is below the threshold, its execution time remains basically stable. However, when the number of elements exceeds the threshold, its execution time increases dramatically. Different opcodes will have different thresholds and price coefficients, which need to be determined based on extensive benchmark data.

The key points are:

The pricing model adjusts based on the number of subitems
There's a critical threshold beyond which computational complexity changes significantly
Each opcode have unique threshold and pricing factor.
The parameters are empirically derived from benchmark measurements

REVERSEN （ Initial Price: 1<<4）

Benchmark script:

        protected override byte[] CreateOneGASScript( )
        {
            var builder = new InstructionBuilder();
            var initBegin = new JumpTarget();
            builder.AddInstruction(new Instruction { _opCode = VM.OpCode.INITSLOT, _operand = [1, 0] });
            builder.Push(ItemCount);
            builder.AddInstruction(VM.OpCode.STLOC0);
            initBegin._instruction = builder.AddInstruction(VM.OpCode.NOP);
            builder.Push(ushort.MaxValue * 2);
            builder.AddInstruction(VM.OpCode.NEWBUFFER);
            builder.AddInstruction(VM.OpCode.LDLOC0);
            builder.AddInstruction(VM.OpCode.DEC);
            builder.AddInstruction(VM.OpCode.STLOC0);
            builder.AddInstruction(VM.OpCode.LDLOC0);
            builder.Jump(VM.OpCode.JMPIF, initBegin);
            var loopBegin = new JumpTarget { _instruction = builder.AddInstruction(VM.OpCode.NOP) };
            builder.Push(ItemCount);
            builder.AddInstruction(VM.OpCode.REVERSEN);
            builder.Jump(VM.OpCode.JMP, loopBegin);
            return builder.ToArray();
        }

Original benchmark：

Method	ItemCount	Mean	Error	StdDev
Bench_OneGAS	4	174.9 ms	1.24 ms	1.10 ms
Bench_OneGAS	8	182.5 ms	3.28 ms	2.91 ms
Bench_OneGAS	16	202.4 ms	2.36 ms	2.09 ms
Bench_OneGAS	32	244.1 ms	3.49 ms	2.92 ms
Bench_OneGAS	64	302.2 ms	5.39 ms	5.05 ms
Bench_OneGAS	128	417.8 ms	4.31 ms	3.82 ms
Bench_OneGAS	256	644.1 ms	8.12 ms	7.59 ms
Bench_OneGAS	512	1,119.7 ms	8.87 ms	8.29 ms
Bench_OneGAS	1024	2,057.8 ms	11.39 ms	10.10 ms
Bench_OneGAS	2040	4,009.0 ms	12.07 ms	10.08 ms

Dynamic Price: ( Initial Price: 1<<2)：

SubItem Threshold：64

Dynamic Factor： 0.02

Method	ItemCount	Mean	Error	StdDev
Bench_OneGAS	4	447.6 ms	7.13 ms	6.67 ms
Bench_OneGAS	8	459.8 ms	9.08 ms	8.49 ms
Bench_OneGAS	16	521.0 ms	7.90 ms	7.39 ms
Bench_OneGAS	32	628.7 ms	12.46 ms	13.85 ms
Bench_OneGAS	64	511.1 ms	9.64 ms	9.02 ms
Bench_OneGAS	128	523.2 ms	6.63 ms	6.21 ms
Bench_OneGAS	256	469.7 ms	6.49 ms	6.08 ms
Bench_OneGAS	512	473.5 ms	3.18 ms	2.98 ms
Bench_OneGAS	1024	469.4 ms	8.33 ms	7.80 ms
Bench_OneGAS	2040	465.1 ms	6.58 ms	6.15 ms

Keep updating.

The text was updated successfully, but these errors were encountered:

Jim8y · 2024-11-26T15:45:18Z

The reason that i am not benchmark on single Opcode is becuause 1. DOS attack needs to be in a loop; 2. single OpCode execution time is related to its price, yet what we want is to have a dynamic price for it, thus setting up a benchmark env for single Opcode is way more complex than have a fixed GAS price.

roman-khimov · 2024-11-26T16:20:12Z

#3510, #3535, #3600, I'm counting...

The baseline of one GAS execution time is based on OPCODE.PUSHX

Why not NOP? I've told about this many times (the last iteration: #3552 (comment)).

not benchmark on single Opcode

The script has NOP/PUSH/JMP per each REVERSEN, this affects executions with low number of elements and your "threshold" might as well be just that. You can have more accurate numbers (like in https://github.com/nspcc-dev/neo-go/blob/e89f9fe2a417c917d40d933c1b6a5f92804d6e2d/pkg/vm/opcodebench_test.go#L13) and accurate numbers is what we need here to make any decisions.

Also, the model doesn't need to be perfect, I'd opt for simpler model that reflects the behavior reasonably (static + num_of_elements * per_element_cost or even static * (num_of_elements >> 2))). "Each opcode have unique threshold and pricing factor" sounds too complex.

Jim8y · 2024-11-27T05:45:20Z

Why not NOP? I've told about this many times (the last iteration: #3552 (comment)).

Nop is basically the same, its 500 ms.

not benchmark on single Opcode

The script has NOP/PUSH/JMP per each REVERSEN, this affects executions with low number of elements and your "threshold" might as well be just that. You can have more accurate numbers (like in https://github.com/nspcc-dev/neo-go/blob/e89f9fe2a417c917d40d933c1b6a5f92804d6e2d/pkg/vm/opcodebench_test.go#L13) and accurate numbers is what we need here to make any decisions.

Yes, i have the same concern as well, the problem of this is what we want is to defend against DOS attack and have a worse case execution time expectation. Basically for all DOS attacks, loop is required.

And aside from that, doing benchmark on single OpCode is way more complex than having a fixed GAS fee, the reason is we can not get the execution time until the benchmark is done, which means we need extra process on the value we get from the benchmark to know if the price is fairly set.

i have tried that, need many days to collect tons of data to have a proper equation and reasonable parameters for a single OpCode. That is not the only problem, if you do benchmark on existing OpCode, you will find that even for OpCode that has the same price, their execution time is very very different, and you can not really find a baseline for that,,,,, But with a fixed GAS, i think loop itself took up most of the time, but this is not against our goal here.

And as i said above, even if we did bench on single OpCode, it basically works the same as what we get from having a fixed GAS fee.

Also, the model doesn't need to be perfect, I'd opt for simpler model that reflects the behavior reasonably (static + num_of_elements * per_element_cost or even static * (num_of_elements >> 2))). "Each opcode have unique threshold and pricing factor" sounds too complex.

Having a threshold and a pricing factor at the same time is because i wish to introduce minumum influence to our existing price model, ordinary contract will still behave the same, even cheaper. If we only have one equation without a price factor, the final price can be too cheap or too expensive for different OpCode, for example, UNPACK and PACK need a factor of 1 and 0.5, while REVERSEN need a factor of 0.002 to limit their execution time to around 400-500 ms. otherwise it can be 2s or 30 ms.

But this can be updated accordingly, I am open to any suggestions.

roman-khimov · 2024-11-27T08:02:59Z

Nop is basically the same, its 500 ms.

Yeah, they're close, still this is the basic unit to measure everything against. At least if we're talking about price model adjustments and not complete rework. It's more or less balanced for a lot of opcodes, the only problematic are the ones that deal with a lot of elements.

Basically for all DOS attacks, loop is required.

That I agree with.

Maybe we need to agree on a list of opcodes that we will not change first since they're considered to be perfectly aligned with execution time already. Then the assumption of "1 GAS executes in ~500ms" can be a good enough approximation. I still think that isolated numbers from NeoGo tests are better, but at the same time I know what you're talking about, Go benchmarks from above have to do a lot of iterations to get any reasonable numbers.

Also, NeoGo test is easy to extend/reproduce.

Having a threshold and a pricing factor at the same time is because i wish to introduce minumum influence to our existing price model

That's exactly the point. Take UNPACK, it's 2048 now, make it 4 * ((num_of_elements + 4) >> 2), get a reasonable 4 for 0-3 elements and 8 for 3-7 which is the majority of the real cases. Wanna play a real game with 2048 elements? Pay 2052 which is about the same as now. Specific numbers can be adjusted, but you get the idea, basic price pays for 4 elements and then grows linearly with the same per-4 step. But I'd really want to have this formula be the same for every opcode, different "thresholds" and coefficients seriously complicate the game.

0.5 ... 0.002

Because these are all wrong numbers, you know it, we need something that can be counted easily in integers.

Jim8y · 2024-11-27T10:53:13Z

That's exactly the point. Take UNPACK, it's 2048 now, make it 4 * ((num_of_elements + 4) >> 2), get a reasonable 4 for 0-3 elements and 8 for 3-7 which is the majority of the real cases. Wanna play a real game with 2048 elements? Pay 2052 which is about the same as now. Specific numbers can be adjusted, but you get the idea, basic price pays for 4 elements and then grows linearly with the same per-4 step. But I'd really want to have this formula be the same for every opcode, different "thresholds" and coefficients seriously complicate the game.

0.5 ... 0.002

Because these are all wrong numbers, you know it, we need something that can be counted easily in integers.

I get all your point above. Let me have a try on your methods. But different coefficients for different OpCode might be unavoidable, cause the execution time increase ratio can be different for different OpCodes, 0.5,and 0.002 are for that purpose. Some opcoe execution time increases sharply when the subitem number grows, while some grows more gentle.

roman-khimov · 2024-11-27T12:23:54Z

Sure, this of course depends on real benchmark data.

Jim8y · 2024-11-30T07:57:10Z

@roman-khimov this is the benchmark with 4 * ((num_of_elements + 4) >> 2), problem without a threshold here is that when the subitem count low, the price is high compared to its execution time, while i think majority of the contract wont have large subitems, but something like 10,20,30, expensive price is not good for normal use.

Method	ItemCount	Mean	Error	StdDev
Bench_OneGAS	4	154.5 ms	3.01 ms	3.46 ms
Bench_OneGAS	8	178.3 ms	3.56 ms	10.04 ms
Bench_OneGAS	16	246.7 ms	5.25 ms	15.33 ms
Bench_OneGAS	32	471.0 ms	19.13 ms	56.40 ms
Bench_OneGAS	64	546.3 ms	31.88 ms	94.01 ms
Bench_OneGAS	128	520.6 ms	33.42 ms	98.53 ms
Bench_OneGAS	256	485.4 ms	31.65 ms	93.33 ms
Bench_OneGAS	512	435.2 ms	28.57 ms	84.25 ms
Bench_OneGAS	1024	448.8 ms	24.81 ms	73.16 ms
Bench_OneGAS	2040	294.5 ms	4.25 ms	3.97 ms

With a threshold of 64, the result is:

Method	ItemCount	Mean	Error	StdDev	Median
Bench_OneGAS	4	515.7 ms	7.35 ms	6.88 ms	516.3 ms
Bench_OneGAS	8	667.3 ms	13.17 ms	32.30 ms	666.0 ms
Bench_OneGAS	16	238.5 ms	4.77 ms	13.76 ms	240.2 ms
Bench_OneGAS	32	462.8 ms	17.02 ms	50.19 ms	454.7 ms
Bench_OneGAS	64	557.6 ms	31.91 ms	94.08 ms	553.0 ms
Bench_OneGAS	128	557.7 ms	35.62 ms	105.03 ms	547.8 ms
Bench_OneGAS	256	501.2 ms	34.20 ms	100.83 ms	467.3 ms
Bench_OneGAS	512	438.8 ms	28.37 ms	83.21 ms	432.6 ms
Bench_OneGAS	1024	440.4 ms	22.68 ms	66.51 ms	426.8 ms
Bench_OneGAS	2040	293.9 ms	4.60 ms	4.30 ms	292.9 ms

roman-khimov · 2024-11-30T08:39:07Z

when the subitem count low, the price is high compared to its execution time

I see your formula is a bit more accurate. But likely you're overoptimizing it for a particular CPU. These "thresholds" are very likely to be cache/pipeline effects and these are very different for different CPUs, you're using Intel Core i9-14900HX, now take some Ryzen and see how it works there. Then some newest Xeon, then an M1/M2/M3/whatever is it now and you're likely to see different numbers again. You can't make it perfect, so I'd set a goal of making it good enough. And from your numbers it looks OK, the difference is ~3-4× most. Notice that the "threshold" case also has some ~3× different cases, it's just that they're structured a bit differently.

Maybe we can build this ladder with a step of 8 or 16, btw, can be tried, but I'd expect it to be good enough in the end.

expensive price

And "expensive" or not should first of all be compared to the current price. And there I think we'd get a huge improvement anyway. Like paying 8 instead of 2048 is a much bigger difference than worrying that it takes 150ms per 1 GAS instead of 500ms.

roman-khimov · 2024-11-30T08:40:58Z

And I'm not even talking about different implementations and potential future optimizations, btw.

cschuchardt88 · 2024-11-30T09:08:01Z

these will also differ from dotnet vs golang. Because dotnet has it own assembly language that runs in a vm. Also take jit into account for optimization. If neovm was written for x86_64 in c/c++ than your benchmark for CPU would be good. Your only benchmarking the runtime.

Jim8y · 2024-11-30T09:15:50Z

This is PACK without threshold and factor:

Method	ItemCount	Mean	Error	StdDev	Median
Bench_OneGAS	4	98.87 ms	1.864 ms	1.831 ms	99.10 ms
Bench_OneGAS	8	117.30 ms	2.247 ms	5.297 ms	116.81 ms
Bench_OneGAS	16	150.08 ms	4.001 ms	11.607 ms	152.22 ms
Bench_OneGAS	32	311.90 ms	11.550 ms	34.054 ms	322.22 ms
Bench_OneGAS	64	364.28 ms	20.553 ms	60.602 ms	368.60 ms
Bench_OneGAS	128	367.39 ms	25.840 ms	76.189 ms	368.45 ms
Bench_OneGAS	256	338.10 ms	22.533 ms	66.438 ms	344.35 ms
Bench_OneGAS	512	292.99 ms	16.181 ms	47.711 ms	283.96 ms
Bench_OneGAS	1024	269.50 ms	9.724 ms	28.671 ms	266.70 ms
Bench_OneGAS	2040	209.94 ms	2.680 ms	2.376 ms	209.42 ms

with threshold of 16 and factor of 0.7

Method	ItemCount	Mean	Error	StdDev
Bench_OneGAS	4	558.2 ms	8.46 ms	7.92 ms
Bench_OneGAS	8	705.4 ms	13.89 ms	12.31 ms
Bench_OneGAS	16	207.3 ms	4.13 ms	10.52 ms
Bench_OneGAS	32	413.7 ms	14.87 ms	43.83 ms
Bench_OneGAS	64	528.4 ms	26.96 ms	79.49 ms
Bench_OneGAS	128	491.6 ms	30.98 ms	91.35 ms
Bench_OneGAS	256	423.3 ms	25.79 ms	76.03 ms
Bench_OneGAS	512	371.7 ms	21.84 ms	64.41 ms
Bench_OneGAS	1024	362.7 ms	24.49 ms	72.20 ms
Bench_OneGAS	2040	257.0 ms	2.12 ms	1.88 ms

Jim8y · 2024-11-30T09:21:52Z

when the subitem count low, the price is high compared to its execution time

I see your formula is a bit more accurate. But likely you're overoptimizing it for a particular CPU. These "thresholds" are very likely to be cache/pipeline effects and these are very different for different CPUs, you're using Intel Core i9-14900HX, now take some Ryzen and see how it works there. Then some newest Xeon, then an M1/M2/M3/whatever is it now and you're likely to see different numbers again. You can't make it perfect, so I'd set a goal of making it good enough. And from your numbers it looks OK, the difference is ~3-4× most. Notice that the "threshold" case also has some ~3× different cases, it's just that they're structured a bit differently.

Maybe we can build this ladder with a step of 8 or 16, btw, can be tried, but I'd expect it to be good enough in the end.

expensive price

And "expensive" or not should first of all be compared to the current price. And there I think we'd get a huge improvement anyway. Like paying 8 instead of 2048 is a much bigger difference than worrying that it takes 150ms per 1 GAS instead of 500ms.

As you said different platforms can have different performance, but since i dont really have that many machines to test, and even if we can, we still need to set a target for the bench... Anyway, i am open to all options here, if you think it is OK, i would happily adopt your suggestion directly. As to comment from @cschuchardt88, the same as above, we still need to have one final propsoal that works for all languages, differences between languages, yes, but i dont think i can take that into consideration. But i think anna or roman will provide the correspoing golang bench result and suggest for a proper update to the equation.

Jim8y added the Discussion Initial issue state - proposed but not yet accepted label Nov 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Neo VM] Towards an Execution Time Determinastic VM: Dynamic OpCode Price #3600

[Neo VM] Towards an Execution Time Determinastic VM: Dynamic OpCode Price #3600

Jim8y commented Nov 26, 2024 •

edited

Loading

Jim8y commented Nov 26, 2024

roman-khimov commented Nov 26, 2024

Jim8y commented Nov 27, 2024 •

edited

Loading

roman-khimov commented Nov 27, 2024

Jim8y commented Nov 27, 2024

roman-khimov commented Nov 27, 2024

Jim8y commented Nov 30, 2024 •

edited

Loading

roman-khimov commented Nov 30, 2024

roman-khimov commented Nov 30, 2024

cschuchardt88 commented Nov 30, 2024 •

edited

Loading

Jim8y commented Nov 30, 2024

Jim8y commented Nov 30, 2024

[Neo VM] Towards an Execution Time Determinastic VM: Dynamic OpCode Price #3600

[Neo VM] Towards an Execution Time Determinastic VM: Dynamic OpCode Price #3600

Comments

Jim8y commented Nov 26, 2024 • edited Loading

Benchmark env:

Solution Overview

REVERSEN （ Initial Price: 1<<4）

Benchmark script:

Original benchmark：

Dynamic Price: ( Initial Price: 1<<2)：

Keep updating.

Jim8y commented Nov 26, 2024

roman-khimov commented Nov 26, 2024

Jim8y commented Nov 27, 2024 • edited Loading

roman-khimov commented Nov 27, 2024

Jim8y commented Nov 27, 2024

roman-khimov commented Nov 27, 2024

Jim8y commented Nov 30, 2024 • edited Loading

roman-khimov commented Nov 30, 2024

roman-khimov commented Nov 30, 2024

cschuchardt88 commented Nov 30, 2024 • edited Loading

Jim8y commented Nov 30, 2024

Jim8y commented Nov 30, 2024

Jim8y commented Nov 26, 2024 •

edited

Loading

Jim8y commented Nov 27, 2024 •

edited

Loading

Jim8y commented Nov 30, 2024 •

edited

Loading

cschuchardt88 commented Nov 30, 2024 •

edited

Loading