-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Different performance from test WU run on FAH and on openMM #35
Comments
These are my benchmark results (based on PantherX nVidia charts), unfortunately on the first runs the charts disappeared: |
@peastman Peter, sorry for bothering you, but I'm out of ideas why the same system executed on a local build of openMM in all but one case is about 10% slower than if executed through F@H (I'm concluding that from ns/day information on F@H benchmark stats compared to the output of the benchmark generation script from @jchodera, hope those figures are comparable). The only exception is a system with very low atom count, where the AMD Radeon VII anyhow performs extremely bad. I've looked through the parameters in CMakeCache to check whether there are any optimizations or debug settings which could make a difference, but didn't find any. Do you have an idea what could cause the difference and where I should search further. On the local build the system is executed through a python script (probably in contrast to F@H core22), but as most steps should be executed within openMM that probably should not create the difference, or am I wrong in that assumption? |
So far as I know, they ought to be completely comparable. Your script pretty much looks fine. I would make just a few minor changes. I would add
before you begin timing. As you currently have it, a lot of initialization (including compiling kernels) happens after you start timing. This will make sure all initialization is finished before then. I would also add |
Thanks for your suggestion, Peter! After the improvements suggested and after considering the F@H results from the logs on my machine, the difference gets much smaller:
So to the higher difference in the first analysis contributed two systematic errors, the one pointed out by Peter and the fact that I had taken the figures from F@H stats, not from the reports of the F@H runs in the logs on my machine (where the stats may be higher, as there maximum is taken and that maximum may come from other machines with different setup). There is some systematic error left, as I've been running fixed 50000 steps, while the runs on F@H had different step counts based on estimated runtime on 2080ti. So some difference is left, about 2,5% to 3% for larger systems, somewhat higher for smaller systems. The irregularity on RUN9 (F@H results lower than local openMM) did not show in the data from my logs. Anyhow the differences now seem so small, with the larger ones in small, quick projects, where step count could contribute, that for my side further analysis is not required. Unless you would be interested in further analysis I think the issue can be closed. Sorry for bothering you and thanks for the help. |
The last days I have played with the AMD HIP port of openMM on FAH test WUs from the 17102 test project on my Radeon VIIs. I have compared the ns/day results from the FAH benchmark with ns/day values from these systems run on openmm master (7.5) with HIP platform and openmm 7.4.2 according to the branch run in FAH core22.
That the results with platform HIP are different from those with platform openCL is logical. However I have also seen performance difference (of about 10%) in the comparision between the FAH reported ns/day values and the ns/day values on a local run of the system in openMM.
For example RUN10:
or example RUN13:
So it seems the results on openMM openCL are 10%-20% lower than these on FAH, which I don't understand. I would expect the opposite, since the runs on FAH include also checkpoints.
Would be good to understand the differences and achieve similar results in execution of the runs on openCL in FAH and openCL on local openMM. As long as there are significant differences effective benchmarks are not possible before integration of a new approach into a new FAH core, which is a big effort. Being able to run benchmarks in advance directly in openMM would be helpful to analyse performance effects of different changes.
This is the script I used, derived from the script to generate the 17101 (and probably 17102) test WUs:
The text was updated successfully, but these errors were encountered: