-
-
Notifications
You must be signed in to change notification settings - Fork 229
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GPU Parallel Monte Carlo #308
Comments
The parallel Monte Carlo simulation tooling is built around running Until we have that, yes our GPU parallelism is at the vector level. If I'm not mistaken, that's the same as VexCL and ODEINT strategy which GPU-parallelize through GPU-based vectors? If not, then I'd like to hear some details about what they're doing. Given that, you'd parallelize a Monte Carlo simulation by making a giant matrix with different columns corresponding to runs with different parameters. Of course, this would run into issues with events if the event handling dependent on the solution values, since then running a bunch of concurrent simulations would make there be a lot of events and make the time step very small. In which case you'd want to have our first solution. But yes, parallelizing lots of runs with small ODEs is on our mind and we need a good solution. I may end up coding specialized versions of the ODE solver that can compile to the GPU more easily specifically for this case, if I cannot get the more general compilation to work soon after v0.7 is released. |
Bummer. Thanks for the info though. Other than this one "killer feature" I would say that DifferentialEquations.jl is the tool I have been looking for for years. Hopefully, some of my experience using VexCL + ODEINT will be of some help.
Yes and no. VexCL certainly does parallelism at the vector level, but I think the vectors are defined differently. DifferentialEquations.jl is parallelizing across the state vector. VexCL is parallelizing across an ensemble vector. For example, lets say you want to run For my application the dynamics are quite slow, so I've been able to get away with a fixed time-step RK4 integrator...for now. It does sound like adaptive time stepping should work with VexCL, you just need to keep track of an array of time steps, one for each thread. Each kernel call simply does a single step update for the whole ensemble. This gets wrapped in a host This was all done using VexCL's symbolic types. For more details on how I generated the kernels for my problem see VexCL Issue#202 . I also discussed event detecction in VexCL Issue#203 |
I see. Yes, you can do the VexCL version in DiffEq by using a GPUArray of static arrays and then broadcast calling your derivative kernel on this GPUArray{SArray}. But that will never scale to event handling since then the timesteps are global, i.e. all integrations are advancing with the same But I have a solution in mind. It will need Julia v1.0 and since that will be out in <2 months, let's wait for then. Ping me after Julia v1.0 comes out and I'll get an adaptive integrator that auto-compiles to the GPU with event handling. |
@ChrisRackauckas Awesome! Thanks for all your great work. |
@ChrisRackauckas Thank you for your amazing package, I'm using it already heavily for my simulations. Julia 1.0 is now out, are there already news about the adaptive intergrator that auto-compiles to the GPU with event handling? Thanks in advance. |
What I have planned here for the short term won't do event handling. Doing event handling would require getting the of OrdinaryDiffEq.jl to compile to the GPU which needs Cassette.jl to be released (so we can overdub the array allocations) |
Found some precedence: https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5710898 |
@ChrisRackauckas Good find. I actually stumbled on that paper a few years ago when I started using VexCL+ODEINT but completely forgot about it. FYI, the author has source code here: LibBi My apologies for my radio silence on this. I somehow missed your previous posts. What do you have in mind for your short-term plan? |
My short term plan is a direct implementation of Tsit5 into SimpleDiffEq.jl using only static arrays and only allowing @YingboMa could probably help with getting some of it done. We have applications and funders who are interested so that will drive it pretty soon. |
The SimpleTsit5 and SimpleATsit5 implementations should be GPU-compatible given discussions with the CUDANative.jl devs. The PR which added the adaptive one can be found here: Essentially that |
This is now being handled at https://github.com/JuliaDiffEq/DiffEqGPU.jl |
I have a need to run some large scale Monte Carlo simulations w/ random initial conditions and parameters for small scale ODEs (<10 states). Is it possible to do parallel Monte Carlo simulation using the GPU?
I see that you support GPU, but all the examples look like the parallelization happens within the EOMs and you need very large systems vs. each GPU thread solving the EOMs independently on different data.
I have implemented large scale Monte Carlo simulations w/ VexCL and Boost ODEINT but event detection is essentially non existent. I will admit that I am a Julia novice, but my interest in learning more is primarily driven by this awesome library.
Thanks.
The text was updated successfully, but these errors were encountered: