Optimal number of parallel EnergyPlus simulations on multi-core machine

energyplus
python
batch-run
retag add tags

I was using Python multiprocessing and subprocess.run() to run multiple EnergyPlus simulations at the same time. Suppose I have a dual CPU machine with each CPU has 4 cores and 8 logical processors. As a rule of thumb, should the optimal number of parallel runs/subprocess be the number of CPUs (2), cores (8), or logical processors (16)?

The EnergyPlus documentation Run EnergyPlus in Parallel says "To be time efficient, the number of parallel EnergyPlus runs should not be more than the number of CPUs on a computer", but I'm not sure how it applies to multi-core CPUs, nor can I find other clear answers elsewhere. Any suggestions are more than welcome.

EDIT: Following the advice from @Jason Glazer and @Julien Marrec, I did a simple test with two models, 24 runs each on another laptop with a 4-core CPU, 8 logical processors, and 32GB RAM. No other tasks were performed on the laptop during the test. Customized pre- and post-process are performed separately from parallel runs in my workflow so they won't consume CPU during the runs.

Here's the results (number of processes - total time/time per EnergyPlus run):

Model 1:

3 - 163.84s/20.48s
4 - 142.31s/23.71s
6 - 133.41s/33.35s
8 - 123.81s/41.27s

Model 2:

3 - 190.89s/23.86s
4 - 162.97s/27.16s
6 - 144.02s/36.00s
8 - 136.09s/45.36s

So it looks like it's OK to use maximum number of threads, but the gain is very small.

Accoding to this answer on StackOverflow, it sounds like using the number of cores (optionally minus 1) is a safer bet for my workflow, especially when the number of runs reach thousands. Not sure if it is still the case for pure EnergyPlus workflow, i.e. using EP-Launch's group simulation functionality.

Thank you again for your advice.

108

liq2519

asked 2020-01-17 12:25:22 -0500, updated 2020-01-21 15:15:00 -0500

edit flag offensive 0 remove flag close merge delete

Comments

add a comment see more comments

2 Answers

Sort by

oldest

newest

most voted

My usual start point for that type of stuff is nproc - 2, so n# of threads minus 2, in your case that'd be 14.

In python, that's multiprocessing.cpu_count() - 2 generally speaking, but do make sure that's the case as it returns the number of online processors and sometimes could return less than actually available depending on your hardware and power management options.

I reserve usually 2 threads for the managing part (and not freezing the system while I do small stuff on the side like browsing): after all, if python is going to be orchestrating your runs and potentially doing some post-processing things, then it must have available resources to do that.

If your python code is going to do heavy things in pre and/or post processing, then I would reserve more resources for non-energyplus runs.

As usual, your mileage will greatly vary, so just try it out for yourself on a subset of your analysis if it's large. Benchmark, and advise.

29.7k

Julien Marrec

answered 2020-01-21 08:39:25 -0500, updated 2020-01-22 09:45:56 -0500

edit flag offensive 0 remove flag delete link

Comments

add a comment see more comments

I don't feel like I'm an expert on this topic although I do have some experience. When I was performing a large parametric study a few years ago, I found that allocating the work as one simulation job per core was the fastest way to get the study simulations done. For your computer that would be 8 parallel runs.

One caveat is that this makes sense for a computer not being used for any other function. If you are still trying to use your computer while the simulations are being performed, you need to reserve some cores for that function. It is also important to shut down as many background processes as possible.

Overall, I would recommend some experimentation for your specific computer.