Windows-only air-gapped network grid computing solution?

Message boards : Questions and problems : Windows-only air-gapped network grid computing solution?
Message board moderation

To post messages, you must log in.

AuthorMessage
rowet

Send message
Joined: 14 Jun 17
Posts: 1
United States
Message 79021 - Posted: 14 Jun 2017, 15:58:48 UTC
Last modified: 14 Jun 2017, 16:03:08 UTC

I'm trying to evaluate BOINC as a potential solution for my problem just by perusing the docs with no prior exposure, and I'm not doing so great figuring it out. Hopefully it isn't rude to lay out the situation and ask for advice.

I have several Windows only networks (no chance of a Linux server in the mix, no way to run a virtual machine, no cygwin). These networks are air-gapped.

I have a monte-carlo simulation executable (C++) that takes a seed and an input file ranging in size from kilobytes to hundreds of megabytes. The executable also has about 20 megabytes of DLL dependencies and some other static files. The simulation run times range from seconds to days. The simulation spits out a results file.

So I need a grid computing solution that runs entirely on Windows. It needs to take the specification of the simulation executable, the simulation input file, and a seed list, perform the replications, and collect the output files somewhere. There are about five perfectly adequate and mature solutions for problem this in Linux-land, but this whole thing needs to be Windows only.

An additional problem is I need low latency. For the cases when single runs of the simulation only take a few seconds, dozens of seconds of wait time when distributing the work and starting it up is not acceptable. After hitting run, all the cores should be hot inside ten seconds. Executables and DLLs and input files should be cached on the job nodes to avoid re-copying.

HTCondor in theory works on Windows but it failed me on the latency constraint and I had other problems with it. There was no reliable way to make it snappy for short jobs on Windows. In my experiments I also hit tons of odd behavior and bugs to do with Windows credentials. It would even randomly lock out user accounts.

Is there any chance that BOINC makes more sense for my case than writing something from scratch or paying for Xoreax Grid Engine, or maybe some commercial Hadoop distribution? My assesment is that BOINC is not suitable for my case, but I figured it wouldn't hurt to ask.
ID: 79021 · Report as offensive
Juha
Volunteer developer
Volunteer tester
Help desk expert

Send message
Joined: 20 Nov 12
Posts: 801
Finland
Message 79122 - Posted: 19 Jun 2017, 17:54:00 UTC - in response to Message 79021.  

Well, this forum is titled Questions and problems so it's quite ok to have questions here :)

It needs to take the specification of the simulation executable, the simulation input file, and a seed list, perform the replications, and collect the output files somewhere.


This is pretty much what BOINC was designed to do. You'd need to write a program or script that takes the seed list and generates tasks based on that though. It should be easy enough to do it.

An additional problem is I need low latency.


BOINC client can be told to contact the project server every, say, ten seconds. Depending on the number of compute nodes you may need to have a beefy server for this.

Executables and DLLs and input files should be cached on the job nodes to avoid re-copying.


Application files are cached by default and input files can be marked to be cached as well.

The biggest obstacle could be the Windows-only requirement. While the BOINC client runs on Windows just fine I don't think anyone has ever tried to run the server daemons on Windows. I can't think of anything in them that would be inherently Unixy only but since they have been developed to run on Linux I would expect some changes would have to be made to even compile them on Windows. If you do try to compile BOINC server parts on Windows I would recommend using MSYS(2)+MinGW for a bit better compatibility.
ID: 79122 · Report as offensive

Message boards : Questions and problems : Windows-only air-gapped network grid computing solution?

Copyright © 2024 University of California.
Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.