F.A.Q

What if I want to run on a cluster that happens to exist but can’t be installed with the current installers ?

Runaway uses profiles (yaml files stored in your ~/.orchestra folder) to connect to remote hosts and execute the jobs. Since most resources are either shared without scheduled access, or managed with slurm, you should be able to make it work with the current installers. Though, if you have a particular need that can’t be met, it should not be too much of a hassle to write your own profile. Just read through the execution model page of the documentation, and start building your own profile.

Why using Runaway instead of launching experiments by hand ?

Depending on your needs, Runaway is expected to help you in a few different ways:

Seamless scaling: If you are a newcomer and just want to rapidly scale your experiments without thinking about it then using Runaway is one good way to do it. You have almost nothing to set up, and you can get up and running in minutes.
Get the most from restrictive platforms: Some platforms do not allow you to queue an infinite amount of jobs (actually only a small amount). In this case, Runaway substitutes to the scheduler and allows you to use those platforms as simply as any other one, and with good performances.
Instantaneous platform switch: By changing only one string in your command, you can get going on a new cluster (assuming you already have the profiles). Most of the burden involved in switching to a new cluster disappears.
Get rid of boring cluster tasks: The process of running things on clusters is uninteresting and can be pretty time consuming. Synchronizing your code, submitting jobs, checking for failures, getting data back, … You have a lot of more interesting things to do on your research work. Our goal is that Runaway can help you think less about the troubles of running stuffs, and more about your actual research.
Run more experiments: Depending on the platform, the scheduling policies can vary. The current template design allows to adapt pretty easily to the different situations with one common point: Once you got a node, Runaway will take care about feeding it with as much work as possible.

Why using Runaway instead of using your own tools ?

Now why using it if you already have your own super-cool tools ?

The project of developing tools to automate experiments is a old hat in multiple research teams. Every now and then, Phd and PostDocs take on their own to develop such tools. It can be a funny engineering project, but depending on your knowledge, it can take time and it adds no value to your research. Plus, if you are a researcher, you’ll never have time and motivation to make it a tool others can use. All in all, the fact that people are still developing their own scripts and tools is demonstrative enough of the need.

In some cases, doing things on your own may still be your best option:

Gaining cluster expertise: If you want to gain knowledge about clusters, to help you improve your use of a particular platform, then practicing with the scheduler is the best way. You’ll be able to run leaner jobs, and get your jobs to start faster.
Running jobs on detailed allocations: If you have very precise hardware needs, then going directly with the scheduler can be simpler (though you can can go pretty far with current profiles)
Running complicated pipelines of jobs: Some people want to run DAGs of jobs, for example to run analysis once a batch of experiments were made. Runaway will never allow you to do that. It assumes the simplest experimental schemes, i.e. the repetitive execution of a single script under variations of parameters.

Is your profile system future-proof ?

In the design process of this program, we’ve been discussing wit administrators from the plafrim and the curta platforms. Those discussions influenced this design, which currently appears to be flexible enough to fit most platforms. Future will tell us if this was a good design decision, but from the experience of those people, this should work on most platforms we could use.

What if I want to start some experiments and go in holidays ?

Since Runaway is scheduling jobs over the scheduler, it has to be running somewhere. The standard way to use it is to run it from your laptop and keep the command going. Problem is, if the program looses connection, or is paused (if your laptop hibernates), it won’t be able to keep scheduling!

Hopefully, there is a simple solution to that: you can install Runaway directly on a cluster, and derive a working profile by just changing the ssh target to localhost. This way, you just have to log on the cluster start a runaway command in a tmux, and move out. When you are back, you’ll just have to check if the command is done. An other solution is to do the same on an other computer of the lab that stays up while you are gone. This way, you can have most of Runaway benefits, while having the ability to let the jobs run while you are away.

Also note that if you just disconnect for a few hours, runaway will be able to resume your work (basically as long as a ssh session is kept on the server side, which is usually a matter of hours).

What about performances ?

Thanks to the asynchronous development used for its core, Runaway is able to scale to larger experimental campaigns pretty well. Still, since this program is still in its infancy, as with any other programs, you may experience a few bugs. Don’t hesitate to file an issue !