caliban stop¶
This command allows you to stop running jobs submitted using caliban.
For example, suppose you submit a group of experiments to GKE using an experiment config file like the following:
$ caliban cluster job submit --xgroup my-xgroup ... --experiment_config exp.json cpu.py --
After a bit, you realize that you made a coding error, so you’d like to stop
these jobs so that you can fix your error without wasting cloud resources (and
money). The caliban stop
command makes this relatively simple:
$ caliban stop --xgroup my-xgroup
the following jobs would be stopped:
cpu.py --foo 3 --sleep -1
job 61 RUNNING GKE 2020-05-28 11:55:04 container: gcr.io/totoro-project/0f6d8a3ddbee:latest name: job-stop-test-57pr9
cpu.py --foo 3 --sleep 2
job 62 RUNNING GKE 2020-05-28 11:55:04 container: gcr.io/totoro-project/0f6d8a3ddbee:latest name: job-stop-test-s67jt
cpu.py --foo 3 --sleep 600
job 63 RUNNING GKE 2020-05-28 11:55:04 container: gcr.io/totoro-project/0f6d8a3ddbee:latest name: job-stop-test-gg9zm
do you wish to stop these 3 jobs? [yN]: y
stopping job: 61 RUNNING GKE 2020-05-28 11:55:04 container: gcr.io/totoro-project/0f6d8a3ddbee:latest name: job-stop-test-57pr9
stopping job: 62 RUNNING GKE 2020-05-28 11:55:04 container: gcr.io/totoro-project/0f6d8a3ddbee:latest name: job-stop-test-s67jt
stopping job: 63 RUNNING GKE 2020-05-28 11:55:04 container: gcr.io/totoro-project/0f6d8a3ddbee:latest name: job-stop-test-gg9zm
requested job cancellation, please be patient as it may take a short while for this status change to be reflected in the gcp dashboard or from the `caliban status` command.
This command will stop all jobs that are in a RUNNING
or SUBMITTED
state,
and checks with you to make sure this is what you really intend, as
accidentally stopping a job that has been running for days is a particularly
painful experience if your checkpointing is less than perfect. Similar to other
caliban commands, you can use the --dry_run
flag to just print what jobs would
be stopped.
This command supports the following arguments:
$ caliban stop --help
usage: caliban stop [-h] [--helpfull] [--xgroup XGROUP] [--dry_run]
optional arguments:
-h, --help show this help message and exit
--helpfull show full help message and exit
--xgroup XGROUP experiment group
--dry_run Don't actually submit; log everything that's going to
happen.