Using a Single GPU¶
By default, docker run
will make all GPUs on your workstation available
inside of the container. This means that in caliban shell
, caliban
notebook
or caliban run
, any jobs executed on your workstation will
attempt to use:
your huge GPU, custom-built and installed for ML Supremacy
the dinky GPU that exists solely to power your monitor, NOT to help train models
The second GPU will slow down everything.
To stop this from happening you need to set the CUDA_VISIBLE_DEVICES
environment variable equal to 0
, as described on this
nvidia blog
about the issue.
You can set the environment variable inside your container by passing
--docker_run_args
to caliban, like this:
caliban run --docker_run_args "--env CUDA_VISIBLE_DEVICES=0" trainer.train
Note
you may have noticed that this problem doesn’t happen when you run a
job inside caliban shell
. Your local environment may have
CUDA_VISIBLE_DEVICES
set. caliban shell
and caliban notebook
mount your home directory by default, which loads all of your local
environment variables into the container and, if you’ve set this environment
variable, modifies this setting inside your container. This doesn’t happen
with caliban run
or caliban cloud
. You will always need to use this
trick with those modes.
There are two other ways to solve this problem using the
custom ``docker run` arguments detailed here <https://docs.docker.com/engine/reference/commandline/run/>`_.
You can directly limit the GPUs that mount into the container using the --gpus
argument:
caliban run --docker_run_args "--gpus device=0" trainer.train
If you run nvidia-smi
in the container after passing this argument you won’t
see more than 1 GPU. This is useful if you know that some library you’re using
doesn’t respect the CUDA_VISIBLE_DEVICES
environment variable for any reason.
You could also pass this and other environment variables using an env file.
Given some file, say, myvars.env
, whose contents look like this:
CUDA_VISIBLE_DEVICES=0
IS_THIS_A_VARIABLE=yes
The --env-file
argument will load all of the referenced variables into the
docker environment:
caliban run --docker_run_args "--env-file myvars.env" trainer.train
Check out Custom Docker Run Arguments for more information.