TPUs on AI Platform¶
This documentation is currently quite sparse; expect a tutorial soon.
Unlike on Cloud, TPUs on AI Platform only support (as of Dec 2019) Tensorflow versions 1.13 and 1.14. No Jax, no Pytorch.
Caliban has Tensorflow version 2.1 hardcoded internally. Once the range of possible values expands we’ll make this customizable.
See AI Platform’s runtime version list for more detail.
If you supply the
--tpu_spec NUM_TPUSxTPU_TYPE argument to your
cloud job, AI Platform will configure a worker node with that number of TPUs
and attach it to the master node where your code runs.
--tpu_spec is compatible with
--gpu_spec; the latter configures the master
node where your code lives, while the former sets up a separate worker instance.
CPU mode by Default¶
Normally, all jobs default to GPU mode unless you supply
This default flips when you supply a
--tpu_spec and no explicit
In that case,
caliban cloud will NOT attach a default GPU to your master
instance. You have to ask for it explicitly.
A CPU mode default also means that by default Caliban will try to install the
'cpu' extra dependency set in your
setup.py, as described in the
Declaring Requirements guide.
Next you’ll need to get the repository of TPU examples on your machine.
mkdir tpu-demos && cd tpu-demos curl https://codeload.github.com/tensorflow/tpu/tar.gz/r1.14 -o r1.14.tar.gz tar -xzvf r1.14.tar.gz && rm r1.14.tar.gz
Check out the AI Platform TPU tutorial for the next steps, and check back for more detail about how to use that tutorial with Caliban.