Arena is a command-line interface for the data scientists to run and monitor the machine learning training jobs and check their results in an easy way. Currently it supports solo/distributed TensorFlow training. In the backend, it is based on Kubernetes, helm and Kubeflow. But the data scientists can have very little knowledge about kubernetes.
Meanwhile, the end users require GPU resource and node management. Arena also provides
top command to check avaliable GPU resources in the Kubernetes cluster.
In one word, Arena's goal is to make the data scientists feel like to work on a single machine but with the Power of GPU clusters indeed.
For the Chinese version, please refer to 中文文档
You can follow up the Installation guide
Arena is a command-line interface to run and monitor the machine learning training jobs and check their results in an easy way. Currently it supports solo/distributed training.
- 1. Run a training Job with source code from git
- 2. Run a training Job with tensorboard
- 3. Run a distributed training Job
- 4. Run a distributed training Job with external data
- 5. Run a distributed training Job based on MPI
- 6. Run a distributed TensorFlow training job with gang scheduler
- 7. Run TensorFlow Serving
- 8. Run TensorFlow Estimator
- 9. Monitor GPUs of the training job
- Go >= 1.8
mkdir -p $GOPATH/src/github.com/kubeflow cd $GOPATH/src/github.com/kubeflow git clone https://github.com/kubeflow/arena.git cd arena make
arena binary is located in directory
arena/bin. You may want to add the directory to
Please refer to arena.md