Installation

Let us start with the install of Katib.

cd $HOME
git clone https://github.com/tfworldkatib/tutorial.git
cd $HOME/tutorial/setup/katib-install
./deploy.sh
Sample Output ``` + kubectl apply -f v1alpha3 namespace/kubeflow created + kubectl apply -f v1alpha3/katib-controller customresourcedefinition.apiextensions.k8s.io/experiments.kubeflow.org created customresourcedefinition.apiextensions.k8s.io/suggestions.kubeflow.org created customresourcedefinition.apiextensions.k8s.io/trials.kubeflow.org created configmap/katib-config created deployment.apps/katib-controller created clusterrole.rbac.authorization.k8s.io/katib-controller created serviceaccount/katib-controller created clusterrolebinding.rbac.authorization.k8s.io/katib-controller created secret/katib-controller created service/katib-controller created configmap/trial-template created + kubectl apply -f v1alpha3/manager deployment.extensions/katib-manager created service/katib-manager created + kubectl apply -f v1alpha3/pv persistentvolume/katib-mysql created persistentvolumeclaim/katib-mysql created + kubectl apply -f v1alpha3/db deployment.extensions/katib-db created secret/katib-db-secrets created service/katib-db created + kubectl apply -f v1alpha3/ui deployment.extensions/katib-ui created clusterrole.rbac.authorization.k8s.io/katib-ui created serviceaccount/katib-ui created clusterrolebinding.rbac.authorization.k8s.io/katib-ui created service/katib-ui created + kubectl apply -f tf-job customresourcedefinition.apiextensions.k8s.io/tfjobs.kubeflow.org created serviceaccount/tf-job-dashboard created serviceaccount/tf-job-operator created clusterrole.rbac.authorization.k8s.io/kubeflow-tfjobs-admin created clusterrole.rbac.authorization.k8s.io/kubeflow-tfjobs-edit created clusterrole.rbac.authorization.k8s.io/kubeflow-tfjobs-view created clusterrole.rbac.authorization.k8s.io/tf-job-operator created clusterrolebinding.rbac.authorization.k8s.io/tf-job-operator created service/tf-job-operator created deployment.apps/tf-job-operator created + kubectl apply -f pytorch customresourcedefinition.apiextensions.k8s.io/pytorchjobs.kubeflow.org created serviceaccount/pytorch-operator created clusterrole.rbac.authorization.k8s.io/kubeflow-pytorchjobs-admin created clusterrole.rbac.authorization.k8s.io/kubeflow-pytorchjobs-edit created clusterrole.rbac.authorization.k8s.io/kubeflow-pytorchjobs-view created clusterrole.rbac.authorization.k8s.io/pytorch-operator created clusterrolebinding.rbac.authorization.k8s.io/pytorch-operator created service/pytorch-operator created deployment.apps/pytorch-operator created ```

Check that the Katib core components are installed and ready.

 kubectl -n kubeflow get pods
Sample Output
NAME                                READY   STATUS    RESTARTS   AGE
katib-controller-7665868558-nfghw   1/1     Running   1          80s
katib-db-594756f779-dxttq           1/1     Running   0          81s
katib-manager-769b7bcbfb-7vvgx      1/1     Running   0          81s
katib-ui-854969c97-tl4wg            1/1     Running   0          79s
pytorch-operator-794899d49b-ww59g   1/1     Running   0          79s
tf-job-operator-7b589f5f5f-fpr2p    1/1     Running   0          80s

katib-controller, katib-manager, katib-db and katib-ui are the core components of Katib. We have also installed a tf-job-operator and pytorch-operator to be able to run TensorFlow Jobs and PyTorch Jobs.

You can access Katib UI here. If you are running on a non-Vagrant Kubernetes Cluster, you may need to use the Node IP for your VM or change the katib-ui service to use a LoadBalancer.