Kubeflow Mnist範例介紹

官網的Git：kubeflow mnist 利用簡單的mnist來做為範例，介紹如何在Kubeflow進行training跟serving的流程。Kubeflow利用argo workflow將整個流程寫成腳本，透過指令驅動整個AI training與serving過程。training階段會將訓練model的程式碼包成docker image，建立TFjob將model訓練出來。serving階段，會將訓練好的model利用seldon-core的幫助，建立起prediction服務。 Training 的流程將training的程式從git上載下來，打包成training image 將打包好的image推上去docker registry 利用TFjob執行image產生model 將model給volume出來 Serving 的流程將serving的程式從git上載下來，打包成serving image 將打包好的image推上去docker registry 將剛train好的model給volume到seldon-core的image上透過定義seldon graph啟動predict的機制測試可以利用seldon-core出的測試工具來檢視predict serving是否正確執行： $ seldon-core-api-tester --ambassador-path /seldon/kubeflow/mnist-classifier -p contract.json <host ip> <host port> Debug 錯誤訊息： error when creating "/tmp/manifest.yaml": tfjobs.kubeflow.org is forbidden: User "system:serviceaccount:kubeflow:default" cannot create tfjobs.kubeflow.org in the namespace "kubeflow" 將kubeflow namespace的帳號default 加上cluster-admin的權限 kubectl create clusterrolebinding sa-admin --clusterrole=cluster-admin --serviceaccount=kubeflow:default 錯誤訊息： Failed to submit workflow: workflows.argoproj.io is forbidden: User "system:serviceaccount:kubeflow:jupyter-notebook" cannot create workflows.