Kubeflow 0.2 Offers New Components and Simplified Setup

Aug 06, 2018

Since Last We Met

It has been 6 months since Google announced Kubeflow at KubeCon Austin, and just 3 months since the Kubeflow community delivered 0.1 at KubeCon EU. We have been both humbled and delighted by the amazing progress since then. The community has grown even faster than we could have imagined, with nearly 100 contributors in 20 different organizations and more than 4,000 GitHub stars. But what really shocked us was how many supporting projects are collaborating with the Kubeflow community to extend and expand what Kubeflow can do. Just a small summary include:

To everyone who has contributed so far, we would like to offer a huge thank you! We are getting ever closer to realizing our vision: letting data scientists and software engineers focus on the things they do well by giving them an easy-to-use, portable and scalable ML stack.

Introducing Kubeflow 0.2

We’re proud to announce the availability of Kubeflow 0.2. The new release offers significant performance upgrades, a simplified getting started experience, and alpha support for advanced features. Let’s walk through some of the highlights:

Improved Getting Started Experience

We know that many data scientists and ML engineers are new to Kubernetes and Kubernetes concepts. We’ve done a number of investments to minimize the amount of work they have to do to get started on the platform. This includes:

  • A new deployment script which makes getting started on an existing cluster a single command. To get a default deployment running on Kubernetes anywhere just execute:

    export KUBEFLOW_VERSION=0.2.2
    curl https://raw.githubusercontent.com/kubeflow/kubeflow/v${KUBEFLOW_VERSION}/scripts/deploy.sh | bash
    
  • We also offer cloud-specific versions of these deployment scripts so that you can auto create a cluster if you don’t have one available. For example:

    export KUBEFLOW_VERSION=0.2.2
    curl https://raw.githubusercontent.com/kubeflow/kubeflow/${KUBEFLOW_VERSION}/scripts/gke/deploy.sh | bash
    
  • A central UI that gives you visibility into all the components running in your cluster - Central UI - To make it easier to navigate among components

    Central UI Dashboard

  • A declarative deployment process that first creates Cloud resources including your Kubernetes cluster and then deploys Kubeflow. We have an example for GCP using Deployment Manager, and plan to add support for other clouds using tools like Terraform.

New and Improved Components Available

The essence of Kubeflow is all about extending the project with new components, and making the existing components more feature rich. Some examples of the improvements we made in 0.2 include:

  • Adding TFX components TFMA & TFT
  • Several TF Job improvements
    • Event driven implementation making running even faster.
    • Preservation of logs after job finishes.
    • A master/chief is no longer required; expanding the number of TF programs that just run
    • Simplified ksonnet prototypes for TFJob; make it easier to do advanced customization
  • Alpha support for advanced tooling including:
    • Katib for Hyperparameter search
    • PyTorch operator
    • MPI operator
    • Horvod / MPI integration

Leveraging Kubernetes for deeper platform integrations

Part of our goals with Kubeflow is to enable platform extensions so that users can customize their deployments based on their needs. These include:

  • Simplified networking integration including auto provisioning of endpoints and identity aware proxysetup
  • Using Kubernetes Persistent Volumes to simplify persistent storage (both locally and in the cloud) - PVC for Jupyter notebooks
  • Kubernetes native monitoring which works with both Prometheus and cloud-based monitoring (such as Stackdriver)

And many more. For a more comprehensive list, please see the issues closed in 0.2.0 on github.

Learning More

If you’d like to try out Kubeflow, we have a number of options for you:

What’s Next

Our next major release will be 0.3 coming this fall. In it, we expect to land the following new features:

  • Hyperparameter tuning jobs can be submitted without writing any code
  • Job operators - Consistent APIs across supported frameworks
  • Getting Started - Click to Deploy

Come Help!

As always, we’re listening! Please tell us the feature (or features) you’d really like to see that aren’t there yet. Some options for making your voice heard include:

Thank you for all your support so far!

Jeremy Lewi & David Aronchick Google