Introduction

A scalable Machine Learning (ML) workflow involves several steps and complex computations. These steps include data preparation and pre-processing, evaluating, and training models, installing these models, and more. While prototyping an ML model can be seen as a simple and easy-going task, it ultimately becomes tough to keep track of each and every process.

To streamline the development of ML models, Google has launched the beta version of Cloud AI Platform Pipelines, which can help install robust, repeatable ML pipelines coupled with monitoring, version tracking, auditing, and reproducibility. It guarantees to offer an enterprise-ready, easy to deploy, and safe execution environment for the ML workflows.

Cloud AI Platform

The AI platform in Google Cloud is a code-based data science development environment, which enables ML developers, data engineers, and data scientists to install ML models in a fast and cost-effective manner.

The core tech stack of AI platform pipelines that support two SDKs to author ML pipelines are the TensorFlow Extended (TFX) Pipeline SDK and the Kubeflow Pipelines (KFP) SDK. The KFP SDK is a lower-level SDK, which allows direct Kubernetes resource control and simple distribution of compartmentalized components. While the TFX SDK offers a higher-level abstraction with rigid components that can be customized. Thus, with the AI Platform Pipelines, one can provide a pipeline using the KFP SDK, or by customising the TFX Pipeline template with the TFX SDK.

There are two main benefits of using the AI platform pipelines:

  1. Easy Management and Installation: The AI Platform Pipelines can be accessed easily via the AI Platform panel in the Cloud.
  2. Easy Authenticated Access: AI Platform Pipelines offer safe and authentic access to the Pipelines UI through the Cloud AI Platform UI without the requirement to set up port-forwarding.

AI Platform Pipelines Beta

AI Platform Pipelines comprise enterprise features for running ML workloads, including pipeline versioning, automatic metadata following of executions and artefacts, visualization tools, cloud logging, and more. It offers seamless integration with Google Cloud managed services, such as BigQuery, AI Platform Training, Dataflow and Serving, Cloud Functions, and others.

The AI platform comprises two major parts, which are:

  • The enterprise-ready infrastructure for installing and running structured ML workflows that are integrated with GCP services.
  • The pipeline tools for debugging, building, and sharing components and pipelines.

The beta launch of AI Platform Pipelines comprises various new features, that include support for template-based pipeline building, automatic artefact, versioning, and lineage tracking.

Following are the features for this pipeline:

  • Build ML Pipeline with TFX Templates: To make it easier for the developers to set up an ML pipeline code, the TFX SDK offers templates, along with step-by-step guidance on creating a production ML pipeline for the data. With this feature, one can add a number of components to the pipeline and also reiterate them.
  • Pipelines Versioning: This feature allows a developer to handle semantically related workflows together by uploading different versions of the same pipeline and then assemble them in the UI.
  • Artefact and Lineage Tracking: The AI Platform Pipelines enables automatic lineage and artefact tracking powered by ML metadata that enables tracking of artefacts for an ML pipeline. The lineage tracking, helps to display the history and versions of the ML models, data, and so on.

Conclusion

Google states that in the future, multi-user isolation will also be a part of the AI Platform Pipelines. Other upcoming features comprise workload identity to enable transparent access to Google Cloud Services; a UI-based setup of off-cluster storage of backend information, and more.