FMCG

Migrating data pipelines and database structures from Cloudera to GCP services for a global leader in Consumer-Packaged Goods industry

Client

Global FMCG / CPG Company

Date

Services

Data Migration

Technologies

Google Cloud Products: DataProc, BigQuery, Google Kubernetes Engine

Challenge

Multiple data sources contained various semi-structured data types and suffered from data quality problems. The goal was to enhance the cost efficiency of campaigns in linear TV planning and purchasing processes by constructing pipelines utilizing Kubeflow services. This approach aimed to streamline the overall system performance, enhance the reliability of data transformation, and optimize Python-based advertising procedures.

Our approach

The current data pipelines have been migrated to DataProc, GCS, and Composer. To enhance scalability, we have containerized the Python ad optimization code, enabling us to execute expandable tasks on Kubeflow hosted in GKE. By utilizing Kubeflow pipelines and node pools, we can efficiently manage job resources, taking into account the diverse hardware resource needs across different scenarios. This approach allows us to optimize resource utilization and ensure a better fit for the specific workloads required.

The outcome

Cloudera data pipelines were successfully migrated to the GCP platform. The new data pipelines have been enhanced to ensure cost-effectiveness and ease of maintenance. Fast response times is guaranteed by utilizing the BigQuery cache. By leveraging GKE, Kubeflow, and Docker images, jobs can be executed on various code versions and hardware resources. The process of initiating optimization jobs has been streamlined through the utilization of Cloud Functions.