In my tenure at TELUS, I was assigned a prominent project requiring substantial technical expertise: the development of a telemetry analytics platform that could analyze data in real-time from over 100,000 set-top boxes (STBs) deployed throughout Canada. The objective was not just about scale; it aimed to assist teams to make quicker operational decisions and enhance the experience for millions of customers. Initially, I recognized the outdated data infrastructure as a bottleneck, obstructing the data from reaching the teams who required it the most. This article portrays the methodologies we employed to modernize our infrastructure using Google Cloud Platform (GCP), Apache Airflow, and Infrastructure-as-Code tools to surmount the obstacles and deliver a future-proof solution.
The Predicament: Ancient Bottlenecks and Unseen Black Spots
Prior to this revamp, we predominantly relied on segregated and batch-oriented data pipelines incapable of supporting real-time diagnostics. Key concerns encompassed: