Chief Software Architect | Einride
Scalable forecasting in Google Cloud
A practical and data engineering-oriented talk on how to stand up a scalable and fully automated pipeline for time series forecasting using Facebook’s Prophet library and the standard big data and machine learning tools in Google Cloud.
This will be done in the context of Einride’s data platform, which is all about creating actionable insights that drive customers toward sustainable transport. As a real-world case-study, I will show how we break down and understand transport demand in multiple dimensions – from a customer’s total demand down to thousands of individual sites and shipping lanes.
I will walk us through the full pipeline; extracting a production database dump, multiple tiers of data cleaning and transformation, how to use PySpark and Dataproc to parallelize model training and forecast generation, and how to orchestrate it all with Apache Airflow.
The key takeaway (and what’s really exciting) is how easy to use and available big data and machine learning tools have become – to tech giants and fledgling startups alike!
In 2018, Oscar left a cushy backend and data engineering job at Spotify to seek the thrill of a New Game+ experience from Gothenburg’s startup scene. Cue Einride, a crack team of technologists out to disrupt an outdated industry with sustainable transport solutions. Oscar has lately been working on building up Einride’s data platform capabilities and will be sharing practical advice on building scalable data pipelines from scratch in Google Cloud.