introduction
This study details a project related to designing a data stream processing pipeline for one of our Clients in moving and storage industry. The objective was to extract, transform, and load data in real-time into data warehouses, replacing the inefficient, batch-based data jobs that relied heavily on outdated legacy systems. The legacy systems had been in place for 25 years and were deemed inefficient due to their limitations in terms of speed, scalability, and dependency. By implementing a modern data stream processing pipeline, the aim was to leverage the benefits of robustness, fault tolerance, and distributed processing.