
Boosting Data Processing for a "Big Three" Credit Rating Company
Near real-time vast dataset synchronization that ensured quick access to the most current information.
Client
A global credit research and ratings leader that provides in-depth analysis and data to investors and companies worldwide.
Client Need
The client’s system manages vast economic datasets with over 60 million records in MySQL and millions more added daily. To keep searches fast and efficient, MongoDB worked alongside MySQL, demanding continuous synchronization. However, the existing daily sync job was slow, error-prone, and resource-intensive, leaving users with outdated data.
The client needed back-end web development services to provide near real-time synchronization between the databases, ensuring fast and reliable access to up-to-date information.
Solution
Recognizing the complexity of the challenge, our team was entrusted with significant autonomy to design and implement a solution. This allowed our specialists to start from proof-of-concept development and tool evaluations to create a system tailored to the client’s unique requirements.
We proposed a new approach, substituting the existing data synchronization mechanism with a real-time Change Data Capture (CDC) process. To monitor MySQL tables for changes, we employed Amazon Data Migration Service (DMS). Now, every time there is an insertion, update, or deletion of data in MySQL, an event is created and published via Kafka Message Queue.
To manage these events, we built a custom Back-end Consumer Service that continuously monitors Kafka topics, processes the updates, and immediately updates the MongoDB structure.
To make it even snappier, we introduced Redis as a caching layer, which makes data fetching lightning-quick and enhances the user experience on the web portal. And to ensure that things continue to run smoothly, we built strong error-handling mechanisms that detect and recover from failures automatically, reducing downtime and ensuring data consistency.
This transformation not only optimized the synchronization process but also reduced resource consumption. By transitioning to real-time updates and eliminating the storage of extensive changelogs on Amazon S3, the system became leaner, faster, and more cost-effective.
Technologies
Spring Boot, Kafka Streams, Spring Data, Kubernetes, Redis, AWS DMS
