Odds Importer — Building a Real-Time Odds and Data Ingestion Platform

Odds Importer — Building a Real-Time Odds and Data Ingestion Platform
Technology Stack
| Component | Technology | Purpose |
|---|---|---|
| Backend Framework | Ruby on Rails | Data processing and API development |
| Job Processing | Resque | Background job queue and worker management |
| Message Queue | NServiceBus → SQS → Kafka | Async message processing and streaming |
| Search Engine | Elasticsearch | Data resolution and fuzzy matching |
| Database | PostgreSQL | Primary data storage and querying |
| Cache Layer | Redis | High-speed data caching and sessions |
| Cloud Platform | AWS (EC2, RDS, SQS, S3) | Infrastructure and managed services |
| Container Platform | Docker + ECS | Application containerization and orchestration |
| Load Balancing | HAProxy + ELB/ALB | Traffic distribution and high availability |
| Service Discovery | Consul | Microservice registration and discovery |
| Monitoring | Grafana + InfluxDB | Real-time dashboards and metrics visualization |
| Metrics Collection | Telegraf | System and application metrics gathering |
| Data Processing | Custom ETL Pipeline | Real-time data transformation |
| API Framework | RESTful Rails APIs | External data integrations |
| Configuration | JSON Map Files | Dynamic feed parsing without code changes |
| Legacy Systems | C# Scripts | Initial hard-coded importers (replaced) |
Table of Contents
- From Hard-Coded Scripts to Configurable Transforms
- The NServiceBus Era: Lessons in Pain
- Rebuilding in Rails and AWS
- Public API Evolution and Decoupling
- System Performance and Scale
- Legacy and Impact
- Technical Innovations
When I first joined the company, the odds importer was a mess of hard-coded C# scripts. Each sportsbook had its own parsing code, and every time a feed changed, someone had to dive into SQL to fix broken mappings like "NE" → "New England Patriots." It was fragile, manual, and slow. Adding a new book could take weeks.
My goal was simple but ambitious: make data ingestion modular, configurable, and maintainable — without developer intervention.
💡 Vision: Transform weeks of custom development into hours of configuration by creating a universal data normalization system.
From Hard-Coded Scripts to Configurable Transforms
To solve the chaos, I started by studying every feed format the company had. I literally printed them out — XML, JSON, even CSVs — spread them across my desk, and started highlighting patterns. That's when I realized that even though every sportsbook used a different format, they were all describing the same types of entities: leagues, teams, events, odds, statistics.
The Map File Architecture
Instead of writing a separate parser for each feed, I built a system where each entity type had its own "map file." Each map file defined the structure of that entity — for example, what an event looked like: fields like start time, participants, league, venue, etc.
For each specific feed or provider, we created a configuration file that told the system how to map that provider's fields to our standardized schema. The two files — the map file (entity definition) and the configuration file (feed definition) — worked together to generate an XML transformation that normalized the feed into our internal format.
All data, regardless of input type (XML, JSON, or CSV), went through a preprocessing step that converted it into XML, then ran through the generated transformation. The output was a consistent, canonical data structure that could flow directly into our database.
I also built an admin interface so that unmapped or ambiguous items — like a new team alias — could be resolved by non-developers through a UI. That one system took something that used to take weeks of developer effort and reduced it to hours of configuration.
The NServiceBus Era: Lessons in Pain
My next step was to make the system distributed. Back then, Docker and serverless didn't exist, so I implemented NServiceBus — a message queue–based service system running as Windows services. Each service handled a stage in the ingestion process and passed messages via queues.
It worked — until it didn't. Under heavy load, queues would pile up. I'd wake up at 4 a.m. to fix stalled jobs, not even sure which service was hung. We had zero observability — no metrics, no dashboards, nothing.
That period taught me one of the most important lessons in engineering:
"If you can't see it, you can't scale it."
Rebuilding in Rails and AWS
Around 2013, I rebuilt everything from scratch — this time in Ruby on Rails running in AWS. We replaced the brittle Windows services with a cloud-based, horizontally scalable system, and introduced InfluxDB and Grafana for full observability.
Intelligent Data Resolution
For the data normalization challenge, I implemented an Elasticsearch-based resolution system. It used a directed graph of resolution steps — first checking the league, then the teams, then event time — with ordered edges and conditional logic.
The system took everything it knew (league, date, potential home/away aliases) and ran searches to find the most likely match. It even handled edge cases like MLB double-headers by narrowing time windows.
When the graph couldn't resolve a record automatically, it sent it back to the admin queue for human verification.
That system — pairing observability with intelligent resolution — was a game changer. We went from manual firefighting to reliable automation, and the platform started scaling into the millions of odds per day range.
Public API Evolution and Decoupling
From the start, I believed the ingestion database should be completely isolated from public traffic. No reads, no shared load, no coupling. So I built a separate, real-time export layer that synced normalized data to a public API datastore.
That API evolved through three generations:
1. C# API (early version)
The first consolidated endpoint layer, still coupled to old schemas.
2. Python API with Cassandra backend
I switched to Python and Cassandra to handle massive read volumes without ever touching the ingestion DB. Data synced in real time, providing fast, high-availability reads.
3. Microservices + Postgres (containerization era)
Once Docker arrived, I rebuilt the stack again into a fully containerized Node.js + GraphQL + Kafka architecture. Each microservice handled a single domain — teams, leagues, events, odds, markets — all reporting metrics to Grafana.
System Performance and Scale
By the final generation, the entire system was observable, distributed, and resilient. Even on Super Bowl Sunday, the ingestion layer ran at full speed while millions of users hit the APIs without any degradation.
🏆 Battle-Tested: The system proved its resilience during Super Bowl Sunday — the highest traffic event of the year — with zero downtime and full performance.
Legacy and Impact
By 2017, the Odds Importer was mature, stable, and scalable — and it's still running today, serving as the data backbone for OddsTrader, SportsbookReview, and BookmakersReview. It processes tens of millions of updates per day, including odds, scores, player stats, and live event data.
Key Metrics and Achievements
- 99.99% uptime during peak events like Super Bowl Sunday
- Tens of millions of updates per day processed reliably
- Hours instead of weeks to add new sportsbook feeds
- Zero 4 AM wake-up calls after implementing observability
- Complete decoupling of ingestion and public API layers
Technical Innovations
This project introduced several key innovations that shaped how I approach engineering leadership:
- Configuration-Driven Architecture: Replaced hard-coded parsers with configurable map files and transformations
- Intelligent Data Resolution: Built an Elasticsearch-based graph resolution system for ambiguous data matching
- Observability-First Design: Implemented comprehensive metrics and monitoring before scaling
- Progressive Decoupling: Evolved from monolithic to microservices architecture while maintaining system stability
- Real-time Sync Patterns: Developed reliable data synchronization between ingestion and public API layers
That project shaped how I think about engineering leadership. It taught me that real scalability isn't just about distributed systems — it's about visibility, resilience, and clarity. When I started, I was waking up at 4 a.m. fixing invisible queues. Now, the system runs itself with full transparency, metrics, and trust.
⚠️ Key Lesson: You can't scale what you can't see. Observability isn't optional — it's the foundation of reliable systems.
It remains the most defining technical project of my career — the moment I truly learned how to turn chaos into systems.
Brian Wight
Technical leader and entrepreneur focused on building scalable systems and high-performing teams. Passionate about ownership culture, data-driven decision making, and turning complex problems into simple solutions.