Odds Importer — Building a Real-Time Odds and Data Ingestion Platform

Odds Importer — Building a Real-Time Odds and Data Ingestion Platform

Odds Importer — Building a Real-Time Odds and Data Ingestion Platform

Technology Stack

ComponentTechnologyPurpose
Backend FrameworkRuby on RailsData processing and API development
Job ProcessingResqueBackground job queue and worker management
Message QueueNServiceBus → SQS → KafkaAsync message processing and streaming
Search EngineElasticsearchData resolution and fuzzy matching
DatabasePostgreSQLPrimary data storage and querying
Cache LayerRedisHigh-speed data caching and sessions
Cloud PlatformAWS (EC2, RDS, SQS, S3)Infrastructure and managed services
Container PlatformDocker + ECSApplication containerization and orchestration
Load BalancingHAProxy + ELB/ALBTraffic distribution and high availability
Service DiscoveryConsulMicroservice registration and discovery
MonitoringGrafana + InfluxDBReal-time dashboards and metrics visualization
Metrics CollectionTelegrafSystem and application metrics gathering
Data ProcessingCustom ETL PipelineReal-time data transformation
API FrameworkRESTful Rails APIsExternal data integrations
ConfigurationJSON Map FilesDynamic feed parsing without code changes
Legacy SystemsC# ScriptsInitial hard-coded importers (replaced)

Table of Contents

  1. From Hard-Coded Scripts to Configurable Transforms
  2. The NServiceBus Era: Lessons in Pain
  3. Rebuilding in Rails and AWS
  4. Public API Evolution and Decoupling
  5. System Performance and Scale
  6. Legacy and Impact
  7. Technical Innovations

When I first joined the company, the odds importer was a mess of hard-coded C# scripts. Each sportsbook had its own parsing code, and every time a feed changed, someone had to dive into SQL to fix broken mappings like "NE" → "New England Patriots." It was fragile, manual, and slow. Adding a new book could take weeks.

My goal was simple but ambitious: make data ingestion modular, configurable, and maintainable — without developer intervention.

💡 Vision: Transform weeks of custom development into hours of configuration by creating a universal data normalization system.

From Hard-Coded Scripts to Configurable Transforms

To solve the chaos, I started by studying every feed format the company had. I literally printed them out — XML, JSON, even CSVs — spread them across my desk, and started highlighting patterns. That's when I realized that even though every sportsbook used a different format, they were all describing the same types of entities: leagues, teams, events, odds, statistics.

The Map File Architecture

Instead of writing a separate parser for each feed, I built a system where each entity type had its own "map file." Each map file defined the structure of that entity — for example, what an event looked like: fields like start time, participants, league, venue, etc.

Click to zoom

For each specific feed or provider, we created a configuration file that told the system how to map that provider's fields to our standardized schema. The two files — the map file (entity definition) and the configuration file (feed definition) — worked together to generate an XML transformation that normalized the feed into our internal format.

All data, regardless of input type (XML, JSON, or CSV), went through a preprocessing step that converted it into XML, then ran through the generated transformation. The output was a consistent, canonical data structure that could flow directly into our database.

I also built an admin interface so that unmapped or ambiguous items — like a new team alias — could be resolved by non-developers through a UI. That one system took something that used to take weeks of developer effort and reduced it to hours of configuration.

The NServiceBus Era: Lessons in Pain

My next step was to make the system distributed. Back then, Docker and serverless didn't exist, so I implemented NServiceBus — a message queue–based service system running as Windows services. Each service handled a stage in the ingestion process and passed messages via queues.

It worked — until it didn't. Under heavy load, queues would pile up. I'd wake up at 4 a.m. to fix stalled jobs, not even sure which service was hung. We had zero observability — no metrics, no dashboards, nothing.

That period taught me one of the most important lessons in engineering:

"If you can't see it, you can't scale it."

Rebuilding in Rails and AWS

Around 2013, I rebuilt everything from scratch — this time in Ruby on Rails running in AWS. We replaced the brittle Windows services with a cloud-based, horizontally scalable system, and introduced InfluxDB and Grafana for full observability.

Click to zoom

Intelligent Data Resolution

For the data normalization challenge, I implemented an Elasticsearch-based resolution system. It used a directed graph of resolution steps — first checking the league, then the teams, then event time — with ordered edges and conditional logic.

Click to zoom

The system took everything it knew (league, date, potential home/away aliases) and ran searches to find the most likely match. It even handled edge cases like MLB double-headers by narrowing time windows.

When the graph couldn't resolve a record automatically, it sent it back to the admin queue for human verification.

That system — pairing observability with intelligent resolution — was a game changer. We went from manual firefighting to reliable automation, and the platform started scaling into the millions of odds per day range.

Millions
Odds processed daily
99.99%
Uptime achieved
0
4 AM wake-up calls

Public API Evolution and Decoupling

From the start, I believed the ingestion database should be completely isolated from public traffic. No reads, no shared load, no coupling. So I built a separate, real-time export layer that synced normalized data to a public API datastore.

That API evolved through three generations:

1. C# API (early version)

The first consolidated endpoint layer, still coupled to old schemas.

2. Python API with Cassandra backend

I switched to Python and Cassandra to handle massive read volumes without ever touching the ingestion DB. Data synced in real time, providing fast, high-availability reads.

3. Microservices + Postgres (containerization era)

Once Docker arrived, I rebuilt the stack again into a fully containerized Node.js + GraphQL + Kafka architecture. Each microservice handled a single domain — teams, leagues, events, odds, markets — all reporting metrics to Grafana.

System Performance and Scale

By the final generation, the entire system was observable, distributed, and resilient. Even on Super Bowl Sunday, the ingestion layer ran at full speed while millions of users hit the APIs without any degradation.

🏆 Battle-Tested: The system proved its resilience during Super Bowl Sunday — the highest traffic event of the year — with zero downtime and full performance.

Legacy and Impact

By 2017, the Odds Importer was mature, stable, and scalable — and it's still running today, serving as the data backbone for OddsTrader, SportsbookReview, and BookmakersReview. It processes tens of millions of updates per day, including odds, scores, player stats, and live event data.


Key Metrics and Achievements

  • 99.99% uptime during peak events like Super Bowl Sunday
  • Tens of millions of updates per day processed reliably
  • Hours instead of weeks to add new sportsbook feeds
  • Zero 4 AM wake-up calls after implementing observability
  • Complete decoupling of ingestion and public API layers

Technical Innovations

This project introduced several key innovations that shaped how I approach engineering leadership:

  1. Configuration-Driven Architecture: Replaced hard-coded parsers with configurable map files and transformations
  2. Intelligent Data Resolution: Built an Elasticsearch-based graph resolution system for ambiguous data matching
  3. Observability-First Design: Implemented comprehensive metrics and monitoring before scaling
  4. Progressive Decoupling: Evolved from monolithic to microservices architecture while maintaining system stability
  5. Real-time Sync Patterns: Developed reliable data synchronization between ingestion and public API layers

That project shaped how I think about engineering leadership. It taught me that real scalability isn't just about distributed systems — it's about visibility, resilience, and clarity. When I started, I was waking up at 4 a.m. fixing invisible queues. Now, the system runs itself with full transparency, metrics, and trust.

⚠️ Key Lesson: You can't scale what you can't see. Observability isn't optional — it's the foundation of reliable systems.

It remains the most defining technical project of my career — the moment I truly learned how to turn chaos into systems.

Brian Wight

Brian Wight

Technical leader and entrepreneur focused on building scalable systems and high-performing teams. Passionate about ownership culture, data-driven decision making, and turning complex problems into simple solutions.

Odds Importer — Building a Real-Time Odds and Data Ingestion Platform - Brian Wight