When Scalability Breaks Reality: Lessons from Building the Odds Importer

When Scalability Breaks Reality: Lessons from Building the Odds Importer

When Scalability Breaks Reality: Lessons from Building the Odds Importer

Technology Stack

ComponentTechnologyPurpose
BackendRuby on RailsCore data processing framework
Message QueueAWS SQSDistributed task processing
DatabasePostgreSQLPrimary data storage
CacheRedisFast data access and temporary storage
Load BalancerAWS ELBTraffic distribution across servers
MonitoringAWS CloudWatchSystem metrics and alerting
Data FormatJSON/XMLInput feed processing
InfrastructureAWS EC2 Auto ScalingDynamic server provisioning
API LayerRESTful RailsFrontend data delivery
Processing PipelineCustom ETLReal-time odds transformation

I used to think scalability was the ultimate goal β€” that if a system could scale horizontally, it could handle anything.

I was wrong.

When we built the Odds Importer for OddsTrader, scalability was the north star. We designed it so we could spin up 10, 20, even 30 servers to process odds feeds in parallel. It worked beautifully β€” in theory. But what I didn't anticipate was how data context could quietly destroy that perfect design.

The Architecture That Looked Perfect

We started with two core principles:

  1. Horizontal Scalability:
    Each server could take a chunk of data and process it independently.
  2. Sub-Second Updates:
    Once the feed was downloaded, updates needed to propagate to the frontend in under a second.

The problem? Some XML feeds were massive β€” 30 MB or more. So we decided to split them into small JSON "lines" that could be processed independently across multiple servers.

Example: The Input Feed

{
  "sport": "NFL",
  "gameId": "NE-NYJ-2025-11-01",
  "matchup": {
    "home": "NE",
    "away": "NYJ",
    "startTime": "2025-11-01T18:00:00Z"
  },
  "markets": [
    {
      "market": "Point Spread",
      "odds": [
        { "team": "NE", "spread": -5, "price": -110 },
        { "team": "NYJ", "spread": +5, "price": -110 }
      ]
    },
    {
      "market": "Total Points",
      "odds": [
        { "type": "Over", "line": 42.5, "price": -110 },
        { "type": "Under", "line": 42.5, "price": -110 }
      ]
    }
  ]
}

We'd split this into multiple independent chunks:

[
  {
    "gameId": "NE-NYJ-2025-11-01",
    "market": "Point Spread",
    "team": "NE",
    "spread": -5,
    "price": -110
  },
  {
    "gameId": "NE-NYJ-2025-11-01",
    "market": "Point Spread",
    "team": "NYJ",
    "spread": +5,
    "price": -110
  },
  {
    "gameId": "NE-NYJ-2025-11-01",
    "market": "Total Points",
    "type": "Over",
    "line": 42.5,
    "price": -110
  },
  {
    "gameId": "NE-NYJ-2025-11-01",
    "market": "Total Points",
    "type": "Under",
    "line": 42.5,
    "price": -110
  }
]

That denormalization made scaling simple. Each record was atomic β€” or so we thought.

πŸ’‘ The Scalability Trap: Making data artificially atomic can destroy the very relationships that give it meaning.

The Real-World Failure: Futures and Missing Context

The design assumed every record could live on its own. But in sports data, context is everything.

For typical two-outcome markets, like point spreads, this was fine β€” both sides update nearly simultaneously. But in futures markets, the data behaves differently.

Here's an example:

{
  "market": "Super Bowl Winner",
  "odds": [
    { "team": "NE", "price": 300 },
    { "team": "KC", "price": 500 },
    { "team": "NYJ", "price": 50000 }
  ]
}

When the Jets are eliminated, their line disappears. So the next update might look like this:

{
  "market": "Super Bowl Winner",
  "odds": [
    { "team": "NE", "price": 250 },
    { "team": "KC", "price": 450 }
  ]
}

If you process this line-by-line across 20 servers, you'll never know the Jets were removed β€” only that they didn't update.

The system can't tell the difference between "missing data" and "removed market."

"Horizontal scaling amplifies inconsistency. When your architecture assumes atomic updates, but your data is contextual, scaling only makes the wrong behavior happen faster." β€” Hard-learned lesson from the trenches

This wasn't a theory problem β€” it was a reality that hit us in production with millions of users watching.

We had feeds that handled this gracefully by marking removed teams explicitly:

{ "team": "NYJ", "status": "off", "market": "Super Bowl Winner" }

But most didn't. Our "perfectly scalable" importer suddenly required manual intervention. We ended up building an internal admin tool so operators could mark eliminated teams manually β€” the opposite of scalable.

When Data Arrives Out of Sync

The second problem was even trickier: live statistics.

We wanted to show real-time play-by-play β€” score, time, down, distance β€” all updating automatically. But each stat came through a different feed, updated at different times.

Example Feed Sequence

// Update 1
{ "quarter": 1, "time": "10:25", "down": 1, "distance": 10, "yardLine": "NE 25" }

// Update 2 (time only)
{ "quarter": 1, "time": "10:15" }

// Update 3 (yards only)
{ "yardLine": "NE 30" }

// Update 4 (final state)
{ "quarter": 1, "time": "10:15", "down": 1, "distance": 5, "yardLine": "NE 30" }

Each update looked valid β€” but the frontend would render these in between states: time changed, but yard line didn't. Yard line changed, but down didn't. It looked like the game was breaking physics.

We couldn't rewrite the importer without breaking everything else. So we grouped related stats and forced the frontend to wait until all fields in that group had been updated before showing changes.

"updateGroups": {
  "playState": ["quarter", "time", "down", "distance", "yardLine"]
}

It was hacky, but it worked β€” mostly. Performance tanked, and we paid that cost for years.

ProblemOur "Solution"Real Cost
Missing context in futuresManual admin toolHours of daily operator work
Out-of-sync live statsGrouped updates300ms+ latency penalty
Scale complexityMore monitoring4 AM debugging sessions

⚠️ Technical Debt Reality: What works "for now" usually becomes tomorrow's bottleneck. These quick fixes compounded into architectural constraints that lasted years.

The Broader Lesson

On paper, horizontal scalability solves throughput. In reality, it amplifies inconsistency.

When your architecture assumes atomic updates, but your data is contextual, scaling only makes the wrong behavior happen faster.

If I Were Rebuilding It Today

If I rebuilt the importer now, I'd treat related odds and stats as transactional groups β€” processed together, versioned together, and retired together.

Something like:

{
  "batchId": "sb2025-001",
  "market": "Super Bowl Winner",
  "timestamp": "2025-11-01T20:00:00Z",
  "records": [
    { "team": "NE", "price": 250 },
    { "team": "KC", "price": 450 },
    { "team": "NYJ", "status": "off" }
  ]
}

That single record can be diffed, versioned, and replayed β€” without losing context or flooding the system.

Modern Architecture Approach

Instead of splitting everything into atomic pieces, I'd use:

  1. Event Streaming with Kafka for ordered processing
  2. Batch Processing for contextual groups
  3. State Snapshots for consistency verification
  4. Explicit Deletes instead of implicit removals

βœ… Context-Aware Scaling: Sometimes, the fastest way to get things right is to process related data together, not split it apart.

What This Taught Me

1. Scale exposes weak assumptions

If your system depends on context, scaling horizontally multiplies confusion, not throughput.

2. Real-time β‰  real order

Concurrency and truth aren't the same thing β€” you have to design for state consistency, not speed.

3. Hacks create invisible debts

What works "for now" usually becomes tomorrow's bottleneck.

4. Context beats parallelism

Sometimes, the fastest way to get things right is to process them together.


"A system can only move as fast as its context stays intact." β€” The most expensive lesson we learned

No amount of horizontal scaling can fix fundamentally broken data relationships. You have to solve for correctness first, then scale.

The Real Success Metrics

In the end, the odds importer worked β€” it powered millions of updates per day. But the metrics that mattered weren't just throughput:

  • Accuracy: 99.9% data consistency after the fixes
  • Operational overhead: Reduced from 4 hours/day to 30 minutes/day
  • Developer sanity: No more 4 AM debugging sessions
  • User experience: Sub-second updates with correct context

The system taught me something more valuable than scale: context is not optional. You can't scale away fundamental design problems β€” you can only make them happen faster and more expensively.

When building distributed systems, always ask: "What relationships am I breaking by splitting this data?" Because once you lose context, no amount of horizontal scaling will get it back.

Brian Wight

Brian Wight

Technical leader and entrepreneur focused on building scalable systems and high-performing teams. Passionate about ownership culture, data-driven decision making, and turning complex problems into simple solutions.

When Scalability Breaks Reality: Lessons from Building the Odds Importer - Brian Wight