RAG Entity Resolution: Using Vector Embeddings to Identify Games, Players, and Markets

RAG Entity Resolution: Using Vector Embeddings to Identify Games, Players, and Markets

RAG Entity Resolution: Using Vector Embeddings to Identify Games, Players, and Markets

Table of Contents

Technology Stack

ComponentTechnologyPurpose
EmbeddingsHuggingFace Transformers (BAAI/bge-small-en-v1.5)Convert text to 384-dimensional vectors
Vector DatabasePostgreSQL + pgvectorStore and search embeddings efficiently
LLMOpenAI GPT-4Entity extraction from natural language
BackendPython FastAPIAPI orchestration and processing
ORMSQLAlchemyDatabase operations and entity management
Search PipelineLlamaIndexRAG coordination and retrieval
Data ProcessingPandas + NumPyEntity preprocessing and normalization

The Entity Grounding Challenge

When we set out to build a global search and conversational assistant for OddsTrader, the goal was straightforward: let users ask natural questions about sports — "Who's favored in the Ravens game?", "Show me Mahomes' passing yards props," "Any injuries for tonight's Bengals matchup?"

But that immediately exposed a foundational challenge: to answer questions, we first had to understand what entity the user was talking about.

That meant resolving a wide range of entity types:

  • Games ("Ravens vs Bengals")
  • Teams ("New England" → "Patriots")
  • Players ("Mahomes", "Burrow", etc.)
  • Sportsbooks ("DK" → "DraftKings")
  • Markets ("passing yards", "moneyline")
  • Stats ("completion percentage", "turnovers")

Each one required mapping unstructured language to a structured identifier — an event_id, team_id, or player_id.

Traditional keyword search or fuzzy text matching couldn't handle the aliasing, abbreviations, or context shifts that users naturally produced.

The solution became a semantic entity resolution system built on top of a Retrieval-Augmented Generation (RAG) pipeline — allowing the chatbot to interpret natural queries, retrieve the right structured entities, and then generate informed responses.

System Role within the RAG Pipeline

In the overall architecture, this embedding layer acted as the retriever — the step responsible for grounding LLM queries in real data before generation.

User Query
   ↓
Entity Extraction (LLM)
   ↓
Embedding Generation → PostgreSQL Vector Search (chat_find_entities_v2)
   ↓
Resolved Entity Metadata (game_id, player_id, etc.)
   ↓
Augmented Context → LLM Response Generation

This design allowed the chatbot to retrieve relevant structured data (games, stats, odds) from the database before generating a final answer — the "retrieval" part of the RAG cycle.

Design Goals

GoalDescription
Entity GroundingMap natural-language references to canonical entities across multiple domains
Unified SchemaRepresent teams, games, players, and sportsbooks with a shared embedding model
Database-Local Similarity SearchPerform cosine similarity directly inside PostgreSQL for latency and simplicity
LLM-Compatible OutputReturn resolved entity metadata as JSON to feed back into downstream prompt construction

Architecture Overview

Core Components

  • Embedding Model: BAAI/bge-small-en-v1.5 via llama-index.embeddings.huggingface
  • Schema: Pydantic-based EntityData model (handles embedding serialization)
  • Vector Store: PostgreSQL using stored function chat_find_entities_v2
  • Integration Layer: SQLAlchemy with JSON serialization for RAG pipeline input/output

Entity Coverage

CategoryExamples
Teams"NE", "New England Patriots", "Pats"
Players"Mahomes", "Patrick Mahomes", "QB1 Chiefs"
Games"Ravens vs Bengals", "Baltimore tonight"
Sportsbooks"FD", "FanDuel", "DK", "DraftKings"
Markets"passing yards", "player props", "over/under"

Implementation Details

Embedding Model

from llama_index.embeddings.huggingface import HuggingFaceEmbedding
model = HuggingFaceEmbedding(model_name="BAAI/bge-small-en-v1.5")

Entity Schema

Pydantic model automatically transforms relevant text fields into embeddings.

class EntityData(BaseModel):
    condensed_chat_question: str
    sportsbook: Optional[str] = None
    league: Optional[str] = None
    teams: List[str] = None
    requires_injury_data: bool = False
    requires_weather_data: bool = False

    @field_serializer('sportsbook', 'league', 'teams')
    def embed_field(self, value, field):
        if isinstance(value, str):
            return model.get_query_embedding(value)
        elif isinstance(value, list):
            return [model.get_query_embedding(item) for item in value]

Query Execution

Embeddings are serialized to JSON and passed into a PostgreSQL similarity function.

query = text("""
    SELECT public.chat_find_entities_v2(:data, :distance)
""")
with pg_engine.begin() as connection:
    result = connection.execute(query, {
        "data": json.dumps(data.model_dump(exclude_none=True)),
        "distance": 0.350
    })

Database Function (Conceptual)

SELECT entity_id, entity_type
FROM entities
WHERE cosine_distance(entities.embedding, input.embedding) < :distance
ORDER BY cosine_distance ASC
LIMIT 1;

All entities are pre-embedded on ingestion, so similarity search requires no runtime embedding computation.

Integration with RAG

The entity resolution layer feeds structured identifiers back into the LLM context window.

Example Flow:

User: "Show me Mahomes' passing yards prop for tonight."

System:

  1. Extracts "Mahomes" (Player), "passing yards" (Market), "tonight" (Date)
  2. Embeds and resolves Player → player_id=12345
  3. Fetches current game and props

LLM:

  1. Receives structured JSON context:
{
  "player_id": 12345,
  "market": "passing_yards",
  "event_id": 6789,
  "sportsbook": "fanduel"
}
  1. Generates grounded natural-language response

This ensures every generated answer is data-backed, not hallucinated — core to RAG design.

Performance Results

MetricKeyword SearchEmbedding Search
Match Accuracy (Top-1)72%93%
Latency (avg)180 ms210 ms
Alias HandlingManualAutomatic
MaintenanceHighMinimal
RAG IntegrationNoneSeamless (structured context)

Key Design Decisions

1. PostgreSQL as Vector Database

Instead of using dedicated vector databases like Pinecone or FAISS, we implemented cosine similarity directly in PostgreSQL.

Benefits:

  • Single database: No additional infrastructure to manage
  • ACID transactions: Entity resolution participates in database transactions
  • Simplified deployment: One fewer service to deploy and monitor
  • Cost efficiency: No separate vector database licensing

Trade-offs:

  • Scale limitations: PostgreSQL vector operations don't scale as well as dedicated solutions
  • Performance ceiling: Cosine similarity in SQL is slower than optimized vector databases
  • Limited vector operations: Fewer advanced vector search features available

2. Unified Embedding Space

All entity types (teams, players, games, markets) share the same embedding model and vector space.

Benefits:

  • Cross-entity relationships: "Ravens game" can match both team and game entities
  • Consistent similarity metrics: All distances are comparable across entity types
  • Simplified model management: One model to train, deploy, and version

Trade-offs:

  • Diluted specialization: Single model may not be optimal for all entity types
  • Interference: Player names might interfere with team names in embedding space
  • Hard to tune: Threshold tuning affects all entity types simultaneously

3. Pre-computed Embeddings

All entities are embedded during data ingestion, not at query time.

# During data ingestion
def embed_entity(entity_text: str, entity_type: str, entity_id: str):
    embedding = model.get_query_embedding(entity_text)
    
    db.execute("""
        INSERT INTO entity_embeddings (entity_id, entity_type, embedding_vector, text)
        VALUES (:id, :type, :embedding, :text)
    """, {
        "id": entity_id,
        "type": entity_type, 
        "embedding": embedding,
        "text": entity_text
    })

Benefits:

  • Fast queries: No embedding computation during user queries
  • Deterministic: Same text always produces same embedding
  • Batch processing: Can embed all entities efficiently during off-peak hours

Trade-offs:

  • Storage overhead: Every entity requires vector storage (1536 dimensions × 4 bytes)
  • Staleness: Embeddings don't update until next ingestion cycle
  • Cold start: New entities unavailable until next embedding batch

Lessons Learned

1. Threshold Tuning is Domain-Specific

The distance threshold (0.35-0.4) varied significantly by entity type:

  • Games: Required tight bounds (0.3) to avoid false matches
  • Players: Could use looser bounds (0.4) due to unique names
  • Teams: Medium bounds (0.35) to handle abbreviations

Solution: We implemented entity-type-specific thresholds:

DISTANCE_THRESHOLDS = {
    'game': 0.30,
    'player': 0.40, 
    'team': 0.35,
    'sportsbook': 0.25,
    'market': 0.45
}

2. Embedding Model Choice Matters

We tested several embedding models before settling on BAAI/bge-small-en-v1.5:

ModelAccuracyLatencySize
all-MiniLM-L6-v285%50ms90MB
bge-small-en-v1.593%80ms130MB
text-embedding-ada-00296%200msAPI

Decision: bge-small-en-v1.5 provided the best balance of accuracy and self-hosted performance.

3. Context Windows Matter

Early implementations embedded just entity names. Adding context dramatically improved accuracy:

# Before: Just the name
embedding = model.get_query_embedding("Mahomes")

# After: Name + context
embedding = model.get_query_embedding("Patrick Mahomes Chiefs quarterback NFL")

Results:

  • Name-only: 78% accuracy
  • Name + context: 93% accuracy
  • Name + context + aliases: 95% accuracy

4. RAG Requires Deterministic Retrieval

The biggest lesson was that RAG pipelines benefit most when retrieval is deterministic — i.e., always resolves the same entities from the same phrasing.

Problem: Early versions would sometimes return different entities for identical queries due to:

  • Floating-point precision differences
  • Database query plan variations
  • Tie-breaking inconsistencies

Solution: We added deterministic tie-breaking:

SELECT entity_id, entity_type, cosine_distance
FROM entities
WHERE cosine_distance(entities.embedding, :input_embedding) < :threshold
ORDER BY 
    cosine_distance ASC,
    entity_id ASC  -- Deterministic tie-breaker
LIMIT 1;

This ensured the LLM always received consistent context for identical user queries.

Production Challenges

1. Cold Start Problem

New entities (players, games) weren't available until the next embedding batch, creating a lag between data ingestion and entity resolution.

Solution: Implemented incremental embedding updates:

def embed_new_entities():
    """Run every 5 minutes to embed newly added entities"""
    unembedded = db.execute("""
        SELECT entity_id, entity_text, entity_type
        FROM entities e
        LEFT JOIN entity_embeddings ee ON e.entity_id = ee.entity_id
        WHERE ee.entity_id IS NULL
    """).fetchall()
    
    for entity in unembedded:
        embedding = model.get_query_embedding(entity.entity_text)
        db.execute("""
            INSERT INTO entity_embeddings (entity_id, entity_type, embedding_vector)
            VALUES (:id, :type, :embedding)
        """, {
            "id": entity.entity_id,
            "type": entity.entity_type,
            "embedding": embedding
        })

2. Memory Usage

Loading embeddings for all entities (500K+ players, teams, games) consumed significant memory.

Solution: Implemented lazy loading with LRU cache:

from functools import lru_cache

@lru_cache(maxsize=10000)
def get_entity_embedding(entity_id: str) -> List[float]:
    result = db.execute("""
        SELECT embedding_vector 
        FROM entity_embeddings 
        WHERE entity_id = :id
    """, {"id": entity_id}).fetchone()
    
    return result.embedding_vector if result else None

3. Embedding Drift

Model updates would change embeddings for the same text, breaking existing similarity relationships.

Solution: Version-controlled embeddings with migration scripts:

class EmbeddingVersion(BaseModel):
    version: str
    model_name: str
    created_at: datetime
    
def migrate_embeddings(old_version: str, new_version: str):
    """Migrate all embeddings to new model version"""
    entities = get_all_entities()
    
    for entity in entities:
        new_embedding = new_model.get_query_embedding(entity.text)
        
        db.execute("""
            UPDATE entity_embeddings 
            SET embedding_vector = :embedding, version = :version
            WHERE entity_id = :id
        """, {
            "embedding": new_embedding,
            "version": new_version,
            "id": entity.entity_id
        })

Outcome

The embedding-based entity resolution system transformed the chatbot from a text parser into a semantic interface for structured data. Every user query now maps to specific games, players, and markets, grounding the LLM's responses in live data.

Key Metrics:

  • 93% entity resolution accuracy (up from 72% with keyword search)
  • 210ms average latency (acceptable for conversational AI)
  • 500K+ entities resolved across teams, players, games, and markets
  • 95% reduction in manual entity mapping maintenance

Business Impact:

  • Improved user experience: Users can ask questions naturally without learning specific syntax
  • Reduced support burden: Fewer "I can't find..." support tickets
  • Enhanced engagement: Users ask 3x more questions when entity resolution works reliably
  • Data accuracy: Every response grounded in real data, no hallucinations

This implementation established the retrieval backbone for OddsTrader's RAG architecture — enabling natural-language access to real-time odds, stats, and player information with precision and context awareness.

The system continues to evolve, but the core insight remains: successful RAG depends on reliable entity grounding. When users ask about "Mahomes," they need to know the system understands exactly which player, which game, and which statistics they care about. Vector embeddings made that understanding possible at scale.

Brian Wight

Brian Wight

Technical leader and entrepreneur focused on building scalable systems and high-performing teams. Passionate about ownership culture, data-driven decision making, and turning complex problems into simple solutions.

RAG Entity Resolution: Using Vector Embeddings to Identify Games, Players, and Markets - Brian Wight