RAG Entity Resolution: Using Vector Embeddings to Identify Games, Players, and Markets

Technology Stack
The Entity Grounding Challenge
System Role within the RAG Pipeline
Design Goals
Architecture Overview
Implementation Details
Integration with RAG
Performance Results
Key Design Decisions
Lessons Learned
Production Challenges
Outcome

Technology Stack

Component	Technology	Purpose
Embeddings	HuggingFace Transformers (BAAI/bge-small-en-v1.5)	Convert text to 384-dimensional vectors
Vector Database	PostgreSQL + pgvector	Store and search embeddings efficiently
LLM	OpenAI GPT-4	Entity extraction from natural language
Backend	Python FastAPI	API orchestration and processing
ORM	SQLAlchemy	Database operations and entity management
Search Pipeline	LlamaIndex	RAG coordination and retrieval
Data Processing	Pandas + NumPy	Entity preprocessing and normalization

The Entity Grounding Challenge

When we set out to build a global search and conversational assistant for OddsTrader, the goal was straightforward: let users ask natural questions about sports — "Who's favored in the Ravens game?", "Show me Mahomes' passing yards props," "Any injuries for tonight's Bengals matchup?"

But that immediately exposed a foundational challenge: to answer questions, we first had to understand what entity the user was talking about.

That meant resolving a wide range of entity types:

Games ("Ravens vs Bengals")
Teams ("New England" → "Patriots")
Players ("Mahomes", "Burrow", etc.)
Sportsbooks ("DK" → "DraftKings")
Markets ("passing yards", "moneyline")
Stats ("completion percentage", "turnovers")

Each one required mapping unstructured language to a structured identifier — an event_id, team_id, or player_id.

Traditional keyword search or fuzzy text matching couldn't handle the aliasing, abbreviations, or context shifts that users naturally produced.

The solution became a semantic entity resolution system built on top of a Retrieval-Augmented Generation (RAG) pipeline — allowing the chatbot to interpret natural queries, retrieve the right structured entities, and then generate informed responses.

System Role within the RAG Pipeline

In the overall architecture, this embedding layer acted as the retriever — the step responsible for grounding LLM queries in real data before generation.

User Query
   ↓
Entity Extraction (LLM)
   ↓
Embedding Generation → PostgreSQL Vector Search (chat_find_entities_v2)
   ↓
Resolved Entity Metadata (game_id, player_id, etc.)
   ↓
Augmented Context → LLM Response Generation

This design allowed the chatbot to retrieve relevant structured data (games, stats, odds) from the database before generating a final answer — the "retrieval" part of the RAG cycle.

Design Goals

Goal	Description
Entity Grounding	Map natural-language references to canonical entities across multiple domains
Unified Schema	Represent teams, games, players, and sportsbooks with a shared embedding model
Database-Local Similarity Search	Perform cosine similarity directly inside PostgreSQL for latency and simplicity
LLM-Compatible Output	Return resolved entity metadata as JSON to feed back into downstream prompt construction

Architecture Overview

Core Components

Embedding Model: BAAI/bge-small-en-v1.5 via llama-index.embeddings.huggingface
Schema: Pydantic-based EntityData model (handles embedding serialization)
Vector Store: PostgreSQL using stored function chat_find_entities_v2
Integration Layer: SQLAlchemy with JSON serialization for RAG pipeline input/output

Entity Coverage

Category	Examples
Teams	"NE", "New England Patriots", "Pats"
Players	"Mahomes", "Patrick Mahomes", "QB1 Chiefs"
Games	"Ravens vs Bengals", "Baltimore tonight"
Sportsbooks	"FD", "FanDuel", "DK", "DraftKings"
Markets	"passing yards", "player props", "over/under"

Implementation Details

Embedding Model

from llama_index.embeddings.huggingface import HuggingFaceEmbedding
model = HuggingFaceEmbedding(model_name="BAAI/bge-small-en-v1.5")

Entity Schema

Pydantic model automatically transforms relevant text fields into embeddings.

class EntityData(BaseModel):
    condensed_chat_question: str
    sportsbook: Optional[str] = None
    league: Optional[str] = None
    teams: List[str] = None
    requires_injury_data: bool = False
    requires_weather_data: bool = False

    @field_serializer('sportsbook', 'league', 'teams')
    def embed_field(self, value, field):
        if isinstance(value, str):
            return model.get_query_embedding(value)
        elif isinstance(value, list):
            return [model.get_query_embedding(item) for item in value]

Query Execution

Embeddings are serialized to JSON and passed into a PostgreSQL similarity function.

query = text("""
    SELECT public.chat_find_entities_v2(:data, :distance)
""")
with pg_engine.begin() as connection:
    result = connection.execute(query, {
        "data": json.dumps(data.model_dump(exclude_none=True)),
        "distance": 0.350
    })

Database Function (Conceptual)

SELECT entity_id, entity_type
FROM entities
WHERE cosine_distance(entities.embedding, input.embedding) < :distance
ORDER BY cosine_distance ASC
LIMIT 1;

All entities are pre-embedded on ingestion, so similarity search requires no runtime embedding computation.

Integration with RAG

The entity resolution layer feeds structured identifiers back into the LLM context window.

Example Flow:

User: "Show me Mahomes' passing yards prop for tonight."

System:

Extracts "Mahomes" (Player), "passing yards" (Market), "tonight" (Date)
Embeds and resolves Player → player_id=12345
Fetches current game and props

LLM:

Receives structured JSON context:

{
  "player_id": 12345,
  "market": "passing_yards",
  "event_id": 6789,
  "sportsbook": "fanduel"
}

Generates grounded natural-language response

This ensures every generated answer is data-backed, not hallucinated — core to RAG design.

Performance Results

Metric	Keyword Search	Embedding Search
Match Accuracy (Top-1)	72%	93%
Latency (avg)	180 ms	210 ms
Alias Handling	Manual	Automatic
Maintenance	High	Minimal
RAG Integration	None	Seamless (structured context)

Key Design Decisions

1. PostgreSQL as Vector Database

Instead of using dedicated vector databases like Pinecone or FAISS, we implemented cosine similarity directly in PostgreSQL.

Benefits:

Single database: No additional infrastructure to manage
ACID transactions: Entity resolution participates in database transactions
Simplified deployment: One fewer service to deploy and monitor
Cost efficiency: No separate vector database licensing

Trade-offs:

Scale limitations: PostgreSQL vector operations don't scale as well as dedicated solutions
Performance ceiling: Cosine similarity in SQL is slower than optimized vector databases
Limited vector operations: Fewer advanced vector search features available

2. Unified Embedding Space

All entity types (teams, players, games, markets) share the same embedding model and vector space.

Benefits:

Cross-entity relationships: "Ravens game" can match both team and game entities
Consistent similarity metrics: All distances are comparable across entity types
Simplified model management: One model to train, deploy, and version

Trade-offs:

Diluted specialization: Single model may not be optimal for all entity types
Interference: Player names might interfere with team names in embedding space
Hard to tune: Threshold tuning affects all entity types simultaneously

3. Pre-computed Embeddings

All entities are embedded during data ingestion, not at query time.

# During data ingestion
def embed_entity(entity_text: str, entity_type: str, entity_id: str):
    embedding = model.get_query_embedding(entity_text)
    
    db.execute("""
        INSERT INTO entity_embeddings (entity_id, entity_type, embedding_vector, text)
        VALUES (:id, :type, :embedding, :text)
    """, {
        "id": entity_id,
        "type": entity_type, 
        "embedding": embedding,
        "text": entity_text
    })

Benefits:

Fast queries: No embedding computation during user queries
Deterministic: Same text always produces same embedding
Batch processing: Can embed all entities efficiently during off-peak hours

Trade-offs:

Storage overhead: Every entity requires vector storage (1536 dimensions × 4 bytes)
Staleness: Embeddings don't update until next ingestion cycle
Cold start: New entities unavailable until next embedding batch

Lessons Learned

1. Threshold Tuning is Domain-Specific

The distance threshold (0.35-0.4) varied significantly by entity type:

Games: Required tight bounds (0.3) to avoid false matches
Players: Could use looser bounds (0.4) due to unique names
Teams: Medium bounds (0.35) to handle abbreviations

Solution: We implemented entity-type-specific thresholds:

DISTANCE_THRESHOLDS = {
    'game': 0.30,
    'player': 0.40, 
    'team': 0.35,
    'sportsbook': 0.25,
    'market': 0.45
}

2. Embedding Model Choice Matters

We tested several embedding models before settling on BAAI/bge-small-en-v1.5:

Model	Accuracy	Latency	Size
all-MiniLM-L6-v2	85%	50ms	90MB
bge-small-en-v1.5	93%	80ms	130MB
text-embedding-ada-002	96%	200ms	API

Decision: bge-small-en-v1.5 provided the best balance of accuracy and self-hosted performance.

3. Context Windows Matter

Early implementations embedded just entity names. Adding context dramatically improved accuracy:

# Before: Just the name
embedding = model.get_query_embedding("Mahomes")

# After: Name + context
embedding = model.get_query_embedding("Patrick Mahomes Chiefs quarterback NFL")

Results:

Name-only: 78% accuracy
Name + context: 93% accuracy
Name + context + aliases: 95% accuracy

4. RAG Requires Deterministic Retrieval

The biggest lesson was that RAG pipelines benefit most when retrieval is deterministic — i.e., always resolves the same entities from the same phrasing.

Problem: Early versions would sometimes return different entities for identical queries due to:

Floating-point precision differences
Database query plan variations
Tie-breaking inconsistencies

Solution: We added deterministic tie-breaking:

SELECT entity_id, entity_type, cosine_distance
FROM entities
WHERE cosine_distance(entities.embedding, :input_embedding) < :threshold
ORDER BY 
    cosine_distance ASC,
    entity_id ASC  -- Deterministic tie-breaker
LIMIT 1;

This ensured the LLM always received consistent context for identical user queries.

Production Challenges

1. Cold Start Problem

New entities (players, games) weren't available until the next embedding batch, creating a lag between data ingestion and entity resolution.

Solution: Implemented incremental embedding updates:

def embed_new_entities():
    """Run every 5 minutes to embed newly added entities"""
    unembedded = db.execute("""
        SELECT entity_id, entity_text, entity_type
        FROM entities e
        LEFT JOIN entity_embeddings ee ON e.entity_id = ee.entity_id
        WHERE ee.entity_id IS NULL
    """).fetchall()
    
    for entity in unembedded:
        embedding = model.get_query_embedding(entity.entity_text)
        db.execute("""
            INSERT INTO entity_embeddings (entity_id, entity_type, embedding_vector)
            VALUES (:id, :type, :embedding)
        """, {
            "id": entity.entity_id,
            "type": entity.entity_type,
            "embedding": embedding
        })

2. Memory Usage

Loading embeddings for all entities (500K+ players, teams, games) consumed significant memory.

Solution: Implemented lazy loading with LRU cache:

from functools import lru_cache

@lru_cache(maxsize=10000)
def get_entity_embedding(entity_id: str) -> List[float]:
    result = db.execute("""
        SELECT embedding_vector 
        FROM entity_embeddings 
        WHERE entity_id = :id
    """, {"id": entity_id}).fetchone()
    
    return result.embedding_vector if result else None

3. Embedding Drift

Model updates would change embeddings for the same text, breaking existing similarity relationships.

Solution: Version-controlled embeddings with migration scripts:

class EmbeddingVersion(BaseModel):
    version: str
    model_name: str
    created_at: datetime
    
def migrate_embeddings(old_version: str, new_version: str):
    """Migrate all embeddings to new model version"""
    entities = get_all_entities()
    
    for entity in entities:
        new_embedding = new_model.get_query_embedding(entity.text)
        
        db.execute("""
            UPDATE entity_embeddings 
            SET embedding_vector = :embedding, version = :version
            WHERE entity_id = :id
        """, {
            "embedding": new_embedding,
            "version": new_version,
            "id": entity.entity_id
        })

Outcome

The embedding-based entity resolution system transformed the chatbot from a text parser into a semantic interface for structured data. Every user query now maps to specific games, players, and markets, grounding the LLM's responses in live data.

Key Metrics:

93% entity resolution accuracy (up from 72% with keyword search)
210ms average latency (acceptable for conversational AI)
500K+ entities resolved across teams, players, games, and markets
95% reduction in manual entity mapping maintenance

Business Impact:

Improved user experience: Users can ask questions naturally without learning specific syntax
Reduced support burden: Fewer "I can't find..." support tickets
Enhanced engagement: Users ask 3x more questions when entity resolution works reliably
Data accuracy: Every response grounded in real data, no hallucinations

This implementation established the retrieval backbone for OddsTrader's RAG architecture — enabling natural-language access to real-time odds, stats, and player information with precision and context awareness.

The system continues to evolve, but the core insight remains: successful RAG depends on reliable entity grounding. When users ask about "Mahomes," they need to know the system understands exactly which player, which game, and which statistics they care about. Vector embeddings made that understanding possible at scale.

RAG Entity Resolution: Using Vector Embeddings to Identify Games, Players, and Markets

RAG Entity Resolution: Using Vector Embeddings to Identify Games, Players, and Markets

Table of Contents

Technology Stack

The Entity Grounding Challenge

System Role within the RAG Pipeline

Design Goals

Architecture Overview

Core Components

Entity Coverage

Implementation Details

Embedding Model

Entity Schema

Query Execution

Database Function (Conceptual)

Integration with RAG

Example Flow:

Performance Results

Key Design Decisions

1. PostgreSQL as Vector Database

2. Unified Embedding Space

3. Pre-computed Embeddings

Lessons Learned

1. Threshold Tuning is Domain-Specific

2. Embedding Model Choice Matters

3. Context Windows Matter

4. RAG Requires Deterministic Retrieval

Production Challenges

1. Cold Start Problem

2. Memory Usage

3. Embedding Drift

Outcome

Key Metrics:

Business Impact:

Brian Wight