RAG Entity Resolution: Using Vector Embeddings to Identify Games, Players, and Markets

RAG Entity Resolution: Using Vector Embeddings to Identify Games, Players, and Markets
Table of Contents
- Technology Stack
- The Entity Grounding Challenge
- System Role within the RAG Pipeline
- Design Goals
- Architecture Overview
- Implementation Details
- Integration with RAG
- Performance Results
- Key Design Decisions
- Lessons Learned
- Production Challenges
- Outcome
Technology Stack
| Component | Technology | Purpose |
|---|---|---|
| Embeddings | HuggingFace Transformers (BAAI/bge-small-en-v1.5) | Convert text to 384-dimensional vectors |
| Vector Database | PostgreSQL + pgvector | Store and search embeddings efficiently |
| LLM | OpenAI GPT-4 | Entity extraction from natural language |
| Backend | Python FastAPI | API orchestration and processing |
| ORM | SQLAlchemy | Database operations and entity management |
| Search Pipeline | LlamaIndex | RAG coordination and retrieval |
| Data Processing | Pandas + NumPy | Entity preprocessing and normalization |
The Entity Grounding Challenge
When we set out to build a global search and conversational assistant for OddsTrader, the goal was straightforward: let users ask natural questions about sports — "Who's favored in the Ravens game?", "Show me Mahomes' passing yards props," "Any injuries for tonight's Bengals matchup?"
But that immediately exposed a foundational challenge: to answer questions, we first had to understand what entity the user was talking about.
That meant resolving a wide range of entity types:
- Games ("Ravens vs Bengals")
- Teams ("New England" → "Patriots")
- Players ("Mahomes", "Burrow", etc.)
- Sportsbooks ("DK" → "DraftKings")
- Markets ("passing yards", "moneyline")
- Stats ("completion percentage", "turnovers")
Each one required mapping unstructured language to a structured identifier — an event_id, team_id, or player_id.
Traditional keyword search or fuzzy text matching couldn't handle the aliasing, abbreviations, or context shifts that users naturally produced.
The solution became a semantic entity resolution system built on top of a Retrieval-Augmented Generation (RAG) pipeline — allowing the chatbot to interpret natural queries, retrieve the right structured entities, and then generate informed responses.
System Role within the RAG Pipeline
In the overall architecture, this embedding layer acted as the retriever — the step responsible for grounding LLM queries in real data before generation.
User Query
↓
Entity Extraction (LLM)
↓
Embedding Generation → PostgreSQL Vector Search (chat_find_entities_v2)
↓
Resolved Entity Metadata (game_id, player_id, etc.)
↓
Augmented Context → LLM Response Generation
This design allowed the chatbot to retrieve relevant structured data (games, stats, odds) from the database before generating a final answer — the "retrieval" part of the RAG cycle.
Design Goals
| Goal | Description |
|---|---|
| Entity Grounding | Map natural-language references to canonical entities across multiple domains |
| Unified Schema | Represent teams, games, players, and sportsbooks with a shared embedding model |
| Database-Local Similarity Search | Perform cosine similarity directly inside PostgreSQL for latency and simplicity |
| LLM-Compatible Output | Return resolved entity metadata as JSON to feed back into downstream prompt construction |
Architecture Overview
Core Components
- Embedding Model: BAAI/bge-small-en-v1.5 via llama-index.embeddings.huggingface
- Schema: Pydantic-based EntityData model (handles embedding serialization)
- Vector Store: PostgreSQL using stored function
chat_find_entities_v2 - Integration Layer: SQLAlchemy with JSON serialization for RAG pipeline input/output
Entity Coverage
| Category | Examples |
|---|---|
| Teams | "NE", "New England Patriots", "Pats" |
| Players | "Mahomes", "Patrick Mahomes", "QB1 Chiefs" |
| Games | "Ravens vs Bengals", "Baltimore tonight" |
| Sportsbooks | "FD", "FanDuel", "DK", "DraftKings" |
| Markets | "passing yards", "player props", "over/under" |
Implementation Details
Embedding Model
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
model = HuggingFaceEmbedding(model_name="BAAI/bge-small-en-v1.5")
Entity Schema
Pydantic model automatically transforms relevant text fields into embeddings.
class EntityData(BaseModel):
condensed_chat_question: str
sportsbook: Optional[str] = None
league: Optional[str] = None
teams: List[str] = None
requires_injury_data: bool = False
requires_weather_data: bool = False
@field_serializer('sportsbook', 'league', 'teams')
def embed_field(self, value, field):
if isinstance(value, str):
return model.get_query_embedding(value)
elif isinstance(value, list):
return [model.get_query_embedding(item) for item in value]
Query Execution
Embeddings are serialized to JSON and passed into a PostgreSQL similarity function.
query = text("""
SELECT public.chat_find_entities_v2(:data, :distance)
""")
with pg_engine.begin() as connection:
result = connection.execute(query, {
"data": json.dumps(data.model_dump(exclude_none=True)),
"distance": 0.350
})
Database Function (Conceptual)
SELECT entity_id, entity_type
FROM entities
WHERE cosine_distance(entities.embedding, input.embedding) < :distance
ORDER BY cosine_distance ASC
LIMIT 1;
All entities are pre-embedded on ingestion, so similarity search requires no runtime embedding computation.
Integration with RAG
The entity resolution layer feeds structured identifiers back into the LLM context window.
Example Flow:
User: "Show me Mahomes' passing yards prop for tonight."
System:
- Extracts "Mahomes" (Player), "passing yards" (Market), "tonight" (Date)
- Embeds and resolves Player →
player_id=12345 - Fetches current game and props
LLM:
- Receives structured JSON context:
{
"player_id": 12345,
"market": "passing_yards",
"event_id": 6789,
"sportsbook": "fanduel"
}
- Generates grounded natural-language response
This ensures every generated answer is data-backed, not hallucinated — core to RAG design.
Performance Results
| Metric | Keyword Search | Embedding Search |
|---|---|---|
| Match Accuracy (Top-1) | 72% | 93% |
| Latency (avg) | 180 ms | 210 ms |
| Alias Handling | Manual | Automatic |
| Maintenance | High | Minimal |
| RAG Integration | None | Seamless (structured context) |
Key Design Decisions
1. PostgreSQL as Vector Database
Instead of using dedicated vector databases like Pinecone or FAISS, we implemented cosine similarity directly in PostgreSQL.
Benefits:
- Single database: No additional infrastructure to manage
- ACID transactions: Entity resolution participates in database transactions
- Simplified deployment: One fewer service to deploy and monitor
- Cost efficiency: No separate vector database licensing
Trade-offs:
- Scale limitations: PostgreSQL vector operations don't scale as well as dedicated solutions
- Performance ceiling: Cosine similarity in SQL is slower than optimized vector databases
- Limited vector operations: Fewer advanced vector search features available
2. Unified Embedding Space
All entity types (teams, players, games, markets) share the same embedding model and vector space.
Benefits:
- Cross-entity relationships: "Ravens game" can match both team and game entities
- Consistent similarity metrics: All distances are comparable across entity types
- Simplified model management: One model to train, deploy, and version
Trade-offs:
- Diluted specialization: Single model may not be optimal for all entity types
- Interference: Player names might interfere with team names in embedding space
- Hard to tune: Threshold tuning affects all entity types simultaneously
3. Pre-computed Embeddings
All entities are embedded during data ingestion, not at query time.
# During data ingestion
def embed_entity(entity_text: str, entity_type: str, entity_id: str):
embedding = model.get_query_embedding(entity_text)
db.execute("""
INSERT INTO entity_embeddings (entity_id, entity_type, embedding_vector, text)
VALUES (:id, :type, :embedding, :text)
""", {
"id": entity_id,
"type": entity_type,
"embedding": embedding,
"text": entity_text
})
Benefits:
- Fast queries: No embedding computation during user queries
- Deterministic: Same text always produces same embedding
- Batch processing: Can embed all entities efficiently during off-peak hours
Trade-offs:
- Storage overhead: Every entity requires vector storage (1536 dimensions × 4 bytes)
- Staleness: Embeddings don't update until next ingestion cycle
- Cold start: New entities unavailable until next embedding batch
Lessons Learned
1. Threshold Tuning is Domain-Specific
The distance threshold (0.35-0.4) varied significantly by entity type:
- Games: Required tight bounds (0.3) to avoid false matches
- Players: Could use looser bounds (0.4) due to unique names
- Teams: Medium bounds (0.35) to handle abbreviations
Solution: We implemented entity-type-specific thresholds:
DISTANCE_THRESHOLDS = {
'game': 0.30,
'player': 0.40,
'team': 0.35,
'sportsbook': 0.25,
'market': 0.45
}
2. Embedding Model Choice Matters
We tested several embedding models before settling on BAAI/bge-small-en-v1.5:
| Model | Accuracy | Latency | Size |
|---|---|---|---|
| all-MiniLM-L6-v2 | 85% | 50ms | 90MB |
| bge-small-en-v1.5 | 93% | 80ms | 130MB |
| text-embedding-ada-002 | 96% | 200ms | API |
Decision: bge-small-en-v1.5 provided the best balance of accuracy and self-hosted performance.
3. Context Windows Matter
Early implementations embedded just entity names. Adding context dramatically improved accuracy:
# Before: Just the name
embedding = model.get_query_embedding("Mahomes")
# After: Name + context
embedding = model.get_query_embedding("Patrick Mahomes Chiefs quarterback NFL")
Results:
- Name-only: 78% accuracy
- Name + context: 93% accuracy
- Name + context + aliases: 95% accuracy
4. RAG Requires Deterministic Retrieval
The biggest lesson was that RAG pipelines benefit most when retrieval is deterministic — i.e., always resolves the same entities from the same phrasing.
Problem: Early versions would sometimes return different entities for identical queries due to:
- Floating-point precision differences
- Database query plan variations
- Tie-breaking inconsistencies
Solution: We added deterministic tie-breaking:
SELECT entity_id, entity_type, cosine_distance
FROM entities
WHERE cosine_distance(entities.embedding, :input_embedding) < :threshold
ORDER BY
cosine_distance ASC,
entity_id ASC -- Deterministic tie-breaker
LIMIT 1;
This ensured the LLM always received consistent context for identical user queries.
Production Challenges
1. Cold Start Problem
New entities (players, games) weren't available until the next embedding batch, creating a lag between data ingestion and entity resolution.
Solution: Implemented incremental embedding updates:
def embed_new_entities():
"""Run every 5 minutes to embed newly added entities"""
unembedded = db.execute("""
SELECT entity_id, entity_text, entity_type
FROM entities e
LEFT JOIN entity_embeddings ee ON e.entity_id = ee.entity_id
WHERE ee.entity_id IS NULL
""").fetchall()
for entity in unembedded:
embedding = model.get_query_embedding(entity.entity_text)
db.execute("""
INSERT INTO entity_embeddings (entity_id, entity_type, embedding_vector)
VALUES (:id, :type, :embedding)
""", {
"id": entity.entity_id,
"type": entity.entity_type,
"embedding": embedding
})
2. Memory Usage
Loading embeddings for all entities (500K+ players, teams, games) consumed significant memory.
Solution: Implemented lazy loading with LRU cache:
from functools import lru_cache
@lru_cache(maxsize=10000)
def get_entity_embedding(entity_id: str) -> List[float]:
result = db.execute("""
SELECT embedding_vector
FROM entity_embeddings
WHERE entity_id = :id
""", {"id": entity_id}).fetchone()
return result.embedding_vector if result else None
3. Embedding Drift
Model updates would change embeddings for the same text, breaking existing similarity relationships.
Solution: Version-controlled embeddings with migration scripts:
class EmbeddingVersion(BaseModel):
version: str
model_name: str
created_at: datetime
def migrate_embeddings(old_version: str, new_version: str):
"""Migrate all embeddings to new model version"""
entities = get_all_entities()
for entity in entities:
new_embedding = new_model.get_query_embedding(entity.text)
db.execute("""
UPDATE entity_embeddings
SET embedding_vector = :embedding, version = :version
WHERE entity_id = :id
""", {
"embedding": new_embedding,
"version": new_version,
"id": entity.entity_id
})
Outcome
The embedding-based entity resolution system transformed the chatbot from a text parser into a semantic interface for structured data. Every user query now maps to specific games, players, and markets, grounding the LLM's responses in live data.
Key Metrics:
- 93% entity resolution accuracy (up from 72% with keyword search)
- 210ms average latency (acceptable for conversational AI)
- 500K+ entities resolved across teams, players, games, and markets
- 95% reduction in manual entity mapping maintenance
Business Impact:
- Improved user experience: Users can ask questions naturally without learning specific syntax
- Reduced support burden: Fewer "I can't find..." support tickets
- Enhanced engagement: Users ask 3x more questions when entity resolution works reliably
- Data accuracy: Every response grounded in real data, no hallucinations
This implementation established the retrieval backbone for OddsTrader's RAG architecture — enabling natural-language access to real-time odds, stats, and player information with precision and context awareness.
The system continues to evolve, but the core insight remains: successful RAG depends on reliable entity grounding. When users ask about "Mahomes," they need to know the system understands exactly which player, which game, and which statistics they care about. Vector embeddings made that understanding possible at scale.
Brian Wight
Technical leader and entrepreneur focused on building scalable systems and high-performing teams. Passionate about ownership culture, data-driven decision making, and turning complex problems into simple solutions.