Entity-Anchored RAG
The Problem: Structured vs. Unstructured Is a False Divide
Traditional RAG systems treat documents as the atomic unit — upload PDFs, chunk them, embed them, search them. The application’s structured data (database fields, relationships, status changes) lives in a completely separate world. An AI agent answering “what are the risks in this project?” has to somehow bridge the gap between a Risk entity in the database and a paragraph in a site inspection PDF.
The platform already knows both sides. It has the entity model (entities, properties, relations, security policies) and it has file attachments. The RAG layer unifies them: every entity instance casts a semantic shadow — a set of vector-embedded text chunks that capture both its structured state and its unstructured attachments, stored in a single table, searchable in a single query, governed by a single security model.
The design principles:
- Entity-anchored — every piece of context is owned by a specific entity instance, not floating in a global index
- Transactional consistency — vector data lives in the same PostgreSQL database (via pgvector) as the entity data it describes, sharing the same backup/restore boundary
- Schema-agnostic — the platform doesn’t know what a “Project” or “Contract” is; the RAG layer works with any entity type defined in the application metadata model
- Composable retrieval — structured properties, unstructured file chunks, and metadata snapshots are all searchable in the same vector space
The Context Shadow
Every entity instance casts a semantic shadow — a collection of vector-embedded text chunks stored in osy.EntityContext, a platform-level entity provisioned in every application’s data schema. The key insight is that context comes in different types, each with its own lifecycle and generation strategy.
Context types
| Type | Source | Lifecycle |
|---|---|---|
MetadataSnapshot | Entity properties serialised via semantic template | Replaced on every property change |
FileChunk | Chunked text from attached PDF/DOCX/XLSX files | Replaced when file is re-uploaded or deleted |
PropertyChange | Individual property mutation log | Append-only, trimmed by retention policy |
FileDistillation | LLM-generated summary of all chunks from a single file | Replaced when file is re-uploaded |
ChatMessage | Individual message in a conversation entity | Append-only, one row per message |
RollingSummary | Periodic condensed summary of N child entities | Replaced every N children (configurable) |
Relationship | Graph-aware context: entity relationships and neighbour summaries | Replaced when relationships change |
The EntityContext entity
choice ContextType:
MetadataSnapshot
FileChunk
PropertyChange
FileDistillation
ChatMessage
RollingSummary
Relationship
ensure entity osy.EntityContext:
description = "Semantic shadow of entity instances — unified vector store
for structured and unstructured context"
EntityType: osy.EntityMetadata required
Entity: dynamic reference(EntityType) required
ContextType: @ContextType required
Content: String required
Embedding: Vector
EmbeddingModel: String(100)
Every context row is anchored to a specific entity instance via a dynamic reference — the EntityType property (an EntityRef to osy.EntityMetadata) tells the runtime which entity type the Entity property points to. No physical FK constraint is needed, but the metadata model maintains a proper typed relationship. This means a single table stores context for all entity types without per-type schema changes.
EmbeddingModel records which model produced the vector, enabling background re-embedding when the provider changes.
File chunks: separating extraction from embedding
File chunks are the text-extraction artefact — a structural fact about a file’s content, independent of the embedding. Separating chunks from context rows means you can re-embed without re-chunking (when switching embedding models) or re-chunk without re-embedding (when tuning chunk size).
ensure entity osy.FileChunk:
description = "Text chunk extracted from a file asset for RAG ingestion"
FileAsset: osy.FileAsset required
ChunkIndex: Int32 required
Content: String required
PageNumber: Int32
SectionHeader: String
ChunkStrategy: String(50)
ChunkParams: Json
ChunkStrategy and ChunkParams make chunking reproducible and versionable. When you change the splitting algorithm or tune parameters, a query like ChunkStrategy != 'recursive-v2' finds all chunks that need re-splitting.
When a FileChunk is embedded, the resulting osy.EntityContext row links back to the chunk’s parent entity (not the FileChunk itself) — the chunk is an intermediate artefact, the context row is anchored to the entity that owns the file.
File asset ownership
Files use the same dynamic-reference pattern for polymorphic ownership — any entity type can own files:
entity Project:
Name: String required
Documents: collection(osy.FileAsset, Owner)
entity Milestone:
Name: String required
Project: Project required
Documents: collection(osy.FileAsset, Owner)
A file belongs to exactly one entity. Uploading the same PDF to both a Project and a Milestone creates two FileAsset instances, each with its own semantic shadow scoped to its owner. This preserves the entity-anchored principle — each entity’s context is self-contained.
The RAG Block
Every entity type can opt into RAG by declaring a rag: block inside the entity definition. This is where the application builder controls what gets embedded, how context is generated, and how the Pulse (the AI-generated summary) behaves. Entities without a rag: block have no semantic shadow — no context rows, no Pulse, no vector search.
Syntax
The rag: block lives inside the entity definition, alongside properties: and index::
entity Project:
description "Project with full RAG support"
semantic "Project {{Name}}: {{Phase}} phase, budget {{Budget}}"
properties:
Name: String required
Budget: Decimal
Phase: @ProjectPhase
Documents: collection(osy.FileAsset, Owner)
Tasks: collection(Task, Project)
Risks: collection(Risk, Project)
rag:
context = auto
files = auto
relationships = auto
pulse = auto
distillation = auto
coalesce = 30s
The minimal form — equivalent to “turn on the semantic shadow with defaults”:
entity Contract:
properties:
Name: String required
rag:
context = auto
Declaring context = auto enables the entity’s semantic shadow. All other settings default to off unless explicitly enabled, except coalesce which defaults to 30s when any RAG feature is active.
Settings
| Setting | Values | Default | Description |
|---|---|---|---|
context | auto / off | required | Core switch. auto embeds the entity’s properties as a MetadataSnapshot context row using the semantic template. If no semantic directive exists, the platform generates a default template from all non-system properties. |
files | auto / off | off | Chunk and embed attached files (via osy.FileAsset ownership). Creates FileChunk rows and FileChunk context rows anchored to this entity. |
distillation | auto / off | off | Generate LLM summaries of attached files. Creates FileDistillation context rows. Requires files = auto. |
relationships | auto / off / CSV | off | Generate Relationship context from the entity’s collection properties. auto = all collections. CSV = explicit list of collection property paths. |
pulse | auto / off | off | Enable Pulse synthesis. Adds PulseContent, PulseGeneratedAt, PulseStaleSince properties to the entity. |
children | auto / off | off | Auto-embed child entities as ChatMessage context rows on the parent. Designed for conversation/message patterns where each child is a discrete unit of content. |
summaryInterval | integer | 50 | When children = auto, generate a RollingSummary every N child entities. |
coalesce | duration | 30s | Staleness coalescing window. Changes within this window produce at most one regeneration. |
propagation | none / parent / integer | parent | Staleness propagation depth. parent = depth 1 (direct parent only). Integer = explicit depth. |
pulseModel | string | platform default | Which LLM to use for Pulse generation. References an llm block by name. |
pulsePrompt | string | auto-generated | Entity-specific Pulse synthesis prompt. |
pulseLength | brief / standard / detailed / integer | standard | Target Pulse length. brief = 1-2 sentences, standard = 50-100 words, detailed = 150-250 words. Integer = explicit word count. |
pulseTrackedProperties | CSV | all | Which property changes trigger Pulse staleness. Default: any change. Use to ignore noise like LastViewedAt. |
statusProperty | CSV | none | Which properties to aggregate for Pulse context enrichment. Produces a distribution summary (e.g., “Draft: 5, Review: 12”). |
embeddingModel | string | platform default | Which embedding provider to use. References an embedding block by name. |
chunkSize | integer | 512 | Target token count per file chunk. Legal/dense docs: 256. Long-form articles: 1000-1500. |
chunkOverlap | integer | 15 | Overlap percentage between adjacent chunks. |
retrievalLimit | integer | 10 | Top-K results from vector search for context assembly. |
recencyWindow | integer | 10 | Last N chat messages included verbatim (not via vector search). |
distillationPrompt | string | auto-generated | Prompt for LLM file summarisation. Override for domain-specific extraction. |
chatReferences | bool | false | Entity chat agent cites its sources via osy.Reference rows. |
pulseReferences | bool | false | Pulse cites its sources via osy.Reference rows. |
Examples
Project management — full-featured RAG with tuned settings:
entity Project:
description = "Project with document-backed RAG and Pulse"
semantic = "Project {{Name}}: {{Phase}} phase, budget {{Budget}}"
properties:
Name: type = string length = 200 required = true
Budget: type = decimal
Phase: choice ProjectPhase
LastViewedAt: type = datetime
Documents: type = collection entityType = osy.FileAsset foreignKey = Owner
Tasks: type = collection entityType = Task foreignKey = Project
Risks: type = collection entityType = Risk foreignKey = Project
rag:
context = auto
files = auto
distillation = auto
distillationPrompt = "Extract key deliverables, deadlines, budget figures,
risk factors, and team responsibilities."
relationships = "Tasks, Risks, SubProjects, SubProjects.Findings"
pulse = auto
pulsePrompt = "Summarise project health: budget vs actual, timeline risks,
blocked tasks, and team capacity."
pulseTrackedProperties = "Budget, Phase, Status"
statusProperty = Status
chunkSize = 800
retrievalLimit = 15
coalesce = 30s
Chat/conversation — auto-embed child messages with rolling summaries:
entity Conversation:
semantic "{{Title}}"
properties:
Title: String required
Messages: collection(Message, Conversation)
rag:
context = auto
children = auto
summaryInterval: 25
pulse = auto
coalesce = 60s
entity Message:
properties:
Conversation: Conversation required
Content: String required
Author: String required
Simple document entity — just files, no relationships or Pulse:
entity Contract:
semantic "Contract: {{Name}}"
properties:
Name: String required
Documents: collection(osy.FileAsset, Owner)
rag:
context = auto
files = auto
distillation = auto
High-churn entity — long coalescing window, cheap Pulse model:
entity ActivityLog:
properties:
Action: String required
Timestamp: DateTime required
rag:
context = auto
coalesce = 120s
pulse = auto
pulseModel: "haiku"
Metadata model
The rag: block emits to osy.RagConfiguration, a platform entity with one row per entity type that has a rag: block:
ensure entity osy.RagConfiguration:
description = "RAG configuration for an entity type"
Entity: osy.EntityMetadata required unique
ContextMode: @RagMode required
FilesMode: @RagMode required
DistillationMode: @RagMode required
RelationshipsMode: @RagMode required
PulseMode: @RagMode required
ChildrenMode: @RagMode required
SummaryInterval: Int32
CoalesceSeconds: Int32
PropagationDepth: Int32
PulseModel: String(100)
StatusProperty: String(200)
choice RagMode:
Auto
Off
The emitter creates/updates this row during compilation. The runtime reads osy.RagConfiguration to decide which ingestion pipelines to activate for each entity type. Entities without a configuration row are inert — no context rows, no background processing.
Interaction with the semantic directive
The existing semantic directive on an entity defines the template used for MetadataSnapshot content. The rag: block’s context = auto activates the embedding of that template output. These are intentionally separate concerns:
semantic= what the natural-language rendering looks like (template authoring)rag: context = auto= whether that rendering gets embedded into the vector store
An entity can have a semantic template without a rag: block — the template is still useful for display purposes (entity cards, search result previews). Conversely, context = auto without an explicit semantic template generates a default rendering from all non-system properties.
Pulse property injection
When pulse = auto is set, the emitter automatically adds three properties to the entity during compilation:
PulseContent: String -- AI-generated summary of entity state
PulseGeneratedAt: DateTime -- When the Pulse was last generated
PulseStaleSince: DateTime -- When the Pulse became stale (null = fresh)
These properties are system-managed — user code can read but not write them directly. The decompiler does not emit these properties in the properties: block; they’re implicit from pulse = auto. When an existing entity gains pulse = auto, the schema evolution engine generates the ALTER TABLE DDL for the new columns automatically.
Embedding provider block
Embeddings need their own provider configuration, parallel to the existing llm: block:
embedding PlatformEmbedding:
provider = OpenAI
model = "text-embedding-3-small"
apiKey = @secret EmbeddingApiKey
dimensions = 1536
embedding GeminiEmbedding:
provider = Google
model = "text-embedding-004"
apiKey = @secret GcpApiKey
dimensions = 768
The rag: block references these by name:
entity Project:
rag:
context = auto
embeddingModel = PlatformEmbedding
pulseModel = CheapLlm
Application-level defaults in the application: block:
application MyApp:
rag:
defaultEmbedding = PlatformEmbedding
defaultPulseModel = CheapLlm
defaultCoalesce = 30s
Per-entity settings override the application defaults. If neither is set, the platform uses a built-in default (the platform operator’s configured embedding provider).
Why a separate embedding: block rather than reusing llm:? Embedding models and chat/completion models are different services with different pricing, rate limits, and capabilities. An app might use OpenAI for embeddings but Anthropic for Pulse generation. The API key for embeddings might differ from the LLM key. And the embedding: block has embedding-specific settings (dimensions) that don’t apply to LLMs.
The Ingestion Pipeline
File normalisation
All file assets are converted to Markdown before chunking. Markdown preserves the structural hierarchy (headers, tables, lists) that LLMs rely on for reasoning.
| Source format | Conversion strategy |
|---|---|
| Extract text with layout preservation. Use LLM-based extraction for complex layouts. | |
| DOCX | Parse XML structure to Markdown. Headings, tables, and lists map directly. |
| XLSX | Each sheet becomes a Markdown table. Sheet name becomes a heading. |
| Images | OCR to Markdown (for scanned documents) or vision model description. |
| Plain text / CSV | Minimal transformation — wrap in Markdown structure. |
Recursive three-tier splitting
Three-tier splitting ensures chunks are semantically coherent:
- Tier 1 — Header split: Split by Markdown headers (
#,##,###). Each section becomes a candidate chunk. - Tier 2 — Paragraph split: If a section exceeds the target size, split by paragraph boundaries (
\n\n). - Tier 3 — Sentence split: If a paragraph still exceeds the target, split by sentence boundaries.
Target chunk size: 512-800 tokens with 15% overlap to ensure no concept is cut at a boundary.
Metadata prepending
Every chunk is injected with its identity before embedding. This gives the vector search implicit filtering power — a query like “risks in Apollo” naturally scores higher on chunks that contain “Apollo” in their identity prefix.
[Project: Apollo] [File: Site_Report.pdf] [Section: 3.2 Risks]
>> "Foundation crack detected in sector 7. Remediation estimated at $45K.
Structural engineer recommends immediate shoring before Phase 2 excavation."
Structured property snapshots
When entity properties change, the platform serialises the entity’s current state using its semantic template and embeds it as a MetadataSnapshot context row:
[Project: Apollo] [Properties]
>> "Project Apollo is in Planning phase. Budget: $2.4M. Timeline: Q3 2026.
Owner: Sarah Chen. Priority: High. 3 open risks, 12 completed tasks."
This snapshot replaces the previous one (upsert by entity + context type), keeping the vector index current without unbounded growth.
Ingestion trigger points
| Event | Action |
|---|---|
| File uploaded to entity | Normalise, chunk, embed, insert FileChunk context rows |
| File deleted | Delete FileChunk rows for that file |
| File re-uploaded | Delete old chunks, re-ingest |
| Entity property changed | Re-generate MetadataSnapshot from semantic template |
| Entity deleted | Delete all EntityContext rows for that entity |
All ingestion runs asynchronously via the platform’s background task queue. The entity remains immediately usable; the semantic shadow catches up within seconds.
Dual-layer ingestion: chunks for precision, distillation for comprehension
Every file is processed in two passes.
Pass 1 — Chunking (no LLM, fast): Split the normalised Markdown into 512-800 token chunks, prepend metadata identity, embed each chunk, store as FileChunk context rows. Pure text processing plus embedding API calls. Cost: embedding only (~$0.0001 per chunk).
Pass 2 — Distillation (cheap LLM, async): Send the full normalised Markdown (or the top chunks if the file exceeds the model’s window) to a small model with a structured prompt:
Summarise this document in 200-300 words. Include:
- What the document is (type, purpose)
- Key facts, numbers, and decisions
- Risks, blockers, or action items if any
- Who is mentioned and their roles
The response is embedded and stored as a single FileDistillation context row.
How retrieval uses both layers:
User query: "what are the risks in the construction project?"
Vector search returns:
1. FileChunk (0.91): "Foundation crack detected in sector 7..."
2. FileDistillation (0.85): "Site inspection report covering structural risks,
soil conditions, and remediation costs..."
3. FileChunk (0.83): "Permit delay expected: 3 additional weeks..."
4. FileDistillation (0.79): "Email thread between PM and county office regarding
permit approval timeline..."
The agent receives precise excerpts (#1, #3) for citation AND document-level context (#2, #4) for reasoning. It can say “the Site Report covers several structural risks” without having retrieved every chunk from that file.
The same pattern applies to MetadataSnapshot. The snapshot is the “distillation” of the entity’s structured state — a natural-language rendering of its properties. No LLM needed for this (the semantic template is deterministic), but the result is embedded in the same vector space, making structured data searchable alongside file content.
Cost model
| Entity scale | Files | Chunks | Distillations | Embedding cost | LLM cost | Total |
|---|---|---|---|---|---|---|
| Small (3 files, 50 pages) | 3 | ~150 | 3 | ~$0.015 | ~$0.003 | ~$0.02 |
| Medium (10 files, 200 pages) | 10 | ~600 | 10 | ~$0.06 | ~$0.01 | ~$0.07 |
| Large (50 files, 1000 pages) | 50 | ~3000 | 50 | ~$0.30 | ~$0.05 | ~$0.35 |
These are one-time ingestion costs. Retrieval (vector search) is free — it’s a PostgreSQL query.
Background embedding job
A persistent background job handles all vector embedding work. It is not specific to any one feature — it is the single worker for all embedding operations across all applications.
What it processes:
osy.EntityContextrows whereEmbedding IS NULL(new rows, model migration, re-chunked content)osy.EntityContextrows whereEmbeddingModel != currentModel(incremental model upgrade if dimensions match)
How it works:
- Polls for unembedded rows:
SELECT * FROM osy.entity_context WHERE embedding IS NULL ORDER BY __created_at LIMIT batch_size - Batches content for the embedding API (most providers accept batch requests)
- Writes embedding + model name back to each row
- Respects rate limits per provider (configurable in the
embedding:block)
Triggered by:
- Entity property change producing a new/updated
MetadataSnapshotrow with NULL embedding - File upload producing new
FileChunkcontext rows with NULL embeddings - File distillation producing a new
FileDistillationrow with NULL embedding - Embedding provider change setting all rows to NULL
- Manual re-embed command setting targeted rows to NULL
The job is a platform-level background task, not per-app — it processes all apps’ embedding queues. Rate limiting and batching are per-provider to respect API quotas.
Vector Dimensions and Model Changes
The Embedding column on osy.EntityContext is created with a fixed dimension matching the app’s configured embedding provider (e.g., vector(1536) for OpenAI text-embedding-3-small). The dimension is derived from the embedding: block’s dimensions setting at schema creation time.
All embedding providers used in an app must share the same dimension. This is a compile-time check — if entity A uses embedding: OpenAI (1536) and entity B uses embedding: Google (768), the compiler reports an error. One app = one vector dimension in osy.EntityContext.
Switching embedding providers is a destructive migration — different models produce vectors in incompatible spaces, so old vectors are useless regardless of dimension:
- The platform detects the
embedding:block references a different model (or different dimensions) - The schema evolution engine generates
ALTER TABLEto resize the vector column - All
EmbeddingandEmbeddingModelvalues set to NULL - The vector index (HNSW/ivfflat) is dropped and recreated with the new dimension
- The background embedding job picks up all rows with NULL embeddings and re-embeds using the new provider
This is the correct trade-off: switching models is rare and intentional. The cost is a full re-embed (minutes to hours depending on row count), but the process is automatic and non-blocking — the app works during re-embedding, just without vector search results for rows that haven’t been processed yet.
Evolution: From Inline Vectors to Consolidated Architecture
The starting point: per-entity-table vectors
The platform’s first approach to semantic search embedded vectors inline on every entity table via three system columns:
__semantic_vector(Vector) — the embedding__semantic_cache(String) — the rendered semantic template text__semantic_manifest(String/Json) — dependency tracking for staleness detection
These columns were added to every entity automatically. The vector was generated synchronously during commit — the semantic manifest builder would render the template, then the embedding provider would generate the embedding, all in the same transaction.
The embedding provider infrastructure was already properly abstracted with an IEmbeddingProvider interface supporting both synchronous (local/fast) and asynchronous (remote, deferred) modes. The semantic manifest builder resolved {{Property}} tokens in templates and tracked dependency hashes for staleness.
What needed to change
Inline per-entity vectors had fundamental limitations:
- No file content — only entity properties were embedded. Documents attached to entities were invisible to vector search.
- No multi-type context — each entity had exactly one vector. You couldn’t search across property snapshots, file chunks, and chat messages in one query.
- No relationship awareness — the vector captured the entity’s own state but knew nothing about its children or parent.
- Scaling — adding a 1536-dimension vector column to every entity table, including those that didn’t need semantic search, added storage and index overhead everywhere.
The consolidation
The unified osy.EntityContext table replaced the per-entity columns. The migration:
- Remove
__semantic_vector,__semantic_cache,__semantic_manifestfrom entity table provisioning - Remove synchronous embedding from the commit path
- Keep the
IEmbeddingProviderinterface — reused unchanged for the background embedding job - Keep the semantic manifest builder’s template resolution — the text rendering logic was correct; only the storage destination changed (entity column to
osy.EntityContextrow)
The existing embedding provider abstraction and template rendering were design assets. The new architecture reused them wholesale, just pointing the output at a different table.
The RAG compiler pipeline
The rag: block follows the same four-stage compiler pipeline as every other DSL block:
- Parser —
rag:block syntax parsed into aRagBlockNodeAST node with directives as key-value pairs - Resolver — validates directive values (property references exist, types are correct,
summaryIntervalis a positive integer,statusPropertyreferences a String or Int32 property) - Emitter — creates/updates
osy.RagConfigurationrows in the metadata database, injects Pulse properties whenpulse = auto - Decompiler — reads
osy.RagConfigurationrows and reconstructs therag:block syntax (round-trip support)
The same pattern applies to the embedding: block. Compilation validates that all embeddingModel references resolve to defined embedding: blocks, and that all embedding: blocks within an application share the same dimension.
The implementation refined through several phases: first consolidating the data model and metadata (the entities, choices, and compiler pipeline), then layering the classification pipeline (paragraph-level LLM classification, classification-aware chunking with a four-pass pipeline, windowed processing for large documents), then the citation system (stable chunk IDs, prompt injection, post-processing extraction, osy.Reference creation), reference graph traversal (BFS across osy.Reference links for cross-entity knowledge), status property aggregation for Pulse context enrichment, and rolling summaries for long conversations.
Part 2 covers Pulse synthesis and agent interactions. Part 3 covers security, citations, and knowledge graphs.