Entity-Anchored RAG

architecture draft Updated May 19, 2026

The Problem: Structured vs. Unstructured Is a False Divide

Traditional RAG systems treat documents as the atomic unit — upload PDFs, chunk them, embed them, search them. The application’s structured data (database fields, relationships, status changes) lives in a completely separate world. An AI agent answering “what are the risks in this project?” has to somehow bridge the gap between a Risk entity in the database and a paragraph in a site inspection PDF.

The platform already knows both sides. It has the entity model (entities, properties, relations, security policies) and it has file attachments. The RAG layer unifies them: every entity instance casts a semantic shadow — a set of vector-embedded text chunks that capture both its structured state and its unstructured attachments, stored in a single table, searchable in a single query, governed by a single security model.

The design principles:

Entity-anchored — every piece of context is owned by a specific entity instance, not floating in a global index
Transactional consistency — vector data lives in the same PostgreSQL database (via pgvector) as the entity data it describes, sharing the same backup/restore boundary
Schema-agnostic — the platform doesn’t know what a “Project” or “Contract” is; the RAG layer works with any entity type defined in the application metadata model
Composable retrieval — structured properties, unstructured file chunks, and metadata snapshots are all searchable in the same vector space

The Context Shadow

Every entity instance casts a semantic shadow — a collection of vector-embedded text chunks stored in osy.EntityContext, a platform-level entity provisioned in every application’s data schema. The key insight is that context comes in different types, each with its own lifecycle and generation strategy.

Context types

Type	Source	Lifecycle
`MetadataSnapshot`	Entity properties serialised via semantic template	Replaced on every property change
`FileChunk`	Chunked text from attached PDF/DOCX/XLSX files	Replaced when file is re-uploaded or deleted
`PropertyChange`	Individual property mutation log	Append-only, trimmed by retention policy
`FileDistillation`	LLM-generated summary of all chunks from a single file	Replaced when file is re-uploaded
`ChatMessage`	Individual message in a conversation entity	Append-only, one row per message
`RollingSummary`	Periodic condensed summary of N child entities	Replaced every N children (configurable)
`Relationship`	Graph-aware context: entity relationships and neighbour summaries	Replaced when relationships change

The EntityContext entity

choice ContextType:
  MetadataSnapshot
  FileChunk
  PropertyChange
  FileDistillation
  ChatMessage
  RollingSummary
  Relationship

ensure entity osy.EntityContext:
  description = "Semantic shadow of entity instances — unified vector store
                 for structured and unstructured context"
  EntityType: osy.EntityMetadata required
  Entity: dynamic reference(EntityType) required
  ContextType: @ContextType required
  Content: String required
  Embedding: Vector
  EmbeddingModel: String(100)

Every context row is anchored to a specific entity instance via a dynamic reference — the EntityType property (an EntityRef to osy.EntityMetadata) tells the runtime which entity type the Entity property points to. No physical FK constraint is needed, but the metadata model maintains a proper typed relationship. This means a single table stores context for all entity types without per-type schema changes.

EmbeddingModel records which model produced the vector, enabling background re-embedding when the provider changes.

File chunks: separating extraction from embedding

File chunks are the text-extraction artefact — a structural fact about a file’s content, independent of the embedding. Separating chunks from context rows means you can re-embed without re-chunking (when switching embedding models) or re-chunk without re-embedding (when tuning chunk size).

ensure entity osy.FileChunk:
  description = "Text chunk extracted from a file asset for RAG ingestion"
  FileAsset: osy.FileAsset required
  ChunkIndex: Int32 required
  Content: String required
  PageNumber: Int32
  SectionHeader: String
  ChunkStrategy: String(50)
  ChunkParams: Json

ChunkStrategy and ChunkParams make chunking reproducible and versionable. When you change the splitting algorithm or tune parameters, a query like ChunkStrategy != 'recursive-v2' finds all chunks that need re-splitting.

When a FileChunk is embedded, the resulting osy.EntityContext row links back to the chunk’s parent entity (not the FileChunk itself) — the chunk is an intermediate artefact, the context row is anchored to the entity that owns the file.

File asset ownership

Files use the same dynamic-reference pattern for polymorphic ownership — any entity type can own files:

entity Project:
  Name: String required
  Documents: collection(osy.FileAsset, Owner)

entity Milestone:
  Name: String required
  Project: Project required
  Documents: collection(osy.FileAsset, Owner)

A file belongs to exactly one entity. Uploading the same PDF to both a Project and a Milestone creates two FileAsset instances, each with its own semantic shadow scoped to its owner. This preserves the entity-anchored principle — each entity’s context is self-contained.

The RAG Block

Every entity type can opt into RAG by declaring a rag: block inside the entity definition. This is where the application builder controls what gets embedded, how context is generated, and how the Pulse (the AI-generated summary) behaves. Entities without a rag: block have no semantic shadow — no context rows, no Pulse, no vector search.

Syntax

The rag: block lives inside the entity definition, alongside properties: and index::

entity Project:
  description "Project with full RAG support"
  semantic "Project {{Name}}: {{Phase}} phase, budget {{Budget}}"

  properties:
    Name: String required
    Budget: Decimal
    Phase: @ProjectPhase
    Documents: collection(osy.FileAsset, Owner)
    Tasks: collection(Task, Project)
    Risks: collection(Risk, Project)

  rag:
    context = auto
    files = auto
    relationships = auto
    pulse = auto
    distillation = auto
    coalesce = 30s

The minimal form — equivalent to “turn on the semantic shadow with defaults”:

entity Contract:
  properties:
    Name: String required

  rag:
    context = auto

Declaring context = auto enables the entity’s semantic shadow. All other settings default to off unless explicitly enabled, except coalesce which defaults to 30s when any RAG feature is active.

Settings

Setting	Values	Default	Description
`context`	`auto` / `off`	required	Core switch. `auto` embeds the entity’s properties as a `MetadataSnapshot` context row using the `semantic` template. If no `semantic` directive exists, the platform generates a default template from all non-system properties.
`files`	`auto` / `off`	`off`	Chunk and embed attached files (via `osy.FileAsset` ownership). Creates `FileChunk` rows and `FileChunk` context rows anchored to this entity.
`distillation`	`auto` / `off`	`off`	Generate LLM summaries of attached files. Creates `FileDistillation` context rows. Requires `files = auto`.
`relationships`	`auto` / `off` / CSV	`off`	Generate `Relationship` context from the entity’s collection properties. `auto` = all collections. CSV = explicit list of collection property paths.
`pulse`	`auto` / `off`	`off`	Enable Pulse synthesis. Adds `PulseContent`, `PulseGeneratedAt`, `PulseStaleSince` properties to the entity.
`children`	`auto` / `off`	`off`	Auto-embed child entities as `ChatMessage` context rows on the parent. Designed for conversation/message patterns where each child is a discrete unit of content.
`summaryInterval`	integer	`50`	When `children = auto`, generate a `RollingSummary` every N child entities.
`coalesce`	duration	`30s`	Staleness coalescing window. Changes within this window produce at most one regeneration.
`propagation`	`none` / `parent` / integer	`parent`	Staleness propagation depth. `parent` = depth 1 (direct parent only). Integer = explicit depth.
`pulseModel`	string	platform default	Which LLM to use for Pulse generation. References an `llm` block by name.
`pulsePrompt`	string	auto-generated	Entity-specific Pulse synthesis prompt.
`pulseLength`	`brief` / `standard` / `detailed` / integer	`standard`	Target Pulse length. `brief` = 1-2 sentences, `standard` = 50-100 words, `detailed` = 150-250 words. Integer = explicit word count.
`pulseTrackedProperties`	CSV	all	Which property changes trigger Pulse staleness. Default: any change. Use to ignore noise like `LastViewedAt`.
`statusProperty`	CSV	none	Which properties to aggregate for Pulse context enrichment. Produces a distribution summary (e.g., “Draft: 5, Review: 12”).
`embeddingModel`	string	platform default	Which embedding provider to use. References an `embedding` block by name.
`chunkSize`	integer	`512`	Target token count per file chunk. Legal/dense docs: 256. Long-form articles: 1000-1500.
`chunkOverlap`	integer	`15`	Overlap percentage between adjacent chunks.
`retrievalLimit`	integer	`10`	Top-K results from vector search for context assembly.
`recencyWindow`	integer	`10`	Last N chat messages included verbatim (not via vector search).
`distillationPrompt`	string	auto-generated	Prompt for LLM file summarisation. Override for domain-specific extraction.
`chatReferences`	bool	`false`	Entity chat agent cites its sources via `osy.Reference` rows.
`pulseReferences`	bool	`false`	Pulse cites its sources via `osy.Reference` rows.

Examples

Project management — full-featured RAG with tuned settings:

entity Project:
  description = "Project with document-backed RAG and Pulse"
  semantic = "Project {{Name}}: {{Phase}} phase, budget {{Budget}}"

  properties:
    Name: type = string length = 200 required = true
    Budget: type = decimal
    Phase: choice ProjectPhase
    LastViewedAt: type = datetime
    Documents: type = collection entityType = osy.FileAsset foreignKey = Owner
    Tasks: type = collection entityType = Task foreignKey = Project
    Risks: type = collection entityType = Risk foreignKey = Project

  rag:
    context = auto
    files = auto
    distillation = auto
    distillationPrompt = "Extract key deliverables, deadlines, budget figures,
                          risk factors, and team responsibilities."
    relationships = "Tasks, Risks, SubProjects, SubProjects.Findings"
    pulse = auto
    pulsePrompt = "Summarise project health: budget vs actual, timeline risks,
                   blocked tasks, and team capacity."
    pulseTrackedProperties = "Budget, Phase, Status"
    statusProperty = Status
    chunkSize = 800
    retrievalLimit = 15
    coalesce = 30s

Chat/conversation — auto-embed child messages with rolling summaries:

entity Conversation:
  semantic "{{Title}}"

  properties:
    Title: String required
    Messages: collection(Message, Conversation)

  rag:
    context = auto
    children = auto
    summaryInterval: 25
    pulse = auto
    coalesce = 60s

entity Message:
  properties:
    Conversation: Conversation required
    Content: String required
    Author: String required

Simple document entity — just files, no relationships or Pulse:

entity Contract:
  semantic "Contract: {{Name}}"

  properties:
    Name: String required
    Documents: collection(osy.FileAsset, Owner)

  rag:
    context = auto
    files = auto
    distillation = auto

High-churn entity — long coalescing window, cheap Pulse model:

entity ActivityLog:
  properties:
    Action: String required
    Timestamp: DateTime required

  rag:
    context = auto
    coalesce = 120s
    pulse = auto
    pulseModel: "haiku"

Metadata model

The rag: block emits to osy.RagConfiguration, a platform entity with one row per entity type that has a rag: block:

ensure entity osy.RagConfiguration:
  description = "RAG configuration for an entity type"
  Entity: osy.EntityMetadata required unique
  ContextMode: @RagMode required
  FilesMode: @RagMode required
  DistillationMode: @RagMode required
  RelationshipsMode: @RagMode required
  PulseMode: @RagMode required
  ChildrenMode: @RagMode required
  SummaryInterval: Int32
  CoalesceSeconds: Int32
  PropagationDepth: Int32
  PulseModel: String(100)
  StatusProperty: String(200)

choice RagMode:
  Auto
  Off

The emitter creates/updates this row during compilation. The runtime reads osy.RagConfiguration to decide which ingestion pipelines to activate for each entity type. Entities without a configuration row are inert — no context rows, no background processing.

Interaction with the `semantic` directive

The existing semantic directive on an entity defines the template used for MetadataSnapshot content. The rag: block’s context = auto activates the embedding of that template output. These are intentionally separate concerns:

semantic = what the natural-language rendering looks like (template authoring)
rag: context = auto = whether that rendering gets embedded into the vector store

An entity can have a semantic template without a rag: block — the template is still useful for display purposes (entity cards, search result previews). Conversely, context = auto without an explicit semantic template generates a default rendering from all non-system properties.

Pulse property injection

When pulse = auto is set, the emitter automatically adds three properties to the entity during compilation:

PulseContent: String      -- AI-generated summary of entity state
PulseGeneratedAt: DateTime -- When the Pulse was last generated
PulseStaleSince: DateTime  -- When the Pulse became stale (null = fresh)

These properties are system-managed — user code can read but not write them directly. The decompiler does not emit these properties in the properties: block; they’re implicit from pulse = auto. When an existing entity gains pulse = auto, the schema evolution engine generates the ALTER TABLE DDL for the new columns automatically.

Embedding provider block

Embeddings need their own provider configuration, parallel to the existing llm: block:

embedding PlatformEmbedding:
  provider = OpenAI
  model = "text-embedding-3-small"
  apiKey = @secret EmbeddingApiKey
  dimensions = 1536

embedding GeminiEmbedding:
  provider = Google
  model = "text-embedding-004"
  apiKey = @secret GcpApiKey
  dimensions = 768

The rag: block references these by name:

entity Project:
  rag:
    context = auto
    embeddingModel = PlatformEmbedding
    pulseModel = CheapLlm

Application-level defaults in the application: block:

application MyApp:
  rag:
    defaultEmbedding = PlatformEmbedding
    defaultPulseModel = CheapLlm
    defaultCoalesce = 30s

Per-entity settings override the application defaults. If neither is set, the platform uses a built-in default (the platform operator’s configured embedding provider).

Why a separate embedding: block rather than reusing llm:? Embedding models and chat/completion models are different services with different pricing, rate limits, and capabilities. An app might use OpenAI for embeddings but Anthropic for Pulse generation. The API key for embeddings might differ from the LLM key. And the embedding: block has embedding-specific settings (dimensions) that don’t apply to LLMs.

The Ingestion Pipeline

File normalisation

All file assets are converted to Markdown before chunking. Markdown preserves the structural hierarchy (headers, tables, lists) that LLMs rely on for reasoning.

Source format	Conversion strategy
PDF	Extract text with layout preservation. Use LLM-based extraction for complex layouts.
DOCX	Parse XML structure to Markdown. Headings, tables, and lists map directly.
XLSX	Each sheet becomes a Markdown table. Sheet name becomes a heading.
Images	OCR to Markdown (for scanned documents) or vision model description.
Plain text / CSV	Minimal transformation — wrap in Markdown structure.

Recursive three-tier splitting

Three-tier splitting ensures chunks are semantically coherent:

Tier 1 — Header split: Split by Markdown headers (#, ##, ###). Each section becomes a candidate chunk.
Tier 2 — Paragraph split: If a section exceeds the target size, split by paragraph boundaries (\n\n).
Tier 3 — Sentence split: If a paragraph still exceeds the target, split by sentence boundaries.

Target chunk size: 512-800 tokens with 15% overlap to ensure no concept is cut at a boundary.

Metadata prepending

Every chunk is injected with its identity before embedding. This gives the vector search implicit filtering power — a query like “risks in Apollo” naturally scores higher on chunks that contain “Apollo” in their identity prefix.

[Project: Apollo] [File: Site_Report.pdf] [Section: 3.2 Risks]
>> "Foundation crack detected in sector 7. Remediation estimated at $45K.
    Structural engineer recommends immediate shoring before Phase 2 excavation."

Structured property snapshots

When entity properties change, the platform serialises the entity’s current state using its semantic template and embeds it as a MetadataSnapshot context row:

[Project: Apollo] [Properties]
>> "Project Apollo is in Planning phase. Budget: $2.4M. Timeline: Q3 2026.
    Owner: Sarah Chen. Priority: High. 3 open risks, 12 completed tasks."

This snapshot replaces the previous one (upsert by entity + context type), keeping the vector index current without unbounded growth.

Ingestion trigger points

Event	Action
File uploaded to entity	Normalise, chunk, embed, insert `FileChunk` context rows
File deleted	Delete `FileChunk` rows for that file
File re-uploaded	Delete old chunks, re-ingest
Entity property changed	Re-generate `MetadataSnapshot` from semantic template
Entity deleted	Delete all `EntityContext` rows for that entity

All ingestion runs asynchronously via the platform’s background task queue. The entity remains immediately usable; the semantic shadow catches up within seconds.

Dual-layer ingestion: chunks for precision, distillation for comprehension

Every file is processed in two passes.

Pass 1 — Chunking (no LLM, fast): Split the normalised Markdown into 512-800 token chunks, prepend metadata identity, embed each chunk, store as FileChunk context rows. Pure text processing plus embedding API calls. Cost: embedding only (~$0.0001 per chunk).

Pass 2 — Distillation (cheap LLM, async): Send the full normalised Markdown (or the top chunks if the file exceeds the model’s window) to a small model with a structured prompt:

Summarise this document in 200-300 words. Include:
- What the document is (type, purpose)
- Key facts, numbers, and decisions
- Risks, blockers, or action items if any
- Who is mentioned and their roles

The response is embedded and stored as a single FileDistillation context row.

How retrieval uses both layers:

User query: "what are the risks in the construction project?"

Vector search returns:
  1. FileChunk       (0.91): "Foundation crack detected in sector 7..."
  2. FileDistillation (0.85): "Site inspection report covering structural risks,
                               soil conditions, and remediation costs..."
  3. FileChunk       (0.83): "Permit delay expected: 3 additional weeks..."
  4. FileDistillation (0.79): "Email thread between PM and county office regarding
                               permit approval timeline..."

The agent receives precise excerpts (#1, #3) for citation AND document-level context (#2, #4) for reasoning. It can say “the Site Report covers several structural risks” without having retrieved every chunk from that file.

The same pattern applies to MetadataSnapshot. The snapshot is the “distillation” of the entity’s structured state — a natural-language rendering of its properties. No LLM needed for this (the semantic template is deterministic), but the result is embedded in the same vector space, making structured data searchable alongside file content.

Cost model

Entity scale	Files	Chunks	Distillations	Embedding cost	LLM cost	Total
Small (3 files, 50 pages)	3	~150	3	~$0.015	~$0.003	~$0.02
Medium (10 files, 200 pages)	10	~600	10	~$0.06	~$0.01	~$0.07
Large (50 files, 1000 pages)	50	~3000	50	~$0.30	~$0.05	~$0.35

These are one-time ingestion costs. Retrieval (vector search) is free — it’s a PostgreSQL query.

Background embedding job

A persistent background job handles all vector embedding work. It is not specific to any one feature — it is the single worker for all embedding operations across all applications.

What it processes:

osy.EntityContext rows where Embedding IS NULL (new rows, model migration, re-chunked content)
osy.EntityContext rows where EmbeddingModel != currentModel (incremental model upgrade if dimensions match)

How it works:

Polls for unembedded rows: SELECT * FROM osy.entity_context WHERE embedding IS NULL ORDER BY __created_at LIMIT batch_size
Batches content for the embedding API (most providers accept batch requests)
Writes embedding + model name back to each row
Respects rate limits per provider (configurable in the embedding: block)

Triggered by:

Entity property change producing a new/updated MetadataSnapshot row with NULL embedding
File upload producing new FileChunk context rows with NULL embeddings
File distillation producing a new FileDistillation row with NULL embedding
Embedding provider change setting all rows to NULL
Manual re-embed command setting targeted rows to NULL

The job is a platform-level background task, not per-app — it processes all apps’ embedding queues. Rate limiting and batching are per-provider to respect API quotas.

Vector Dimensions and Model Changes

The Embedding column on osy.EntityContext is created with a fixed dimension matching the app’s configured embedding provider (e.g., vector(1536) for OpenAI text-embedding-3-small). The dimension is derived from the embedding: block’s dimensions setting at schema creation time.

All embedding providers used in an app must share the same dimension. This is a compile-time check — if entity A uses embedding: OpenAI (1536) and entity B uses embedding: Google (768), the compiler reports an error. One app = one vector dimension in osy.EntityContext.

Switching embedding providers is a destructive migration — different models produce vectors in incompatible spaces, so old vectors are useless regardless of dimension:

The platform detects the embedding: block references a different model (or different dimensions)
The schema evolution engine generates ALTER TABLE to resize the vector column
All Embedding and EmbeddingModel values set to NULL
The vector index (HNSW/ivfflat) is dropped and recreated with the new dimension
The background embedding job picks up all rows with NULL embeddings and re-embeds using the new provider

This is the correct trade-off: switching models is rare and intentional. The cost is a full re-embed (minutes to hours depending on row count), but the process is automatic and non-blocking — the app works during re-embedding, just without vector search results for rows that haven’t been processed yet.

Evolution: From Inline Vectors to Consolidated Architecture

The starting point: per-entity-table vectors

The platform’s first approach to semantic search embedded vectors inline on every entity table via three system columns:

__semantic_vector (Vector) — the embedding
__semantic_cache (String) — the rendered semantic template text
__semantic_manifest (String/Json) — dependency tracking for staleness detection

These columns were added to every entity automatically. The vector was generated synchronously during commit — the semantic manifest builder would render the template, then the embedding provider would generate the embedding, all in the same transaction.

The embedding provider infrastructure was already properly abstracted with an IEmbeddingProvider interface supporting both synchronous (local/fast) and asynchronous (remote, deferred) modes. The semantic manifest builder resolved {{Property}} tokens in templates and tracked dependency hashes for staleness.

What needed to change

Inline per-entity vectors had fundamental limitations:

No file content — only entity properties were embedded. Documents attached to entities were invisible to vector search.
No multi-type context — each entity had exactly one vector. You couldn’t search across property snapshots, file chunks, and chat messages in one query.
No relationship awareness — the vector captured the entity’s own state but knew nothing about its children or parent.
Scaling — adding a 1536-dimension vector column to every entity table, including those that didn’t need semantic search, added storage and index overhead everywhere.

The consolidation

The unified osy.EntityContext table replaced the per-entity columns. The migration:

Remove __semantic_vector, __semantic_cache, __semantic_manifest from entity table provisioning
Remove synchronous embedding from the commit path
Keep the IEmbeddingProvider interface — reused unchanged for the background embedding job
Keep the semantic manifest builder’s template resolution — the text rendering logic was correct; only the storage destination changed (entity column to osy.EntityContext row)

The existing embedding provider abstraction and template rendering were design assets. The new architecture reused them wholesale, just pointing the output at a different table.

The RAG compiler pipeline

The rag: block follows the same four-stage compiler pipeline as every other DSL block:

Parser — rag: block syntax parsed into a RagBlockNode AST node with directives as key-value pairs
Resolver — validates directive values (property references exist, types are correct, summaryInterval is a positive integer, statusProperty references a String or Int32 property)
Emitter — creates/updates osy.RagConfiguration rows in the metadata database, injects Pulse properties when pulse = auto
Decompiler — reads osy.RagConfiguration rows and reconstructs the rag: block syntax (round-trip support)

The same pattern applies to the embedding: block. Compilation validates that all embeddingModel references resolve to defined embedding: blocks, and that all embedding: blocks within an application share the same dimension.

The implementation refined through several phases: first consolidating the data model and metadata (the entities, choices, and compiler pipeline), then layering the classification pipeline (paragraph-level LLM classification, classification-aware chunking with a four-pass pipeline, windowed processing for large documents), then the citation system (stable chunk IDs, prompt injection, post-processing extraction, osy.Reference creation), reference graph traversal (BFS across osy.Reference links for cross-entity knowledge), status property aggregation for Pulse context enrichment, and rolling summaries for long conversations.

Part 2 covers Pulse synthesis and agent interactions. Part 3 covers security, citations, and knowledge graphs.