RAG Part 2

architecture draft Updated May 19, 2026

The Pulse

The Pulse is the primary UX feature — a state-aware, auto-refreshing synthesis that tells the user “here’s what’s going on with this entity right now.” It combines structured data, unstructured documents, and recent changes into a single narrative. Zero cost at view time — it’s pre-generated and cached on the entity.

The synthesis loop

┌─────────────┐     ┌──────────────┐     ┌────────────────┐     ┌──────────┐
│  Trigger     │────>│  Retrieval   │────>│  Prompt        │────>│  Cache   │
│  (stale?)    │     │  (vectors)   │     │  Assembly      │     │  (store) │
└─────────────┘     └──────────────┘     └────────────────┘     └──────────┘

Step 1 — Trigger. An event (property change, file upload, manual refresh) flags the entity’s Pulse as stale. The staleness flag is a lightweight field on the entity (PulseStaleSince: DateTime?).

Step 2 — Retrieval. The system performs a constrained vector search — query osy.EntityContext filtered to the target entity, ordered by cosine similarity to the query embedding (derived from the semantic template + “summarise the current state”), limited to the top-K results.

Step 3 — Prompt assembly. The prompt is compositional, built from layers:

Layer	Source	Example
System instructions	Platform base persona	”You are an assistant for the Osyrin platform. Be concise, cite sources.”
Semantic template	Entity-type-specific “what to look for"	"Identify budget risks, timeline blockers, and recent activity.”
Structured context	Serialised JSON of entity properties	`{ "Name": "Apollo", "Budget": 2400000, "Phase": "Planning" }`
Unstructured context	Top-K relevant chunks from vector search	File excerpts, previous summaries, change logs
User question (optional)	If Pulse is triggered by a user query	”What are the risks?”

Step 4 — Generation. The LLM generates the Pulse with citations:

**Project Apollo — Pulse**

The project is in **Planning** phase with a $2.4M budget. Three risks
are currently open:

1. **Foundation crack in sector 7** — remediation estimated at $45K.
   Structural engineer recommends immediate shoring before Phase 2.
   [Site_Report.pdf, p.12]

2. **Permit delay** — county review expected to take 3 additional weeks.
   [Email_Thread_Permits.pdf, p.2]

3. **Subcontractor availability** — electrical crew not confirmed for Q3.
   [Vendor_Status.xlsx, Sheet: Electrical]

*Last updated: 2 minutes ago*

Step 5 — Cache. The generated Pulse is stored on the entity (PulseContent, PulseGeneratedAt). Subsequent reads serve the cached version until the staleness flag is set again.

Staleness strategy

Event	Behaviour
Property change on entity	Immediate: set `PulseStaleSince = now()`
File uploaded/deleted	Deferred: set stale after ingestion pipeline completes
Manual refresh (user clicks “Refresh”)	Immediate: bypass cache, regenerate
Time-based	Optional: auto-refresh if Pulse is older than configurable threshold (e.g., 24h)
Dependent entity change	Optional: if a child entity changes, mark parent stale (e.g., Task closed → Project Pulse stale)

Cost control

Four mechanisms keep Pulse generation economically viable:

Token budget. Cap the unstructured context window at ~4K tokens, leaving room for structured context and the system prompt within an 8K model window.

Debounce and coalescing. Multiple property changes within the coalescing window produce exactly one regeneration. The coalescing window is configurable per entity type — high-churn entities might use 120 seconds, low-churn entities might use 10 seconds. The default of 30 seconds balances responsiveness with cost.

Tiered models. Use a fast/cheap model (Haiku/Flash) for routine Pulse updates, reserve Sonnet/Pro for user-initiated deep queries:

entity ActivityLog:
  properties:
    Action: String required
    Timestamp: DateTime required

  rag:
    context = auto
    coalesce = 120s
    pulse = auto
    pulseModel: "haiku"

Skip if unchanged (retrieval-set fingerprinting). Before calling the LLM, hash the actual retrieval result set and compare it to the hash stored with the cached Pulse. If the hashes match, the Pulse is still valid — serve it without regeneration.

The key subtlety: the hash must be computed on the retrieval result set (the specific context row IDs and their content), not on a derived input like “structured properties + top-K query.” The top-K result set can change without any ingestion event on this entity — for example, a sibling entity’s chunk gets re-embedded and now ranks higher in the similarity search, displacing a previously-included chunk. Hashing the query inputs would miss this shift and serve a stale Pulse. Hashing the actual result set catches it. The cost of the extra vector query is negligible compared to the LLM call it potentially skips.

Per-entity Pulse prompts

The default Pulse prompt is generic: “Summarise the current state of this entity, including risks and recent changes.” This works for project management but is wrong for a social recipe app or a customer support queue.

The pulsePrompt setting lets the app builder define what the Pulse should focus on per entity type:

entity Project:
  rag:
    context = auto
    pulse = auto
    pulsePrompt = "Summarise project health: budget vs actual, timeline risks,
                   blocked tasks, and team capacity. Highlight anything that
                   needs executive attention."

entity Recipe:
  rag:
    context = auto
    pulse = auto
    pulsePrompt = "Create an engaging summary of this recipe: key ingredients,
                   difficulty level, what makes it special, and recent community
                   comments or tips."

entity SupportTicket:
  rag:
    context = auto
    pulse = auto
    pulsePrompt = "Summarise ticket status: customer sentiment, resolution
                   attempts so far, escalation risk, and suggested next action."

If omitted, the platform generates a generic prompt from the entity’s semantic template and property types.

Pulse property injection

When pulse = auto is set, the compiler automatically adds three properties to the entity:

PulseContent: String     — AI-generated summary of entity state
PulseGeneratedAt: DateTime — When the Pulse was last generated
PulseStaleSince: DateTime  — When the Pulse became stale (null = fresh)

These are system-managed — user code can read but not write them directly. The decompiler does NOT emit these properties in the properties: block — they’re implicit from pulse = auto. This keeps the source clean and the properties clearly platform-managed.

When an existing entity gains pulse = auto (e.g., upgrading from a version without Pulse), the schema evolution engine generates ALTER TABLE ADD COLUMN DDL for the three new columns. No migration script needed — it’s the same path as adding any new property.

The `statusProperty` directive

When Pulse generates a summary, it often needs to produce an aggregate breakdown like “5 orders in Draft, 12 in Review, 3 in Shipped.” This requires knowing which property represents the entity’s lifecycle state.

The statusProperty directive declares which property to aggregate:

entity Order:
  Name: String(200)
  Status: String(50) choice = OrderStatus
  Total: Decimal
  rag:
    context = auto
    pulse = auto
    statusProperty = Status

When set, the Pulse context assembler runs a GROUP BY aggregate on the status property and prepends the result to the context before calling the LLM:

## Status Distribution
Draft: 5
Pending Review: 12
Approved: 8
Shipped: 3
Cancelled: 1

This gives the LLM concrete numbers to work with instead of guessing from individual entity snapshots.

Multiple status properties are supported as a CSV — some entities have more than one dimension worth aggregating:

  rag:
    statusProperty = "Status, Priority"

Each produces its own distribution section. The statusProperty (context enrichment) is complementary to pulseTrackedProperties (staleness trigger) — a property can be in both lists.

Entity Chat

Every entity with a rag: block gets a chat surface. One question about a contract’s termination clauses is the start of a conversation — the user might follow up with “how does this relate to the draft we’re working on?” and the history is preserved. Chat is the universal interaction pattern.

Chat sessions and messages

The platform provisions two entities for entity chat:

# Platform-provisioned, not developer-authored:

ensure entity osy.ChatSession:
  description = "A conversation session on an entity — like a chat thread"
  EntityType: osy.EntityMetadata required
  EntityId: Guid required
  User: osy.User required
  Agent: String(200)              # agent block name (e.g., "LegalAssistant")
  Title: String(200)              # auto-generated from first message, editable
  IsPinned: Bool default = false
  IsArchived: Bool default = false
  LastMessageAt: DateTime

ensure entity osy.ChatMessage:
  description = "A single message in a chat session"
  Session: osy.ChatSession required
  Content: String required
  Role: String(20)                # User, Assistant
  Timestamp: DateTime default = "NOW"

Session lifecycle

The UX is familiar — like ChatGPT’s sidebar:

Open entity — chat panel shows your sessions for this entity. Most recent active session auto-selected.
New conversation — creates a fresh osy.ChatSession. Clean context — the agent doesn’t see previous session messages in its recency window. Previous sessions’ messages are still in the vector index, so genuinely relevant context can still surface via similarity search.
Continue — select any previous session, pick up where you left off. Full history loaded.
Pin — keeps important sessions at the top. “The one where we found the liability issue” stays accessible.
Archive — hides the session from the default list. Still searchable, still in the vector index, just decluttered.
Session title — auto-generated from the first message (“Termination clause analysis”), editable by the user.

Visibility modes

Each user sees only their own sessions by default. Configurable per entity type:

entity Contract:
  rag:
    context = auto
    chatVisibility = private      # private (default), shared, role-gated

private — each user has their own sessions per entity. “What are the risks?” stays between you and the AI.
shared — all users with entity access see the same sessions. Good for team collaboration: “look at what the AI found in session ‘Risk Analysis’.”
role-gated — visible to users with a specific role (e.g., Manager can see all team members’ sessions for oversight).

The context assembly algorithm

Every RAG architecture document (including the internal ones) shows a diagram like “system prompt + entity context + retrieved chunks + history -> LLM.” The hard part isn’t the diagram — it’s the concrete decisions inside it: how much of each source, what happens when the budget overflows, how to keep the system prompt influential across a 30-turn conversation, and what to do when the entity changes mid-conversation.

Every turn is a fresh assembly

The LLM API is stateless. Every turn, the full context is rebuilt from scratch. There is no accumulated state carried over from the previous turn — only the message history (stored in osy.ChatMessage) and the entity’s current state. This matters because:

The Pulse may have regenerated since turn 1 (a colleague edited the entity, a file was uploaded)
The entity properties may have changed (the agent itself may have modified them via tool calls)
The vector index may have new content (the user’s previous messages are now embedded and searchable)
The security context is always the current user’s current roles (not cached from turn 1)

Token budget allocation

Target budget: 12,000 input tokens for Sonnet. The allocation is priority-ordered — higher-priority layers are filled first, lower-priority layers get what’s left.

Total budget: 12,000 tokens
├── [1] System prompt + agent instructions         500 tokens (fixed)
├── [2] Entity Pulse (cached summary)              200-400 tokens (fixed)
├── [3] Structured properties (MetadataSnapshot)   200-500 tokens (fixed)
├── [4] Session history (last N messages)           3,000-5,000 tokens (variable)
├── [5] Retrieved context (vector search)           3,000-5,000 tokens (variable)
└── [6] Cross-session / cross-entity retrieval      1,000-2,000 tokens (remainder)

Priority [1-3] are non-negotiable — they’re always included in full. Together they consume ~1,000-1,400 tokens. This is the “briefing” that anchors every turn: who the agent is, what the entity looks like, and what the Pulse summarises.

Priority [4] session history is the conversation itself. It grows with each turn:

Session messages	Strategy
10 or fewer	Include all verbatim
11-20 messages	Include last 10 verbatim, summarise earlier messages into a ~500-token recap
More than 20	Include last 10 verbatim, rolling summary of older messages, vector search for relevant older messages

The summary is generated by a cheap model (Haiku/Flash) when the session crosses the threshold — not on every turn. It’s cached and refreshed every N messages.

Priority [5] retrieved context — the per-turn vector search. The user’s current message is embedded and used to search osy.EntityContext for this entity. This is where file chunks, relationship context, and property change history surface. Budget: whatever remains after [1-4], minimum 2,000 tokens.

The retrieved chunks are deduplicated against the Pulse — if a chunk’s content is substantially covered by the Pulse summary, it’s deprioritised. The Pulse already told the agent about it; the raw chunk adds precision only if the user is asking about that specific detail.

Priority [6] cross-session/cross-entity — searched only when budget allows and the user’s question suggests it (e.g., “compare this to…” or “what did we discuss about…”). This uses the remaining budget after [1-5].

Two-path retrieval: static context and dynamic search

The context assembly uses two distinct retrieval paths:

Path 1 — Pre-call static context (entity-scoped, cached). Before calling the LLM, the system assembles the entity’s “briefing” — its MetadataSnapshot, recent changes, file summaries, relationship context. This search uses the entity’s own embedding, not the user’s query. It’s the same for every turn on this entity, which means it benefits from prompt caching.

Path 2 — Tool-based dynamic search (query-driven, on-demand). When the LLM needs more specific information, it calls search_knowledge with a query derived from the user’s question. This is where the actual answer comes from — the embedding of “liability clause indemnification cap” finds the specific chunks.

Why two paths instead of pre-embedding the user’s query? This was a deliberate design decision. The alternative — embedding the user’s message and pre-fetching relevant chunks into the prompt assembly — was considered and rejected:

Breaks prompt caching. The RAG context segment (4-8K tokens) is cached across turns at 1/10th cost. Making it query-dependent means it changes every turn — roughly 4.5x more expensive for the RAG segment over a 20-turn conversation.
The LLM crafts better queries. When the LLM decides to call search_knowledge, it formulates a targeted query (“liability clause indemnification cap”) rather than using the raw user message (“what about that thing we discussed”). The raw message is often a poor embedding query.
The latency cost is acceptable. The tool call adds ~1-2 seconds but only when the LLM decides it needs more context. Many turns don’t need a search at all — the static entity context plus conversation history is sufficient.
The tool path handles the exact case where pre-fetch would help. When the user’s question is specific and the static context has nothing relevant, the LLM calls search_knowledge anyway.

The two-path design (cached static context + on-demand tool search) is the right trade-off for cost, quality, and latency.

System prompt reinforcement

The system prompt is always position [1] — first in the context, every turn. But LLM attention degrades over long contexts. In a 30-turn conversation with 10K tokens of history, the system prompt’s behavioural instructions can lose influence.

Reinforcement strategy:

System prompt is always first. Non-negotiable. Sent as the API’s system parameter.
Behavioural reminders injected every N turns. Every 5 turns (configurable), a short reinforcement message is injected:

[System reminder — turn 15]
Remember: cite sources using [ref:N] format. Use tools when the user asks you to
take action. Stay grounded in the retrieved context — don't fabricate information
not present in the provided chunks.

This is ~50 tokens. Cheap insurance against prompt drift.

Entity context refresh. The Pulse and MetadataSnapshot are re-read on every turn (not cached from turn 1). If the entity changed mid-conversation, the agent sees the current state.
Tool instructions re-injected with tool definitions. Tool-specific behavioural instructions travel with the tool definitions, not in the system prompt. This ensures they’re always “close” to the tool call in the context.

Rolling summary budget reallocation

When the conversation crosses the rolling summary threshold, older messages are summarized and the raw messages are dropped from the prompt. This frees budget that flows to retrieved chunks.

Before summary (turn 15, 15 messages in history):

System prompt (cached)                  500 tokens
RAG context: Pulse + snapshot (cached)  800 tokens
Tools (cached)                          1,500 tokens
History: messages 1-15 verbatim         4,200 tokens   <-- raw history
Retrieved chunks                        3,000 tokens   <-- constrained
Current user message                    200 tokens
                                        ─────────────
                                        10,200 tokens

After summary (turn 16, summary replaces messages 1-10):

System prompt (cached)                  500 tokens
RAG context: Pulse + snapshot (cached)  800 tokens
Tools (cached)                          1,500 tokens
Summary of messages 1-10                500 tokens     <-- compressed
History: messages 11-16 verbatim        1,800 tokens
Retrieved chunks                        5,100 tokens   <-- budget freed!
Current user message                    200 tokens
                                        ─────────────
                                        10,400 tokens

The freed ~2,400 tokens flow to retrieved chunks — which matters because by turn 16 the user is asking specific questions (“what about the liability clause?”) that need targeted retrieval, not verbatim recall of message 3 where they said “hi, let’s look at this deal.”

The literal flow: “What about the liability clause?”

Here’s exactly what happens when a user is on turn 8 of a chat with the LegalReviewer agent on an InvestmentOpportunity entity, and they type “what about the liability clause?”:

User types: "what about the liability clause?"
                    │
                    ▼
┌─────────────────────────────────────────────────────────────┐
│ 1. STORE MESSAGE                                            │
│    Create ChatMessage(session, "what about the liability    │
│    clause?", role=User)                                     │
└────────────────────────┬────────────────────────────────────┘
                         │
                         ▼
┌─────────────────────────────────────────────────────────────┐
│ 2. ASSEMBLE STATIC CONTEXT (pre-call, entity-scoped)        │
│    Read from osy.EntityContext:                              │
│    ├── Tier 1: MetadataSnapshot (entity's semantic template) │
│    ├── Tier 2: Delta (changes since last Pulse)              │
│    ├── Tier 3: File distillations (document summaries)       │
│    ├── Tier 4: Relationship context                          │
│    ├── Tier 5: Relevant chunks (vector search using the      │
│    │           entity's OWN embedding, not the user's query) │
│    └── Tier 6: Previous Pulse content                        │
│                                                              │
│    This is the ENTITY's context — what the entity looks like │
│    right now. It doesn't know about "liability" yet.         │
│    → appended to system prompt as "## Entity Context (RAG)"  │
└────────────────────────┬────────────────────────────────────┘
                         │
                         ▼
┌─────────────────────────────────────────────────────────────┐
│ 3. ASSEMBLE HISTORY                                         │
│    Load ChatMessages for this session, ordered by timestamp  │
│    ├── ≤10 messages: include all verbatim                    │
│    ├── 11-20: summary of older + last 10 verbatim            │
│    └── >20: summary + last 10 + vector search for relevant   │
│                                                              │
│    Turn 8 → all 8 messages included verbatim (under limit)   │
└────────────────────────┬────────────────────────────────────┘
                         │
                         ▼
┌─────────────────────────────────────────────────────────────┐
│ 4. BUILD PROMPT (with cache stability segments)              │
│    ┌──────────────────────────────────────┐                  │
│    │ Segment 1 (AgentLevel, cached):      │                  │
│    │ "You are a legal analyst. Focus on   │                  │
│    │  obligations, liabilities..."        │                  │
│    ├──────────────────────────────────────┤                  │
│    │ Segment 2 (EntityLevel, cached):     │                  │
│    │ "## Entity Context (RAG)             │                  │
│    │  Project Apollo... 40 documents...   │                  │
│    │  Key risks: revenue inconsistency..."│                  │
│    ├──────────────────────────────────────┤                  │
│    │ Tools (cached after last tool):      │                  │
│    │  search_knowledge, CreateFinding,    │                  │
│    │  UpdateFindingStatus, add_reference  │                  │
│    ├──────────────────────────────────────┤                  │
│    │ Messages (not cached — grows):       │                  │
│    │  [user: "start review of contracts"] │                  │
│    │  [assistant: "I'll begin with..."]   │                  │
│    │  ... turns 1-7 ...                   │                  │
│    │  [user: "what about the liability    │                  │
│    │          clause?"]                   │                  │
│    └──────────────────────────────────────┘                  │
└────────────────────────┬────────────────────────────────────┘
                         │
                         ▼
┌─────────────────────────────────────────────────────────────┐
│ 5. LLM CALL (with prompt caching)                            │
│    Cache hit on segments 1+2+tools from turn 7 (~80% cached) │
│    LLM sees the full prompt and decides:                     │
│                                                              │
│    "The static RAG context mentions contract review but      │
│     doesn't have specific liability clause details.          │
│     I need to search for that."                              │
│                                                              │
│    → LLM returns: tool_use: search_knowledge({              │
│        query: "liability clause indemnification cap",        │
│        entity_type: "Document",                              │
│        entity_id: "<opportunity-id>"                         │
│      })                                                      │
└────────────────────────┬────────────────────────────────────┘
                         │
                         ▼
┌─────────────────────────────────────────────────────────────┐
│ 6. TOOL EXECUTION: search_knowledge                          │
│    a. Embed the query "liability clause indemnification cap"  │
│    b. Hybrid search across osy.EntityContext:                │
│       - Vector branch: cosine similarity against embeddings  │
│       - Keyword branch: tsvector @@ 'liability & clause'    │
│       - RRF merge (k=60)                                     │
│       - Scoped to the opportunity's entity graph             │
│       - Classification filter applied                        │
│    c. Returns top-10 chunks ranked by relevance:             │
│       [0.91] "§8.1 Liability Cap: aggregate liability        │
│               shall not exceed 2x annual fees..." (p.12)     │
│       [0.87] "Indemnification: mutual indemnification for    │
│               third-party IP claims..." (p.14)               │
│       [0.83] "§12 Limitation of Liability: neither party     │
│               shall be liable for indirect damages..." (p.8) │
│                                                              │
│    → Tool result injected into conversation as tool_result   │
└────────────────────────┬────────────────────────────────────┘
                         │
                         ▼
┌─────────────────────────────────────────────────────────────┐
│ 7. LLM CONTINUES (with tool result in context)               │
│    LLM now has the specific liability chunks and generates:  │
│                                                              │
│    "The Master Services Agreement has a liability cap in     │
│     §8.1 — aggregate liability is capped at 2x annual fees, │
│     which at current $2M/year means $4M maximum exposure.    │
│     There's also a mutual indemnification clause for IP      │
│     claims.                                                  │
│                                                              │
│     Notable: §12 excludes indirect/consequential damages,    │
│     which is standard, but the carve-out for data breach     │
│     liability is missing — this is unusual for a SaaS        │
│     agreement. I'll file this as a finding."                 │
│                                                              │
│    → tool_use: CreateFinding({                               │
│        kind: "Risk",                                         │
│        title: "Missing data breach liability carve-out",     │
│        body: "§12 excludes consequential damages but does    │
│               not carve out data breach liability...",        │
│        severity: "Medium"                                    │
│      })                                                      │
└────────────────────────┬────────────────────────────────────┘
                         │
                         ▼
┌─────────────────────────────────────────────────────────────┐
│ 8. POST-PROCESSING                                           │
│    a. Store assistant message as ChatMessage                 │
│    b. Extract citations → validate against context manifest  │
│       → create osy.Reference rows                            │
│    c. Queue embedding of user + assistant messages            │
│       (async, coalesced — available for future turns)        │
│    d. Finding created → triggers Pulse staleness on          │
│       the entity (new finding = Pulse needs refresh)         │
└─────────────────────────────────────────────────────────────┘

Embedding pipeline consistency

New messages are embedded asynchronously after the turn completes. This creates a consistency gap:

Turn 1: User asks about "liability" → message stored → queued for embedding
Turn 2: User asks about "risks" → vector search for "risks"
         → Turn 1's "liability" message is NOT in the vector index yet
         → But it IS in the recency window (last 10 messages, loaded directly)

The recency window (loading last N messages directly from osy.ChatMessage, not via vector search) covers this gap. Messages within the window are always available regardless of embedding pipeline latency.

The gap only matters for messages older than the recency window that haven’t been embedded yet — which only happens if the embedding pipeline is severely backlogged (minutes behind). Under normal operation (embedding latency < 30 seconds), the gap is invisible because the user is unlikely to have 10+ turns in 30 seconds.

Cost per conversation

With prompt caching, ~60-65% of input tokens are cached at 1/10th cost on turns 2+:

Scenario	Turns	Avg input/turn	Tool calls	Cost without caching	Cost with caching	Savings
Quick Q&A	3	8K	0	~$0.09	~$0.04	55%
Document review	10	12K	3	~$0.45	~$0.18	60%
Deep analysis with actions	20	14K	8	~$1.10	~$0.40	64%
Long investigation	40	16K	12	~$2.50	~$0.85	66%

The 40-turn investigation at $0.85 replaces hours of manual document review. The deeper the conversation, the better the cache hit rate — rolling summaries compress old history while the stable entity context stays cached.

What makes this hard (honest assessment)

Token budget tuning. The fixed allocations above are a starting point. Real conversations will reveal that some entity types need more chunk budget (document-heavy), some need more history (multi-step analysis), and some need almost no retrieval (simple data entities). The budget allocation should eventually be tunable per entity type in the rag: block.
Retrieval quality. Vector search is only as good as the embeddings and the query. A user asking “what about that thing we discussed last week” is a terrible vector search query — it has no semantic content. The system might need to reformulate vague queries using conversation context before searching.
Rolling summary quality. The summary of older messages is lossy by design. If the summary misses a detail and the user asks about it, the vector search is the safety net — but the query needs to be specific enough to surface it.
Tool call cascades. An agent that calls 5 tools per turn is expensive and slow. The system prompt should guide the agent toward fewer, more targeted tool calls. But this is prompt engineering, not infrastructure.
Citation accuracy. The LLM might hallucinate a citation reference or cite the wrong chunk. Post-processing can validate that a citation maps to an actual context chunk that was in the prompt, but it can’t verify that the citation accurately represents the chunk’s content.

Multi-Agent Pages

A page declares which agents are available. The platform renders them as selectable personas — the user picks which expert to talk to.

page ContractDetail:
  type = Instance
  entity = Contract
  view = ContractDetailView
  agents:
    LegalAssistant
    ComplianceReviewer
    CommercialAnalyst

Each agent has identity properties for display:

agent LegalAssistant using MainLlm:
  name = "Legal Advisor"
  icon = "Scale"
  color = "blue"
  purpose = "Legal contract analysis"
  systemPrompt = "You are a legal analyst. Focus on obligations, liabilities,
                  termination clauses, and compliance risks. Always cite the
                  specific clause and page number."
  tools:
    function = search_knowledge
    function = CreateRisk
    function = FlagClause

agent ComplianceReviewer using MainLlm:
  name = "Compliance Check"
  icon = "ShieldCheck"
  color = "green"
  purpose = "Regulatory compliance review"
  systemPrompt = "You are a compliance officer. Check for regulatory requirements,
                  data protection obligations, and industry-specific compliance gaps."
  tools:
    function = search_knowledge
    function = CreateComplianceFinding

agent CommercialAnalyst using CheapLlm:
  name = "Commercial Advisor"
  icon = "TrendingUp"
  color = "orange"
  purpose = "Commercial terms analysis"
  systemPrompt = "You are a commercial analyst. Evaluate pricing terms, payment
                  conditions, volume commitments, and competitive positioning."
  tools:
    function = search_knowledge

Behaviour:

Each agent gets its own chat sessions on the entity — switching from Legal to Compliance starts a separate thread.
All agents share the same entity RAG context (same documents, same properties, same relationships).
Session list shows which agent each session belongs to (icon + color badge).
A page with no agents: block uses the entity’s chatAgent (from rag: block) or the app’s default agent.

Agent identity directives:

Directive	Type	Description
`name`	string	Display name for the agent (“Legal Advisor”). Falls back to block name if omitted.
`icon`	string	Icon identifier from the platform icon set
`color`	string	Theme color for avatar/badge
`runAs`	`user` / `agent`	Security context. `user` (default) = runs as the calling user. `agent` = runs autonomously with its own role.
`role`	string	Role the agent runs as when `runAs = agent`. Required when `runAs = agent`, compile error otherwise.

Extracting Insights

The most powerful pattern: the user chats about a contract, the agent identifies risks, and the user says “add those to the risk log.” The agent uses a tool to create new entities with references back to the source.

User: "What are the risks in this contract?"
Agent: "I found 3 key risks:
  1. Termination clause allows 30-day notice without cause [MasterAgreement.pdf, §12.3]
  2. Liability cap is below industry standard [MasterAgreement.pdf, §8.1]
  3. IP assignment clause is ambiguous [MasterAgreement.pdf, §15.2]"

User: "Add those to the risk log with the references"
Agent: [calls CreateRisk 3 times]
  → Risk "Early termination exposure" (source: MasterAgreement.pdf §12.3, Contract: Acme MSA)
  → Risk "Below-standard liability cap" (source: MasterAgreement.pdf §8.1, Contract: Acme MSA)
  → Risk "Ambiguous IP assignment" (source: MasterAgreement.pdf §15.2, Contract: Acme MSA)
"Done — 3 risks added to the risk log, each linked to this contract and the specific clause."

The tools are regular Osyrin functions that the agent can call:

entity Risk:
  properties:
    Title: String(200) required
    Description: String
    Severity: choice RiskSeverity
    Contract: Contract
  rag:
    context = auto
    pulse = auto
    pulsePrompt = "Summarise this risk: what it is, where it comes from,
                   current mitigation status, and recommended action."

function CreateRisk:
  input = RiskInput
  rootType = Risk
  logic:
    risk = create Risk:
      Title = input.Title
      Description = input.Description
      Contract = input.Contract
      Severity = input.Severity
    commit

The insight extraction loop

Entity (with RAG)
    → Chat
        → Agent identifies insight
            → Tool creates new entity (with RAG)
                → New entity is searchable and has its own Pulse

Each extracted entity carries the chain: which entity it came from, which document, which clause. The RAG layer on the new entity makes it discoverable in future searches. Knowledge compounds.

The created Risk entities get their own RAG context — they’re searchable, they have a Pulse, and they reference back to the source contract via osy.Reference. When someone later asks about risks across all contracts, these show up in cross-entity search with full provenance — including the exact clause and page the risk was extracted from.

Cross-Entity Search

Beyond per-entity Pulse and chat, the same vector infrastructure supports cross-entity search — “find everything related to permit delays across all projects.” A single vector query on osy.EntityContext, optionally filtered by entity type, returns results ranked by cosine similarity across all entities and context types.

This powers:

Global search: Natural language queries across all entities and documents
Agent tool: search_knowledge(query, entity_type?, entity_id?) as a tool
Related entities: “Show me entities related to this one” via vector proximity
Duplicate detection: Find near-duplicate records by comparing metadata snapshot embeddings

The display challenge

Retrieval across entity types is trivial — one query, one table, one index. The hard part is presenting results meaningfully when each row could be a Project, an Invoice, a Chat message, or a PDF chunk.

Each result carries entity type and entity ID, which gives us:

Entity metadata — from the application metadata model we know the entity’s properties, display name format, icon, and color
Context type — tells us whether this is a property snapshot, file chunk, chat message, or relationship summary
The content itself — the human-readable text that matched

A search result card adapts based on what it found:

┌─────────────────────────────────────────────────────┐
│ Project: Apollo                            0.92 match│
│ [metadata_snapshot]                                  │
│ "Planning phase, $2.4M budget, 3 open risks..."    │
│                                    → Open Project    │
├─────────────────────────────────────────────────────┤
│ Site_Report.pdf → Project: Apollo          0.88 match│
│ [file_chunk, p.12]                                   │
│ "Foundation crack detected in sector 7..."          │
│                              → Open PDF at page 12   │
├─────────────────────────────────────────────────────┤
│ Ticket: TK-4021                           0.81 match │
│ [metadata_snapshot]                                  │
│ "Customer reports incorrect tax rate on invoice..." │
│                                    → Open Ticket     │
├─────────────────────────────────────────────────────┤
│ Conversation: API Design Review            0.78 match│
│ [chat_message]                                       │
│ "We decided to use the new tax calculation engine..." │
│                              → Open at message       │
└─────────────────────────────────────────────────────┘

Rendering strategy

Component	Source
Icon + colour	EntityMetadata (from application metadata model)
Entity display name	Semantic template rendered for this entity
Context type badge	Context type field on the result
Content preview	Content field, truncated to ~2 lines
Source attribution	Metadata JSON (file name, page, section header)
Action link	Entity type → default view page, optionally deep-linked to file/page/message

The client doesn’t need per-entity-type rendering code. It reads the entity metadata from the model (icon, colour, display name template), renders the content preview from the context row, and links to the entity’s default view. The platform’s existing view resolution handles the rest — the search result is just a navigation entry point.

For the agent, cross-entity results are even simpler — it receives the content and metadata as text and reasons over it directly. No rendering needed.

Long-Running Chat Memory

A Conversation is just an entity. Messages are child entities. The RAG layer gives every conversation infinite memory without the app builder configuring anything.

How it works

Each message gets embedded as a ChatMessage context row on the Conversation entity. The agent’s context window holds the last N messages directly (recency window), while older messages are retrieved via vector similarity.

┌─────────────────────────────────────────────────────┐
│ Agent context window                                │
│                                                     │
│  System prompt                                      │
│  Conversation Pulse (rolling summary)               │
│  Top-K RAG results from older messages              │
│  ─── recency boundary ───                           │
│  Last 10 messages (verbatim)                        │
│  Current user message                               │
└─────────────────────────────────────────────────────┘

Rolling summaries as navigational anchors

Every N messages (configurable per entity type, default 50), the platform generates a condensed summary and embeds it as a RollingSummary context row. These summaries act as navigational anchors — the agent can find “we discussed the budget in messages 50-100” without retrieving 50 individual chunks.

entity Conversation:
  semantic "{{Title}} with {{Participants}}"

  properties:
    Title: String required
    Messages: collection(Message, Conversation)

  rag:
    context = auto
    children = auto
    summaryInterval: 50
    pulse = auto
    coalesce = 60s

The children = auto setting tells the platform to automatically embed child entities (Messages) as ChatMessage context rows on the Conversation. The summaryInterval controls how often rolling summaries are generated.

Summaries are generated incrementally — when a new summary is needed, the system reads the existing summary content and passes it to the LLM with an “update this summary” prompt, so older conversation context is preserved cumulatively rather than lost.

Memory lifecycle

Event	Action
New message created	Embed as `ChatMessage` → append to entity context
Every N messages	Generate `RollingSummary` → embed and store
Agent turn	Retrieve: last N messages (verbatim) + top-K older messages (by similarity) + latest rolling summary
Conversation archived	Context rows retained (searchable) but excluded from Pulse generation
Conversation deleted	All entity context rows deleted with the entity

Why this beats sliding windows

Traditional chat tools truncate history (sliding window) or naively stuff everything (context overflow). The semantic shadow approach gives the agent:

Precision: retrieves the 5 most relevant prior messages, not the last 50 irrelevant ones
Infinite history: a 10,000-message conversation works exactly like a 10-message one
Cross-conversation search: “what did we decide about the API design?” searches across all conversations
Zero configuration: the app builder defines a Conversation entity with messages — the platform handles the rest

Workflow Agent Action

An agent in a workflow wait state processes an entity using RAG context to make a decision. No human chat involved — this is automated.

workflow OrderApproval for Order:
  type = lifecycle
  property = Status

  waitstate Review:
    for:
      aiRiskCheck: agent
        completes: AiRiskReview
      managerApproval: manual

The platform invokes the agent with the entity’s RAG context. The agent returns a decision which becomes the claim’s payload — visible in the workflow history as “AI assessed risk as low: customer has 12 on-time payments, order value within normal range.”

This pattern separates chat (human-agent interaction with history) from automated processing (agent makes a decision and moves on). The same RAG infrastructure powers both — the agent sees the entity’s semantic shadow either way.

Pattern Summary

Pattern	Human interaction	Chat history	Entity-specific	Trigger
Chat	Conversational	Yes (per user per entity)	Yes	User sends message
Pulse	None (passive read)	No	Yes	Background job
Workflow	None (automated)	No	Yes	Workflow for-item
Search	Query → results	No	Cross-entity	User searches

Chat is the workhorse — it covers Q&A, document analysis, follow-ups, report generation, and insight extraction. Everything a user does with an agent goes through chat.