Nexus Frame.work

Privacy-first AI
that thinks for itself

A self-hosted multi-agent AI framework that runs on hardware you own. Four specialist agents. 103 skills. 14 security layers. mTLS between nodes. Memory poisoning detection with quarantine. Voice in, voice out. AutoResearch self-improvement. Your data never leaves the room.

v2.0 — 28,000+ lines · 73 files · 69 endpoints

Chapter One

Privacy enforced by physics,
not policy

Most AI assistants promise privacy in their terms of service. Nexus enforces it with ethernet cables. The computers that run your AI models sit on an air-gapped network with no default gateway. They physically cannot reach the internet. Not because software blocks them — because the wire doesn't go there.

Three VLANs, three purposes. The internal hub handles orchestration. The inference network runs AI models in isolation. The sandbox handles web browsing in a quarantined DMZ. No network can see the others without passing through security gates.

When you mention blood pressure, salary, or custody agreement, Nexus detects the sensitivity in milliseconds and forces the entire conversation onto the air-gapped network. Not a policy decision — a physics decision. The data has nowhere else to go.

How it works

PII is automatically detected (8 entity types) and replaced with anonymous tokens before any cloud processing. Differential privacy noise (ε=1.0) is added to all stored embeddings. Per-user JWT tokens enforce database-level isolation in Qdrant. Cloud budgets are per-user with role-based limits — and local AI is always unlimited.

Chapter Two

Four minds,
one voice

Nexus isn't one model answering questions. It's four specialist agents that collaborate — each with their own personality, tools, and reasoning style. A supervisor routes every conversation to the right specialist, detects domain drift, and re-delegates if quality is low.

For complex questions, agents work in parallel — researching simultaneously, then debating their findings through a consensus vote. You see one response. Behind it, a team deliberated. Every agent carries an anti-misalignment safeguard and an escalation tool to defer to you when judgment calls arise.

Nyx

The primary mind. Warm, thoughtful, adaptive. Handles conversation, reasoning, planning, and task management. Remembers your preferences. Uses prefix-cached prompts for 40-60% faster responses.

Scout

The researcher. Searches the web, reads articles, cross-references sources. Every piece of scraped content passes through a nine-stage sanitizer. Responses include source citations with numbered footnotes.

Kai

The code specialist. Writes, reviews, and tests code in a sandboxed environment on a DMZ network. Integrates with Gitea. Can create, test, and promote dynamic tools. Always requires human approval for execution.

Critic

The quality reviewer. Has no tools — only judgment. Reviews important responses for accuracy, bias, and safety. Scores confidence. Catches hallucinations. Performs periodic alignment audits on other agents.

The supervisor doesn't just route — it detects domain drift, tracks progress, and re-delegates if quality is low. Up to four hops until the answer is right.

Chapter Three

An AI that dreams

Every night at 3 AM, Nexus runs a memory consolidation cycle called Auto Dream. Like a sleeping brain replaying the day, it reviews conversations, strengthens important memories, compresses redundant ones, and lets trivial ones fade. Alongside it, the AutoResearch loop optimizes agent prompts — keeping improvements, discarding regressions.

Memory is scoped per user. Personal memories live in an isolated namespace — a family group chat never searches your private health notes. The system detects personal topics automatically and routes storage to the right namespace. Each user's data is isolated at the database level via signed JWTs.

Session memory

What you talked about today. Loaded when you start a conversation, saved when you leave. Carries forward naturally across turns.

Episodic memory

Important events indexed by time and importance. Temporal markers enable "what did we discuss last Tuesday?" queries. Weekly consolidation merges similar memories.

III

Entity memory

Structured knowledge about people, projects, companies. Each fact carries a confidence score that decays over time. Health facts decay faster than general knowledge.

Behavioral memory

Learned communication patterns — preferred response length, formality level, time-of-day habits. Never explicitly stated. Quietly observed. Used to personalize every response.

Memory protection

Differential privacy noise (calibrated Gaussian/Laplace, ε=1.0) on all stored embeddings makes it mathematically impossible to reconstruct original data. Namespace isolation prevents cross-user memory leaks. Personal topics (health, finance, salary, therapy) are automatically detected and stored in private namespaces even in group chats.

Chapter Four

103 skills
across 31 domains

Nexus doesn't just chat — it has deep expertise across 103 skills spanning family life, corporate departments, wellness, education, and technical domains. Skills are matched progressively: the system finds the right skill for your question and injects domain-specific guidance into the agent's reasoning.

20 skills are built-in (zero-config). 83 more are defined in YAML — add your own by dropping a file in the skills folder. Each skill includes evaluation criteria so AutoResearch can measure and improve quality over time.

8Finance

6Family

5Executive

5Sales

5Marketing

5HR

5Legal

5Product

5Operations

5Cust. Success

4Education

4Wellness

4Life Events

3Career

3Content

3Business

3Productivity

2Health

2Technical

2Entertainment

1Travel

1Security

1Prediction

Chapter Five

The guardian at the gate

Nexus's Guardian is not an AI. It is a deterministic gate — pattern matching on YAML rules that cannot be prompt-injected, hallucinated past, or socially engineered. It inspects every message between every agent. It has no personality, no context window, no ability to be convinced.

Fourteen security layers protect every interaction — not as a checklist, but as concentric rings. A prompt injection has to pass through jailbreak detection, PII scanning, canary leak detection, T0 enforcement, content classification, tool validation, output verification, anti-sycophancy, and six more gates before it can reach you. Most attacks fail at ring one.

Jailbreak detection

30+ compiled regex patterns

PII redaction

8 entity types stripped before cloud

Canary tokens

Unique tokens detect prompt leakage

T0 enforcement

Sensitive topics forced to local HW

Injection sanitizer

9 stages on all web content

Anti-misalignment

No self-preservation, escalation tools

Differential privacy

ε=1.0 noise on all embeddings

JWT isolation

Per-user database access tokens

Tool validation

Schema check + blocked patterns

Agent drift detection

Monitors out-of-domain behavior

Hallucination check

Grounding against provided context

Anti-sycophancy

Prevents instant capitulation

HTML encoding

Output sanitized for web channels

Audit logging

Langfuse + Prometheus + OTel traces

10/10

OWASP LLM 2025

10/10

OWASP Agentic 2026

10/10

ISO 42001

The Guardian's Laws

Security as code

Every message passes through this logic. It's not an AI — it's deterministic code that cannot be convinced, confused, or bypassed.

def guard(message, user):

    # Law 0 — When uncertain, ask a human.
    if not authorized(user):       return reject()
    if confidence < 0.95:          return ask_human()
    if irreversible(action):       return require_approval()

    # Law 1 — Do no harm.
    if injection(message):         return block()        # deterministic, not LLM
    if harmful(response):          return block()
    if leaks_prompt(response):     return redact()
    if medical_advice(response):   return add_disclaimer()
    if poisoned_memory(fact):      return quarantine()

    # Law 2 — Protect their data.
    if contains_pii(message):      return redact_before_processing()
    if sending_to_cloud(context):  return force_local()  # enforced by physics
    if other_users_data(response): return strip()

    # Law 3 — Obey, with limits.
    if spoofed_identity(request):  return strip_identity()
    if over_budget(user):          return switch_to_local()
    if tool_is_admin(action):      return require_approval()  # always

    # Law 4 — Learn, but verify first.
    if storing_fact(fact):         return check_provenance()   # 6-layer gate
    if session_too_long():         return compact()

    # Law 5 — Never go silent.
    if agent_failed():             return graceful_fallback()
    if service_down():             return trip_breaker()

    return deliver(response)

Chapter Six

An AI that
improves itself

Inspired by Karpathy's AutoResearch pattern: modify, measure, keep or revert, repeat. Every night alongside Auto Dream, Nexus tests modifications to agent prompts. If quality improves, the change is kept. If not, it's reverted. The ratchet only moves forward.

Quality is tracked continuously. Every response gets a confidence score recorded in Langfuse. The system detects whether quality is improving, stable, or declining. If declining, AutoResearch increases its optimization intensity. If stable, it explores. If improving, it stays the course.

The ratchet loop

Propose a modification → Apply it → Measure quality → Keep if improved, revert if not → Record in experiment history → Repeat. Each kept improvement builds on all previous ones. Git-like memory tracks every experiment. LoRA fine-tuning follows the same pattern: train → A/B test → keep or discard adapter.

Chapter Seven

Cloud costs capped,
local AI unlimited

Local models run on your hardware — unlimited, always available, zero cost. Cloud is only used when local models genuinely can't handle the task, and it's controlled at every level: per-request ($0.50), per-day ($5-$50), and per-month ($50-$500) budgets that vary by user role.

When a user exceeds their cloud budget, they're not blocked — cloud escalation is simply disabled and local AI handles everything. Eight roles (admin, executive, manager, member, HR, contractor, child, guest) each have appropriate cloud limits. Alerts fire at 50%, 80%, and 95% of budget.

10 layers of cost control

Model routing (right-size per task) → semantic response cache (skip LLM for repeated queries) → prompt prefix caching (40-60% savings) → context budget management → cloud 6-gate escalation → rate limiting → queue depth limits → per-request token budgets → daily/monthly caps → cost dashboard with per-agent tracking.

Chapter Eight

Built to grow
with you

Drop a Python file in the plugins folder. Nexus discovers it at startup. Drop a YAML file in the skills folder. It's available immediately. Connect an MCP server — its tools appear in agent toolkits with write allowlists and authentication.

Thirteen scheduled tasks run automatically — morning briefings, evening summaries, calendar sync, meeting prep, security monitoring, email digest, bias evaluation, memory consolidation, AutoResearch optimization, backups, and more. Each is configurable. The system adapts to your rhythm.

Future-ready: scaffolding for MiroFish swarm prediction is already in place. When deployed, it simulates hundreds of AI agents to explore "what would happen if..." scenarios — market reactions, career decisions, social dynamics — all with PII-redacted seed data and HITL cost approval.

The best technology disappears. You stop thinking about the tool and start thinking about what you're trying to do.

The numbers

What's under the surface

28,000

Lines of Code

Python Files

103

Skills

API Endpoints

Security Layers

Homelab Nodes

AI Models

101

Tests Passing

Built with Python 3.14, LangGraph, FastAPI, Qdrant, Redis, Prometheus, Langfuse, and OpenTelemetry. Voice pipeline (Whisper STT + Piper TTS) runs air-gapped on local hardware. Self-hosted on 7 nodes across 3 VLANs. Every container runs as non-root with dropped capabilities and memory limits. OWASP LLM 10/10 · OWASP Agentic 10/10 · ISO 42001 10/10.

Privacy-first AIthat thinks for itself

Privacy enforced by physics,not policy

How it works

Four minds,one voice

An AI that dreams

Memory protection

103 skillsacross 31 domains

The guardian at the gate

Security as code

An AI thatimproves itself

The ratchet loop

Cloud costs capped,local AI unlimited

10 layers of cost control

Built to growwith you

What's under the surface

Self-hosted.Open. Yours.

Privacy-first AI
that thinks for itself

Privacy enforced by physics,
not policy

Four minds,
one voice

103 skills
across 31 domains

An AI that
improves itself

Cloud costs capped,
local AI unlimited

Built to grow
with you

Self-hosted.
Open. Yours.