Nexus Framework

Privacy-first AI
that thinks for itself

A self-hosted multi-agent AI framework that runs on hardware you own. Four specialist agents. 103 skills. 14 security layers. mTLS between nodes. Voice in, voice out. AutoResearch self-improvement. Your data never leaves the room.

v2.0 — 26,500+ lines · 69 files · 62 endpoints
Chapter One

Privacy enforced by physics,
not policy

Most AI assistants promise privacy in their terms of service. Nexus enforces it with ethernet cables. The computers that run your AI models sit on an air-gapped network with no default gateway. They physically cannot reach the internet. Not because software blocks them — because the wire doesn't go there.

Three VLANs, three purposes. The internal hub handles orchestration. The inference network runs AI models in isolation. The sandbox handles web browsing in a quarantined DMZ. No network can see the others without passing through security gates.

When you mention blood pressure, salary, or custody agreement, Nexus detects the sensitivity in milliseconds and forces the entire conversation onto the air-gapped network. Not a policy decision — a physics decision. The data has nowhere else to go.

How it works

PII is automatically detected (8 entity types) and replaced with anonymous tokens before any cloud processing. Differential privacy noise (ε=1.0) is added to all stored embeddings. Per-user JWT tokens enforce database-level isolation in Qdrant. Cloud budgets are per-user with role-based limits — and local AI is always unlimited.

Chapter Two

Four minds,
one voice

Nexus isn't one model answering questions. It's four specialist agents that collaborate — each with their own personality, tools, and reasoning style. A supervisor routes every conversation to the right specialist, detects domain drift, and re-delegates if quality is low.

For complex questions, agents work in parallel — researching simultaneously, then debating their findings through a consensus vote. You see one response. Behind it, a team deliberated. Every agent carries an anti-misalignment safeguard and an escalation tool to defer to you when judgment calls arise.

Nyx
The primary mind. Warm, thoughtful, adaptive. Handles conversation, reasoning, planning, and task management. Remembers your preferences. Uses prefix-cached prompts for 40-60% faster responses.
Scout
The researcher. Searches the web, reads articles, cross-references sources. Every piece of scraped content passes through a nine-stage sanitizer. Responses include source citations with numbered footnotes.
Kai
The code specialist. Writes, reviews, and tests code in a sandboxed environment on a DMZ network. Integrates with Gitea. Can create, test, and promote dynamic tools. Always requires human approval for execution.
Critic
The quality reviewer. Has no tools — only judgment. Reviews important responses for accuracy, bias, and safety. Scores confidence. Catches hallucinations. Performs periodic alignment audits on other agents.
The supervisor doesn't just route — it detects domain drift, tracks progress, and re-delegates if quality is low. Up to four hops until the answer is right.
Chapter Three

An AI that dreams

Every night at 3 AM, Nexus runs a memory consolidation cycle called Auto Dream. Like a sleeping brain replaying the day, it reviews conversations, strengthens important memories, compresses redundant ones, and lets trivial ones fade. Alongside it, the AutoResearch loop optimizes agent prompts — keeping improvements, discarding regressions.

Memory is scoped per user. Personal memories live in an isolated namespace — a family group chat never searches your private health notes. The system detects personal topics automatically and routes storage to the right namespace. Each user's data is isolated at the database level via signed JWTs.

I
Session memory
What you talked about today. Loaded when you start a conversation, saved when you leave. Carries forward naturally across turns.
II
Episodic memory
Important events indexed by time and importance. Temporal markers enable "what did we discuss last Tuesday?" queries. Weekly consolidation merges similar memories.
III
Entity memory
Structured knowledge about people, projects, companies. Each fact carries a confidence score that decays over time. Health facts decay faster than general knowledge.
IV
Behavioral memory
Learned communication patterns — preferred response length, formality level, time-of-day habits. Never explicitly stated. Quietly observed. Used to personalize every response.

Memory protection

Differential privacy noise (calibrated Gaussian/Laplace, ε=1.0) on all stored embeddings makes it mathematically impossible to reconstruct original data. Namespace isolation prevents cross-user memory leaks. Personal topics (health, finance, salary, therapy) are automatically detected and stored in private namespaces even in group chats.

Chapter Four

103 skills
across 31 domains

Nexus doesn't just chat — it has deep expertise across 103 skills spanning family life, corporate departments, wellness, education, and technical domains. Skills are matched progressively: the system finds the right skill for your question and injects domain-specific guidance into the agent's reasoning.

20 skills are built-in (zero-config). 83 more are defined in YAML — add your own by dropping a file in the skills folder. Each skill includes evaluation criteria so AutoResearch can measure and improve quality over time.

8Finance
6Family
5Executive
5Sales
5Marketing
5HR
5Legal
5Product
5Operations
5Cust. Success
4Education
4Wellness
4Life Events
3Career
3Content
3Business
3Productivity
2Health
2Technical
2Entertainment
1Travel
1Security
1Prediction
Chapter Five

The guardian at the gate

Nexus's Guardian is not an AI. It is a deterministic gate — pattern matching on YAML rules that cannot be prompt-injected, hallucinated past, or socially engineered. It inspects every message between every agent. It has no personality, no context window, no ability to be convinced.

Fourteen security layers protect every interaction — not as a checklist, but as concentric rings. A prompt injection has to pass through jailbreak detection, PII scanning, canary leak detection, T0 enforcement, content classification, tool validation, output verification, anti-sycophancy, and six more gates before it can reach you. Most attacks fail at ring one.

Jailbreak detection
30+ compiled regex patterns
PII redaction
8 entity types stripped before cloud
Canary tokens
Unique tokens detect prompt leakage
T0 enforcement
Sensitive topics forced to local HW
Injection sanitizer
9 stages on all web content
Anti-misalignment
No self-preservation, escalation tools
Differential privacy
ε=1.0 noise on all embeddings
JWT isolation
Per-user database access tokens
Tool validation
Schema check + blocked patterns
Agent drift detection
Monitors out-of-domain behavior
Hallucination check
Grounding against provided context
Anti-sycophancy
Prevents instant capitulation
HTML encoding
Output sanitized for web channels
Audit logging
Langfuse + Prometheus + OTel traces
10/10
OWASP LLM 2025
10/10
OWASP Agentic 2026
9/10
ISO 42001
Chapter Six

An AI that
improves itself

Inspired by Karpathy's AutoResearch pattern: modify, measure, keep or revert, repeat. Every night alongside Auto Dream, Nexus tests modifications to agent prompts. If quality improves, the change is kept. If not, it's reverted. The ratchet only moves forward.

Quality is tracked continuously. Every response gets a confidence score recorded in Langfuse. The system detects whether quality is improving, stable, or declining. If declining, AutoResearch increases its optimization intensity. If stable, it explores. If improving, it stays the course.

The ratchet loop

Propose a modification → Apply it → Measure quality → Keep if improved, revert if not → Record in experiment history → Repeat. Each kept improvement builds on all previous ones. Git-like memory tracks every experiment. LoRA fine-tuning follows the same pattern: train → A/B test → keep or discard adapter.

Chapter Seven

Cloud costs capped,
local AI unlimited

Local models run on your hardware — unlimited, always available, zero cost. Cloud is only used when local models genuinely can't handle the task, and it's controlled at every level: per-request ($0.50), per-day ($5-$50), and per-month ($50-$500) budgets that vary by user role.

When a user exceeds their cloud budget, they're not blocked — cloud escalation is simply disabled and local AI handles everything. Eight roles (admin, executive, manager, member, HR, contractor, child, guest) each have appropriate cloud limits. Alerts fire at 50%, 80%, and 95% of budget.

10 layers of cost control

Model routing (right-size per task) → semantic response cache (skip LLM for repeated queries) → prompt prefix caching (40-60% savings) → context budget management → cloud 6-gate escalation → rate limiting → queue depth limits → per-request token budgets → daily/monthly caps → cost dashboard with per-agent tracking.

Chapter Eight

Built to grow
with you

Drop a Python file in the plugins folder. Nexus discovers it at startup. Drop a YAML file in the skills folder. It's available immediately. Connect an MCP server — its tools appear in agent toolkits with write allowlists and authentication.

Thirteen scheduled tasks run automatically — morning briefings, evening summaries, calendar sync, meeting prep, security monitoring, email digest, bias evaluation, memory consolidation, AutoResearch optimization, backups, and more. Each is configurable. The system adapts to your rhythm.

Future-ready: scaffolding for MiroFish swarm prediction is already in place. When deployed, it simulates hundreds of AI agents to explore "what would happen if..." scenarios — market reactions, career decisions, social dynamics — all with PII-redacted seed data and HITL cost approval.

The best technology disappears. You stop thinking about the tool and start thinking about what you're trying to do.
The numbers

What's under the surface

26,525
Lines of Code
69
Python Files
103
Skills
62
API Endpoints
14
Security Layers
7
Homelab Nodes
4
AI Models
101
Tests Passing

Built with Python 3.14, LangGraph, FastAPI, Qdrant, Redis, Prometheus, Langfuse, and OpenTelemetry. Voice pipeline (Whisper STT + Piper TTS) runs air-gapped on local hardware. Self-hosted on 7 nodes across 3 VLANs. Every container runs as non-root with dropped capabilities and memory limits. OWASP LLM 10/10 · OWASP Agentic 10/10 · ISO 42001 9/10.

Self-hosted.
Open. Yours.

Nexus is a framework for people who believe their data belongs to them — not to a cloud provider, not to an ad network, not to anyone else. It runs on hardware you can hold in your hands.

Get in touch