A self-hosted multi-agent AI framework that runs on hardware you own. Four specialist agents. 103 skills. 14 security layers. mTLS between nodes. Voice in, voice out. AutoResearch self-improvement. Your data never leaves the room.
Most AI assistants promise privacy in their terms of service. Nexus enforces it with ethernet cables. The computers that run your AI models sit on an air-gapped network with no default gateway. They physically cannot reach the internet. Not because software blocks them — because the wire doesn't go there.
Three VLANs, three purposes. The internal hub handles orchestration. The inference network runs AI models in isolation. The sandbox handles web browsing in a quarantined DMZ. No network can see the others without passing through security gates.
When you mention blood pressure, salary, or custody agreement, Nexus detects the sensitivity in milliseconds and forces the entire conversation onto the air-gapped network. Not a policy decision — a physics decision. The data has nowhere else to go.
PII is automatically detected (8 entity types) and replaced with anonymous tokens before any cloud processing. Differential privacy noise (ε=1.0) is added to all stored embeddings. Per-user JWT tokens enforce database-level isolation in Qdrant. Cloud budgets are per-user with role-based limits — and local AI is always unlimited.
Nexus isn't one model answering questions. It's four specialist agents that collaborate — each with their own personality, tools, and reasoning style. A supervisor routes every conversation to the right specialist, detects domain drift, and re-delegates if quality is low.
For complex questions, agents work in parallel — researching simultaneously, then debating their findings through a consensus vote. You see one response. Behind it, a team deliberated. Every agent carries an anti-misalignment safeguard and an escalation tool to defer to you when judgment calls arise.
Every night at 3 AM, Nexus runs a memory consolidation cycle called Auto Dream. Like a sleeping brain replaying the day, it reviews conversations, strengthens important memories, compresses redundant ones, and lets trivial ones fade. Alongside it, the AutoResearch loop optimizes agent prompts — keeping improvements, discarding regressions.
Memory is scoped per user. Personal memories live in an isolated namespace — a family group chat never searches your private health notes. The system detects personal topics automatically and routes storage to the right namespace. Each user's data is isolated at the database level via signed JWTs.
Differential privacy noise (calibrated Gaussian/Laplace, ε=1.0) on all stored embeddings makes it mathematically impossible to reconstruct original data. Namespace isolation prevents cross-user memory leaks. Personal topics (health, finance, salary, therapy) are automatically detected and stored in private namespaces even in group chats.
Nexus doesn't just chat — it has deep expertise across 103 skills spanning family life, corporate departments, wellness, education, and technical domains. Skills are matched progressively: the system finds the right skill for your question and injects domain-specific guidance into the agent's reasoning.
20 skills are built-in (zero-config). 83 more are defined in YAML — add your own by dropping a file in the skills folder. Each skill includes evaluation criteria so AutoResearch can measure and improve quality over time.
Nexus's Guardian is not an AI. It is a deterministic gate — pattern matching on YAML rules that cannot be prompt-injected, hallucinated past, or socially engineered. It inspects every message between every agent. It has no personality, no context window, no ability to be convinced.
Fourteen security layers protect every interaction — not as a checklist, but as concentric rings. A prompt injection has to pass through jailbreak detection, PII scanning, canary leak detection, T0 enforcement, content classification, tool validation, output verification, anti-sycophancy, and six more gates before it can reach you. Most attacks fail at ring one.
Inspired by Karpathy's AutoResearch pattern: modify, measure, keep or revert, repeat. Every night alongside Auto Dream, Nexus tests modifications to agent prompts. If quality improves, the change is kept. If not, it's reverted. The ratchet only moves forward.
Quality is tracked continuously. Every response gets a confidence score recorded in Langfuse. The system detects whether quality is improving, stable, or declining. If declining, AutoResearch increases its optimization intensity. If stable, it explores. If improving, it stays the course.
Propose a modification → Apply it → Measure quality → Keep if improved, revert if not → Record in experiment history → Repeat. Each kept improvement builds on all previous ones. Git-like memory tracks every experiment. LoRA fine-tuning follows the same pattern: train → A/B test → keep or discard adapter.
Local models run on your hardware — unlimited, always available, zero cost. Cloud is only used when local models genuinely can't handle the task, and it's controlled at every level: per-request ($0.50), per-day ($5-$50), and per-month ($50-$500) budgets that vary by user role.
When a user exceeds their cloud budget, they're not blocked — cloud escalation is simply disabled and local AI handles everything. Eight roles (admin, executive, manager, member, HR, contractor, child, guest) each have appropriate cloud limits. Alerts fire at 50%, 80%, and 95% of budget.
Model routing (right-size per task) → semantic response cache (skip LLM for repeated queries) → prompt prefix caching (40-60% savings) → context budget management → cloud 6-gate escalation → rate limiting → queue depth limits → per-request token budgets → daily/monthly caps → cost dashboard with per-agent tracking.
Drop a Python file in the plugins folder. Nexus discovers it at startup. Drop a YAML file in the skills folder. It's available immediately. Connect an MCP server — its tools appear in agent toolkits with write allowlists and authentication.
Thirteen scheduled tasks run automatically — morning briefings, evening summaries, calendar sync, meeting prep, security monitoring, email digest, bias evaluation, memory consolidation, AutoResearch optimization, backups, and more. Each is configurable. The system adapts to your rhythm.
Future-ready: scaffolding for MiroFish swarm prediction is already in place. When deployed, it simulates hundreds of AI agents to explore "what would happen if..." scenarios — market reactions, career decisions, social dynamics — all with PII-redacted seed data and HITL cost approval.
Built with Python 3.14, LangGraph, FastAPI, Qdrant, Redis, Prometheus, Langfuse, and OpenTelemetry. Voice pipeline (Whisper STT + Piper TTS) runs air-gapped on local hardware. Self-hosted on 7 nodes across 3 VLANs. Every container runs as non-root with dropped capabilities and memory limits. OWASP LLM 10/10 · OWASP Agentic 10/10 · ISO 42001 9/10.
Nexus is a framework for people who believe their data belongs to them — not to a cloud provider, not to an ad network, not to anyone else. It runs on hardware you can hold in your hands.
Get in touch