The Sovereign Stack — Local AI Toolkit

S LLM Runner

Ollama ↗

Local LLM runner (CLI + API)

The standard for running open models on your own machine.

AgenticLow MCPPlugin RAM8GB (small) · 32GB+ for 70B class PlatformMac · Linux · Windows

Single-command install, pulls models like Docker images. Runs Llama, Qwen, DeepSeek, Gemma, Mistral, etc. on Mac / Linux / Windows. Has a REST API so other tools can hit it.

Free (open source)

Redundancy check — Overlaps LM Studio (CLI vs GUI). Most captains pick one.

Best for: technical captains comfortable with the terminal; the foundation other local tools build on.

Start here. Most-used local runner for a reason — simple, stable, fast.

A LLM Runner

LM Studio ↗

Local LLM runner (GUI app)

The captain-friendly way to run local models.

AgenticLow MCPYes RAM8GB · 32GB+ for 70B class PlatformMac · Linux · Windows

Desktop app for Mac / Linux / Windows. Browse, download, run open-weight models from a clean GUI. Includes a chat interface and an OpenAI-compatible API. No terminal required.

Free

Redundancy check — Overlaps Ollama. LM Studio has a nicer UI; Ollama has a smaller resource footprint.

Best for: captains who want point-and-click local AI without learning a terminal.

Best entry point if you don't already love the terminal. Free, fast, works.

B LLM Runner

MLX ↗

Apple Silicon ML framework

The fastest way to run local models on Mac.

AgenticN/A MCPN/A RAMSame as model: 16GB for 7B, 64GB+ for 70B Q4 PlatformApple Silicon only (M1/M2/M3/M4)

Apple's native ML framework for M-series chips. Models converted to MLX format run faster and with lower memory than llama.cpp on Apple Silicon. Pair with mlx-lm or mlx-examples to use it.

Free (open source)

Redundancy check — Different layer than Ollama — Ollama can use MLX as a backend on Macs.

Best for: Mac captains who want maximum performance from their unified memory.

If you have a Mac Studio M3 Ultra, this is what makes 70B+ models feel snappy.

C LLM Runner

llama.cpp ↗

Local LLM inference engine

The C++ engine under most local LLM tools.

AgenticN/A MCPN/A RAMPer model PlatformMac · Linux · Windows

Powers Ollama, LM Studio, and many others under the hood. Direct command-line use is technical; most captains use it via Ollama or LM Studio. The GGUF model format originated here.

Free (open source)

Redundancy check — Indirectly used by Ollama / LM Studio.

Best for: captains who want fine-grained control over inference parameters and model formats.

You don't need this directly unless you're benchmarking. Trust Ollama / LM Studio to handle it.

A LLM Runner

Open WebUI ↗

Local web UI for any LLM backend

ChatGPT-style UI for your local models.

AgenticMedium MCPYes RAMServer: 4GB · plus per-model RAM PlatformDocker (Mac · Linux · Windows)

Runs as a Docker container, connects to Ollama / LM Studio / OpenAI-compatible backends. Adds chat history, RAG over uploaded files, multi-user, prompt library — feels like ChatGPT, runs local.

Free (open source)

Redundancy check — Different from Ollama — Ollama is the engine, Open WebUI is the GUI on top.

Best for: captains who want a polished web interface for their local models.

Best 'looks like ChatGPT, runs on my hardware' choice. Pair with Ollama; you're set.

B Frontier Model

Llama 4 (Meta) ↗

Open-weight frontier model (Meta's last open flagship)

Meta's open frontier — Scout & Maverick.

AgenticHigh MCPN/A RAMScout Q4: ~64GB · Maverick: larger PlatformAny platform via Ollama / MLX

The Scout and Maverick variants, Scout's long-context window unmatched for big documents; quality competitive with top closed models on many tasks. Note: in April 2026 Meta's newest flagship (Muse Spark) went closed-weight, so Llama 4 is — for now — the last open Meta model. Qwen and DeepSeek are the open frontier going forward.

Free (Meta license; not pure OSS but generous)

Redundancy check — Overlaps Qwen 3.6 / DeepSeek — which are now the more actively-advancing open models.

Best for: captains who want a proven, strong open-weight English-language model.

Proven and solid. But for a fresh local stack in 2026, reach for Qwen 3.6 or DeepSeek V4 first — Meta's open line has paused.

A Frontier Model

Qwen 3.5 / 3.6 (Alibaba)

Open-weight frontier family (multilingual, MoE + dense)

Alibaba's open frontier. Best-in-class multilingual + strong coding.

AgenticVery High MCPYes RAM27B Q4: ~22GB · 235B-A22B Q4: 96GB+ PlatformAny via Ollama / MLX / vLLM

Qwen 3.6 27B is the best dense coding model you can run locally (~77% SWE-bench, ~22GB). Qwen 3.5 covers 200+ languages and scales to a 235B-A22B MoE that activates ~22B params, so it runs at 22B speed on 128GB+ unified memory. Apache 2.0.

Free (Apache 2.0)

Redundancy check — Overlaps Llama 4 / DeepSeek for English; wins on multilingual + dense coding.

Best for: captains doing multilingual ministry, coding, or who want the most capable open MoE.

The 27B is the everyday workhorse on 32-64GB; pull the 235B only if you've got 128GB+.

C Frontier Model

DeepSeek V4

Open-weight frontier reasoning model (MoE, 1M context)

The DeepSeek that tops the open leaderboards — on your own machine.

AgenticHigh MCPN/A RAMQ4: ~96-110GB · Q2: ~64GB PlatformMac (slow) · Linux + GPU (fast)

DeepSeek V4 (early 2026, MIT license) leads open models on raw capability — ~80% SWE-bench and a 1M-token context, with R1-style reasoning built in. Large MoE; quantized to int4 it fits on a 128GB+ Mac (with patience) or runs fast on a Linux GPU box.

Free (MIT)

Redundancy check — Overlaps Qwen 3.5 / Llama 4 for top-tier reasoning; V4 leads on raw benchmarks.

Best for: captains who want the strongest open reasoning model, no API key, full privacy.

Frontier-class for free — but heavy. Worth it only if you've got 128GB+ or a GPU box.

S Frontier Model

Gemma 4 (Google)

Open-weight Google model family (Apache 2.0)

Google's open-weight family — now the on-device champion.

AgenticMedium MCPN/A RAME4B: 3GB · 26B-A4B Q4: ~18GB · 31B Q4: ~20GB PlatformAny via Ollama / MLX / LM Studio

Released April 2026 under Apache 2.0. Four sizes: E2B / E4B (edge — E4B runs in ~3GB with multimodal audio), 26B-A4B (Mixture-of-Experts, ~3.8B active — the practical local pick), and 31B dense (maximum quality). The 26B MoE reaches ~97% of the 31B’s quality at a fraction of the compute.

Free (Apache 2.0 — unrestricted commercial use)

Redundancy check — Overlaps Phi-4 / Qwen 3.6 for the 'capable small model' slot.

Best for: captains who want the most capable model that still runs on modest hardware — laptop to Mac Studio.

Pull Gemma 4 first. The 26B MoE runs great on 32GB; the E4B even runs on phone-class hardware.

A Frontier Model

Phi-4 / Phi-4 Mini (Microsoft)

Small but capable Microsoft models

Punches above its weight — and Mini runs on almost anything.

AgenticMedium MCPN/A RAMMini: 4GB · Phi-4 Q4: 9GB PlatformAny via Ollama / MLX

Phi-4 (dense 14B) scores higher than its size suggests on reasoning. Phi-4 Mini is the best pick for 4-8GB machines. Fast on any modern Mac; great for lighter hardware.

Free (MIT)

Redundancy check — Overlaps Gemma 4 for the 'capable small model' slot.

Best for: captains on lighter hardware (8-32GB) who still want solid reasoning.

If your Mac is 8-32GB, Phi-4 (or Mini) + Gemma 4 are your workhorses. Tiny, fast, smart.

S Voice

Whisper.cpp ↗

Local speech-to-text (Whisper port)

Whisper running locally — no API calls.

AgenticN/A MCPN/A RAM2-8GB depending on model size PlatformMac · Linux · Windows

C++ port of OpenAI's Whisper, optimized for CPU and Apple Silicon. Real-time transcription on a Mac. Much faster than the original Python implementation.

Free (MIT)

Redundancy check — Different from cloud Whisper API (this runs locally, no upload).

Best for: captains transcribing sensitive content (counseling notes, sermon prep) who don't want audio leaving the machine.

Use this over the API when content is sensitive. Free and fast.

A Voice

WhisperX ↗

Whisper + speaker diarization

Whisper with speaker labels and word-level timestamps.

AgenticN/A MCPN/A RAM8-16GB PlatformMac · Linux · Windows

Builds on Whisper to add speaker diarization (who said what) and word-level alignment. Useful for transcribing conversations, panel recordings, sermons with multiple voices.

Free (BSD)

Redundancy check — Adds to Whisper.cpp; not a replacement.

Best for: captains transcribing multi-speaker recordings (conversations, panels, group prayer, board meetings).

Use when you need to know who said what. Free.

C Voice

Kokoro / OpenVoice

Local text-to-speech / voice cloning

Local TTS — your voice, on your machine.

AgenticN/A MCPN/A RAMKokoro: 2-4GB (CPU ok) · OpenVoice: 8-16GB PlatformMac · Linux · Windows

Kokoro (2026) is an 82M-param TTS model that sounds better than models 20x its size and runs on plain CPU — the new default for local narration. OpenVoice still leads for voice cloning from short samples. Both keep audio off the cloud.

Free (open source)

Redundancy check — Different from ElevenLabs (cloud, polished). Kokoro closes much of the quality gap, for free + private.

Best for: captains who want natural narration (devotionals, audio overviews) or private voice cloning without uploading samples.

Kokoro is shockingly good for its size and runs on CPU. Start there; reach for OpenVoice only if you need cloning.

S Coding

Continue.dev ↗

VS Code extension w/ local model support

Cursor-like AI coding with your local LLM.

AgenticHigh MCPYes RAMPer model used PlatformMac · Linux · Windows

Open-source VS Code (and JetBrains) extension. Connect to Ollama / LM Studio for autocomplete and chat using local models. Free, private.

Free (open source)

Redundancy check — Overlaps Cursor for AI coding; Continue is free + uses your local models.

Best for: captains who want Cursor-style coding without the subscription, using their local models.

Best free Cursor alternative for the local-first captain.

A Coding

Cline (formerly Claude Dev) ↗

Open-source agentic VS Code extension

Open-source autonomous coding agent.

AgenticVery High MCPYes RAMPer model used PlatformMac · Linux · Windows

VS Code extension that operates like a Claude Code clone — autonomous agent that reads/writes files, runs commands, plans multi-step changes. Works with Anthropic, OpenAI, or local models via Ollama.

Free (open source); pay for whichever model API you use

Redundancy check — Closest local-friendly alternative to Claude Code.

Best for: captains who want an autonomous coding agent that can run on local models.

Excellent. Pairs with Llama 4 70B locally for a free Claude Code substitute.

C Coding

Tabby ↗

Self-hosted code autocomplete

GitHub Copilot-style autocomplete, on your own server.

AgenticMedium MCPNo RAM8-24GB depending on model PlatformDocker (Mac · Linux · Windows)

Self-hosted alternative to Copilot. Runs as a Docker container, supports VS Code, JetBrains, Vim. Uses local code models like StarCoder.

Free (open source); paid Tabby Pro tier exists

Redundancy check — Overlaps GitHub Copilot. Tabby is self-hosted; Copilot is cloud.

Best for: captains in regulated industries who can't send code to a cloud autocomplete service.

Niche. Use only if compliance forbids cloud Copilot.

S Knowledge / RAG

AnythingLLM ↗

Local RAG over your documents

ChatGPT-style chat over your own files. 100% local.

AgenticMedium MCPPlugin RAM8-16GB + model RAM PlatformMac · Linux · Windows

Drop in PDFs, Word docs, websites — AnythingLLM ingests them, stores them in a local vector database, lets you chat with the corpus using a local LLM. Free, open-source, polished UI.

Free (open source); paid cloud tier exists

Redundancy check — Overlaps Khoj / Open WebUI for RAG.

Best for: captains who want NotebookLM but local — chat over your own books, sermons, family records.

Best NotebookLM alternative that runs entirely on your machine.

A Knowledge / RAG

Khoj ↗

Personal AI search engine

AI search across your notes, emails, files.

AgenticMedium MCPPlugin RAM8-16GB + model RAM PlatformMac · Linux · Windows

Open-source 'Perplexity for your own stuff.' Indexes Obsidian, Notion, GitHub, email, and runs AI search + chat over them. Self-hosted; free.

Free (self-hosted)

Redundancy check — Overlaps AnythingLLM; Khoj is more search-focused.

Best for: captains with a deep personal corpus (Obsidian + email + files) who want AI search over all of it.

Killer pairing with Obsidian. Free, fast, private.

C Knowledge / RAG

Obsidian + Smart Connections ↗

Local-first knowledge graph + AI search plugin

Obsidian + AI plugins, fully local.

AgenticMedium MCPPlugin RAMPer model used PlatformMac · Linux · Windows

Same Obsidian as the cloud Compass entry — but with the Smart Connections plugin pointed at a local LLM, every note in your vault becomes AI-searchable without sending data to a cloud.

Obsidian free; Smart Connections plugin free; some plugins paid

Redundancy check — Same Obsidian as the cloud entry, just configured locally.

Best for: captains already on Obsidian who want a privacy-first AI layer over their notes.

If Obsidian is your second brain, add Smart Connections + a local model and you have private AI search over everything.

A Automation

n8n self-hosted ↗

Self-hosted automation platform

Same n8n from the Compass — running on your machine.

AgenticHigh MCPNative RAM4-8GB for n8n + per model PlatformDocker (Mac · Linux · Windows)

Self-host via Docker. All AI nodes work with local Ollama / LM Studio. The captain's choice for full data sovereignty + automation. MCP-native.

Free (self-hosted; Docker)

Redundancy check — Same n8n as the cloud entry.

Best for: captains who want enterprise-grade automation with zero data leaving their network.

If you have a homelab, n8n + Ollama is the local automation stack.

S Automation

MCP Servers ↗

Local Model Context Protocol servers

Run MCP servers on your machine to wire local tools into Cowork.

AgenticHigh MCPNative RAMMinimal (per server) PlatformMac · Linux · Windows

Anthropic publishes reference MCP servers (filesystem, git, sqlite, etc.) and the community has hundreds more. Run them locally and Cowork or Claude Code can use them as tools — purely local.

Free (open source)

Redundancy check — Different layer — these are the building blocks for agentic systems.

Best for: captains extending Cowork with local capabilities (custom databases, internal APIs, file systems).

Future-proof. As MCP grows, more captain-relevant servers will exist.

C Automation

Goose (local backend) ↗

Open-source agentic desktop tool

Same Goose from the Frontier — running purely local.

AgenticVery High MCPNative RAMPer model + 1-2GB for Goose PlatformMac · Linux · Windows

Block's open-source MCP-native desktop agent. Point it at a local Ollama backend and you have a fully local autonomous agent.

Free, open source

Redundancy check — Same Goose as the Frontier listing; this is the local-first config.

Best for: captains who want Cowork-class agentic capability without the Anthropic subscription.

Best free agent. Pair with Llama 4 70B locally for a serious autonomous setup.

Local AI Toolkit

Hardware Sizing Guide

Where local AI starts being useful

70B territory — frontier class

DeepSeek V4 / Qwen 235B territory

The Sovereign Stack — overview.

Frontier AI with the Sovereign Stack · audio

The 22 local tools

Ollama ↗

LM Studio ↗

MLX ↗

llama.cpp ↗

Open WebUI ↗

Llama 4 (Meta) ↗

Qwen 3.5 / 3.6 (Alibaba)

DeepSeek V4

Gemma 4 (Google)

Phi-4 / Phi-4 Mini (Microsoft)

Whisper.cpp ↗

WhisperX ↗

Kokoro / OpenVoice

Continue.dev ↗

Cline (formerly Claude Dev) ↗

Tabby ↗

AnythingLLM ↗

Khoj ↗

Obsidian + Smart Connections ↗

n8n self-hosted ↗

MCP Servers ↗

Goose (local backend) ↗