THE FAMILY CAPTAIN
← Family Captain Boot Camp The Sovereign Stack · USMC Ministries

Local AI Toolkit

If you've already invested $6k+ into the iron — a Mac Studio with 128GB+ unified memory, or a similar Linux workstation with a 24GB+ GPU — this is the captain's local toolkit. No subscriptions. No data leaving your machine. Total privacy. The agent runs on your hardware.

Most captains don't need this page. Cowork + ChatGPT Plus + Otter for $60/mo will outperform a local stack until you specifically need privacy, sovereignty, or unlimited usage at zero marginal cost. This page is for the captains who've already crossed that line.

22 local tools1 hardware tier guide
All tools listed are open-source or have free local-use tiers. Models checked: 2026-05-23. Local AI is moving even faster than cloud AI — new Qwen / Gemma / Llama releases churn rankings month over month, so expect this list to shift at every 6-month refresh.
Watch · Listen

The Sovereign Stack — overview.

Two formats for the local-AI toolkit briefing — a short visual explainer or a longer audio deep dive.

↓ Download video (MP4)

Chapters (5)

Frontier AI with the Sovereign Stack · audio

Where local AI fits, when to choose it, and the captain’s redlines that make it necessary.

↓ Download audio (M4A)

Chapters (18)

The 22 local tools

Filter by category, search by name, or re-sort — the controls compose with the hardware tiers above. Best first ranks by tier (S→C).

Sort
S LLM Runner

Ollama

Local LLM runner (CLI + API)

The standard for running open models on your own machine.

AgenticLow MCPPlugin RAM8GB (small) · 32GB+ for 70B class PlatformMac · Linux · Windows
Single-command install, pulls models like Docker images. Runs Llama, Qwen, DeepSeek, Gemma, Mistral, etc. on Mac / Linux / Windows. Has a REST API so other tools can hit it.
Free (open source)
Redundancy check — Overlaps LM Studio (CLI vs GUI). Most captains pick one.
Best for: technical captains comfortable with the terminal; the foundation other local tools build on.
Start here. Most-used local runner for a reason — simple, stable, fast.
A LLM Runner

LM Studio

Local LLM runner (GUI app)

The captain-friendly way to run local models.

AgenticLow MCPYes RAM8GB · 32GB+ for 70B class PlatformMac · Linux · Windows
Desktop app for Mac / Linux / Windows. Browse, download, run open-weight models from a clean GUI. Includes a chat interface and an OpenAI-compatible API. No terminal required.
Free
Redundancy check — Overlaps Ollama. LM Studio has a nicer UI; Ollama has a smaller resource footprint.
Best for: captains who want point-and-click local AI without learning a terminal.
Best entry point if you don't already love the terminal. Free, fast, works.
B LLM Runner

MLX

Apple Silicon ML framework

The fastest way to run local models on Mac.

AgenticN/A MCPN/A RAMSame as model: 16GB for 7B, 64GB+ for 70B Q4 PlatformApple Silicon only (M1/M2/M3/M4)
Apple's native ML framework for M-series chips. Models converted to MLX format run faster and with lower memory than llama.cpp on Apple Silicon. Pair with mlx-lm or mlx-examples to use it.
Free (open source)
Redundancy check — Different layer than Ollama — Ollama can use MLX as a backend on Macs.
Best for: Mac captains who want maximum performance from their unified memory.
If you have a Mac Studio M3 Ultra, this is what makes 70B+ models feel snappy.
C LLM Runner

llama.cpp

Local LLM inference engine

The C++ engine under most local LLM tools.

AgenticN/A MCPN/A RAMPer model PlatformMac · Linux · Windows
Powers Ollama, LM Studio, and many others under the hood. Direct command-line use is technical; most captains use it via Ollama or LM Studio. The GGUF model format originated here.
Free (open source)
Redundancy check — Indirectly used by Ollama / LM Studio.
Best for: captains who want fine-grained control over inference parameters and model formats.
You don't need this directly unless you're benchmarking. Trust Ollama / LM Studio to handle it.
A LLM Runner

Open WebUI

Local web UI for any LLM backend

ChatGPT-style UI for your local models.

AgenticMedium MCPYes RAMServer: 4GB · plus per-model RAM PlatformDocker (Mac · Linux · Windows)
Runs as a Docker container, connects to Ollama / LM Studio / OpenAI-compatible backends. Adds chat history, RAG over uploaded files, multi-user, prompt library — feels like ChatGPT, runs local.
Free (open source)
Redundancy check — Different from Ollama — Ollama is the engine, Open WebUI is the GUI on top.
Best for: captains who want a polished web interface for their local models.
Best 'looks like ChatGPT, runs on my hardware' choice. Pair with Ollama; you're set.
B Frontier Model

Llama 4 (Meta)

Open-weight frontier model (Meta's last open flagship)

Meta's open frontier — Scout & Maverick.

AgenticHigh MCPN/A RAMScout Q4: ~64GB · Maverick: larger PlatformAny platform via Ollama / MLX
The Scout and Maverick variants, Scout's long-context window unmatched for big documents; quality competitive with top closed models on many tasks. Note: in April 2026 Meta's newest flagship (Muse Spark) went closed-weight, so Llama 4 is — for now — the last open Meta model. Qwen and DeepSeek are the open frontier going forward.
Free (Meta license; not pure OSS but generous)
Redundancy check — Overlaps Qwen 3.6 / DeepSeek — which are now the more actively-advancing open models.
Best for: captains who want a proven, strong open-weight English-language model.
Proven and solid. But for a fresh local stack in 2026, reach for Qwen 3.6 or DeepSeek V4 first — Meta's open line has paused.
A Frontier Model

Qwen 3.5 / 3.6 (Alibaba)

Open-weight frontier family (multilingual, MoE + dense)

Alibaba's open frontier. Best-in-class multilingual + strong coding.

AgenticVery High MCPYes RAM27B Q4: ~22GB · 235B-A22B Q4: 96GB+ PlatformAny via Ollama / MLX / vLLM
Qwen 3.6 27B is the best dense coding model you can run locally (~77% SWE-bench, ~22GB). Qwen 3.5 covers 200+ languages and scales to a 235B-A22B MoE that activates ~22B params, so it runs at 22B speed on 128GB+ unified memory. Apache 2.0.
Free (Apache 2.0)
Redundancy check — Overlaps Llama 4 / DeepSeek for English; wins on multilingual + dense coding.
Best for: captains doing multilingual ministry, coding, or who want the most capable open MoE.
The 27B is the everyday workhorse on 32-64GB; pull the 235B only if you've got 128GB+.
C Frontier Model

DeepSeek V4

Open-weight frontier reasoning model (MoE, 1M context)

The DeepSeek that tops the open leaderboards — on your own machine.

AgenticHigh MCPN/A RAMQ4: ~96-110GB · Q2: ~64GB PlatformMac (slow) · Linux + GPU (fast)
DeepSeek V4 (early 2026, MIT license) leads open models on raw capability — ~80% SWE-bench and a 1M-token context, with R1-style reasoning built in. Large MoE; quantized to int4 it fits on a 128GB+ Mac (with patience) or runs fast on a Linux GPU box.
Free (MIT)
Redundancy check — Overlaps Qwen 3.5 / Llama 4 for top-tier reasoning; V4 leads on raw benchmarks.
Best for: captains who want the strongest open reasoning model, no API key, full privacy.
Frontier-class for free — but heavy. Worth it only if you've got 128GB+ or a GPU box.
S Frontier Model

Gemma 4 (Google)

Open-weight Google model family (Apache 2.0)

Google's open-weight family — now the on-device champion.

AgenticMedium MCPN/A RAME4B: 3GB · 26B-A4B Q4: ~18GB · 31B Q4: ~20GB PlatformAny via Ollama / MLX / LM Studio
Released April 2026 under Apache 2.0. Four sizes: E2B / E4B (edge — E4B runs in ~3GB with multimodal audio), 26B-A4B (Mixture-of-Experts, ~3.8B active — the practical local pick), and 31B dense (maximum quality). The 26B MoE reaches ~97% of the 31B’s quality at a fraction of the compute.
Free (Apache 2.0 — unrestricted commercial use)
Redundancy check — Overlaps Phi-4 / Qwen 3.6 for the 'capable small model' slot.
Best for: captains who want the most capable model that still runs on modest hardware — laptop to Mac Studio.
Pull Gemma 4 first. The 26B MoE runs great on 32GB; the E4B even runs on phone-class hardware.
A Frontier Model

Phi-4 / Phi-4 Mini (Microsoft)

Small but capable Microsoft models

Punches above its weight — and Mini runs on almost anything.

AgenticMedium MCPN/A RAMMini: 4GB · Phi-4 Q4: 9GB PlatformAny via Ollama / MLX
Phi-4 (dense 14B) scores higher than its size suggests on reasoning. Phi-4 Mini is the best pick for 4-8GB machines. Fast on any modern Mac; great for lighter hardware.
Free (MIT)
Redundancy check — Overlaps Gemma 4 for the 'capable small model' slot.
Best for: captains on lighter hardware (8-32GB) who still want solid reasoning.
If your Mac is 8-32GB, Phi-4 (or Mini) + Gemma 4 are your workhorses. Tiny, fast, smart.
S Voice

Whisper.cpp

Local speech-to-text (Whisper port)

Whisper running locally — no API calls.

AgenticN/A MCPN/A RAM2-8GB depending on model size PlatformMac · Linux · Windows
C++ port of OpenAI's Whisper, optimized for CPU and Apple Silicon. Real-time transcription on a Mac. Much faster than the original Python implementation.
Free (MIT)
Redundancy check — Different from cloud Whisper API (this runs locally, no upload).
Best for: captains transcribing sensitive content (counseling notes, sermon prep) who don't want audio leaving the machine.
Use this over the API when content is sensitive. Free and fast.
A Voice

WhisperX

Whisper + speaker diarization

Whisper with speaker labels and word-level timestamps.

AgenticN/A MCPN/A RAM8-16GB PlatformMac · Linux · Windows
Builds on Whisper to add speaker diarization (who said what) and word-level alignment. Useful for transcribing conversations, panel recordings, sermons with multiple voices.
Free (BSD)
Redundancy check — Adds to Whisper.cpp; not a replacement.
Best for: captains transcribing multi-speaker recordings (conversations, panels, group prayer, board meetings).
Use when you need to know who said what. Free.
C Voice

Kokoro / OpenVoice

Local text-to-speech / voice cloning

Local TTS — your voice, on your machine.

AgenticN/A MCPN/A RAMKokoro: 2-4GB (CPU ok) · OpenVoice: 8-16GB PlatformMac · Linux · Windows
Kokoro (2026) is an 82M-param TTS model that sounds better than models 20x its size and runs on plain CPU — the new default for local narration. OpenVoice still leads for voice cloning from short samples. Both keep audio off the cloud.
Free (open source)
Redundancy check — Different from ElevenLabs (cloud, polished). Kokoro closes much of the quality gap, for free + private.
Best for: captains who want natural narration (devotionals, audio overviews) or private voice cloning without uploading samples.
Kokoro is shockingly good for its size and runs on CPU. Start there; reach for OpenVoice only if you need cloning.
S Coding

Continue.dev

VS Code extension w/ local model support

Cursor-like AI coding with your local LLM.

AgenticHigh MCPYes RAMPer model used PlatformMac · Linux · Windows
Open-source VS Code (and JetBrains) extension. Connect to Ollama / LM Studio for autocomplete and chat using local models. Free, private.
Free (open source)
Redundancy check — Overlaps Cursor for AI coding; Continue is free + uses your local models.
Best for: captains who want Cursor-style coding without the subscription, using their local models.
Best free Cursor alternative for the local-first captain.
A Coding

Cline (formerly Claude Dev)

Open-source agentic VS Code extension

Open-source autonomous coding agent.

AgenticVery High MCPYes RAMPer model used PlatformMac · Linux · Windows
VS Code extension that operates like a Claude Code clone — autonomous agent that reads/writes files, runs commands, plans multi-step changes. Works with Anthropic, OpenAI, or local models via Ollama.
Free (open source); pay for whichever model API you use
Redundancy check — Closest local-friendly alternative to Claude Code.
Best for: captains who want an autonomous coding agent that can run on local models.
Excellent. Pairs with Llama 4 70B locally for a free Claude Code substitute.
C Coding

Tabby

Self-hosted code autocomplete

GitHub Copilot-style autocomplete, on your own server.

AgenticMedium MCPNo RAM8-24GB depending on model PlatformDocker (Mac · Linux · Windows)
Self-hosted alternative to Copilot. Runs as a Docker container, supports VS Code, JetBrains, Vim. Uses local code models like StarCoder.
Free (open source); paid Tabby Pro tier exists
Redundancy check — Overlaps GitHub Copilot. Tabby is self-hosted; Copilot is cloud.
Best for: captains in regulated industries who can't send code to a cloud autocomplete service.
Niche. Use only if compliance forbids cloud Copilot.
S Knowledge / RAG

AnythingLLM

Local RAG over your documents

ChatGPT-style chat over your own files. 100% local.

AgenticMedium MCPPlugin RAM8-16GB + model RAM PlatformMac · Linux · Windows
Drop in PDFs, Word docs, websites — AnythingLLM ingests them, stores them in a local vector database, lets you chat with the corpus using a local LLM. Free, open-source, polished UI.
Free (open source); paid cloud tier exists
Redundancy check — Overlaps Khoj / Open WebUI for RAG.
Best for: captains who want NotebookLM but local — chat over your own books, sermons, family records.
Best NotebookLM alternative that runs entirely on your machine.
A Knowledge / RAG

Khoj

Personal AI search engine

AI search across your notes, emails, files.

AgenticMedium MCPPlugin RAM8-16GB + model RAM PlatformMac · Linux · Windows
Open-source 'Perplexity for your own stuff.' Indexes Obsidian, Notion, GitHub, email, and runs AI search + chat over them. Self-hosted; free.
Free (self-hosted)
Redundancy check — Overlaps AnythingLLM; Khoj is more search-focused.
Best for: captains with a deep personal corpus (Obsidian + email + files) who want AI search over all of it.
Killer pairing with Obsidian. Free, fast, private.
C Knowledge / RAG

Obsidian + Smart Connections

Local-first knowledge graph + AI search plugin

Obsidian + AI plugins, fully local.

AgenticMedium MCPPlugin RAMPer model used PlatformMac · Linux · Windows
Same Obsidian as the cloud Compass entry — but with the Smart Connections plugin pointed at a local LLM, every note in your vault becomes AI-searchable without sending data to a cloud.
Obsidian free; Smart Connections plugin free; some plugins paid
Redundancy check — Same Obsidian as the cloud entry, just configured locally.
Best for: captains already on Obsidian who want a privacy-first AI layer over their notes.
If Obsidian is your second brain, add Smart Connections + a local model and you have private AI search over everything.
A Automation

n8n self-hosted

Self-hosted automation platform

Same n8n from the Compass — running on your machine.

AgenticHigh MCPNative RAM4-8GB for n8n + per model PlatformDocker (Mac · Linux · Windows)
Self-host via Docker. All AI nodes work with local Ollama / LM Studio. The captain's choice for full data sovereignty + automation. MCP-native.
Free (self-hosted; Docker)
Redundancy check — Same n8n as the cloud entry.
Best for: captains who want enterprise-grade automation with zero data leaving their network.
If you have a homelab, n8n + Ollama is the local automation stack.
S Automation

MCP Servers

Local Model Context Protocol servers

Run MCP servers on your machine to wire local tools into Cowork.

AgenticHigh MCPNative RAMMinimal (per server) PlatformMac · Linux · Windows
Anthropic publishes reference MCP servers (filesystem, git, sqlite, etc.) and the community has hundreds more. Run them locally and Cowork or Claude Code can use them as tools — purely local.
Free (open source)
Redundancy check — Different layer — these are the building blocks for agentic systems.
Best for: captains extending Cowork with local capabilities (custom databases, internal APIs, file systems).
Future-proof. As MCP grows, more captain-relevant servers will exist.
C Automation

Goose (local backend)

Open-source agentic desktop tool

Same Goose from the Frontier — running purely local.

AgenticVery High MCPNative RAMPer model + 1-2GB for Goose PlatformMac · Linux · Windows
Block's open-source MCP-native desktop agent. Point it at a local Ollama backend and you have a fully local autonomous agent.
Free, open source
Redundancy check — Same Goose as the Frontier listing; this is the local-first config.
Best for: captains who want Cowork-class agentic capability without the Anthropic subscription.
Best free agent. Pair with Llama 4 70B locally for a serious autonomous setup.
No tools in this filter.