Filter by category, search by name, or re-sort — the controls compose with the hardware tiers above. Best first ranks by tier (S→C).
S
LLM Runner
Local LLM runner (CLI + API)
The standard for running open models on your own machine.
AgenticLow
MCPPlugin
RAM8GB (small) · 32GB+ for 70B class
PlatformMac · Linux · Windows
Single-command install, pulls models like Docker images. Runs Llama, Qwen, DeepSeek, Gemma, Mistral, etc. on Mac / Linux / Windows. Has a REST API so other tools can hit it.
Free (open source)
Redundancy check — Overlaps LM Studio (CLI vs GUI). Most captains pick one.
Best for: technical captains comfortable with the terminal; the foundation other local tools build on.
Start here. Most-used local runner for a reason — simple, stable, fast.
A
LLM Runner
Local LLM runner (GUI app)
The captain-friendly way to run local models.
AgenticLow
MCPYes
RAM8GB · 32GB+ for 70B class
PlatformMac · Linux · Windows
Desktop app for Mac / Linux / Windows. Browse, download, run open-weight models from a clean GUI. Includes a chat interface and an OpenAI-compatible API. No terminal required.
Free
Redundancy check — Overlaps Ollama. LM Studio has a nicer UI; Ollama has a smaller resource footprint.
Best for: captains who want point-and-click local AI without learning a terminal.
Best entry point if you don't already love the terminal. Free, fast, works.
B
LLM Runner
Apple Silicon ML framework
The fastest way to run local models on Mac.
AgenticN/A
MCPN/A
RAMSame as model: 16GB for 7B, 64GB+ for 70B Q4
PlatformApple Silicon only (M1/M2/M3/M4)
Apple's native ML framework for M-series chips. Models converted to MLX format run faster and with lower memory than llama.cpp on Apple Silicon. Pair with mlx-lm or mlx-examples to use it.
Free (open source)
Redundancy check — Different layer than Ollama — Ollama can use MLX as a backend on Macs.
Best for: Mac captains who want maximum performance from their unified memory.
If you have a Mac Studio M3 Ultra, this is what makes 70B+ models feel snappy.
C
LLM Runner
Local LLM inference engine
The C++ engine under most local LLM tools.
AgenticN/A
MCPN/A
RAMPer model
PlatformMac · Linux · Windows
Powers Ollama, LM Studio, and many others under the hood. Direct command-line use is technical; most captains use it via Ollama or LM Studio. The GGUF model format originated here.
Free (open source)
Redundancy check — Indirectly used by Ollama / LM Studio.
Best for: captains who want fine-grained control over inference parameters and model formats.
You don't need this directly unless you're benchmarking. Trust Ollama / LM Studio to handle it.
A
LLM Runner
Local web UI for any LLM backend
ChatGPT-style UI for your local models.
AgenticMedium
MCPYes
RAMServer: 4GB · plus per-model RAM
PlatformDocker (Mac · Linux · Windows)
Runs as a Docker container, connects to Ollama / LM Studio / OpenAI-compatible backends. Adds chat history, RAG over uploaded files, multi-user, prompt library — feels like ChatGPT, runs local.
Free (open source)
Redundancy check — Different from Ollama — Ollama is the engine, Open WebUI is the GUI on top.
Best for: captains who want a polished web interface for their local models.
Best 'looks like ChatGPT, runs on my hardware' choice. Pair with Ollama; you're set.
B
Frontier Model
Open-weight frontier model (Meta's last open flagship)
Meta's open frontier — Scout & Maverick.
AgenticHigh
MCPN/A
RAMScout Q4: ~64GB · Maverick: larger
PlatformAny platform via Ollama / MLX
The Scout and Maverick variants, Scout's long-context window unmatched for big documents; quality competitive with top closed models on many tasks. Note: in April 2026 Meta's newest flagship (Muse Spark) went closed-weight, so Llama 4 is — for now — the last open Meta model. Qwen and DeepSeek are the open frontier going forward.
Free (Meta license; not pure OSS but generous)
Redundancy check — Overlaps Qwen 3.6 / DeepSeek — which are now the more actively-advancing open models.
Best for: captains who want a proven, strong open-weight English-language model.
Proven and solid. But for a fresh local stack in 2026, reach for Qwen 3.6 or DeepSeek V4 first — Meta's open line has paused.
A
Frontier Model
Qwen 3.5 / 3.6 (Alibaba)
Open-weight frontier family (multilingual, MoE + dense)
Alibaba's open frontier. Best-in-class multilingual + strong coding.
AgenticVery High
MCPYes
RAM27B Q4: ~22GB · 235B-A22B Q4: 96GB+
PlatformAny via Ollama / MLX / vLLM
Qwen 3.6 27B is the best dense coding model you can run locally (~77% SWE-bench, ~22GB). Qwen 3.5 covers 200+ languages and scales to a 235B-A22B MoE that activates ~22B params, so it runs at 22B speed on 128GB+ unified memory. Apache 2.0.
Free (Apache 2.0)
Redundancy check — Overlaps Llama 4 / DeepSeek for English; wins on multilingual + dense coding.
Best for: captains doing multilingual ministry, coding, or who want the most capable open MoE.
The 27B is the everyday workhorse on 32-64GB; pull the 235B only if you've got 128GB+.
C
Frontier Model
DeepSeek V4
Open-weight frontier reasoning model (MoE, 1M context)
The DeepSeek that tops the open leaderboards — on your own machine.
AgenticHigh
MCPN/A
RAMQ4: ~96-110GB · Q2: ~64GB
PlatformMac (slow) · Linux + GPU (fast)
DeepSeek V4 (early 2026, MIT license) leads open models on raw capability — ~80% SWE-bench and a 1M-token context, with R1-style reasoning built in. Large MoE; quantized to int4 it fits on a 128GB+ Mac (with patience) or runs fast on a Linux GPU box.
Free (MIT)
Redundancy check — Overlaps Qwen 3.5 / Llama 4 for top-tier reasoning; V4 leads on raw benchmarks.
Best for: captains who want the strongest open reasoning model, no API key, full privacy.
Frontier-class for free — but heavy. Worth it only if you've got 128GB+ or a GPU box.
S
Frontier Model
Gemma 4 (Google)
Open-weight Google model family (Apache 2.0)
Google's open-weight family — now the on-device champion.
AgenticMedium
MCPN/A
RAME4B: 3GB · 26B-A4B Q4: ~18GB · 31B Q4: ~20GB
PlatformAny via Ollama / MLX / LM Studio
Released April 2026 under Apache 2.0. Four sizes: E2B / E4B (edge — E4B runs in ~3GB with multimodal audio), 26B-A4B (Mixture-of-Experts, ~3.8B active — the practical local pick), and 31B dense (maximum quality). The 26B MoE reaches ~97% of the 31B’s quality at a fraction of the compute.
Free (Apache 2.0 — unrestricted commercial use)
Redundancy check — Overlaps Phi-4 / Qwen 3.6 for the 'capable small model' slot.
Best for: captains who want the most capable model that still runs on modest hardware — laptop to Mac Studio.
Pull Gemma 4 first. The 26B MoE runs great on 32GB; the E4B even runs on phone-class hardware.
A
Frontier Model
Phi-4 / Phi-4 Mini (Microsoft)
Small but capable Microsoft models
Punches above its weight — and Mini runs on almost anything.
AgenticMedium
MCPN/A
RAMMini: 4GB · Phi-4 Q4: 9GB
PlatformAny via Ollama / MLX
Phi-4 (dense 14B) scores higher than its size suggests on reasoning. Phi-4 Mini is the best pick for 4-8GB machines. Fast on any modern Mac; great for lighter hardware.
Free (MIT)
Redundancy check — Overlaps Gemma 4 for the 'capable small model' slot.
Best for: captains on lighter hardware (8-32GB) who still want solid reasoning.
If your Mac is 8-32GB, Phi-4 (or Mini) + Gemma 4 are your workhorses. Tiny, fast, smart.
S
Voice
Local speech-to-text (Whisper port)
Whisper running locally — no API calls.
AgenticN/A
MCPN/A
RAM2-8GB depending on model size
PlatformMac · Linux · Windows
C++ port of OpenAI's Whisper, optimized for CPU and Apple Silicon. Real-time transcription on a Mac. Much faster than the original Python implementation.
Free (MIT)
Redundancy check — Different from cloud Whisper API (this runs locally, no upload).
Best for: captains transcribing sensitive content (counseling notes, sermon prep) who don't want audio leaving the machine.
Use this over the API when content is sensitive. Free and fast.
A
Voice
Whisper + speaker diarization
Whisper with speaker labels and word-level timestamps.
AgenticN/A
MCPN/A
RAM8-16GB
PlatformMac · Linux · Windows
Builds on Whisper to add speaker diarization (who said what) and word-level alignment. Useful for transcribing conversations, panel recordings, sermons with multiple voices.
Free (BSD)
Redundancy check — Adds to Whisper.cpp; not a replacement.
Best for: captains transcribing multi-speaker recordings (conversations, panels, group prayer, board meetings).
Use when you need to know who said what. Free.
C
Voice
Kokoro / OpenVoice
Local text-to-speech / voice cloning
Local TTS — your voice, on your machine.
AgenticN/A
MCPN/A
RAMKokoro: 2-4GB (CPU ok) · OpenVoice: 8-16GB
PlatformMac · Linux · Windows
Kokoro (2026) is an 82M-param TTS model that sounds better than models 20x its size and runs on plain CPU — the new default for local narration. OpenVoice still leads for voice cloning from short samples. Both keep audio off the cloud.
Free (open source)
Redundancy check — Different from ElevenLabs (cloud, polished). Kokoro closes much of the quality gap, for free + private.
Best for: captains who want natural narration (devotionals, audio overviews) or private voice cloning without uploading samples.
Kokoro is shockingly good for its size and runs on CPU. Start there; reach for OpenVoice only if you need cloning.
S
Coding
VS Code extension w/ local model support
Cursor-like AI coding with your local LLM.
AgenticHigh
MCPYes
RAMPer model used
PlatformMac · Linux · Windows
Open-source VS Code (and JetBrains) extension. Connect to Ollama / LM Studio for autocomplete and chat using local models. Free, private.
Free (open source)
Redundancy check — Overlaps Cursor for AI coding; Continue is free + uses your local models.
Best for: captains who want Cursor-style coding without the subscription, using their local models.
Best free Cursor alternative for the local-first captain.
A
Coding
Open-source agentic VS Code extension
Open-source autonomous coding agent.
AgenticVery High
MCPYes
RAMPer model used
PlatformMac · Linux · Windows
VS Code extension that operates like a Claude Code clone — autonomous agent that reads/writes files, runs commands, plans multi-step changes. Works with Anthropic, OpenAI, or local models via Ollama.
Free (open source); pay for whichever model API you use
Redundancy check — Closest local-friendly alternative to Claude Code.
Best for: captains who want an autonomous coding agent that can run on local models.
Excellent. Pairs with Llama 4 70B locally for a free Claude Code substitute.
C
Coding
Self-hosted code autocomplete
GitHub Copilot-style autocomplete, on your own server.
AgenticMedium
MCPNo
RAM8-24GB depending on model
PlatformDocker (Mac · Linux · Windows)
Self-hosted alternative to Copilot. Runs as a Docker container, supports VS Code, JetBrains, Vim. Uses local code models like StarCoder.
Free (open source); paid Tabby Pro tier exists
Redundancy check — Overlaps GitHub Copilot. Tabby is self-hosted; Copilot is cloud.
Best for: captains in regulated industries who can't send code to a cloud autocomplete service.
Niche. Use only if compliance forbids cloud Copilot.
S
Knowledge / RAG
Local RAG over your documents
ChatGPT-style chat over your own files. 100% local.
AgenticMedium
MCPPlugin
RAM8-16GB + model RAM
PlatformMac · Linux · Windows
Drop in PDFs, Word docs, websites — AnythingLLM ingests them, stores them in a local vector database, lets you chat with the corpus using a local LLM. Free, open-source, polished UI.
Free (open source); paid cloud tier exists
Redundancy check — Overlaps Khoj / Open WebUI for RAG.
Best for: captains who want NotebookLM but local — chat over your own books, sermons, family records.
Best NotebookLM alternative that runs entirely on your machine.
A
Knowledge / RAG
Personal AI search engine
AI search across your notes, emails, files.
AgenticMedium
MCPPlugin
RAM8-16GB + model RAM
PlatformMac · Linux · Windows
Open-source 'Perplexity for your own stuff.' Indexes Obsidian, Notion, GitHub, email, and runs AI search + chat over them. Self-hosted; free.
Free (self-hosted)
Redundancy check — Overlaps AnythingLLM; Khoj is more search-focused.
Best for: captains with a deep personal corpus (Obsidian + email + files) who want AI search over all of it.
Killer pairing with Obsidian. Free, fast, private.
C
Knowledge / RAG
Local-first knowledge graph + AI search plugin
Obsidian + AI plugins, fully local.
AgenticMedium
MCPPlugin
RAMPer model used
PlatformMac · Linux · Windows
Same Obsidian as the cloud Compass entry — but with the Smart Connections plugin pointed at a local LLM, every note in your vault becomes AI-searchable without sending data to a cloud.
Obsidian free; Smart Connections plugin free; some plugins paid
Redundancy check — Same Obsidian as the cloud entry, just configured locally.
Best for: captains already on Obsidian who want a privacy-first AI layer over their notes.
If Obsidian is your second brain, add Smart Connections + a local model and you have private AI search over everything.
A
Automation
Self-hosted automation platform
Same n8n from the Compass — running on your machine.
AgenticHigh
MCPNative
RAM4-8GB for n8n + per model
PlatformDocker (Mac · Linux · Windows)
Self-host via Docker. All AI nodes work with local Ollama / LM Studio. The captain's choice for full data sovereignty + automation. MCP-native.
Free (self-hosted; Docker)
Redundancy check — Same n8n as the cloud entry.
Best for: captains who want enterprise-grade automation with zero data leaving their network.
If you have a homelab, n8n + Ollama is the local automation stack.
S
Automation
Local Model Context Protocol servers
Run MCP servers on your machine to wire local tools into Cowork.
AgenticHigh
MCPNative
RAMMinimal (per server)
PlatformMac · Linux · Windows
Anthropic publishes reference MCP servers (filesystem, git, sqlite, etc.) and the community has hundreds more. Run them locally and Cowork or Claude Code can use them as tools — purely local.
Free (open source)
Redundancy check — Different layer — these are the building blocks for agentic systems.
Best for: captains extending Cowork with local capabilities (custom databases, internal APIs, file systems).
Future-proof. As MCP grows, more captain-relevant servers will exist.
C
Automation
Open-source agentic desktop tool
Same Goose from the Frontier — running purely local.
AgenticVery High
MCPNative
RAMPer model + 1-2GB for Goose
PlatformMac · Linux · Windows
Block's open-source MCP-native desktop agent. Point it at a local Ollama backend and you have a fully local autonomous agent.
Free, open source
Redundancy check — Same Goose as the Frontier listing; this is the local-first config.
Best for: captains who want Cowork-class agentic capability without the Anthropic subscription.
Best free agent. Pair with Llama 4 70B locally for a serious autonomous setup.
No tools in this filter.