Filter by category, search by name, or re-sort — the controls compose with the hardware tiers above. Best first ranks by tier (S→C).
S
LLM Runner
Local LLM runner (CLI + API)
The standard for running open models on your own machine.
AgenticLow
MCPPlugin
RAM8GB (small) · 32GB+ for 70B class
PlatformMac · Linux · Windows
Single-command install, pulls models like Docker images. Runs Llama, Qwen, DeepSeek, Gemma, Mistral, etc. on Mac / Linux / Windows. Has a REST API so other tools can hit it.
Free (open source)
Redundancy check — Overlaps LM Studio (CLI vs GUI). Most watchmen pick one.
Best for: technical watchmen comfortable with the terminal; the foundation other local tools build on.
Start here. Most-used local runner for a reason — simple, stable, fast.
A
LLM Runner
Local LLM runner (GUI app)
The watchman-friendly way to run local models.
AgenticLow
MCPYes
RAM8GB · 32GB+ for 70B class
PlatformMac · Linux · Windows
Desktop app for Mac / Linux / Windows. Browse, download, run open-weight models from a clean GUI. Includes a chat interface and an OpenAI-compatible API. No terminal required.
Free
Redundancy check — Overlaps Ollama. LM Studio has a nicer UI; Ollama has a smaller resource footprint.
Best for: watchmen who want point-and-click local AI without learning a terminal.
Best entry point if you don't already love the terminal. Free, fast, works.
B
LLM Runner
Apple Silicon ML framework
The fastest way to run local models on Mac.
AgenticN/A
MCPN/A
RAMSame as model: 16GB for 7B, 64GB+ for 70B Q4
PlatformApple Silicon only (M1/M2/M3/M4)
Apple's native ML framework for M-series chips. Models converted to MLX format run faster and with lower memory than llama.cpp on Apple Silicon. Pair with mlx-lm or mlx-examples to use it.
Free (open source)
Redundancy check — Different layer than Ollama — Ollama can use MLX as a backend on Macs.
Best for: Mac watchmen who want maximum performance from their unified memory.
If you have a Mac Studio M3 Ultra, this is what makes 70B+ models feel snappy.
C
LLM Runner
Local LLM inference engine
The C++ engine under most local LLM tools.
AgenticN/A
MCPN/A
RAMPer model
PlatformMac · Linux · Windows
Powers Ollama, LM Studio, and many others under the hood. Direct command-line use is technical; most watchmen use it via Ollama or LM Studio. The GGUF model format originated here.
Free (open source)
Redundancy check — Indirectly used by Ollama / LM Studio.
Best for: watchmen who want fine-grained control over inference parameters and model formats.
You don't need this directly unless you're benchmarking. Trust Ollama / LM Studio to handle it.
A
LLM Runner
Local web UI for any LLM backend
ChatGPT-style UI for your local models.
AgenticMedium
MCPYes
RAMServer: 4GB · plus per-model RAM
PlatformDocker (Mac · Linux · Windows)
Runs as a Docker container, connects to Ollama / LM Studio / OpenAI-compatible backends. Adds chat history, RAG over uploaded files, multi-user, prompt library — feels like ChatGPT, runs local.
Free (open source)
Redundancy check — Different from Ollama — Ollama is the engine, Open WebUI is the GUI on top.
Best for: watchmen who want a polished web interface for their local models.
Best 'looks like ChatGPT, runs on my hardware' choice. Pair with Ollama; you're set.
B
Frontier Model
Open-weight frontier model (Meta's last open flagship)
Meta's open frontier — Scout & Maverick.
AgenticHigh
MCPN/A
RAMScout Q4: ~64GB · Maverick: larger
PlatformAny platform via Ollama / MLX
The Scout and Maverick variants, Scout's long-context window unmatched for big documents; quality competitive with top closed models on many tasks. Note: in April 2026 Meta's newest flagship (Muse Spark) went closed-weight, so Llama 4 is — for now — the last open Meta model. Qwen and DeepSeek are the open frontier going forward.
Free (Meta license; not pure OSS but generous)
Redundancy check — Overlaps Qwen 3.6 / DeepSeek — which are now the more actively-advancing open models.
Best for: watchmen who want a proven, strong open-weight English-language model.
Proven and solid. But for a fresh local stack in 2026, reach for Qwen 3.6 or DeepSeek V4 first — Meta's open line has paused.
A
Frontier Model
Qwen 3.5 / 3.6 (Alibaba)
Open-weight frontier family (multilingual, MoE + dense)
Alibaba's open frontier. Best-in-class multilingual + strong coding.
AgenticVery High
MCPYes
RAM27B Q4: ~22GB · 235B-A22B Q4: 96GB+
PlatformAny via Ollama / MLX / vLLM
Qwen 3.6 27B is the best dense coding model you can run locally (~77% SWE-bench, ~22GB). Qwen 3.5 covers 200+ languages and scales to a 235B-A22B MoE that activates ~22B params, so it runs at 22B speed on 128GB+ unified memory. Apache 2.0.
Free (Apache 2.0)
Redundancy check — Overlaps Llama 4 / DeepSeek for English; wins on multilingual + dense coding.
Best for: watchmen doing multilingual ministry, coding, or who want the most capable open MoE.
The 27B is the everyday workhorse on 32-64GB; pull the 235B only if you've got 128GB+.
C
Frontier Model
DeepSeek V4
Open-weight frontier reasoning model (MoE, 1M context)
The DeepSeek that tops the open leaderboards — on your own machine.
AgenticHigh
MCPN/A
RAMQ4: ~96-110GB · Q2: ~64GB
PlatformMac (slow) · Linux + GPU (fast)
DeepSeek V4 (early 2026, MIT license) leads open models on raw capability — ~80% SWE-bench and a 1M-token context, with R1-style reasoning built in. Large MoE; quantized to int4 it fits on a 128GB+ Mac (with patience) or runs fast on a Linux GPU box.
Free (MIT)
Redundancy check — Overlaps Qwen 3.5 / Llama 4 for top-tier reasoning; V4 leads on raw benchmarks.
Best for: watchmen who want the strongest open reasoning model, no API key, full privacy.
Frontier-class for free — but heavy. Worth it only if you've got 128GB+ or a GPU box.
B
Frontier Model
Mistral Large 3 / Medium 3.5
European open-weight frontier model
France's open-weight contender — strong code, EU-friendly.
AgenticHigh
MCPN/A
RAMMedium 3.5: ~24GB · Large 3 Q4: 64GB+
PlatformAny via Ollama
Mistral Large 3 (dense flagship) and the lighter Medium 3.5 (~77% SWE-bench, the EU coding pick). Strong on European languages and technical work. Medium 3.5 fits ~32GB; Large 3 wants 64GB+ at int4.
Free weights (commercial license for business use)
Redundancy check — Overlaps Llama 4 / Qwen; Mistral often wins on French / Spanish / German.
Best for: watchmen in European-language ministries or technical work where Mistral's tuning shines.
Worth a side-by-side vs Llama / Qwen if your work touches European languages.
S
Frontier Model
Gemma 4 (Google)
Open-weight Google model family (Apache 2.0)
Google's open-weight family — now the on-device champion.
AgenticMedium
MCPN/A
RAME4B: 3GB · 26B-A4B Q4: ~18GB · 31B Q4: ~20GB
PlatformAny via Ollama / MLX / LM Studio
Released April 2026 under Apache 2.0. Four sizes: E2B / E4B (edge — E4B runs in ~3GB with multimodal audio), 26B-A4B (Mixture-of-Experts, ~3.8B active — the practical local pick), and 31B dense (maximum quality). The 26B MoE reaches ~97% of the 31B’s quality at a fraction of the compute.
Free (Apache 2.0 — unrestricted commercial use)
Redundancy check — Overlaps Phi-4 / Qwen 3.6 for the 'capable small model' slot.
Best for: watchmen who want the most capable model that still runs on modest hardware — laptop to Mac Studio.
Pull Gemma 4 first. The 26B MoE runs great on 32GB; the E4B even runs on phone-class hardware.
A
Frontier Model
Phi-4 / Phi-4 Mini (Microsoft)
Small but capable Microsoft models
Punches above its weight — and Mini runs on almost anything.
AgenticMedium
MCPN/A
RAMMini: 4GB · Phi-4 Q4: 9GB
PlatformAny via Ollama / MLX
Phi-4 (dense 14B) scores higher than its size suggests on reasoning. Phi-4 Mini is the best pick for 4-8GB machines. Fast on any modern Mac; great for lighter hardware.
Free (MIT)
Redundancy check — Overlaps Gemma 4 for the 'capable small model' slot.
Best for: watchmen on lighter hardware (8-32GB) who still want solid reasoning.
If your Mac is 8-32GB, Phi-4 (or Mini) + Gemma 4 are your workhorses. Tiny, fast, smart.
S
Image
Open-weight image generation
The current open-weight image champion.
AgenticLow
MCPN/A
RAM24GB+ unified or VRAM (full) · 12GB+ quantized
PlatformMac · Linux + GPU · Windows + GPU
Released by Black Forest Labs (the team behind Stable Diffusion). FLUX.1 [dev] is the open-weight version that runs locally with 24GB+ VRAM (or quantized on Apple Silicon). Quality rivals Midjourney for many use cases.
Free (non-commercial license; FLUX [pro] is commercial via API)
Redundancy check — Overlaps Stable Diffusion 3.5 (FLUX has better quality in 2026).
Best for: watchmen who want Midjourney-quality output running on their own hardware.
Run via ComfyUI for full control. Quantized FLUX runs on 24GB+ Mac unified memory.
C
Image
Open-weight image generation
Stability AI's flagship open model.
AgenticLow
MCPN/A
RAM16GB+ unified or VRAM
PlatformMac · Linux + GPU · Windows + GPU
Stable Diffusion 3.5 Large (8B params) is the latest. Free for non-commercial, commercial license available. Less SOTA than FLUX in 2026 but huge ecosystem of fine-tunes and LoRAs.
Free (Stability community license)
Redundancy check — Overlaps FLUX.1 — SD has more community fine-tunes; FLUX has better default quality.
Best for: watchmen who want a vast library of fine-tuned styles and LoRAs.
Use FLUX as your default; pull SD 3.5 for community fine-tunes and specific styles.
A
Image
Visual workflow editor for image / video models
Node-based UI for FLUX / SD / video models.
AgenticLow
MCPPlugin
RAMPer loaded model
PlatformMac · Linux · Windows
Web app that runs locally. Build node graphs to chain image gen, upscale, ControlNet, etc. The standard for serious local image work. Steeper learning curve than DrawThings or Fooocus.
Free (open source)
Redundancy check — Different layer — ComfyUI runs models like FLUX / SD via workflows.
Best for: watchmen making serious volume of visual content who want a reproducible pipeline.
Pair with FLUX. Steeper curve, but once you have a workflow saved, it's repeatable.
S
Voice
Local speech-to-text (Whisper port)
Whisper running locally — no API calls.
AgenticN/A
MCPN/A
RAM2-8GB depending on model size
PlatformMac · Linux · Windows
C++ port of OpenAI's Whisper, optimized for CPU and Apple Silicon. Real-time transcription on a Mac. Much faster than the original Python implementation.
Free (MIT)
Redundancy check — Different from cloud Whisper API (this runs locally, no upload).
Best for: watchmen transcribing sensitive content (counseling notes, sermon prep) who don't want audio leaving the machine.
Use this over the API when content is sensitive. Free and fast.
A
Voice
Whisper + speaker diarization
Whisper with speaker labels and word-level timestamps.
AgenticN/A
MCPN/A
RAM8-16GB
PlatformMac · Linux · Windows
Builds on Whisper to add speaker diarization (who said what) and word-level alignment. Useful for transcribing conversations, panel recordings, sermons with multiple voices.
Free (BSD)
Redundancy check — Adds to Whisper.cpp; not a replacement.
Best for: watchmen transcribing multi-speaker recordings (conversations, panels, group prayer, board meetings).
Use when you need to know who said what. Free.
C
Voice
Kokoro / OpenVoice
Local text-to-speech / voice cloning
Local TTS — your voice, on your machine.
AgenticN/A
MCPN/A
RAMKokoro: 2-4GB (CPU ok) · OpenVoice: 8-16GB
PlatformMac · Linux · Windows
Kokoro (2026) is an 82M-param TTS model that sounds better than models 20x its size and runs on plain CPU — the new default for local narration. OpenVoice still leads for voice cloning from short samples. Both keep audio off the cloud.
Free (open source)
Redundancy check — Different from ElevenLabs (cloud, polished). Kokoro closes much of the quality gap, for free + private.
Best for: watchmen who want natural narration (devotionals, audio overviews) or private voice cloning without uploading samples.
Kokoro is shockingly good for its size and runs on CPU. Start there; reach for OpenVoice only if you need cloning.
S
Coding
VS Code extension w/ local model support
Cursor-like AI coding with your local LLM.
AgenticHigh
MCPYes
RAMPer model used
PlatformMac · Linux · Windows
Open-source VS Code (and JetBrains) extension. Connect to Ollama / LM Studio for autocomplete and chat using local models. Free, private.
Free (open source)
Redundancy check — Overlaps Cursor for AI coding; Continue is free + uses your local models.
Best for: watchmen who want Cursor-style coding without the subscription, using their local models.
Best free Cursor alternative for the local-first watchman.
A
Coding
Open-source agentic VS Code extension
Open-source autonomous coding agent.
AgenticVery High
MCPYes
RAMPer model used
PlatformMac · Linux · Windows
VS Code extension that operates like a Claude Code clone — autonomous agent that reads/writes files, runs commands, plans multi-step changes. Works with Anthropic, OpenAI, or local models via Ollama.
Free (open source); pay for whichever model API you use
Redundancy check — Closest local-friendly alternative to Claude Code.
Best for: watchmen who want an autonomous coding agent that can run on local models.
Excellent. Pairs with Llama 4 70B locally for a free Claude Code substitute.
B
Coding
Terminal AI pair programmer (local-friendly)
Same Aider from the Toolkit — running on your local LLM.
AgenticHigh
MCPYes
RAMPer model used
PlatformMac · Linux · Windows
Aider supports any OpenAI-API-compatible backend. Point it at Ollama or LM Studio and you have a fully local Aider session. No API charges.
Free
Redundancy check — Same Aider as in the cloud; this is the same tool with a different model.
Best for: terminal-loving watchmen who want Claude Code's shape with their own local model.
If you already use Aider with Claude API, switching it to a local backend takes 30 seconds.
C
Coding
Self-hosted code autocomplete
GitHub Copilot-style autocomplete, on your own server.
AgenticMedium
MCPNo
RAM8-24GB depending on model
PlatformDocker (Mac · Linux · Windows)
Self-hosted alternative to Copilot. Runs as a Docker container, supports VS Code, JetBrains, Vim. Uses local code models like StarCoder.
Free (open source); paid Tabby Pro tier exists
Redundancy check — Overlaps GitHub Copilot. Tabby is self-hosted; Copilot is cloud.
Best for: watchmen in regulated industries who can't send code to a cloud autocomplete service.
Niche. Use only if compliance forbids cloud Copilot.
S
Knowledge / RAG
Local RAG over your documents
ChatGPT-style chat over your own files. 100% local.
AgenticMedium
MCPPlugin
RAM8-16GB + model RAM
PlatformMac · Linux · Windows
Drop in PDFs, Word docs, websites — AnythingLLM ingests them, stores them in a local vector database, lets you chat with the corpus using a local LLM. Free, open-source, polished UI.
Free (open source); paid cloud tier exists
Redundancy check — Overlaps Khoj / Open WebUI for RAG.
Best for: watchmen who want NotebookLM but local — chat over your own books, sermons, family records.
Best NotebookLM alternative that runs entirely on your machine.
A
Knowledge / RAG
Personal AI search engine
AI search across your notes, emails, files.
AgenticMedium
MCPPlugin
RAM8-16GB + model RAM
PlatformMac · Linux · Windows
Open-source 'Perplexity for your own stuff.' Indexes Obsidian, Notion, GitHub, email, and runs AI search + chat over them. Self-hosted; free.
Free (self-hosted)
Redundancy check — Overlaps AnythingLLM; Khoj is more search-focused.
Best for: watchmen with a deep personal corpus (Obsidian + email + files) who want AI search over all of it.
Killer pairing with Obsidian. Free, fast, private.
B
Knowledge / RAG
Open-source Perplexity clone
Perplexity, but local + uses your own models.
AgenticMedium
MCPNo
RAM8-16GB + model RAM
PlatformMac · Linux · Windows
Open-source Perplexity-like interface for AI search. Web search + LLM synthesis with citations. Runs locally; uses any LLM (local or API).
Free (open source)
Redundancy check — Overlaps Perplexity Pro for research.
Best for: watchmen who do research-heavy work and want Perplexity's UX without a subscription.
Solid Perplexity alternative for the local-first watchman.
C
Knowledge / RAG
Local-first knowledge graph + AI search plugin
Obsidian + AI plugins, fully local.
AgenticMedium
MCPPlugin
RAMPer model used
PlatformMac · Linux · Windows
Same Obsidian as the cloud Toolkit entry — but with the Smart Connections plugin pointed at a local LLM, every note in your vault becomes AI-searchable without sending data to a cloud.
Obsidian free; Smart Connections plugin free; some plugins paid
Redundancy check — Same Obsidian as the cloud entry, just configured locally.
Best for: watchmen already on Obsidian who want a privacy-first AI layer over their notes.
If Obsidian is your second brain, add Smart Connections + a local model and you have private AI search over everything.
A
Automation
Self-hosted automation platform
Same n8n from the Toolkit — running on your machine.
AgenticHigh
MCPNative
RAM4-8GB for n8n + per model
PlatformDocker (Mac · Linux · Windows)
Self-host via Docker. All AI nodes work with local Ollama / LM Studio. The watchman's choice for full data sovereignty + automation. MCP-native.
Free (self-hosted; Docker)
Redundancy check — Same n8n as the cloud entry.
Best for: watchmen who want enterprise-grade automation with zero data leaving their network.
If you have a homelab, n8n + Ollama is the local automation stack.
S
Automation
Local Model Context Protocol servers
Run MCP servers on your machine to wire local tools into Cowork.
AgenticHigh
MCPNative
RAMMinimal (per server)
PlatformMac · Linux · Windows
Anthropic publishes reference MCP servers (filesystem, git, sqlite, etc.) and the community has hundreds more. Run them locally and Cowork or Claude Code can use them as tools — purely local.
Free (open source)
Redundancy check — Different layer — these are the building blocks for agentic systems.
Best for: watchmen extending Cowork with local capabilities (custom databases, internal APIs, file systems).
Future-proof. As MCP grows, more watchman-relevant servers will exist.
C
Automation
Open-source agentic desktop tool
Same Goose from the Frontier — running purely local.
AgenticVery High
MCPNative
RAMPer model + 1-2GB for Goose
PlatformMac · Linux · Windows
Block's open-source MCP-native desktop agent. Point it at a local Ollama backend and you have a fully local autonomous agent.
Free, open source
Redundancy check — Same Goose as the Frontier listing; this is the local-first config.
Best for: watchmen who want Cowork-class agentic capability without the Anthropic subscription.
Best free agent. Pair with Llama 4 70B locally for a serious autonomous setup.
No tools in this filter.