The Sovereign Stack — Local AI Toolkit

S LLM Runner

Ollama ↗

Local LLM runner (CLI + API)

The standard for running open models on your own machine.

AgenticLow MCPPlugin RAM8GB (small) · 32GB+ for 70B class PlatformMac · Linux · Windows

Single-command install, pulls models like Docker images. Runs Llama, Qwen, DeepSeek, Gemma, Mistral, etc. on Mac / Linux / Windows. Has a REST API so other tools can hit it.

Free (open source)

Redundancy check — Overlaps LM Studio (CLI vs GUI). Most watchmen pick one.

Best for: technical watchmen comfortable with the terminal; the foundation other local tools build on.

Start here. Most-used local runner for a reason — simple, stable, fast.

A LLM Runner

LM Studio ↗

Local LLM runner (GUI app)

The watchman-friendly way to run local models.

AgenticLow MCPYes RAM8GB · 32GB+ for 70B class PlatformMac · Linux · Windows

Desktop app for Mac / Linux / Windows. Browse, download, run open-weight models from a clean GUI. Includes a chat interface and an OpenAI-compatible API. No terminal required.

Free

Redundancy check — Overlaps Ollama. LM Studio has a nicer UI; Ollama has a smaller resource footprint.

Best for: watchmen who want point-and-click local AI without learning a terminal.

Best entry point if you don't already love the terminal. Free, fast, works.

B LLM Runner

MLX ↗

Apple Silicon ML framework

The fastest way to run local models on Mac.

AgenticN/A MCPN/A RAMSame as model: 16GB for 7B, 64GB+ for 70B Q4 PlatformApple Silicon only (M1/M2/M3/M4)

Apple's native ML framework for M-series chips. Models converted to MLX format run faster and with lower memory than llama.cpp on Apple Silicon. Pair with mlx-lm or mlx-examples to use it.

Free (open source)

Redundancy check — Different layer than Ollama — Ollama can use MLX as a backend on Macs.

Best for: Mac watchmen who want maximum performance from their unified memory.

If you have a Mac Studio M3 Ultra, this is what makes 70B+ models feel snappy.

C LLM Runner

llama.cpp ↗

Local LLM inference engine

The C++ engine under most local LLM tools.

AgenticN/A MCPN/A RAMPer model PlatformMac · Linux · Windows

Powers Ollama, LM Studio, and many others under the hood. Direct command-line use is technical; most watchmen use it via Ollama or LM Studio. The GGUF model format originated here.

Free (open source)

Redundancy check — Indirectly used by Ollama / LM Studio.

Best for: watchmen who want fine-grained control over inference parameters and model formats.

You don't need this directly unless you're benchmarking. Trust Ollama / LM Studio to handle it.

A LLM Runner

Open WebUI ↗

Local web UI for any LLM backend

ChatGPT-style UI for your local models.

AgenticMedium MCPYes RAMServer: 4GB · plus per-model RAM PlatformDocker (Mac · Linux · Windows)

Runs as a Docker container, connects to Ollama / LM Studio / OpenAI-compatible backends. Adds chat history, RAG over uploaded files, multi-user, prompt library — feels like ChatGPT, runs local.

Free (open source)

Redundancy check — Different from Ollama — Ollama is the engine, Open WebUI is the GUI on top.

Best for: watchmen who want a polished web interface for their local models.

Best 'looks like ChatGPT, runs on my hardware' choice. Pair with Ollama; you're set.

B Frontier Model

Llama 4 (Meta) ↗

Open-weight frontier model (Meta's last open flagship)

Meta's open frontier — Scout & Maverick.

AgenticHigh MCPN/A RAMScout Q4: ~64GB · Maverick: larger PlatformAny platform via Ollama / MLX

The Scout and Maverick variants, Scout's long-context window unmatched for big documents; quality competitive with top closed models on many tasks. Note: in April 2026 Meta's newest flagship (Muse Spark) went closed-weight, so Llama 4 is — for now — the last open Meta model. Qwen and DeepSeek are the open frontier going forward.

Free (Meta license; not pure OSS but generous)

Redundancy check — Overlaps Qwen 3.6 / DeepSeek — which are now the more actively-advancing open models.

Best for: watchmen who want a proven, strong open-weight English-language model.

Proven and solid. But for a fresh local stack in 2026, reach for Qwen 3.6 or DeepSeek V4 first — Meta's open line has paused.

A Frontier Model

Qwen 3.5 / 3.6 (Alibaba)

Open-weight frontier family (multilingual, MoE + dense)

Alibaba's open frontier. Best-in-class multilingual + strong coding.

AgenticVery High MCPYes RAM27B Q4: ~22GB · 235B-A22B Q4: 96GB+ PlatformAny via Ollama / MLX / vLLM

Qwen 3.6 27B is the best dense coding model you can run locally (~77% SWE-bench, ~22GB). Qwen 3.5 covers 200+ languages and scales to a 235B-A22B MoE that activates ~22B params, so it runs at 22B speed on 128GB+ unified memory. Apache 2.0.

Free (Apache 2.0)

Redundancy check — Overlaps Llama 4 / DeepSeek for English; wins on multilingual + dense coding.

Best for: watchmen doing multilingual ministry, coding, or who want the most capable open MoE.

The 27B is the everyday workhorse on 32-64GB; pull the 235B only if you've got 128GB+.

C Frontier Model

DeepSeek V4

Open-weight frontier reasoning model (MoE, 1M context)

The DeepSeek that tops the open leaderboards — on your own machine.

AgenticHigh MCPN/A RAMQ4: ~96-110GB · Q2: ~64GB PlatformMac (slow) · Linux + GPU (fast)

DeepSeek V4 (early 2026, MIT license) leads open models on raw capability — ~80% SWE-bench and a 1M-token context, with R1-style reasoning built in. Large MoE; quantized to int4 it fits on a 128GB+ Mac (with patience) or runs fast on a Linux GPU box.

Free (MIT)

Redundancy check — Overlaps Qwen 3.5 / Llama 4 for top-tier reasoning; V4 leads on raw benchmarks.

Best for: watchmen who want the strongest open reasoning model, no API key, full privacy.

Frontier-class for free — but heavy. Worth it only if you've got 128GB+ or a GPU box.

B Frontier Model

Mistral Large 3 / Medium 3.5

European open-weight frontier model

France's open-weight contender — strong code, EU-friendly.

AgenticHigh MCPN/A RAMMedium 3.5: ~24GB · Large 3 Q4: 64GB+ PlatformAny via Ollama

Mistral Large 3 (dense flagship) and the lighter Medium 3.5 (~77% SWE-bench, the EU coding pick). Strong on European languages and technical work. Medium 3.5 fits ~32GB; Large 3 wants 64GB+ at int4.

Free weights (commercial license for business use)

Redundancy check — Overlaps Llama 4 / Qwen; Mistral often wins on French / Spanish / German.

Best for: watchmen in European-language ministries or technical work where Mistral's tuning shines.

Worth a side-by-side vs Llama / Qwen if your work touches European languages.

S Frontier Model

Gemma 4 (Google)

Open-weight Google model family (Apache 2.0)

Google's open-weight family — now the on-device champion.

AgenticMedium MCPN/A RAME4B: 3GB · 26B-A4B Q4: ~18GB · 31B Q4: ~20GB PlatformAny via Ollama / MLX / LM Studio

Released April 2026 under Apache 2.0. Four sizes: E2B / E4B (edge — E4B runs in ~3GB with multimodal audio), 26B-A4B (Mixture-of-Experts, ~3.8B active — the practical local pick), and 31B dense (maximum quality). The 26B MoE reaches ~97% of the 31B’s quality at a fraction of the compute.

Free (Apache 2.0 — unrestricted commercial use)

Redundancy check — Overlaps Phi-4 / Qwen 3.6 for the 'capable small model' slot.

Best for: watchmen who want the most capable model that still runs on modest hardware — laptop to Mac Studio.

Pull Gemma 4 first. The 26B MoE runs great on 32GB; the E4B even runs on phone-class hardware.

A Frontier Model

Phi-4 / Phi-4 Mini (Microsoft)

Small but capable Microsoft models

Punches above its weight — and Mini runs on almost anything.

AgenticMedium MCPN/A RAMMini: 4GB · Phi-4 Q4: 9GB PlatformAny via Ollama / MLX

Phi-4 (dense 14B) scores higher than its size suggests on reasoning. Phi-4 Mini is the best pick for 4-8GB machines. Fast on any modern Mac; great for lighter hardware.

Free (MIT)

Redundancy check — Overlaps Gemma 4 for the 'capable small model' slot.

Best for: watchmen on lighter hardware (8-32GB) who still want solid reasoning.

If your Mac is 8-32GB, Phi-4 (or Mini) + Gemma 4 are your workhorses. Tiny, fast, smart.

S Image

FLUX.1 dev ↗

Open-weight image generation

The current open-weight image champion.

AgenticLow MCPN/A RAM24GB+ unified or VRAM (full) · 12GB+ quantized PlatformMac · Linux + GPU · Windows + GPU

Released by Black Forest Labs (the team behind Stable Diffusion). FLUX.1 [dev] is the open-weight version that runs locally with 24GB+ VRAM (or quantized on Apple Silicon). Quality rivals Midjourney for many use cases.

Free (non-commercial license; FLUX [pro] is commercial via API)

Redundancy check — Overlaps Stable Diffusion 3.5 (FLUX has better quality in 2026).

Best for: watchmen who want Midjourney-quality output running on their own hardware.

Run via ComfyUI for full control. Quantized FLUX runs on 24GB+ Mac unified memory.

C Image

Stable Diffusion 3.5 ↗

Open-weight image generation

Stability AI's flagship open model.

AgenticLow MCPN/A RAM16GB+ unified or VRAM PlatformMac · Linux + GPU · Windows + GPU

Stable Diffusion 3.5 Large (8B params) is the latest. Free for non-commercial, commercial license available. Less SOTA than FLUX in 2026 but huge ecosystem of fine-tunes and LoRAs.

Free (Stability community license)

Redundancy check — Overlaps FLUX.1 — SD has more community fine-tunes; FLUX has better default quality.

Best for: watchmen who want a vast library of fine-tuned styles and LoRAs.

Use FLUX as your default; pull SD 3.5 for community fine-tunes and specific styles.

A Image

ComfyUI ↗

Visual workflow editor for image / video models

Node-based UI for FLUX / SD / video models.

AgenticLow MCPPlugin RAMPer loaded model PlatformMac · Linux · Windows

Web app that runs locally. Build node graphs to chain image gen, upscale, ControlNet, etc. The standard for serious local image work. Steeper learning curve than DrawThings or Fooocus.

Free (open source)

Redundancy check — Different layer — ComfyUI runs models like FLUX / SD via workflows.

Best for: watchmen making serious volume of visual content who want a reproducible pipeline.

Pair with FLUX. Steeper curve, but once you have a workflow saved, it's repeatable.

S Voice

Whisper.cpp ↗

Local speech-to-text (Whisper port)

Whisper running locally — no API calls.

AgenticN/A MCPN/A RAM2-8GB depending on model size PlatformMac · Linux · Windows

C++ port of OpenAI's Whisper, optimized for CPU and Apple Silicon. Real-time transcription on a Mac. Much faster than the original Python implementation.

Free (MIT)

Redundancy check — Different from cloud Whisper API (this runs locally, no upload).

Best for: watchmen transcribing sensitive content (counseling notes, sermon prep) who don't want audio leaving the machine.

Use this over the API when content is sensitive. Free and fast.

A Voice

WhisperX ↗

Whisper + speaker diarization

Whisper with speaker labels and word-level timestamps.

AgenticN/A MCPN/A RAM8-16GB PlatformMac · Linux · Windows

Builds on Whisper to add speaker diarization (who said what) and word-level alignment. Useful for transcribing conversations, panel recordings, sermons with multiple voices.

Free (BSD)

Redundancy check — Adds to Whisper.cpp; not a replacement.

Best for: watchmen transcribing multi-speaker recordings (conversations, panels, group prayer, board meetings).

Use when you need to know who said what. Free.

C Voice

Kokoro / OpenVoice

Local text-to-speech / voice cloning

Local TTS — your voice, on your machine.

AgenticN/A MCPN/A RAMKokoro: 2-4GB (CPU ok) · OpenVoice: 8-16GB PlatformMac · Linux · Windows

Kokoro (2026) is an 82M-param TTS model that sounds better than models 20x its size and runs on plain CPU — the new default for local narration. OpenVoice still leads for voice cloning from short samples. Both keep audio off the cloud.

Free (open source)

Redundancy check — Different from ElevenLabs (cloud, polished). Kokoro closes much of the quality gap, for free + private.

Best for: watchmen who want natural narration (devotionals, audio overviews) or private voice cloning without uploading samples.

Kokoro is shockingly good for its size and runs on CPU. Start there; reach for OpenVoice only if you need cloning.

S Coding

Continue.dev ↗

VS Code extension w/ local model support

Cursor-like AI coding with your local LLM.

AgenticHigh MCPYes RAMPer model used PlatformMac · Linux · Windows

Open-source VS Code (and JetBrains) extension. Connect to Ollama / LM Studio for autocomplete and chat using local models. Free, private.

Free (open source)

Redundancy check — Overlaps Cursor for AI coding; Continue is free + uses your local models.

Best for: watchmen who want Cursor-style coding without the subscription, using their local models.

Best free Cursor alternative for the local-first watchman.

A Coding

Cline (formerly Claude Dev) ↗

Open-source agentic VS Code extension

Open-source autonomous coding agent.

AgenticVery High MCPYes RAMPer model used PlatformMac · Linux · Windows

VS Code extension that operates like a Claude Code clone — autonomous agent that reads/writes files, runs commands, plans multi-step changes. Works with Anthropic, OpenAI, or local models via Ollama.

Free (open source); pay for whichever model API you use

Redundancy check — Closest local-friendly alternative to Claude Code.

Best for: watchmen who want an autonomous coding agent that can run on local models.

Excellent. Pairs with Llama 4 70B locally for a free Claude Code substitute.

B Coding

Aider (local backend) ↗

Terminal AI pair programmer (local-friendly)

Same Aider from the Toolkit — running on your local LLM.

AgenticHigh MCPYes RAMPer model used PlatformMac · Linux · Windows

Aider supports any OpenAI-API-compatible backend. Point it at Ollama or LM Studio and you have a fully local Aider session. No API charges.

Free

Redundancy check — Same Aider as in the cloud; this is the same tool with a different model.

Best for: terminal-loving watchmen who want Claude Code's shape with their own local model.

If you already use Aider with Claude API, switching it to a local backend takes 30 seconds.

C Coding

Tabby ↗

Self-hosted code autocomplete

GitHub Copilot-style autocomplete, on your own server.

AgenticMedium MCPNo RAM8-24GB depending on model PlatformDocker (Mac · Linux · Windows)

Self-hosted alternative to Copilot. Runs as a Docker container, supports VS Code, JetBrains, Vim. Uses local code models like StarCoder.

Free (open source); paid Tabby Pro tier exists

Redundancy check — Overlaps GitHub Copilot. Tabby is self-hosted; Copilot is cloud.

Best for: watchmen in regulated industries who can't send code to a cloud autocomplete service.

Niche. Use only if compliance forbids cloud Copilot.

S Knowledge / RAG

AnythingLLM ↗

Local RAG over your documents

ChatGPT-style chat over your own files. 100% local.

AgenticMedium MCPPlugin RAM8-16GB + model RAM PlatformMac · Linux · Windows

Drop in PDFs, Word docs, websites — AnythingLLM ingests them, stores them in a local vector database, lets you chat with the corpus using a local LLM. Free, open-source, polished UI.

Free (open source); paid cloud tier exists

Redundancy check — Overlaps Khoj / Open WebUI for RAG.

Best for: watchmen who want NotebookLM but local — chat over your own books, sermons, family records.

Best NotebookLM alternative that runs entirely on your machine.

A Knowledge / RAG

Khoj ↗

Personal AI search engine

AI search across your notes, emails, files.

AgenticMedium MCPPlugin RAM8-16GB + model RAM PlatformMac · Linux · Windows

Open-source 'Perplexity for your own stuff.' Indexes Obsidian, Notion, GitHub, email, and runs AI search + chat over them. Self-hosted; free.

Free (self-hosted)

Redundancy check — Overlaps AnythingLLM; Khoj is more search-focused.

Best for: watchmen with a deep personal corpus (Obsidian + email + files) who want AI search over all of it.

Killer pairing with Obsidian. Free, fast, private.

B Knowledge / RAG

Perplexica ↗

Open-source Perplexity clone

Perplexity, but local + uses your own models.

AgenticMedium MCPNo RAM8-16GB + model RAM PlatformMac · Linux · Windows

Open-source Perplexity-like interface for AI search. Web search + LLM synthesis with citations. Runs locally; uses any LLM (local or API).

Free (open source)

Redundancy check — Overlaps Perplexity Pro for research.

Best for: watchmen who do research-heavy work and want Perplexity's UX without a subscription.

Solid Perplexity alternative for the local-first watchman.

C Knowledge / RAG

Obsidian + Smart Connections ↗

Local-first knowledge graph + AI search plugin

Obsidian + AI plugins, fully local.

AgenticMedium MCPPlugin RAMPer model used PlatformMac · Linux · Windows

Same Obsidian as the cloud Toolkit entry — but with the Smart Connections plugin pointed at a local LLM, every note in your vault becomes AI-searchable without sending data to a cloud.

Obsidian free; Smart Connections plugin free; some plugins paid

Redundancy check — Same Obsidian as the cloud entry, just configured locally.

Best for: watchmen already on Obsidian who want a privacy-first AI layer over their notes.

If Obsidian is your second brain, add Smart Connections + a local model and you have private AI search over everything.

A Automation

n8n self-hosted ↗

Self-hosted automation platform

Same n8n from the Toolkit — running on your machine.

AgenticHigh MCPNative RAM4-8GB for n8n + per model PlatformDocker (Mac · Linux · Windows)

Self-host via Docker. All AI nodes work with local Ollama / LM Studio. The watchman's choice for full data sovereignty + automation. MCP-native.

Free (self-hosted; Docker)

Redundancy check — Same n8n as the cloud entry.

Best for: watchmen who want enterprise-grade automation with zero data leaving their network.

If you have a homelab, n8n + Ollama is the local automation stack.

S Automation

MCP Servers ↗

Local Model Context Protocol servers

Run MCP servers on your machine to wire local tools into Cowork.

AgenticHigh MCPNative RAMMinimal (per server) PlatformMac · Linux · Windows

Anthropic publishes reference MCP servers (filesystem, git, sqlite, etc.) and the community has hundreds more. Run them locally and Cowork or Claude Code can use them as tools — purely local.

Free (open source)

Redundancy check — Different layer — these are the building blocks for agentic systems.

Best for: watchmen extending Cowork with local capabilities (custom databases, internal APIs, file systems).

Future-proof. As MCP grows, more watchman-relevant servers will exist.

C Automation

Goose (local backend) ↗

Open-source agentic desktop tool

Same Goose from the Frontier — running purely local.

AgenticVery High MCPNative RAMPer model + 1-2GB for Goose PlatformMac · Linux · Windows

Block's open-source MCP-native desktop agent. Point it at a local Ollama backend and you have a fully local autonomous agent.

Free, open source

Redundancy check — Same Goose as the Frontier listing; this is the local-first config.

Best for: watchmen who want Cowork-class agentic capability without the Anthropic subscription.

Best free agent. Pair with Llama 4 70B locally for a serious autonomous setup.

Local AI Toolkit

Hardware Sizing Guide

Where local AI starts being useful

70B territory — frontier class

DeepSeek V4 / Qwen 235B territory

The 28 local tools

Ollama ↗

LM Studio ↗

MLX ↗

llama.cpp ↗

Open WebUI ↗

Llama 4 (Meta) ↗

Qwen 3.5 / 3.6 (Alibaba)

DeepSeek V4

Mistral Large 3 / Medium 3.5

Gemma 4 (Google)

Phi-4 / Phi-4 Mini (Microsoft)

FLUX.1 dev ↗

Stable Diffusion 3.5 ↗

ComfyUI ↗

Whisper.cpp ↗

WhisperX ↗

Kokoro / OpenVoice

Continue.dev ↗

Cline (formerly Claude Dev) ↗

Aider (local backend) ↗

Tabby ↗

AnythingLLM ↗

Khoj ↗

Perplexica ↗

Obsidian + Smart Connections ↗

n8n self-hosted ↗

MCP Servers ↗

Goose (local backend) ↗