← USMC Ministries Mission USMC Ministries · The Sovereign Stack

Local AI Toolkit

If you've already invested $6k+ into the iron — a Mac Studio with 128GB+ unified memory, or a similar Linux workstation with a 24GB+ GPU — this is the watchman's local toolkit. No subscriptions. No data leaving your machine. Total privacy. The agent runs on your hardware.

Most watchmen don't need this page. Cowork + ChatGPT Plus + Otter for $60/mo will outperform a local stack until you specifically need privacy, sovereignty, or unlimited usage at zero marginal cost. This page is for the watchmen who've already crossed that line.

30 local tools7 categories1 hardware tier guide
All tools listed are open-source or have free local-use tiers. Pricing snapshot: 2026 Q2. Local AI is moving even faster than cloud AI — expect significant turnover at every 6-month refresh.
SSTART HERE— Don't skip these.
ACORE— Adopt in the first month.
BDEPTH— After the basics are running.
COPTIONAL— Most watchmen skip these.

The 30 local tools

Each card shows agent capability, Model Context Protocol (MCP) support, minimum RAM, and supported platforms.

S LLM Runner

Ollama

Local LLM runner (CLI + API)

The standard for running open models on your own machine.

AgenticLow MCPPlugin RAM8GB (small) · 32GB+ for 70B class PlatformMac · Linux · Windows
Single-command install, pulls models like Docker images. Runs Llama, Qwen, DeepSeek, Gemma, Mistral, etc. on Mac / Linux / Windows. Has a REST API so other tools can hit it.
Free (open source)
Redundancy check — Overlaps LM Studio (CLI vs GUI). Most watchmen pick one.
Best for: technical watchmen comfortable with the terminal; the foundation other local tools build on.
Start here. Most-used local runner for a reason — simple, stable, fast.
S LLM Runner

LM Studio

Local LLM runner (GUI app)

The watchman-friendly way to run local models.

AgenticLow MCPYes RAM8GB · 32GB+ for 70B class PlatformMac · Linux · Windows
Desktop app for Mac / Linux / Windows. Browse, download, run open-weight models from a clean GUI. Includes a chat interface and an OpenAI-compatible API. No terminal required.
Free
Redundancy check — Overlaps Ollama. LM Studio has a nicer UI; Ollama has a smaller resource footprint.
Best for: watchmen who want point-and-click local AI without learning a terminal.
Best entry point if you don't already love the terminal. Free, fast, works.
A LLM Runner

MLX

Apple Silicon ML framework

The fastest way to run local models on Mac.

AgenticN/A MCPN/A RAMSame as model: 16GB for 7B, 64GB+ for 70B Q4 PlatformApple Silicon only (M1/M2/M3/M4)
Apple's native ML framework for M-series chips. Models converted to MLX format run faster and with lower memory than llama.cpp on Apple Silicon. Pair with mlx-lm or mlx-examples to use it.
Free (open source)
Redundancy check — Different layer than Ollama — Ollama can use MLX as a backend on Macs.
Best for: Mac watchmen who want maximum performance from their unified memory.
If you have a Mac Studio M3 Ultra, this is what makes 70B+ models feel snappy.
A LLM Runner

llama.cpp

Local LLM inference engine

The C++ engine under most local LLM tools.

AgenticN/A MCPN/A RAMPer model PlatformMac · Linux · Windows
Powers Ollama, LM Studio, and many others under the hood. Direct command-line use is technical; most watchmen use it via Ollama or LM Studio. The GGUF model format originated here.
Free (open source)
Redundancy check — Indirectly used by Ollama / LM Studio.
Best for: watchmen who want fine-grained control over inference parameters and model formats.
You don't need this directly unless you're benchmarking. Trust Ollama / LM Studio to handle it.
A LLM Runner

Open WebUI

Local web UI for any LLM backend

ChatGPT-style UI for your local models.

AgenticMedium MCPYes RAMServer: 4GB · plus per-model RAM PlatformDocker (Mac · Linux · Windows)
Runs as a Docker container, connects to Ollama / LM Studio / OpenAI-compatible backends. Adds chat history, RAG over uploaded files, multi-user, prompt library — feels like ChatGPT, runs local.
Free (open source)
Redundancy check — Different from Ollama — Ollama is the engine, Open WebUI is the GUI on top.
Best for: watchmen who want a polished web interface for their local models.
Best 'looks like ChatGPT, runs on my hardware' choice. Pair with Ollama; you're set.
S Frontier Model

Llama 4 (Meta)

Open-weight frontier model family

Meta's open frontier — the new flagship.

AgenticHigh MCPN/A RAM8B: 6GB · 70B Q4: 48GB · 405B: 256GB+ (cluster) PlatformAny platform via Ollama / MLX
Released early-mid 2025. Sizes 8B / 70B / 405B (with multimodal variants). 70B is the sweet spot for 128GB+ Macs. Quality competitive with GPT-4 / Claude 3.5 Sonnet on many benchmarks.
Free (Meta license; not pure OSS but generous)
Redundancy check — Overlaps Qwen 3 / DeepSeek for general reasoning.
Best for: watchmen who want the strongest open-weight English-language model.
Default model for the local stack. 70B Q4 on a 128GB Mac and you're set.
A Frontier Model

Qwen 3 (Alibaba)

Open-weight Chinese frontier model (multilingual, MoE)

Alibaba's open frontier. Strong reasoning. Strong multilingual.

AgenticVery High MCPYes RAM0.6B: 2GB · 32B: 24GB · 235B-A22B Q4: 96GB+ PlatformAny via Ollama / MLX / vLLM
Family includes 0.6B → 235B-A22B (Mixture-of-Experts). The 235B-A22B MoE activates ~22B parameters at inference, so it runs at 22B speed on 128GB+ unified memory. State-of-the-art reasoning in mid-2025.
Free (Apache 2.0)
Redundancy check — Overlaps Llama 4 / DeepSeek for English; better for Chinese / multilingual.
Best for: watchmen doing multilingual ministry, technical reasoning, or who want the most capable open MoE.
Worth pulling alongside Llama. Try both on a real task; one will click for your style.
A Frontier Model

DeepSeek R1 / V3

Open-weight reasoning model (671B MoE)

The free DeepSeek you saw on the cloud — running on your machine.

AgenticHigh MCPN/A RAMQ4: ~96GB · Q2: ~64GB PlatformMac (slow) · Linux + GPU (fast)
DeepSeek released V3 (general) and R1 (reasoning) as open weights in late 2024 / early 2025. 671B MoE activates ~37B at inference. Quantized to int4, fits on a 128GB+ Mac with patience.
Free (MIT-style license)
Redundancy check — Overlaps Qwen 3 / Llama 4 for top-tier reasoning.
Best for: watchmen who want frontier reasoning on their own hardware, no API key required.
Patience required (slow on Mac without dedicated inference). But it's frontier-class for free.
B Frontier Model

Mistral Large 2

European open-weight frontier model

France's open-weight contender.

AgenticHigh MCPN/A RAMQ4: 64GB+ PlatformAny via Ollama
123B dense model. Strong code + multilingual. Mistral's flagship open release. Runs comfortably on 96GB+ Macs at int4.
Free for research; commercial license required for business use
Redundancy check — Overlaps Llama 4 70B. Mistral often wins on European languages.
Best for: watchmen in European-language ministries or doing technical work where Mistral's tuning shines.
Worth a side-by-side test against Llama if your work touches French / Spanish / German.
B Frontier Model

Mixtral 8×22B (Mistral)

Mixture-of-Experts open-weight model

176B parameters, 39B active. Speed of small, quality of large.

AgenticHigh MCPN/A RAMQ4: 80GB · Q5: 96GB PlatformAny via Ollama
Sparse MoE — 8 experts of 22B each, 2 activate per token. Effective speed of ~39B parameters but quality closer to dense 100B+ models.
Free (Apache 2.0)
Redundancy check — Overlaps Llama 4 / Qwen 3 MoE for general-purpose reasoning.
Best for: watchmen who want quality output at faster speeds than dense models.
Solid second model behind Llama / Qwen. Pull it if you want to A/B test.
B Frontier Model

Gemma 3 (Google)

Open-weight Google model family

Google's open-weight family — small, fast, capable.

AgenticMedium MCPN/A RAM1B: 4GB · 27B Q4: 18GB PlatformAny via Ollama / MLX
Sizes 1B / 4B / 9B / 27B. Trained on Gemini-2-class data. The 27B variant runs comfortably on 32GB Macs and rivals much larger models for most tasks.
Free (Gemma license)
Redundancy check — Overlaps Phi-4 for the 'small but mighty' slot.
Best for: watchmen with mid-tier hardware (32-64GB) who want capable local AI.
Best balance of capability vs. RAM footprint for the 32-64GB Mac.
B Frontier Model

Command R+ (Cohere)

Open-weight enterprise / RAG-tuned model

Cohere's RAG-tuned open release.

AgenticVery High MCPN/A RAMQ4: 56GB PlatformAny via Ollama
104B dense model from Cohere. Particularly strong at retrieval-augmented generation and tool use. Less popular than Llama but quietly excellent for business workflows.
Free for non-commercial; commercial license required
Redundancy check — Overlaps Llama 4 / Qwen for general use; differentiates on RAG.
Best for: watchmen running local RAG over their own documents (sermons, books, business records).
Underrated for the watchman who wants strong RAG without sending docs to the cloud.
B Frontier Model

Phi-4 (Microsoft)

Small but capable Microsoft model

14B that punches above its weight.

AgenticMedium MCPN/A RAMQ4: 9GB PlatformAny via Ollama / MLX
Microsoft's Phi-4 is dense at 14B. Trained on synthetic + curated data, it scores higher than expected on reasoning benchmarks. Fast on any modern Mac.
Free (MIT)
Redundancy check — Overlaps Gemma 3 27B for the 'capable small model' slot.
Best for: watchmen running on lighter hardware (16-32GB) who still want frontier-tier reasoning.
If your Mac is 16GB or 32GB, this is your workhorse. Tiny, fast, smart.
B Image

FLUX.1 dev

Open-weight image generation

The current open-weight image champion.

AgenticLow MCPN/A RAM24GB+ unified or VRAM (full) · 12GB+ quantized PlatformMac · Linux + GPU · Windows + GPU
Released by Black Forest Labs (the team behind Stable Diffusion). FLUX.1 [dev] is the open-weight version that runs locally with 24GB+ VRAM (or quantized on Apple Silicon). Quality rivals Midjourney for many use cases.
Free (non-commercial license; FLUX [pro] is commercial via API)
Redundancy check — Overlaps Stable Diffusion 3.5 (FLUX has better quality in 2026).
Best for: watchmen who want Midjourney-quality output running on their own hardware.
Run via ComfyUI for full control. Quantized FLUX runs on 24GB+ Mac unified memory.
B Image

Stable Diffusion 3.5

Open-weight image generation

Stability AI's flagship open model.

AgenticLow MCPN/A RAM16GB+ unified or VRAM PlatformMac · Linux + GPU · Windows + GPU
Stable Diffusion 3.5 Large (8B params) is the latest. Free for non-commercial, commercial license available. Less SOTA than FLUX in 2026 but huge ecosystem of fine-tunes and LoRAs.
Free (Stability community license)
Redundancy check — Overlaps FLUX.1 — SD has more community fine-tunes; FLUX has better default quality.
Best for: watchmen who want a vast library of fine-tuned styles and LoRAs.
Use FLUX as your default; pull SD 3.5 for community fine-tunes and specific styles.
B Image

ComfyUI

Visual workflow editor for image / video models

Node-based UI for FLUX / SD / video models.

AgenticLow MCPPlugin RAMPer loaded model PlatformMac · Linux · Windows
Web app that runs locally. Build node graphs to chain image gen, upscale, ControlNet, etc. The standard for serious local image work. Steeper learning curve than DrawThings or Fooocus.
Free (open source)
Redundancy check — Different layer — ComfyUI runs models like FLUX / SD via workflows.
Best for: watchmen making serious volume of visual content who want a reproducible pipeline.
Pair with FLUX. Steeper curve, but once you have a workflow saved, it's repeatable.
A Voice

Whisper.cpp

Local speech-to-text (Whisper port)

Whisper running locally — no API calls.

AgenticN/A MCPN/A RAM2-8GB depending on model size PlatformMac · Linux · Windows
C++ port of OpenAI's Whisper, optimized for CPU and Apple Silicon. Real-time transcription on a Mac. Much faster than the original Python implementation.
Free (MIT)
Redundancy check — Different from cloud Whisper API (this runs locally, no upload).
Best for: watchmen transcribing sensitive content (counseling notes, sermon prep) who don't want audio leaving the machine.
Use this over the API when content is sensitive. Free and fast.
B Voice

WhisperX

Whisper + speaker diarization

Whisper with speaker labels and word-level timestamps.

AgenticN/A MCPN/A RAM8-16GB PlatformMac · Linux · Windows
Builds on Whisper to add speaker diarization (who said what) and word-level alignment. Useful for transcribing conversations, panel recordings, sermons with multiple voices.
Free (BSD)
Redundancy check — Adds to Whisper.cpp; not a replacement.
Best for: watchmen transcribing multi-speaker recordings (conversations, panels, group prayer, board meetings).
Use when you need to know who said what. Free.
C Voice

OpenVoice / Bark

Local text-to-speech / voice cloning

Local TTS — your voice, on your machine.

AgenticN/A MCPN/A RAM8-16GB PlatformMac · Linux + GPU · Windows + GPU
OpenVoice and Bark are leading open-source TTS engines. Bark generates expressive natural speech with non-verbal vocals (laughs, sighs). OpenVoice clones a voice from short samples.
Free (open source)
Redundancy check — Different from ElevenLabs (cloud, polished). Local TTS is rougher but free + private.
Best for: watchmen experimenting with voice cloning on private data without uploading samples to a service.
Local TTS quality lags ElevenLabs. Use these for privacy-sensitive experiments only.
A Coding

Continue.dev

VS Code extension w/ local model support

Cursor-like AI coding with your local LLM.

AgenticHigh MCPYes RAMPer model used PlatformMac · Linux · Windows
Open-source VS Code (and JetBrains) extension. Connect to Ollama / LM Studio for autocomplete and chat using local models. Free, private.
Free (open source)
Redundancy check — Overlaps Cursor for AI coding; Continue is free + uses your local models.
Best for: watchmen who want Cursor-style coding without the subscription, using their local models.
Best free Cursor alternative for the local-first watchman.
A Coding

Cline (formerly Claude Dev)

Open-source agentic VS Code extension

Open-source autonomous coding agent.

AgenticVery High MCPYes RAMPer model used PlatformMac · Linux · Windows
VS Code extension that operates like a Claude Code clone — autonomous agent that reads/writes files, runs commands, plans multi-step changes. Works with Anthropic, OpenAI, or local models via Ollama.
Free (open source); pay for whichever model API you use
Redundancy check — Closest local-friendly alternative to Claude Code.
Best for: watchmen who want an autonomous coding agent that can run on local models.
Excellent. Pairs with Llama 4 70B locally for a free Claude Code substitute.
C Coding

Aider (local backend)

Terminal AI pair programmer (local-friendly)

Same Aider from the Compass — running on your local LLM.

AgenticHigh MCPYes RAMPer model used PlatformMac · Linux · Windows
Aider supports any OpenAI-API-compatible backend. Point it at Ollama or LM Studio and you have a fully local Aider session. No API charges.
Free
Redundancy check — Same Aider as in the cloud; this is the same tool with a different model.
Best for: terminal-loving watchmen who want Claude Code's shape with their own local model.
If you already use Aider with Claude API, switching it to a local backend takes 30 seconds.
C Coding

Tabby

Self-hosted code autocomplete

GitHub Copilot-style autocomplete, on your own server.

AgenticMedium MCPNo RAM8-24GB depending on model PlatformDocker (Mac · Linux · Windows)
Self-hosted alternative to Copilot. Runs as a Docker container, supports VS Code, JetBrains, Vim. Uses local code models like StarCoder.
Free (open source); paid Tabby Pro tier exists
Redundancy check — Overlaps GitHub Copilot. Tabby is self-hosted; Copilot is cloud.
Best for: watchmen in regulated industries who can't send code to a cloud autocomplete service.
Niche. Use only if compliance forbids cloud Copilot.
A Knowledge / RAG

AnythingLLM

Local RAG over your documents

ChatGPT-style chat over your own files. 100% local.

AgenticMedium MCPPlugin RAM8-16GB + model RAM PlatformMac · Linux · Windows
Drop in PDFs, Word docs, websites — AnythingLLM ingests them, stores them in a local vector database, lets you chat with the corpus using a local LLM. Free, open-source, polished UI.
Free (open source); paid cloud tier exists
Redundancy check — Overlaps Khoj / Open WebUI for RAG.
Best for: watchmen who want NotebookLM but local — chat over your own books, sermons, family records.
Best NotebookLM alternative that runs entirely on your machine.
B Knowledge / RAG

Khoj

Personal AI search engine

AI search across your notes, emails, files.

AgenticMedium MCPPlugin RAM8-16GB + model RAM PlatformMac · Linux · Windows
Open-source 'Perplexity for your own stuff.' Indexes Obsidian, Notion, GitHub, email, and runs AI search + chat over them. Self-hosted; free.
Free (self-hosted)
Redundancy check — Overlaps AnythingLLM; Khoj is more search-focused.
Best for: watchmen with a deep personal corpus (Obsidian + email + files) who want AI search over all of it.
Killer pairing with Obsidian. Free, fast, private.
C Knowledge / RAG

Perplexica

Open-source Perplexity clone

Perplexity, but local + uses your own models.

AgenticMedium MCPNo RAM8-16GB + model RAM PlatformMac · Linux · Windows
Open-source Perplexity-like interface for AI search. Web search + LLM synthesis with citations. Runs locally; uses any LLM (local or API).
Free (open source)
Redundancy check — Overlaps Perplexity Pro for research.
Best for: watchmen who do research-heavy work and want Perplexity's UX without a subscription.
Solid Perplexity alternative for the local-first watchman.
C Knowledge / RAG

Obsidian + Smart Connections

Local-first knowledge graph + AI search plugin

Obsidian + AI plugins, fully local.

AgenticMedium MCPPlugin RAMPer model used PlatformMac · Linux · Windows
Same Obsidian as the cloud Compass entry — but with the Smart Connections plugin pointed at a local LLM, every note in your vault becomes AI-searchable without sending data to a cloud.
Obsidian free; Smart Connections plugin free; some plugins paid
Redundancy check — Same Obsidian as the cloud entry, just configured locally.
Best for: watchmen already on Obsidian who want a privacy-first AI layer over their notes.
If Obsidian is your second brain, add Smart Connections + a local model and you have private AI search over everything.
C Automation

n8n self-hosted

Self-hosted automation platform

Same n8n from the Compass — running on your machine.

AgenticHigh MCPNative RAM4-8GB for n8n + per model PlatformDocker (Mac · Linux · Windows)
Self-host via Docker. All AI nodes work with local Ollama / LM Studio. The watchman's choice for full data sovereignty + automation. MCP-native.
Free (self-hosted; Docker)
Redundancy check — Same n8n as the cloud entry.
Best for: watchmen who want enterprise-grade automation with zero data leaving their network.
If you have a homelab, n8n + Ollama is the local automation stack.
B Automation

MCP Servers

Local Model Context Protocol servers

Run MCP servers on your machine to wire local tools into Cowork.

AgenticHigh MCPNative RAMMinimal (per server) PlatformMac · Linux · Windows
Anthropic publishes reference MCP servers (filesystem, git, sqlite, etc.) and the community has hundreds more. Run them locally and Cowork or Claude Code can use them as tools — purely local.
Free (open source)
Redundancy check — Different layer — these are the building blocks for agentic systems.
Best for: watchmen extending Cowork with local capabilities (custom databases, internal APIs, file systems).
Future-proof. As MCP grows, more watchman-relevant servers will exist.
B Automation

Goose (local backend)

Open-source agentic desktop tool

Same Goose from the Frontier — running purely local.

AgenticVery High MCPNative RAMPer model + 1-2GB for Goose PlatformMac · Linux · Windows
Block's open-source MCP-native desktop agent. Point it at a local Ollama backend and you have a fully local autonomous agent.
Free, open source
Redundancy check — Same Goose as the Frontier listing; this is the local-first config.
Best for: watchmen who want Cowork-class agentic capability without the Anthropic subscription.
Best free agent. Pair with Llama 4 70B locally for a serious autonomous setup.
No tools in this filter.

Hardware Sizing Guide

What fits in what. Pick the tier that matches your machine; the cards above tell you which tools live there.

32 GB · REALISTIC FLOOR

Where local AI starts being useful

  • Models: Gemma 3 27B Q4 (18GB), Phi-4 14B (9GB)
  • Image: SD 3.5, FLUX.1 quantized
  • Coding: Continue / Cline with 27B-class
  • Use: daily workflows. Below this, local feels like a science project.
64 GB · STRONG

70B territory — frontier class

  • Models: Llama 4 70B Q4 (~48GB), Mistral Large 2 Q4
  • Image: FLUX.1 dev full
  • Multiple loaded: One 70B + one small model
  • Use: frontier-class output, agentic flows.
128 GB · CAPTAIN'S TIER

Qwen 235B-A22B territory

  • Models: Qwen 3 235B-A22B Q4 (96GB+) · Llama 4 70B + small model · DeepSeek R1 Q2 (~64GB) · Mixtral 8×22B Q4
  • Image: FLUX dev full + SD 3.5 swappable
  • Stack loaded: Frontier model + agent (Goose/Cline) + RAG + n8n self-hosted
  • Use: sovereign AI workstation. The setup this page is built for.
192 GB+ · MAX

The full stack

  • Models: DeepSeek R1 Q4 (~96-100GB), Qwen 235B Q5
  • Multiple frontier loaded: Llama 4 70B + Qwen 32B + image + voice all live
  • Use: dev-grade local AI lab. Cluster-class without a cluster.

Honest note: below 32 GB unified RAM, local AI will technically run (Phi-4, Gemma 1B-9B) but most watchmen will find it slower and less useful than the cloud Compass tools. The hardware investment only pays off above 32 GB.