Self-Host a Local AI Coding Workhorse
Run Gemma 4 or Qwen3-Coder locally via Ollama or llama.cpp in Docker, then delegate mechanical coding tasks to it while Claude handles the thinking. Free tokens, zero leakage.
All the articles with the tag "ai".
Run Gemma 4 or Qwen3-Coder locally via Ollama or llama.cpp in Docker, then delegate mechanical coding tasks to it while Claude handles the thinking. Free tokens, zero leakage.
Stop burning expensive AI tokens on boring grunt work. The overseer/workhorse pattern routes mechanical tasks to a cheap model and saves more than you'd think.
Wire a self-hosted SearXNG instance into Claude Code via a Bash wrapper for private, scriptable web search — and when to use it vs the built-in tool.
Dify is an open-source LLM-app builder you can self-host. Visual workflow editor, RAG, agents, tool use — without writing 500 lines of LangChain glue.
CUDA vs ROCm for AI on Linux: NVIDIA's easy path, AMD's emotional journey, and why CPU inference isn't dead yet. Real Docker setups included.
LLaMA, Mistral, Falcon, GPT — the LLM landscape is crowded. Compare model families, sizes, licensing, and what each is actually good for.
Temperature, top-p, top-k, context length — LLM inference parameters explained so you stop guessing why the model gives weird output.
Confused by AI agent frameworks? Compare LangGraph, CrewAI, and AutoGen with real Python examples, a no-nonsense breakdown, and zero hype. Pick the right one.
GGUF, GGML, AWQ, GPTQ — LLM file formats and quantization levels explained: trade-offs between model quality, size, and inference speed.
LiteLLM proxies every LLM — local or cloud — behind one OpenAI-compatible endpoint. Pair it with vLLM for GPU-backed serving and ditch the SDK sprawl.
Pixtral, Qwen3-VL, and Gemma 4 compared for local multimodal use in 2026. LLaVA is dead; here's what to run in Ollama for OCR, screenshots, and vision tasks.
Master Ollama with Modelfiles, GPU tuning, API usage, and performance tricks. Stop running 70B models on 8GB VRAM and wondering why everything is slow.