Self-Host a Local AI Coding Workhorse
Run Gemma 4 or Qwen3-Coder locally via Ollama or llama.cpp in Docker, then delegate mechanical coding tasks to it while Claude handles the thinking. Free tokens, zero leakage.
All the articles with the tag "ollama".
Run Gemma 4 or Qwen3-Coder locally via Ollama or llama.cpp in Docker, then delegate mechanical coding tasks to it while Claude handles the thinking. Free tokens, zero leakage.
LLaMA, Mistral, Falcon, GPT — the LLM landscape is crowded. Compare model families, sizes, licensing, and what each is actually good for.
Temperature, top-p, top-k, context length — LLM inference parameters explained so you stop guessing why the model gives weird output.
GGUF, GGML, AWQ, GPTQ — LLM file formats and quantization levels explained: trade-offs between model quality, size, and inference speed.
Pixtral, Qwen3-VL, and Gemma 4 compared for local multimodal use in 2026. LLaVA is dead; here's what to run in Ollama for OCR, screenshots, and vision tasks.
Master Ollama with Modelfiles, GPU tuning, API usage, and performance tricks. Stop running 70B models on 8GB VRAM and wondering why everything is slow.
Ollama keeps models in VRAM after every request. Control GPU usage with keep_alive, force-unload via the API, and check memory to stop the reload cycle.
Local LLMs can call tools, query APIs, and run code if you set them up right. Function calling on Ollama and llama.cpp explained — patterns that actually work.
Gemma 4 vs Qwen3.6: sizes, reasoning, coding benchmarks, and which model you should actually pull for your home lab rig.
How tiny 7B and 8B models keep punching above their weight — knowledge distillation, the teacher-student trick that makes local AI actually usable on home hardware.
You know how to pull and run a model. Now learn Modelfiles, GPU layer tuning, the REST API, running multiple models without OOM-killing your server, and actually useful system prompts.
Most people use OpenAI's embeddings because it's easy. But local embeddings exist. How to pick and when it actually matters.