Tag: ollama

All the articles with the tag "ollama".

Local Coding Agents Need Less Context

31 Jul, 2026

Local coding agents don't fail because your 27B model is too small. They fail because you let 200K tokens of garbage pile up in the context window. Cap it low.

Self-Host a Local AI Coding Workhorse

17 Jun, 2026

Self-host Gemma 4 or Qwen3-Coder via Ollama or llama.cpp in Docker, then let Claude delegate the grunt work to it. Free tokens, zero code leakage.

Exploring the Diverse World of LLM Models

24 Apr, 2024 · Updated: 9 Jun, 2026

LLaMA, Mistral, Falcon, GPT, the LLM landscape is crowded. Compare model families, sizes, licensing, and what each is actually good for.

Key Parameters of Large Language Models

15 Jul, 2024 · Updated: 9 Jun, 2026

Temperature, top-p, top-k, context length, LLM inference parameters explained so you stop guessing why the model gives weird output.

Large Language Model Formats and Quantization

29 Apr, 2024 · Updated: 9 Jun, 2026

GGUF, GGML, AWQ, GPTQ, LLM file formats and quantization levels explained: trade-offs between model quality, size, and inference speed.

Local Vision LLMs Worth Running in 2026

5 Jun, 2026 · Updated: 9 Jun, 2026

Pixtral, Qwen3-VL, and Gemma 4 compared for local multimodal use in 2026. LLaVA is dead; here's what to run in Ollama for OCR, screenshots, and vision tasks.

Ollama Beyond the Basics: Model Management, Custom Models, and Optimization

26 Sep, 2025 · Updated: 9 Jun, 2026

Master Ollama with Modelfiles, GPU tuning, API usage, and performance tricks. Stop running 70B models on 8GB VRAM and wondering why everything is slow.

Ollama Memory Management: Why Models Keep Loading

22 Jan, 2026 · Updated: 9 Jun, 2026

Ollama keeps models in VRAM after every request. Control GPU usage with keep_alive, force-unload via the API, and check memory to stop the reload cycle.

Function Calling in Local LLMs

7 Jun, 2026

Local LLMs can call tools, query APIs, and run code if you set them up right. Function calling on Ollama and llama.cpp explained, patterns that actually work.

Gemma 4 vs Qwen3.6

6 Jun, 2026

Gemma 4 vs Qwen3.6: sizes, reasoning, coding benchmarks, and which model you should actually pull for your home lab rig.

LLM Distillation Explained

2 Jun, 2026

How tiny 7B and 8B models keep punching above their weight, knowledge distillation, the teacher-student trick that makes local AI actually usable on home hardware.

Ollama Model Management: Beyond ollama run

26 Apr, 2026

You can pull and run a model. Now learn Modelfiles, GPU layer tuning, the REST API, running multiple models without OOM-killing your server, and useful system prompts.