Skip to content

Tag: ollama

All the articles with the tag "ollama".

Self-Host a Local AI Coding Workhorse

Self-Host a Local AI Coding Workhorse

Run Gemma 4 or Qwen3-Coder locally via Ollama or llama.cpp in Docker, then delegate mechanical coding tasks to it while Claude handles the thinking. Free tokens, zero leakage.

Exploring the Diverse World of LLM Models

Exploring the Diverse World of LLM Models

· Updated:

LLaMA, Mistral, Falcon, GPT — the LLM landscape is crowded. Compare model families, sizes, licensing, and what each is actually good for.

Key Parameters of Large Language Models

Key Parameters of Large Language Models

· Updated:

Temperature, top-p, top-k, context length — LLM inference parameters explained so you stop guessing why the model gives weird output.

Local Vision LLMs Worth Running in 2026

Local Vision LLMs Worth Running in 2026

· Updated:

Pixtral, Qwen3-VL, and Gemma 4 compared for local multimodal use in 2026. LLaVA is dead; here's what to run in Ollama for OCR, screenshots, and vision tasks.

Function Calling in Local LLMs

Function Calling in Local LLMs

Local LLMs can call tools, query APIs, and run code if you set them up right. Function calling on Ollama and llama.cpp explained — patterns that actually work.

Gemma 4 vs Qwen3.6

Gemma 4 vs Qwen3.6

Gemma 4 vs Qwen3.6: sizes, reasoning, coding benchmarks, and which model you should actually pull for your home lab rig.

LLM Distillation Explained

LLM Distillation Explained

How tiny 7B and 8B models keep punching above their weight — knowledge distillation, the teacher-student trick that makes local AI actually usable on home hardware.

Ollama Model Management: Beyond ollama run

Ollama Model Management: Beyond ollama run

You know how to pull and run a model. Now learn Modelfiles, GPU layer tuning, the REST API, running multiple models without OOM-killing your server, and actually useful system prompts.