Tag: llm

All the articles with the tag "llm".

Mixture of Experts (MoE) for Self-Hosters, Demystified

19 Jul, 2026

MoE LLMs like Mixtral and DeepSeek-V3 run 70B-class quality on 7B-ish active params. Here's how sparse activation works and how to run it at home.

Speculative Decoding: Faster LLMs With a Tiny Sidekick

15 Jul, 2026

Speculative decoding, Gemma 4 MTP, and DeepSeek DSpark all make LLMs 2-6x faster losslessly. How each works, and which to use for local vs. serving.

Karakeep: Self-Hosted Bookmarks With AI Tagging

14 Jul, 2026

Karakeep (formerly Hoarder) is a self-hosted Pocket alternative with AI auto-tagging, full-text search, and Docker Compose deploy in under 10 minutes.

Stop Feeding the AI Your Whole Repo

4 Jul, 2026

Comparing context-mode, code-review-graph, token-savior, and claude-context — MCP tools that stop AI coding agents from reading your whole codebase blind.

RTK vs snip vs lean-ctx: Token Killers

3 Jul, 2026

RTK, snip, and lean-ctx filter your AI coding agent's output before it burns context tokens. I run RTK daily — here's the one I'd actually switch to.

AI Swarm Audited My 840-Post Blog

27 Jun, 2026

I pointed a parallel swarm of AI agents at 840 technical posts to fact-check and refresh them — the architecture, the token bill, and the guardrails.

Used GPU Buying Guide for Home Lab LLMs

26 Jun, 2026

Your no-BS guide to buying a used GPU for local LLM inference in 2026. Budget tiers from $200 to $1500 plus, real VRAM math, and tips to dodge the scams.

Claude Code in a Homelab Workflow

24 Jun, 2026

Claude Code puts an agentic AI assistant in your terminal for real homelab work — compose files, bash, Ansible, systemd. The honest take on cost and data.

Self-Host a Local AI Coding Workhorse

17 Jun, 2026

Self-host Gemma 4 or Qwen3-Coder via Ollama or llama.cpp in Docker, then let Claude delegate the grunt work to it. Free tokens, zero code leakage.

Give Your AI Agent a Cheap Intern

16 Jun, 2026

Stop burning expensive AI tokens on boring grunt work. The overseer/workhorse pattern routes mechanical tasks to a cheap model and saves more than you'd think.

Claude Code + SearXNG: Private Web Search

15 Jun, 2026

Wire a self-hosted SearXNG instance into Claude Code via a Bash wrapper for private, scriptable web search — and when to use it vs the built-in tool.

Dify: Visual Agent Workflows

9 Jun, 2026

Dify is an open-source LLM-app builder you can self-host. Visual workflow editor, RAG, agents, tool use — without writing 500 lines of LangChain glue.