Topic
AI & LLMs
Models you can run on your own hardware, prompt patterns that ship, agent frameworks that don't catch fire, and the awkward questions nobody answers in the breathless launch posts. Ollama, vLLM, llama.cpp, LocalAI, plus the quieter stuff — embeddings, RAG, evals, and figuring out when the cloud API is actually the right answer. If you'd rather understand the trade-offs than chase benchmarks, you'll feel at home here.
59 articles in this topic.
Featured posts
-
Self-Host a Local AI Coding Workhorse
Run Gemma 4 or Qwen3-Coder locally via Ollama or llama.cpp in Docker, then delegate mechanical coding tasks to it while Claude handles the thinking. Free tokens, zero leakage.
14 min read -
Give Your AI Agent a Cheap Intern
Stop burning expensive AI tokens on boring grunt work. The overseer/workhorse pattern routes mechanical tasks to a cheap model and saves more than you'd think.
12 min read -
Claude Code + SearXNG: Private Web Search
Wire a self-hosted SearXNG instance into Claude Code via a Bash wrapper for private, scriptable web search — and when to use it vs the built-in tool.
10 min read -
Dify: Visual Agent Workflows
Dify is an open-source LLM-app builder you can self-host. Visual workflow editor, RAG, agents, tool use — without writing 500 lines of LangChain glue.
12 min read -
CUDA vs ROCm vs CPU: Running AI on Whatever GPU You've Got
CUDA vs ROCm for AI on Linux: NVIDIA's easy path, AMD's emotional journey, and why CPU inference isn't dead yet. Real Docker setups included.
· Updated:9 min read -
Exploring the Diverse World of LLM Models
LLaMA, Mistral, Falcon, GPT — the LLM landscape is crowded. Compare model families, sizes, licensing, and what each is actually good for.
· Updated:5 min read
All AI & LLMs articles
- Self-Host a Local AI Coding Workhorse
- Give Your AI Agent a Cheap Intern
- Claude Code + SearXNG: Private Web Search
- Dify: Visual Agent Workflows
- CUDA vs ROCm vs CPU: Running AI on Whatever GPU You've Got
- Exploring the Diverse World of LLM Models
- Key Parameters of Large Language Models
- LangGraph vs CrewAI vs AutoGen: AI Agent Frameworks for Mere Mortals
- Large Language Model Formats and Quantization
- LiteLLM & vLLM: One API to Rule All Your Models
- Local Vision LLMs Worth Running in 2026
- Ollama Beyond the Basics: Model Management, Custom Models, and Optimization
- Ollama Memory Management: Why Models Keep Loading
- Open WebUI vs LibreChat: Self-Hosted ChatGPT Alternatives Compared
- Piper vs Coqui: Text-to-Speech on Your Own Hardware (Because AWS Polly Charges Per Character Like It's 1999 SMS)
- Prompt Engineering for Generative AI 101
- Stable Diffusion vs ComfyUI vs Fooocus: AI Image Generation at Home
- Text Generation Web UI vs KoboldCpp: Power User LLM Interfaces
- OpenRouter vs LiteLLM
- Function Calling in Local LLMs
- Gemma 4 vs Qwen3.6
- AnythingLLM as Knowledge Base
- MCP Servers: Tools for LLMs
- RAG Evaluation with Ragas
- LLM Distillation Explained
- Open WebUI Tools, Functions & Pipelines: Extend Your Local LLM
- Self-Supervised Learning Explained
- Ollama Model Management: Beyond ollama run
- Continue.dev vs Cody vs Tabby: AI Code Help Without the Cloud
- LangGraph vs CrewAI vs AutoGen: AI Agents Without the Hype
- Qdrant vs Weaviate vs Chroma: Vector DB Showdown
- LangChain vs LlamaIndex: RAG Framework Showdown
- The Embedding Model Choice Nobody Explains
- GPU Memory Math: Will This Model Actually Fit?
- Beyond RAG: When a Virtual Filesystem Works Better
- Running Gemma 4 Locally with Ollama
- 1-Bit LLMs: The Quantization Endgame
- AMD Lemonade: Local LLM Serving for AMD GPUs
- When to Use Structured Output (JSON Mode) in LLMs
- Using AI to Find Security Bugs in Your Code
- LLM Temperature and top_p Explained Without the Math
- LLM Backends: vLLM vs llama.cpp vs Ollama
- RAG Chunking: Why Chunk Size Is Everything
- System Prompts: The LLM Feature Most People Ignore
- LLM Quantization: Q4_K_M Isn't Always the Best Choice
- Running Multiple Ollama Models Without Running Out of RAM
- Context Window vs Token Limit: Not the Same Thing
- RAG on a Budget: Building a Knowledge Base with Ollama & ChromaDB
- Stable Diffusion vs ComfyUI vs Fooocus: AI Image Generation at Home
- n8n + LLM: Building Automations That Actually Think
- LLM Fine-Tuning for Mortals: LoRA, QLoRA, and Your Gaming GPU
- Whisper & Faster-Whisper: Self-Hosted Speech-to-Text That Actually Works
- Continue.dev vs Cody vs Tabby: AI Code Assistants That Live on Your Machine
- Flowise vs Langflow: Build AI Pipelines Without Writing a Novel
- n8n vs Node-RED: Automate Everything Without Learning to Code (Much)
- Prompts for Image Generation in Stable Diffusion
- Ollama: Powerful Language Models on Your Own Machine
- Unleash the Power of LLMs with LocalAI
- Machine Learning models (AI)