Skip to content

Tag: llm

All the articles with the tag "llm".

LiteLLM & vLLM: One API to Rule All Your Models

LiteLLM & vLLM: One API to Rule All Your Models

Your app calls OpenAI, your side project calls Anthropic, your homelab whispers to Ollama — and your codebase looks like a crime scene. LiteLLM and vLLM are the dynamic duo that puts a single sane API in front of every model you'll ever run, local or cloud.

RAG on a Budget: Building a Knowledge Base with Ollama & ChromaDB

RAG on a Budget: Building a Knowledge Base with Ollama & ChromaDB

Stop paying per-token to ask questions about your own documents. This guide walks you through building a fully local RAG pipeline with Ollama and ChromaDB — from Docker Compose to Python code — so your AI can actually know things without hallucinating them.

Running Gemma 4 Locally with Ollama

Running Gemma 4 Locally with Ollama

Google's Gemma 4 is the best open model they've shipped yet. Here's how to pull it, run it, and actually use it for real work with Ollama on your own hardware.

1-Bit LLMs: The Quantization Endgame

1-Bit LLMs: The Quantization Endgame

1-bit models store weights as -1, 0, or 1. That sounds insane until you see them run a 100B parameter model on a laptop CPU. Here's what's actually happening.