Tag: llm

All the articles with the tag "llm".

CUDA vs ROCm vs CPU: Running AI on Whatever GPU You've Got

26 Aug, 2025 · Updated: 9 Jun, 2026

CUDA vs ROCm for AI on Linux: NVIDIA's easy path, AMD's emotional journey, and why CPU inference isn't dead yet. Real Docker setups included.

Exploring the Diverse World of LLM Models

24 Apr, 2024 · Updated: 9 Jun, 2026

LLaMA, Mistral, Falcon, GPT — the LLM landscape is crowded. Compare model families, sizes, licensing, and what each is actually good for.

Key Parameters of Large Language Models

15 Jul, 2024 · Updated: 9 Jun, 2026

Temperature, top-p, top-k, context length — LLM inference parameters explained so you stop guessing why the model gives weird output.

LangGraph vs CrewAI vs AutoGen: AI Agent Frameworks for Mere Mortals

22 Nov, 2025 · Updated: 9 Jun, 2026

Confused by AI agent frameworks? Compare LangGraph, CrewAI, and AutoGen with real Python examples, a no-nonsense breakdown, and zero hype. Pick the right one.

Large Language Model Formats and Quantization

29 Apr, 2024 · Updated: 9 Jun, 2026

GGUF, GGML, AWQ, GPTQ — LLM file formats and quantization levels explained: trade-offs between model quality, size, and inference speed.

LiteLLM & vLLM: One API to Rule All Your Models

25 Feb, 2026 · Updated: 9 Jun, 2026

LiteLLM proxies every LLM — local or cloud — behind one OpenAI-compatible endpoint. Pair it with vLLM for GPU-backed serving and ditch the SDK sprawl.

Local Vision LLMs Worth Running in 2026

5 Jun, 2026 · Updated: 9 Jun, 2026

Pixtral, Qwen3-VL, and Gemma 4 compared for local multimodal use in 2026. LLaVA is dead; here's what to run in Ollama for OCR, screenshots, and vision tasks.

Ollama Beyond the Basics: Model Management, Custom Models, and Optimization

26 Sep, 2025 · Updated: 9 Jun, 2026

Master Ollama with Modelfiles, GPU tuning, API usage, and performance tricks. Stop running 70B models on 8GB VRAM and wondering why everything is slow.

Ollama Memory Management: Why Models Keep Loading

22 Jan, 2026 · Updated: 9 Jun, 2026

Ollama keeps models in VRAM after every request. Control GPU usage with keep_alive, force-unload via the API, and check memory to stop the reload cycle.

Open WebUI vs LibreChat: Self-Hosted ChatGPT Alternatives Compared

27 Oct, 2025 · Updated: 9 Jun, 2026

Self-hosting a ChatGPT alternative? Open WebUI owns local Ollama models; LibreChat handles Claude, GPT, Gemini, and more. Setup, RAG, and trade-offs compared.

Prompt Engineering for Generative AI 101

17 Jun, 2024 · Updated: 9 Jun, 2026

Write prompts that get useful results — role prompting, few-shot examples, chain-of-thought, and the patterns that work across any LLM.

Text Generation Web UI vs KoboldCpp: Power User LLM Interfaces

2 Jan, 2026 · Updated: 9 Jun, 2026

Text Generation Web UI vs KoboldCpp: setup, model formats, samplers, APIs, and performance compared so you can pick the right local LLM frontend fast.