Skip to content
Go back

Continue.dev + Ollama: Local Code Assistant for Cheap

By SumGuy 8 min read
Continue.dev + Ollama: Local Code Assistant for Cheap

The Cloud AI Coding Tax is Ridiculous

So you’ve been using GitHub Copilot, or maybe Claude in your IDE, and the bill keeps showing up like an unwanted houseguest. $10/month here, $20/month there. For a side project? For learning? For code you’re never selling?

It’s like hiring a premium taxi to drive around your own driveway.

Here’s the thing: local LLM coding is finally good enough to be your default. And with Continue.dev + Ollama, you can have autocomplete and chat AI running on your own machine for exactly zero dollars per month (minus your electricity bill).

I’m not saying it beats Claude in Claude. But for most coding tasks—refactoring, scaffolding, tests, debugging, explaining code—a local model running on your hardware is fast, free, and private. Your code never leaves your machine. No vendor lock-in. No surprise rate limits at 2 AM.

Let’s set it up.


What You’re Installing

Continue.dev is an open-source IDE extension that brings AI chat and autocomplete into VS Code, JetBrains IDEs, and others. It’s pluggable—you can point it at OpenAI, Anthropic, Ollama, or even a local vLLM server.

Ollama is a local LLM runtime. Install it, download a model, and it runs on your CPU or GPU without needing to futz with CUDA drivers or venv hell. ollama pull mistral and you have a 7B parameter model running in seconds.

Together: IDE + local inference = autocomplete that’s instant (no network latency), chat that never touches the cloud, and code that stays yours.


Step 1: Install Ollama

Head to https://ollama.com and grab the binary for your OS. It’s dead simple.

Terminal window
# macOS / Linux (via curl)
curl -fsSL https://ollama.com/install.sh | sh
# Or download from https://ollama.com/download

Start the server:

Terminal window
ollama serve

It’ll default to http://localhost:11434. Leave that running in a terminal or systemd service.


Step 2: Pull Your First Model

Open a new terminal and grab a model. I recommend Mistral 7B for pure coding—it’s fast, decent at reasoning, and won’t nuke your RAM or GPU.

Terminal window
ollama pull mistral

Other solid picks for coding:

For your first run, stick with mistral. It’s the Goldilocks of local coding models—not too slow, not too dumb.

Test it:

Terminal window
ollama run mistral "explain what a goroutine is"

If you get text back, Ollama is working. Good. Kill it with Ctrl+C and move on.


Step 3: Install Continue in Your IDE

VS Code

  1. Open the Extensions sidebar (Ctrl+Shift+X / Cmd+Shift+X)
  2. Search for “Continue”
  3. Install the one from Continue, Inc (verified checkmark)
  4. Reload VS Code

JetBrains (IntelliJ, PyCharm, Goland, etc.)

  1. Preferences → Plugins → Marketplace
  2. Search “Continue”
  3. Install, restart IDE

It’ll add a chat sidebar and inline autocomplete.


Step 4: Configure Continue to Use Your Local Ollama

Continue reads from ~/.continue/config.json. Create or edit it:

{
"models": [
{
"title": "Mistral Local",
"provider": "ollama",
"model": "mistral",
"apiBase": "http://localhost:11434"
}
],
"tabAutocompleteModel": {
"title": "Mistral Local",
"provider": "ollama",
"model": "mistral",
"apiBase": "http://localhost:11434"
},
"slashCommands": [
{
"name": "test",
"description": "Generate unit tests"
},
{
"name": "refactor",
"description": "Suggest refactoring"
}
]
}

What’s happening here:

Save it. Continue will auto-reload.


Step 5: Fire It Up

Open a code file. You should see a Continue sidebar on the right (or press Ctrl+L in VS Code).

  1. Chat: Type a question. “What does this function do?” It’ll send your code to the local Ollama instance and get back completions within 1–5 seconds (depending on model and hardware).

  2. Autocomplete: Start typing. You’ll see inline suggestions pop up—press Tab to accept. This is running the tabAutocompleteModel on every keystroke, so keep it light (smaller model = faster suggestions).

  3. Edit Commands: Highlight code and press Ctrl+K (or Cmd+K on Mac) to open the inline edit panel. “Add error handling” or “Convert to async/await”.

All of this is happening locally. No API keys. No telemetry. Just your code and a model running on your hardware.


Model Picks for Different Scenarios

Autocomplete (you need speed):

Chat (you can wait a bit):

The rule: Autocomplete should be fast (use 7B–13B). Chat can be slower if it’s smarter (you can bump to 34B).


Hardware Reality Check

CPU only (no GPU):

GPU (NVIDIA/AMD/Mac Metal):

Pro tip: Run ollama pull mistral:quantized to get a quantized (smaller, faster) version. Ollama defaults to Q4, which is already pretty lean.


Comparing Chat vs. Autocomplete: When to Use What

Use autocomplete for:

Use chat for:

The difference: Autocomplete is always on (slowing you down if it’s bad). Chat is pull-based (you ask when you’re stuck). If your autocomplete model is laggy, it’s worth dropping to a smaller model or disabling it.


Why Local Coding Is Finally Worth It

Three reasons this changed:

  1. Model quality jumped. A year ago, 7B models were pretty dumb. Now? Mistral 7B is genuinely capable. You’re not sacrificing that much compared to cloud models.

  2. Speed. No network round-trip. Autocomplete suggestions are instant. Chat responses start appearing immediately. It feels snappier than cloud.

  3. The privacy angle is real. If you’re working on proprietary code, medical records, financial data, or anything IP-sensitive, shipping it to a third-party API is a non-starter. Local inference means zero data leave your machine.


The Gotchas

Memory usage. A 7B model takes ~4–5GB of RAM. A 13B model takes ~8–10GB. If you’re on a laptop with 8GB total, you’ll feel it. Quantization helps; smaller models help more.

No internet = no context. Local models can’t browse the web or check the latest docs. You’ll need to paste context into chat yourself. (This is also a feature if you don’t want your queries logged.)

Autocomplete can be annoying. If it’s wrong a lot, it kills your flow. Dial it in by picking the right model and tweaking the config to trigger less often (see Continue docs for tabAutocompleteDelay).

Hallucinations are real. Local models are more likely to confidently spout fake code. Trust but verify. Always run tests before shipping.


The Real Talk

Could you just pay for Copilot or Claude? Sure. But here’s the thing: if you’re building side projects, learning, or writing throwaway scripts, paying for cloud AI feels like overkill. It’s like subscribing to a premium car service when you drive 50 miles a week.

Local LLM coding isn’t a replacement for Claude if you’re shipping production code that needs bulletproof logic. But for the 80% case—scaffolding, refactoring, tests, explanations—a local model on your machine does the job, costs zero per month, and keeps your code off someone’s logging server.

Set it up tonight. Pull Mistral. Open Continue. Start a chat. You’ll be shocked how good it feels.


Next Steps

  1. Install Ollama from https://ollama.com
  2. ollama pull mistral
  3. Install Continue extension in your IDE
  4. Create ~/.continue/config.json (use the template above)
  5. Start coding

If you hit issues, the Continue docs are solid: https://docs.continue.dev/

Enjoy your personal AI coding assistant. No subscription required.


Share this post on:

Send a Webmention

Written about this post on your own site? Send a webmention and it'll show up above once verified.


Next Post
BirdNET-Pi for Self-Hosted Bird Identification

Discussion

Powered by Garrul . Sign in with GitHub or Google, or post anonymously.

Related Posts