Using Ollama (Local AI)

Ollama lets you run AI models directly on your computer. Your documents never leave your machine, there are no API costs, and it works offline.

Why local AI

With Ollama, every AI conversation, document analysis, and search query runs on your hardware. This means complete privacy — nothing is sent to external servers. It also means you can use Return’s AI features without an internet connection or a paid API subscription.

The trade-off is that local models are smaller than cloud models like Claude or GPT-4. They work well for most tasks — summarization, editing, Q&A, translation — but may not match cloud quality on highly complex analysis.

Installation

Download Ollama from ollama.com and install it
Open your terminal and pull a chat model:

ollama pull llama3.2

For semantic search across your documents, also pull an embedding model:

ollama pull nomic-embed-text

That’s it. Ollama runs as a background service — Return connects to it automatically.

Verify the connection

Open Settings (Cmd+,) and go to the AI tab. You should see Ollama listed as available. If the status shows a green indicator, the connection is working.

Make sure the Mode is set to Local and Ollama is selected as the active provider.

Choose your models

In the AI settings, you can select which Ollama models to use:

Chat model — handles conversations, document editing, and analysis. Default: llama3.2. For better quality on a capable machine, try llama3.1:8b or mistral.
Embedding model — powers semantic search across your document library. Default: nomic-embed-text. This is the recommended choice for most setups.

To see what models you have installed, run ollama list in your terminal.

Recommended models

Model	Size	Best for
`llama3.2`	~2 GB	General use, fast responses
`llama3.1:8b`	~4.7 GB	Better quality, still reasonably fast
`mistral`	~4.1 GB	Good at following instructions
`nomic-embed-text`	~274 MB	Semantic search (embedding model)

Pull any model with ollama pull model-name.

Enable semantic search

Semantic search lets you find documents by meaning, not just keywords. To enable it:

Make sure an embedding model is installed (ollama pull nomic-embed-text)
In Settings → AI, enable Semantic Indexing
Return will index your documents in the background

Once indexing completes, the search in the Explorer and the /search command will use semantic matching.

Troubleshooting

Ollama shows as unavailable in Settings

Ollama might not be running. Open your terminal and run ollama serve. On macOS, you can also start the Ollama app from Applications — it runs the server automatically.

Responses are very slow

Local AI speed depends on your hardware. If responses take too long, try a smaller model like llama3.2 instead of larger variants. Closing other resource-heavy applications can also help.

Model not found

If Return says the model isn’t available, make sure you’ve pulled it: ollama list shows installed models. If the model isn’t there, run ollama pull model-name.

Port conflict

Ollama runs on port 11434 by default. If another service uses that port, Ollama won’t start. Check with lsof -i :11434 and stop the conflicting service.