Using Ollama (Local AI)
Ollama lets you run AI models directly on your computer. Your documents never leave your machine, there are no API costs, and it works offline.
Why local AI
With Ollama, every AI conversation, document analysis, and search query runs on your hardware. This means complete privacy — nothing is sent to external servers. It also means you can use Return’s AI features without an internet connection or a paid API subscription.
The trade-off is that local models are smaller than cloud models like Claude or GPT-4. They work well for most tasks — summarization, editing, Q&A, translation — but may not match cloud quality on highly complex analysis.
Installation
- Download Ollama from ollama.com and install it
- Open your terminal and pull a chat model:
ollama pull llama3.2
- For semantic search across your documents, also pull an embedding model:
ollama pull nomic-embed-text
- That’s it. Ollama runs as a background service — Return connects to it automatically.
Verify the connection
Open Settings (Cmd+,) and go to the AI tab. You should see Ollama listed as available. If the status shows a green indicator, the connection is working.
Make sure the Mode is set to Local and Ollama is selected as the active provider.
Choose your models
In the AI settings, you can select which Ollama models to use:
- Chat model — handles conversations, document editing, and analysis. Default:
llama3.2. For better quality on a capable machine, tryllama3.1:8bormistral. - Embedding model — powers semantic search across your document library. Default:
nomic-embed-text. This is the recommended choice for most setups.
To see what models you have installed, run ollama list in your terminal.
Recommended models
| Model | Size | Best for |
|---|---|---|
llama3.2 | ~2 GB | General use, fast responses |
llama3.1:8b | ~4.7 GB | Better quality, still reasonably fast |
mistral | ~4.1 GB | Good at following instructions |
nomic-embed-text | ~274 MB | Semantic search (embedding model) |
Pull any model with ollama pull model-name.
Enable semantic search
Semantic search lets you find documents by meaning, not just keywords. To enable it:
- Make sure an embedding model is installed (
ollama pull nomic-embed-text) - In Settings → AI, enable Semantic Indexing
- Return will index your documents in the background
Once indexing completes, the search in the Explorer and the /search command will use semantic matching.
Troubleshooting
Ollama shows as unavailable in Settings
Ollama might not be running. Open your terminal and run ollama serve. On macOS, you can also start the Ollama app from Applications — it runs the server automatically.
Responses are very slow
Local AI speed depends on your hardware. If responses take too long, try a smaller model like llama3.2 instead of larger variants. Closing other resource-heavy applications can also help.
Model not found
If Return says the model isn’t available, make sure you’ve pulled it: ollama list shows installed models. If the model isn’t there, run ollama pull model-name.
Port conflict
Ollama runs on port 11434 by default. If another service uses that port, Ollama won’t start. Check with lsof -i :11434 and stop the conflicting service.