Categories
AI

Ollama on macOS

This is also a show-and-tell of options on how to use and play with Ollama. I tend to use my Ollama as a headless system sending requests via VPN for remote processing on my Ollama macOS.

I’ll help you create a persistent configuration for OLLAMA_HOST on macOS. The most reliable way to do this is to create a LaunchAgent plist file.

Mac Mini M4 Pro

Personally, I think an Apple Mac Mini M4 Pro is currently one of the best and most stress-free solutions to run an LLM. Of course, the Studio devices have more power, but the price is also significantly higher, so this can only be justified if you can make a serious return on investment.

Mac Mini M4 Pro with Ulanzi QT01

If you can share your data with the cloud and the volume of data is moderate, you are likely better off by just using an API from OpenAI or Anthropic. If you have a lot of data to process and the reduced quality is okay and you have the time to process it for days on end, a Mac might be interesting. I paired my Mac with an external fan to support its cooling.

I had the issue that Llama was not binding to 0.0.0.0 by default, and if I wanted to use it remotely from mobile apps or possibly another computer through an overlay network, I needed it to listen on 0.0.0.0 instead of just localhost. As this is not standard practice, I documented it here for everyone with the same issue.

Here’s how:

  1. First, create a new plist file in your user’s LaunchAgents directory
bmkdir -p ~/Library/LaunchAgents
nano ~/Library/LaunchAgents/com.ollama.host.plist
  1. Add the following content to the plist file:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
    <key>Label</key>
    <string>com.ollama.host</string>
    <key>ProgramArguments</key>
    <array>
        <string>/bin/launchctl</string>
        <string>setenv</string>
        <string>OLLAMA_HOST</string>
        <string>0.0.0.0:11434</string>
    </array>
    <key>RunAtLoad</key>
    <true/>
</dict>
</plist>
  1. Load the LaunchAgent:
launchctl load ~/Library/LaunchAgents/com.ollama.host.plist

This will set the environment variable every time you log in, persisting across reboots. To verify it’s working, you can restart your computer and then check the environment variable:

launchctl getenv OLLAMA_HOST

If you ever need to remove this configuration, you can unload the LaunchAgent:

launchctl unload ~/Library/LaunchAgents/com.ollama.host.plist

Apple-M LLM Performance

If you’re also second-guessing which Mac is right for your LLM needs, this can give you a good overview and feeling. Especially if you have a device already and are thinking about upgrading to more.

This is a collection of short llama.cpp benchmarks using LLaMA 7B on various Apple Silicon hardware. It can be useful to compare the performance that llama.cpp achieves across the M-series chips and hopefully answer questions of people wondering if they should upgrade or not.

https://github.com/ggerganov/llama.cpp/discussions/4167 (2025-02-14)

How to use Ollama on a Mac

Own tools

I think I’m quite special in this regard. I have plenty of tools I built myself that I use with Llama. This is an example: a tool that lets me run prompts against a whole host of models to compare and use them, and storage for prompts I regularly use.

Self-made query system (2025-03-11)

So when I’m working on a new process to automate, I compare both the prompt and the result of different models. For certain tasks, a thinking model like DeepSeek is better; in others, I see great performance in Gemma2, Llama 3.1, Qewen 2.5, or Mistral. I always try to optimize not only the prompt but also select the model that suits me most for bulk processing of a certain task. With those smaller models, it’s often easier to pick and choose the model that works best for the moment over just optimizing the prompt.

On Mobile

If I like to run something on my mobile, I’m more often than not falling back to cloud offerings from big providers like OpenAI, Anthropic, or DeepSeek. But with my Ollama app, I have the option to run requests at home as well. To reach my Mac, I use an overlay network from Tailscale. This gives me the flexibility to route requests from anywhere to my Mac.

https://apps.apple.com/us/app/my-ollama/id6738298481 (2025-03-11)

Apple iOS App “My Ollama”

On Computer

Another way to use it is also with Tailscale to my Mac from my regular workstation. Then using either anything LLM or OpenWebUI, both support remote Ollama hosts. And therefore allow me processing that does not affect my local workstation GPU. This is a luxurious way of processing, but also allows me to do long-running processes that take days and weeks processing data by just running it in the background.