Ollama is great for running a local self-hosted AI REST API. You can load all kinds of models, whether they’re for chatting, vision, or embedding. At this time, I’m not aware that voice models can be used for input or output, but that might only be a matter of time.
Why Ollama
Why would you like to run Ollama? First of all, the models you can load are likely less powerful than what you can consume from the OpenAI API or Anthropic API. However, you can fully locally process, gaining the privacy of not having to process in a US datacenter but rather it does not even have to leave your house. Additionally, if you need to process massive amounts of data and the precision is sufficient from the open models, you might have to wait longer for all to be processed, but using public APIs can also rack up vast amounts of cost to do so. Even by error, you might burn through $80 in no time when you missed a error catch in your code. In the Ollama case, you only have your upfront cost for the device and the energy burned. But then you have no sudden surprises. But also your processing is of course limited; you can’t massively parallel process. So it highly depends, but there is a sweet spot to solve issues on your own machine.