Categories
AI

What i learned about Ollama and Models

While I’m aware those models aren’t limited or bound to Ollama, is Ollama still the way I interface and use them? Here I try to keep notes on how I use certain models and what I like or dislike about them.

This is purely a subjective view, there are many attempts at objectively measuring the performance of models. This is not it.

ModelVRAM SizeQuantizationRemarks from my personal experience
gemma3:27b20627.03 MBQ4_K_M📕 When using Gemma 3 to correct my English, I see that the text it creates, while grammatically correct, no longer means the same thing. Therefore, I prefer Gemma2 for correcting English as it does it well and preserves the content and writing style more.
👁‍🗨 Supports vision
gemma2:27b19903.46 MBQ4_0📕 Very good in fixing English or German texts.
⭐This is my most used and favorite model at the moment.
qwen3:32b24773.64 MBQ4_K_M
qwen3:30b-a3b20293.31 MBQ4_K_MFor fast answers
deepseek-r1:32b22385.44 MBQ4_K_M
mistral:7b6095.05 MBQ4_0📘 An efficient and cheap model that stays truer than others when summarizing texts when processing time is an issue. I did summarize over 300’000 text samples in around ~400 hours.
falcon3:10b8176.71 MB8176.71 MB
llama3.1:latest6942.47 MBQ4_K_M
llama3.2:latest4090.59 MBQ4_K_M
llama3.2-vision:latest11696.27 MBQ4_K_M👁‍🗨 Supports vision
devstral:latest16035.07 MBQ4_K_MDevstral is built under a collaboration between Mistral AI and All Hands AI 🙌
Devstral is light enough to run on a single RTX 4090 or a Mac with 32GB RAM, making it an ideal choice for local deployment and on-device use.
Osmosis/Osmosis-Structure-0.6B:latest2,905.68unknownA specialized small language model (SLM) designed to excel at structured output generation
bge-m31690.55 MBF16🎯 This is an multi-language embedding model the model of choice for my RAG. I usually combine it with Qdrant
⭐This is my most used and favorite embedding model at the moment.

Reflection on what I run on Ollama

In this section, I will reflect on my actual usage. Read more about this graphs.

Week 18 (2025)

I’m a user of all kinds of AI, including OpenAI ChatGPT or Anthropic Cloud. Potentially soon also Grok API via Azure. But for some data I process locally as I don’t want to send the data to the APIs of those companies, as well as for my personal learning journey. We see this week large usage of Mistral:7B. I like to use Mistral:7B summarization when power consumption is an issue. As I was processing a hundred thousand entries, I was using Mistral:7B over Gemma2:27B for keeping the accuracy relatively well while being able to process in OK time on my Mac.

During week 18 (2025) (Apr 28 – May 4), Ollama processed 11 model sessions with a total runtime of 7 hr, 4 min across 8 different models.

Week 19 (2025)

I’m a huge fan of gemma2:27b and was excited with the release of gemma3:27b. However, after testing and comparing both models, I like gemma2:27b because on my summarization and translation workflows it proved to create results closer to what I expected. Gemma3:27b did a lot more “creative” changes and even changed the context in unintended ways.

During week 19 (2025) (May 5 – May 11), Ollama processed 58 model sessions with a total runtime of 4 hr, 37 min across 8 different models.

Week 20 (2025)

I more or less abandoned my summarization and translation workflow attempts with gemma-3b:27 and switched back to gemma-2:27, which is a kickass model. I also updated the Qdrant vector database with new content, where my go-to model is bge-m3:latest.

I was also exploring the new model qwen-30b-a3b, but couldn’t find a use case for it. But always good to see and try.

During week 20 (2025) (May 12 – May 18), Ollama processed 63 model sessions with a total runtime of 1 day, 9 hr across 5 different models.

Week 21 (2025)

This was more or less a regular week with large summarization jobs running on my still favorite model, Gemma 2:27B.

During week 21 (2025) (May 19 – May 25), Ollama processed 34 model sessions with a total runtime of 1 day across 10 different models.

Week 22 (2025)

We see two new models on the list. I was playing with devstral:latest, a derivative of Mistral tailored for coding. I used it in conjunction with AllHands, which gives a chat interface to build your applications. This was my best outcome to date with local, limited processing code generation.

In general, however, we see that my regular summarization and translation workflow is still processing the bulk of the time using Gemma2.

During week 22 (2025) (May 26 – Jun 1), Ollama processed 31 model sessions with a total runtime of 16 hr, 28 min across 4 different models.

Week 23 (2025)

This was not a good week for my Mac. After a power outage from a storm, I lost access for about five days. Therefore, fewer workloads were processed this week, and nothing new could be tested. I was forced to use low-power Gemma 2 models and found that they are not up to the task. Below Gemma:12b is pretty much not usable for translation and summarization in my case. I’m happy my Mac is now working again.

During week 23 (2025) (Jun 2 – Jun 8), Ollama processed 7 model sessions with a total runtime of 1 day, 15 hr across 2 different models.