While I’m aware those models aren’t limited or bound to Ollama, is Ollama still the way I interface and use them? Here I try to keep notes on how I use certain models and what I like or dislike about them.
This is purely a subjective view, there are many attempts at objectively measuring the performance of models. This is not it.
Model | VRAM Size | Quantization | Remarks from my personal experience |
---|---|---|---|
gemma3:27b | 20627.03 MB | Q4_K_M | 📕 When using Gemma 3 to correct my English, I see that the text it creates, while grammatically correct, no longer means the same thing. Therefore, I prefer Gemma2 for correcting English as it does it well and preserves the content and writing style more. 👁🗨 Supports vision |
gemma2:27b | 19903.46 MB | Q4_0 | 📕 Very good in fixing English or German texts. ⭐This is my most used and favorite model at the moment. |
qwen3:32b | 24773.64 MB | Q4_K_M | |
qwen3:30b-a3b | 20293.31 MB | Q4_K_M | For fast answers |
deepseek-r1:32b | 22385.44 MB | Q4_K_M | |
mistral:7b | 6095.05 MB | Q4_0 | 📘 An efficient and cheap model that stays truer than others when summarizing texts when processing time is an issue. I did summarize over 300’000 text samples in around ~400 hours. |
falcon3:10b | 8176.71 MB | 8176.71 MB | |
llama3.1:latest | 6942.47 MB | Q4_K_M | |
llama3.2:latest | 4090.59 MB | Q4_K_M | |
llama3.2-vision:latest | 11696.27 MB | Q4_K_M | 👁🗨 Supports vision |
devstral:latest | 16035.07 MB | Q4_K_M | Devstral is built under a collaboration between Mistral AI and All Hands AI 🙌 Devstral is light enough to run on a single RTX 4090 or a Mac with 32GB RAM, making it an ideal choice for local deployment and on-device use. |
Osmosis/Osmosis-Structure-0.6B:latest | 2,905.68 | unknown | A specialized small language model (SLM) designed to excel at structured output generation |
bge-m3 | 1690.55 MB | F16 | 🎯 This is an multi-language embedding model the model of choice for my RAG. I usually combine it with Qdrant ⭐This is my most used and favorite embedding model at the moment. |
Reflection on what I run on Ollama
In this section, I will reflect on my actual usage. Read more about this graphs.
Week 18 (2025)
I’m a user of all kinds of AI, including OpenAI ChatGPT or Anthropic Cloud. Potentially soon also Grok API via Azure. But for some data I process locally as I don’t want to send the data to the APIs of those companies, as well as for my personal learning journey. We see this week large usage of Mistral:7B. I like to use Mistral:7B summarization when power consumption is an issue. As I was processing a hundred thousand entries, I was using Mistral:7B over Gemma2:27B for keeping the accuracy relatively well while being able to process in OK time on my Mac.

Week 19 (2025)
I’m a huge fan of gemma2:27b and was excited with the release of gemma3:27b. However, after testing and comparing both models, I like gemma2:27b because on my summarization and translation workflows it proved to create results closer to what I expected. Gemma3:27b did a lot more “creative” changes and even changed the context in unintended ways.

Week 20 (2025)
I more or less abandoned my summarization and translation workflow attempts with gemma-3b:27 and switched back to gemma-2:27, which is a kickass model. I also updated the Qdrant vector database with new content, where my go-to model is bge-m3:latest.
I was also exploring the new model qwen-30b-a3b, but couldn’t find a use case for it. But always good to see and try.

Week 21 (2025)
This was more or less a regular week with large summarization jobs running on my still favorite model, Gemma 2:27B.

Week 22 (2025)
We see two new models on the list. I was playing with devstral:latest, a derivative of Mistral tailored for coding. I used it in conjunction with AllHands, which gives a chat interface to build your applications. This was my best outcome to date with local, limited processing code generation.
In general, however, we see that my regular summarization and translation workflow is still processing the bulk of the time using Gemma2.

Week 23 (2025)
This was not a good week for my Mac. After a power outage from a storm, I lost access for about five days. Therefore, fewer workloads were processed this week, and nothing new could be tested. I was forced to use low-power Gemma 2 models and found that they are not up to the task. Below Gemma:12b is pretty much not usable for translation and summarization in my case. I’m happy my Mac is now working again.
