As I work every day, I constantly see recurring work that I would like to optimize and automate. There are use cases that feel obvious that they should work and easy to do, but the real-world experience is different. Read about my learning.
Process Graph data with Vision Model
One use case I came across is that my DDoS traffic scrubbing provider sends me a daily PDF report of cleaned traffic. However, reviewing a PDF daily is not my thing. With ImageMagick, I can convert the PDF chart to an image and then process it in a Vision LLM model.

I did experiment with llama3.2-vision:11b-instruct-q8_0 but was completely unsuccessful at reading the graph consistently using Ollama. It could read the graph but there was too much variability, and it could not correctly understand the different lines. I was quite shocked at this as I expected no issues reading a simple graph with llama3.2-vision:11b-instruct-q8_0. This is once more expectation meeting reality.
Switching to OpenAI gpt-4o-mini resolved this issue. The model had no issue processing the graphs and returning consistent and correct answers for the graph. However, using this proprietary model I can no longer run it on my local computer. For Ollama, I have no solution yet as in my case for llama3.2-vision:90b I do not have a powerful enough system at hand. On the OpenAI API, processing images also comes with a high price compared to text; the token usage is always substantial in those cases. This can be optimized by downscaling the image, but as the processing already was troublesome with regular resolution this was not an option. I processed around 1000 graphs 1980 x 1280 for $5 in January 2025.