Categories
AI

My experience with LLM use cases

As I work every day, I constantly see recurring work that I would like to optimize and automate. There are use cases that feel obvious that they should work and easy to do, but the real-world experience is different. Read about my learning.

Process Graph data with Vision Model

One use case I came across is that my DDoS traffic scrubbing provider sends me a daily PDF report of cleaned traffic. However, reviewing a PDF daily is not my thing. With ImageMagick, I can convert the PDF chart to an image and then process it in a Vision LLM model.

Fake example for DDoS Protection Service (2025-01-14)

I did experiment with llama3.2-vision:11b-instruct-q8_0 but was completely unsuccessful at reading the graph consistently using Ollama. It could read the graph but there was too much variability, and it could not correctly understand the different lines. I was quite shocked at this as I expected no issues reading a simple graph with llama3.2-vision:11b-instruct-q8_0. This is once more expectation meeting reality.

Switching to OpenAI gpt-4o-mini resolved this issue. The model had no issue processing the graphs and returning consistent and correct answers for the graph. However, using this proprietary model I can no longer run it on my local computer. For Ollama, I have no solution yet as in my case for llama3.2-vision:90b I do not have a powerful enough system at hand. On the OpenAI API, processing images also comes with a high price compared to text; the token usage is always substantial in those cases. This can be optimized by downscaling the image, but as the processing already was troublesome with regular resolution this was not an option. I processed around 1000 graphs 1980 x 1280 for $5 in January 2025.

Captcha solving with LLM

A common nuisance on the internet is CAPTCHA. Some are so hard that even as a human, it’s a struggle to solve them. Others, like the example here, are so basic that an LLM can solve them very reliably. CAPTCHAs have become much weaker since the evolution of LLM models. However, running an LLM creates substantial CPU load and therefore is still slowing down automated actions. Slower also means more expensive for the automater. But what used to be almost unsolvable can now be one click away.

Ollama using a LLM Vision enabled model to solve a simple captcha (2025-04-01)

Even advanced OCRs like Tesseract struggle to solve basic CAPTCHAs, and they require a perfect match to pass.

Tesseract OCR attempting to read text on image.

This development means the value of a captcha is decreasing, and IP reputation-based systems are further on the rise, while also offering a perfect way for large players in the field to collect tracking data across the web.