Categories
AI

Deepseek-r1 (Reasoning models)

DeepSeek’s first generation reasoning models with comparable performance to OpenAI-o1.

Deepseek logo (2025-01-22)

What is a Reasoning model

In the words of OpenAI.

A new series of AI models designed to spend more time thinking before they respond. As AI becomes more advanced, it will solve increasingly complex and critical problems. It also takes significantly more compute to power these capabilities.

https://openai.com/o1/ (2025-01-22)

Or as an excerpt from Wikipedia.

It spends time “thinking” before it answers, making it better at complex reasoning tasks, science and programming…

https://en.wikipedia.org/wiki/OpenAI_o1 (2025-01-22)

Now we have two options: OpenAI’s ‘o1-model’ and Deepseek R1. The main difference is that the OpenAI model is only available to paying customers in a very limited quantity. Due to its much lower computational requirements and pricing, it’s possible to use Deepseek online on their website. Due to the power of open source, you can also run it on your Ollama.

Why is DeepSeek such a big deal?

DeepSeek, a Chinese AI company, has achieved a remarkable breakthrough in AI model development that could reshape the entire industry. While traditional AI models like those from OpenAI and Anthropic require massive computing resources costing $100M+ and thousands of expensive GPUs, DeepSeek has managed to create comparable or better-performing models for just $5M. They accomplished this through several innovative approaches: using lower numerical precision that reduces memory needs by 75%, implementing a multi-token system that processes whole phrases at once, and developing an “expert system” that only activates relevant portions of the model as needed.

The implications are profound, both technically and economically. Instead of running all 1.8 trillion parameters continuously like traditional models, DeepSeek’s approach only keeps 37B parameters active at any time, making it possible to run on regular gaming GPUs rather than specialized data center hardware. Their engineering team even bypassed Nvidia’s CUDA software to optimize GPU performance at the assembly language level. This democratization of AI development could dramatically reshape the competitive landscape, potentially threatening Nvidia’s business model and the advantages currently held by major tech companies. The fact that DeepSeek has made their innovations open source suggests we’re at an inflection point where AI development could become significantly more accessible and less expensive.

Try Online

DeepSeek-V3 achieves a significant breakthrough in inference speed over previous models.

It tops the leaderboard among open-source models and rivals the most advanced closed-source models globally.

https://www.deepseek.com (2025-01-22)

https://www.deepseek.com

Try local with Ollama

https://ollama.com/library/deepseek-r1 (self)

1.5B Qwen DeepSeek R1

ollama pull deepseek-r1:1.5b

7B Qwen DeepSeek R1

ollama pull deepseek-r1:7b

8B Llama DeepSeek R1

ollama pull deepseek-r1:8b

14B Qwen DeepSeek R1

ollama pull deepseek-r1:14b

32B Qwen DeepSeek R1

ollama pull deepseek-r1:32b

70B Llama DeepSeek R1

ollama pull deepseek-r1:70b

671B DeepSeek R1

ollama pull deepseek-r1:671b

What i learned using Deepseek-r1

It feels quite different to prompt reasoning models. Most of your prompt engineering skills for instruct models change when working with reasoning models. Zero-shot prompting seems to be a great approach for them. Or asking the model to take all the time it needs, as opposed to asking for fast results, will apparently improve results.

Reasoning models take notably longer to respond, but they also have great results in many use cases. DeepSeek seems to produce great results, in some benchmarks, even better than OpenAI’s o1-model for a fraction of the price (API or CPU), or even self-hosted as it is open source in comparison to OpenAI o1-Model which is not available for local usage.

Hallucination of browsing

I see deepseek-r1:32b making references to looking things up online with no means to do so in my example. It thinks it looks stuff up, but this is all happening as part of the reasoning.

2025-02-01 deepseek-r1:32b looking up facts, but having no access for real.