How to use or disable Model Thinking in Ollama (Python Tutorial)
For Reasoning Models such as DeepSeek-R1 & Qwen3
This short tutorial covers how to enable or disable reasoning model thinking in Ollama. This feature essentially allows you to prevent reasoning AI models such as DeepSeek-R1 or Qwen 3 from outputting their chain-of-thought (CoT) reasoning, which results it lower latency and higher speed.
The video covers the latest updates, how to install Ollama on your computer, how to run it locally, and how to turn thinking mode on or off.
FULL CODE
# Ollama Thinking Mode Toggle
# Important: update to most recent version of Ollama Desktop Application as well as Ollama Python Library
# pip install -U ollama
# Code
from ollama import Client
client = Client()
question = "Is the Earth a planet? Respond with either 'yes' or 'no', no other words"
### Thinking Disabled
resp_fast = client.chat(
model="qwen3:0.6b",
messages=[{"role": "user", "content": question}],
think=False)
# Access Response
print(resp_fast["message"]["content"])
### Thinking Enabled
resp_verbose = client.chat(
model="qwen3:0.6b",
messages=[{"role": "user", "content": question}],
think=True)
# Access Thinking
print(resp_verbose["message"]["thinking"])
Subscribe to the Deep Charts YouTube Channel for more informative AI and Machine Learning Tutorials.