What DeepSeek Is (and Isn't)
DeepSeek is a Chinese AI lab that builds large language models focused on reasoning, mathematics, and code. Its flagship model, DeepSeek-R1, shows its reasoning step by step before delivering a final answer. This transparency into how the model thinks — not just what it concludes — is its most distinctive trait.
That reasoning capability is genuine. DeepSeek performs competitively with frontier models on maths, formal logic, and code, at up to 95% lower API cost than GPT-5. Its models are open-weight under the MIT license (R1, V3 series, VL2, Janus-Pro, and distilled variants from 1.5B to 671B parameters), so developers can run them locally, fine-tune them, or use them commercially.
Where DeepSeek is less distinctive: no ecosystem integration with productivity tools, and web search that exists but is less reliable than competitors. The chat interface at chat.deepseek.com is clean, with toggles for DeepThink (R1 reasoning) and Search (web access), plus file and image upload. As a Chinese company, DeepSeek raises serious privacy questions — conversations are processed under Chinese data laws. Multiple governments (US federal, Italy, South Korea, Australia, Taiwan, India, Czech Republic, Netherlands) have banned it on official devices, and data protection authorities in France, Ireland, Belgium, the Netherlands, and Germany have launched formal investigations. In January 2025, Wiz Research found an exposed database with plaintext chat histories. In March 2026, Germany's BSI confirmed that a pilot project using DeepSeek-V3 inadvertently transmitted classified metadata to DeepSeek's Shanghai cluster. For some users this is manageable; for others handling sensitive data, it is a dealbreaker.
If that list of concerns feels overwhelming, here's the practical takeaway: DeepSeek is a powerful reasoning tool with real data governance risks. This path teaches you to use its strengths while managing those risks honestly.
This path teaches you to use DeepSeek where it excels — structured reasoning, technical problem-solving, and code — while being honest about its limits.
Get Started
Open chat.deepseek.com and create an account. The interface has a text input, a DeepThink toggle (R1 reasoning), a Search toggle (web access), and a paperclip icon for file/image upload. Turn DeepThink on. Leave Search off for now.
Type this prompt exactly:
A bat and a ball cost $1.10 together. The bat costs $1.00 more than the ball. How much does the ball cost? Show your complete reasoning before answering.
Read the response carefully. Notice three things:
- The thinking process — a visible reasoning chain before the answer, often in a collapsible "Thinking" section. The model working through logic, not decoration.
- Correct answer — the ball costs $0.05, not $0.10. A classic cognitive bias trap. Did DeepSeek avoid the intuitive wrong answer and explain why?
- Depth of explanation — did it set up equations, identify the common mistake, and walk through the algebra? Or just state the answer?
If you've used other AI assistants and found their reasoning shallow -- just confident statements without showing the work -- you've identified exactly the gap DeepSeek fills.
If you got a visible reasoning chain that correctly solved a problem designed to trick intuition, you have seen what makes DeepSeek different. Everything here builds on that capability.
Core Skill 1: Reasoning and Problem-Solving
With DeepThink enabled, DeepSeek constructs a logical pathway, considers edge cases, and sometimes catches its own mistakes mid-reasoning. The default model is DeepSeek-V3.2; DeepThink activates R1-0528, scoring 87.5% on AIME 2025 (up from 70%) with added JSON output and function calling.
🔢87.5%
DeepSeek R1-0528's score on AIME 2025 — up from 70% on the previous version. One of the highest reasoning benchmark scores among all AI models.
Source: DeepSeek, 2026
We used to ask DeepSeek factual questions and wonder why it was so slow. The overhead of DeepThink makes sense for logic problems -- it's overkill for simple lookups. Learning when to toggle it on was the breakthrough.
Exercise: DeepThink — On vs. Off
This exercise takes about 10 minutes and answers the question everyone has: is the extra thinking time actually worth it?
- Choose a complex reasoning question. We suggest: "A company has £2M revenue, 40% margins, and wants to expand to 3 new markets. What is the optimal allocation strategy and why?" — but any multi-step problem works.
- Ask DeepSeek with DeepThink OFF. Note the response time, length, and how many reasoning steps it shows.
- Ask the exact same question with DeepThink ON. Note the same metrics.
- Fill in this comparison:
- Response time: Off ___ vs On ___
- Reasoning steps shown: Off ___ vs On ___
- Did it catch edge cases? Off ___ vs On ___
- Which would you trust for a real decision?
- Reflect: when is the extra thinking time worth it, and when is the faster response sufficient? Most of us land on a simple rule — DeepThink for decisions with consequences, off for everything else.
What to observe: The difference is often not just depth but direction. DeepThink sometimes catches assumptions that the fast response treats as given. That is where the real value hides.
The key principle: ask DeepSeek to reason, not just answer. "What is the optimal strategy?" gets a decent response. "Analyse this step by step, identify key variables, consider alternatives, and explain which is optimal and why" activates the full engine.
Reasoning shines on formal logic, mathematical proofs, multi-constraint optimisation, and problems where the obvious answer is wrong. It overcomplicates simple factual questions and straightforward tasks.
Knowledge Check
A colleague asks you to use DeepSeek to check whether next Friday is a public holiday. What should you tell them?
When reasoning works well
Last verified: March 2026
- Mathematical proofs and derivations — multi-step algebra, calculus, number theory
- Logic puzzles and constraint satisfaction — problems with multiple interacting rules
- Decision analysis — weighing tradeoffs with explicit criteria
- Debugging logical arguments — finding flaws in reasoning chains
📈 DeepSeek R1 Task Accuracy
Source: AI Tutorium internal testing, March 2026
When it falls short
Last verified: March 2026
- Simple factual recall — reasoning mode adds overhead without benefit
- Subjective or creative tasks — chain-of-thought can make creative writing feel mechanical
- Tasks requiring current information — the Search toggle provides web access, but accuracy on current events is ~62%, well below competitors. Verify independently
- Over-reasoning — sometimes produces a long reasoning chain that circles back to a simple conclusion
Exercise: Multi-Step Logic Problem
Scenario: Test whether DeepSeek's reasoning handles problems with non-obvious solutions.
Task: With DeepThink enabled, send this prompt:
Three people check into a hotel room costing $30, each paying $10. The manager realises it should cost $25 and gives $5 to the bellhop to return. The bellhop keeps $2 and returns $1 each. Now each person paid $9 (total $27), the bellhop has $2 — that's $29. Where is the missing dollar? Explain the flaw.
What to observe: Does DeepSeek identify the misleading framing? Does the reasoning chain show where the accounting trick lies?
Reflection: Open the "Thinking" section. Did the model consider and reject wrong approaches before arriving at the correct explanation?
Exercise: Mathematical Reasoning Under Constraints
Scenario: See how DeepSeek handles setting up equations from a word description.
Task: Prompt DeepSeek:
A farmer has chickens and cows. He counts 50 heads and 140 legs. How many of each does he have? Solve using two methods: simultaneous equations and logical elimination. Then explain which method generalises better to three or more animal types.
What to observe: Does DeepSeek produce two genuinely distinct methods? Does the comparison show real analytical thinking?
Reflection: Try the same prompt without DeepThink enabled. Is there a noticeable difference in depth and quality?
Core Skill 2: Code and Technical Work
DeepSeek generates clean, well-structured code across Python, JavaScript/TypeScript, Java, C++, Go, and Rust. Its strength is not just writing code that runs, but explaining design decisions and suggesting alternatives. API pricing is attractive: V3.2 costs $0.28/$0.42 per million input/output tokens; R1 costs $0.55/$2.19.
⚖️ V3.2 vs R1 API Cost (per million tokens)
Model Strengths Input Cost Output Cost DeepSeek V3.2 Fast general model, no reasoning chain $0.28 $0.42 DeepSeek R1 Full reasoning chain, highest accuracy $0.55 $2.19
What it handles well
Last verified: March 2026
- Algorithm design — from sorting to graph algorithms to dynamic programming
- Debugging with reasoning — traces through code logic to find actual bugs, not just pattern-match common errors
- Code explanation — breaks down complex functions line by line
- Test generation — produces meaningful test cases including edge cases
- Language translation — converting between languages while preserving idioms
Where it struggles
Last verified: March 2026
- Cutting-edge framework APIs — may use outdated patterns after training cutoff. Search toggle helps in chat, but API users do not get web search
- Large-context code tasks — loses coherence with very long files or many files at once
- Visual or UI code — can generate CSS or layout code, but cannot evaluate whether it looks right
- Environment-specific configuration — Docker, CI/CD, deployment configs often need adjustment
Exercise: Debug and Optimise
Scenario: You have a function that works but is inefficient.
Task: Send this prompt:
This Python function is too slow for large inputs. Identify the performance problem, explain why, and rewrite with optimal time complexity. Include Big-O analysis for both versions. def find_pairs(nums, target): pairs = [] for i in range(len(nums)): for j in range(i+1, len(nums)): if nums[i] + nums[j] == target: pairs.append((nums[i], nums[j])) return pairs
What to observe: Does it correctly identify the O(n squared) problem and propose a hash-based O(n) solution? Does the reasoning chain analyse the nested loop structure?
Reflection: Test the optimised code with a large input. Does it perform better, or did DeepSeek introduce a correctness bug?
DeepSeek can generate code and explain design decisions — but why is it important to actually run and test its optimised code, rather than trusting the reasoning chain?
The reasoning chain shows the model's intended logic, not verified behaviour. DeepSeek can produce a convincing explanation of why a hash-based approach is O(n) while introducing a subtle bug — like mishandling duplicate pairs or off-by-one errors. The reasoning chain passing your smell test is necessary but not sufficient; execution is the only proof.
Exercise: Explain and Extend
Scenario: You have encountered unfamiliar code and need to understand it before modifying it.
Task: Find a function from an open-source project. Paste it into DeepSeek and prompt:
Explain what this function does, line by line. Then identify one limitation or edge case the original author didn't handle, and write an improved version that addresses it. Explain your changes.
What to observe: Is the explanation accurate? Is the limitation real? Does the improvement fix the issue without breaking existing behaviour?
Reflection: For what types of code comprehension does this save meaningful time versus reading documentation?
Core Skill 3: Research and Analysis
In our experience, DeepSeek's greatest research value isn't web search -- it's the analytical framework it brings to information you already have.
DeepSeek has a Search toggle for web access, and R1 was one of the first reasoning models to integrate web search. However, its accuracy on current events (~62%) is inconsistent. DeepSeek's greatest research value is structured analysis: organising, comparing, and reasoning about information methodically — an analytical partner that excels at systematic thinking.
This works well for comparing approaches or thinking through implications. For current facts or breaking news, verify Search results independently or use a tool with more reliable web access.
If DeepSeek's web search accuracy is only ~62% on current events, why is it still valuable for research tasks?
DeepSeek's research strength is not web search — it is structured analysis. Organising information, comparing options against explicit criteria, and reasoning through implications are all tasks that rely on the model's logic engine, not its ability to fetch current data. You bring the facts; DeepSeek brings the analytical framework.
Exercise: Structured Comparison
Scenario: You need to choose between two approaches and want systematic analysis.
Task: Pick a real decision you face (or use this example) and prompt:
I'm choosing between a monolithic and microservices architecture for a web application serving 10,000 daily users. Analyse systematically: define evaluation criteria, assess each option against them, identify assumptions that would change your recommendation, and give a clear recommendation with reasoning.
What to observe: Does DeepSeek define criteria before evaluating, or jump to a conclusion? Does it acknowledge genuine tradeoffs?
Reflection: How does this compare to asking a colleague? Where does it add value, and where does it lack domain context?
Exercise: Concept Synthesis
Scenario: You are preparing to write about a topic and need to organise your understanding.
Task: Choose a topic you know moderately well and prompt:
Explain the relationship between [Topic A] and [Topic B]. Start with how they're understood independently, then cover where they intersect, where they conflict, and what most people get wrong about their relationship. Structure this as a briefing for writing an article.
What to observe: Does the synthesis reveal connections you had not considered? Is the "what most people get wrong" section insightful or obvious? Since you know this topic, you can evaluate accuracy directly.
Reflection: What topic familiarity do you need for DeepSeek's synthesis to be useful? If you knew nothing, could you spot errors?
Core Skill 4: Understanding the Tradeoffs
Choose DeepSeek when: you need step-by-step reasoning, code with explanation, structured analysis, local model deployment (MIT license), or low API cost (V3.2: $0.28/$0.42 per million tokens; R1: $0.55/$2.19). Off-peak discounts of up to 75% (R1) and 50% (V3) are available.
Choose a different tool when: you need reliable current information (DeepSeek's web search is less accurate than competitors), productivity suite integration, fluid creative writing, or your data policies prohibit Chinese-jurisdiction services.
Knowledge Check
Your company's legal team handles confidential client contracts. A lawyer wants to use DeepSeek to analyse a contract clause for logical inconsistencies. What is the best advice?
Multimodal capabilities: Chat supports image upload (up to 20 per session) and file upload (PDF, DOCX, PPTX, XLSX, TXT, code files — up to 50). VL2 handles vision-language tasks in three sizes; Janus-Pro (1B, 7B) handles image understanding and generation. A natively multimodal V4 (1T parameters, 1M token context, native image/video/text generation) is expected in April 2026, alongside a next-generation reasoning model, R2. Growing, but still trailing leading competitors.
Open-weight models: The MIT-licensed family includes R1 (671B), R1-0528, R1-Zero, six distilled variants (1.5B-70B), R1-0528 distilled variants (Qwen3-8B), V3 series (V3-0324 through V3.2, plus V3.1-Terminus and V3.2-Speciale), VL2 (three sizes), and Janus-Pro (1B, 7B). Full-size models require substantial hardware; distilled variants run on consumer GPUs.
Exercise: Head-to-Head Comparison
Task: Take a technical problem from your own work and send the same prompt to DeepSeek and one other AI assistant:
[Paste a real problem you're working on — a bug, a design decision, an analysis task. Ask for step-by-step reasoning and a recommendation.]
What to observe: Compare reasoning depth, correctness, clarity, and whether either tool flagged something the other missed.
Reflection: For what tasks would you default to DeepSeek? For what tasks the other tool? Write your personal decision rule.
Exercise: Privacy Assessment
Task: Before using DeepSeek with real work data, think through the privacy implications:
I'm considering using DeepSeek for [describe a real work task]. The data involved includes [describe data type]. What are the privacy and data governance considerations I should evaluate before proceeding?
What to observe: Note the irony: you are asking DeepSeek about its own privacy risks. Use this as a starting point, not a final answer. Remember the government bans in 8+ countries, formal data protection investigations across Europe, the January 2025 breach that exposed chat histories in plaintext, and the March 2026 German BSI incident involving classified metadata.
Reflection: Where is your personal line for what you will and will not send to DeepSeek? Does your organisation have a policy?
Challenge Exercises
These combine multiple skills and require critical evaluation of DeepSeek's outputs.
Challenge 1: The Reasoning Stress Test
Scenario: You want to find the boundary of DeepSeek's reasoning capability.
Task: Design a logic problem with at least 5 interacting constraints. Send it with DeepThink enabled. Progressively add constraints until reasoning breaks down or produces an incorrect answer.
Deliverable: Document where quality degraded. Failure mode — wrong answer stated confidently, acknowledged uncertainty, or silently dropped a constraint?
Success criteria: You can articulate the complexity boundary beyond which you would not trust DeepSeek without verification.
Challenge 2: The Code Review Partner
Scenario: You want to use DeepSeek as part of your code review process.
Task: Take a real pull request or code change. Paste the diff and prompt: "Review this code change. Identify bugs, performance issues, security concerns, and style problems. For each, explain why it matters and suggest a fix." Compare against actual review comments or your own judgement.
Deliverable: A scored assessment: issues caught, false positives raised, critical issues missed.
Success criteria: You can articulate whether DeepSeek adds value to your review process and where human reviewers remain essential.
Challenge 3: Build a Decision Framework
Scenario: Your team is evaluating AI tools and needs a recommendation.
Task: Use DeepSeek to build a decision framework for choosing between AI assistants by task category. Feed it your real use cases and ask it to define evaluation criteria and scoring rubrics. Then use that framework to evaluate DeepSeek alongside two competitors.
Deliverable: An evaluation matrix with scores, reasoning, and a recommendation per task category.
Success criteria: The framework is honest — DeepSeek should not win every category. If it does, adjust your criteria.
Quick Reference
Prompting Patterns That Work
- Activate reasoning: "Think through this step by step before answering."
- Request multiple approaches: "Solve this using two different methods and compare them."
- Demand tradeoff analysis: "What are the tradeoffs? Under what conditions would each be preferable?"
- Code with explanation: "Write the code, then explain each design decision and identify edge cases."
- Structured analysis: "Define criteria, evaluate options against those criteria, and state your assumptions."
- Self-correction: "Check your answer for errors before presenting it."
DeepSeek's Strengths
Last verified: March 2026
- Transparent chain-of-thought reasoning (DeepThink mode, powered by R1-0528)
- Strong mathematical and formal logic capabilities (R1-0528: 87.5% on AIME 2025)
- Competitive code generation and debugging across major languages
- Extensive open-weight model family, all MIT-licensed for commercial use
- Dramatically lower API cost (V3.2: $0.28/$0.42 per 1M tokens)
- File and image upload in chat; growing multimodal capabilities via VL2 and Janus-Pro
- Web search via Search toggle in chat interface
DeepSeek's Limitations
Last verified: March 2026
- Web search exists but less reliable than competitors (~62% on current events); API has no web search
- No integration with productivity tools or ecosystems
- Multimodal growing (VL2, Janus-Pro, file/image upload) but trails leading competitors
- Data under Chinese jurisdiction — government bans in 8+ countries, formal investigations by data protection authorities in France, Ireland, Belgium, Netherlands, and Germany, multiple documented breaches. Review your policies
- Can over-reason on simple tasks, adding unnecessary complexity
When-to-Use Checklist
- Does this task require step-by-step reasoning or logical analysis?
- Is the task primarily about code, maths, or structured analysis?
- Do I need reliable current information? (DeepSeek's Search toggle is inconsistent — verify results or use a different tool for time-sensitive facts.)
- Is the data I am sharing acceptable to send to a service under Chinese data law?
- Would I benefit from seeing the model's reasoning process, not just its conclusion?
What you've developed here goes beyond knowing one tool. The ability to match a problem to the right reasoning approach -- and to verify that reasoning critically -- is a skill that transfers to every AI tool you'll use from here.
Practice Project
You've seen DeepSeek reason through textbook problems. But the real test is whether it can reason through yours — the messy, context-heavy problems that show up in your actual work.
Reasoning Task Evaluation
Time: 50-60 minutes
What you'll build: A comparative evaluation of DeepSeek against 5 complex problems from your real work — documenting where its reasoning shines, where it struggles, and where another tool might serve you better.
Why this matters: DeepSeek's reasoning chain is impressive in demos, but the question that matters is: does it help with the problems you actually face? Some tasks benefit enormously from step-by-step transparency. Others don't need it — and the extra reasoning steps just slow things down. This evaluation gives you a clear, evidence-based answer for your specific work. We've found that people are often surprised by which problems DeepSeek handles best — it's rarely the ones you'd predict.
Steps
- Select 5 complex problems from your work. Choose problems that genuinely require reasoning — not simple lookups or creative writing. Good candidates: analysing a business decision with trade-offs, debugging a process that isn't working, evaluating competing options with multiple criteria, breaking down a technical challenge into steps, or finding the flaw in someone's argument. Write each problem out clearly, including the context DeepSeek needs to work with. Remember the data governance considerations from this path — don't upload anything you wouldn't want processed under Chinese data law.
- Test each problem with DeepThink on. Open chat.deepseek.com, enable DeepThink, and paste each problem one at a time (fresh conversation for each). Read the full reasoning chain, not just the final answer. For each problem, note: Did the reasoning chain follow a logical path? Did it catch nuances or edge cases you hadn't considered? Did it over-complicate things? Did it reach the right conclusion? Record these observations immediately — don't rely on memory.
- Test at least 2 of the same problems with another tool. Pick your 2 most important problems and run them through ChatGPT, Claude, or whichever tool you normally use. Compare the outputs side by side. Where did DeepSeek's visible reasoning add value? Where did the other tool produce a better result despite not showing its work? Be specific about what "better" means for each problem — more accurate, more actionable, faster, more nuanced.
- Build your decision matrix. Create a simple table with your 5 problems as rows and these columns: Problem Type, DeepSeek Score (1-5), Reasoning Chain Useful? (yes/no/partially), Better Tool Alternative, and When to Use DeepSeek. Fill it in based on your evidence. This becomes your personal reference for knowing when to reach for DeepSeek versus something else.
Deliverable: A decision matrix with 5 evaluated problems, each scored and annotated with whether DeepSeek's reasoning approach was the right fit — plus a one-paragraph summary of your overall assessment.
Stretch goal: Test one of your problems with DeepThink off (using the base V3.2 model) and compare. Does the lighter model handle it well enough, or does the reasoning chain genuinely add value for that specific problem type?
Reflection: For how many of your 5 problems was DeepSeek genuinely the best tool? If the answer is 1 or 2, that's not a failure — it's clarity. Knowing exactly when to use a specialised tool is far more valuable than trying to use it for everything.
What you've built here isn't loyalty to one tool — it's the ability to match the right AI to the right problem. That judgement is portable, and it gets sharper every time you test it against real work.