How Claude Actually Works: Models, Memory & Mindset
Understand the engine — so you can get dramatically better results every time
Why Most People Get Mediocre Results
Most users treat Claude like a search engine. Experts treat it like a brilliant colleague who needs the right context. That mental model gap produces wildly different results from the same tool.
This lesson gives you the foundation — model selection, tokens, system prompts, and how Claude thinks — that makes every advanced technique in the rest of Level 3 work better.
The Model Lineup
Most intelligent. Use for: complex reasoning, long document analysis, research synthesis, difficult architecture decisions. Slower, higher cost. Best when quality matters more than speed.
Best balance of intelligence and speed. Handles 90% of real tasks with high quality. Default choice unless you have a specific reason to switch.
Fastest, lowest cost. Best for high-volume automation, simple classification, quick structured extraction. Use in pipelines where you need thousands of calls.
Real Example: Same Request, Three Models
Here's the same request sent to all three models. Notice the differences in depth, structure, and nuance:
We're deciding whether to build our own auth system or use Auth0. We're a 5-person startup, 6 months from launch, TypeScript/Next.js stack.
Use Auth0. Building auth yourself takes 2-4 weeks and introduces security risks. Auth0 handles OAuth, MFA, session management, and compliance (SOC 2, GDPR). For a 5-person team 6 months from launch, your engineering time is better spent on your core product. Auth0 free tier covers up to 7,500 monthly active users — enough to launch. Switch to paid ($240/month) at scale if needed.
Recommendation: Use Auth0 now, with a clear exit strategy. Why Auth0 wins for your situation: - 6 months to launch means ~24 weeks. A solid auth system (SSO, MFA, password reset, session management, token rotation) takes 3-5 weeks to build correctly and 2-3 weeks to harden. That's 20%+ of your runway. - Auth0 is SOC 2 Type II certified. Your custom solution won't be for years. - You get social login (Google, GitHub) out of the box — typically +15-25% in signup conversion. The hidden cost of DIY auth: Every auth bug that reaches production is a reputational event. Session fixation, token leakage, password hashing mistakes — these are career-defining incidents at a startup. Exit strategy (when to consider leaving Auth0): - Hitting 50k+ MAU and paying $3k+/month → evaluate Clerk or Supabase Auth - Need custom auth flows Auth0 can't support → build specific pieces only - Building an identity product → then auth is your core competency Implementation path: 1. next-auth v5 + Auth0 adapter (4 hours setup) 2. Add to CLAUDE.md: "Auth uses next-auth with Auth0 adapter in src/auth/" 3. Set Auth0 tenant: dev / staging / prod separation from day one The only case for DIY: if your product IS authentication (building an identity provider for enterprises). Otherwise, use the tool.
This is a classic build-vs-buy decision with some startup-specific wrinkles worth thinking through carefully. My recommendation: Auth0, but let me explain why the tradeoffs are more nuanced than "just use Auth0." The standard case for Auth0 is clear: [...deeper analysis of security surface area, regulatory trajectory, team velocity math, specific Next.js integration patterns, vendor lock-in risk analysis, migration complexity to alternative providers, pricing model at different growth scenarios, and 3 specific implementation decisions to make before starting...] The one scenario where I'd reconsider: If your target customer is enterprise, and you expect SOC 2 Type II requirements within 18 months, Auth0's enterprise tier pricing ($2/MAU above 7,500) can become expensive fast. In that case, evaluate Stytch or Clerk — similar capabilities, more predictable pricing at enterprise scale...
Notice: Opus proactively identifies an edge case (enterprise customers) that Sonnet and Haiku both missed. That's when Opus earns its cost.
Context Windows and Tokens
A token is roughly 0.75 words (~4 characters). Claude processes your entire conversation in a single context window. Claude Sonnet and newer support 200,000 tokens — approximately 150,000 words.
- Paste whole documents — a 60-page PDF, an entire codebase, 200 emails. Claude handles it. No need to chunk manually.
- Long sessions accumulate context — in a 4-hour session, Claude is reading everything you've said. Run
/compactwhen switching tasks to free space. - Irrelevant context hurts — pasting 50 unrelated files dilutes focus. Be intentional about what you include.
Your system prompt / CLAUDE.md: ~1,500 tokens (2 pages) 5 files you pasted for context: ~8,000 tokens (10-12 pages of code) Conversation so far: ~3,000 tokens Current question: ~200 tokens ───────────────────────────────────────────── Total used: ~12,700 tokens Remaining (200k window): ~187,300 tokens You have massive room. In a normal 1-hour session you'll rarely exceed 40,000 tokens even with large file pastes.
System Prompts — The Hidden Layer
A system prompt is a set of instructions Claude reads before your conversation starts. It shapes tone, behavior, focus, and constraints — invisibly from the user's perspective.
- Claude Projects — "Project instructions" field = your system prompt. Every chat starts with it.
- Claude Code — Reads CLAUDE.md as a system prompt at session start.
- Cowork Skills — Each SKILL.md becomes part of the prompt when the skill runs.
- API calls — The
systemparameter in every API request.
Question: "What should I do about my underperforming team member?" WITHOUT system prompt → Claude gives generic management advice: "Consider having a direct conversation, setting clear expectations, providing regular feedback, and if needed, a performance improvement plan..." WITH system prompt: "You are a senior engineering manager at a Series B startup. I manage a team of 6. Responses should be direct and practical, not HR boilerplate. Flag if something is legally sensitive." → Claude's response: "Direct answer: how long and in what ways are they underperforming? If <3 months, this is a management problem (unclear expectations, wrong fit for current work). If >6 months with documented feedback, this is a PIP conversation — talk to HR first to understand your company's process because wrongful termination risk is real even at startups. What specifically is not meeting the bar?" The second response is actionable. The first is useless.
How Claude "Thinks": Constitutional AI
Claude was trained with Constitutional AI (CAI) — a process where Claude learned to evaluate its own responses against principles: be helpful, be harmless, be honest. Claude isn't just pattern-matching; it reasons about whether its responses are actually good.
What this means for you:
- Tell Claude your intent. "I'm building a product for healthcare professionals" gives Claude context to calibrate helpfulness vs. caution appropriately.
- Claude can push back. If you ask for something that conflicts with its values, it will say so — and often offer an alternative approach.
- Be direct. Claude wants to help. Vague requests get vague results — not because Claude can't, but because it doesn't know what "excellent" looks like for your specific situation.
Hands-On Exercise (~25 min)
Pick a real decision you're facing at work. Send the same prompt to Haiku, Sonnet, and Opus (switch models with /model haiku, etc.). Document where Sonnet adds value over Haiku, and where Opus goes deeper than Sonnet. This tells you your personal model selection rule.
Create a Claude Project for your most common work type. Add Project Instructions that include: your role, your company context, how you like answers structured, and 3-5 "do not" rules. Ask a real work question. Compare to asking the same question without the system prompt in a new conversation. Save the version that worked as your template.
Use when: complex multi-step reasoning, nuanced analysis, the difference between good and great matters. Costs more, thinks deeper.
Default for 90% of tasks. Best speed/quality balance. If you're not sure which model, use Sonnet.
Use for: high-volume pipelines, simple classification, fast structured extraction. Fastest and cheapest.
~150,000 words. Paste whole documents, large codebases, many emails. Use /compact in long sessions to free space.
Hidden instructions that shape all of Claude's responses. Set via Project Instructions, CLAUDE.md, or API system parameter.
Claude reasons about whether its responses are good, not just pattern-matches. Give Claude your intent for better calibration.