TL;DR
Spanish prompts cost ~50% more tokens than English, but the quality difference is negligible for coding tasks. Using your native language lets you think faster and give clearer instructions — that matters more than token savings.
The Setup
I'm a Spanish-speaking software engineer who uses Claude Code as my primary development tool. Over the past months, I've accumulated 21,000+ sessions and 245,000+ messages. My workflow is heavily execution-oriented: long debugging sessions, architecture discussions, code reviews — all in Spanish.
At some point I asked myself the obvious question: should I be writing all these prompts in English instead? English is the lingua franca of programming. Most training data is in English. Every benchmark is in English. Surely I'm leaving performance on the table?
I decided to find out.
Part 1: The Token Tax
LLMs don't see words — they see tokens. And tokenizers are heavily optimized for English. The word authentication is a single token. Its Spanish equivalent, autenticación, is three tokens.
I ran equivalent prompts through Claude's tokenizer (cl100k_base proxy) to measure the real cost. These are actual prompts I use daily in Claude Code:
| Prompt Type | English | Spanish | Overhead |
|---|---|---|---|
| Bug fix instruction | 20 tokens | 31 tokens | +55% |
| API endpoint creation | 32 tokens | 46 tokens | +44% |
| Refactoring task | 26 tokens | 41 tokens | +58% |
| Documentation generation | 33 tokens | 52 tokens | +58% |
| DevOps debugging | 37 tokens | 58 tokens | +57% |
| Code review (long) | 65 tokens | 106 tokens | +63% |
| Architecture discussion | 58 tokens | 93 tokens | +60% |
| Average across all tests | 352 tokens | 544 tokens | +54.5% |
That's a 54.5% token overhead on average. For longer, more complex prompts, it gets worse — the code review prompt hit +63%.
Why is the gap so large?
BPE tokenizers build their vocabulary from training data, which is predominantly English. Common English words get their own tokens. Spanish words often get split into subword pieces:
Notice that middleware stays at 1 token in both languages — technical terms borrowed directly from English don't pay the tax. This is actually important: the more technical your prompt, the more English loanwords it contains, and the smaller the gap becomes.
Part 2: The Context Window Reality
In a typical Claude Code session, your prompts are a tiny fraction of the total context. Here's what actually fills the context window:
Your messages typically account for ~10% of the context. A 54% overhead on 10% of context is a ~5.5% increase in total token usage. That's the real number.
54% sounds terrifying. 5.5% sounds manageable. Both numbers are correct — the difference is what you measure against.
Part 3: Does Language Affect Quality?
This is the question that actually matters. Tokens are money, but quality is everything.
For code generation: no meaningful difference
Claude generates the same Go, TypeScript, or Python regardless of whether you asked in English or Spanish. The code output is language-agnostic. Variable names, function signatures, and architecture decisions don't change based on prompt language.
Consider these two equivalent prompts:
Add pagination to the list endpoint. Use cursor-based
pagination with a default page size of 20.
Añade paginación al endpoint de listado. Usa paginación
basada en cursor con un tamaño de página por defecto de 20.
Both produce identical code. The generated function names, the SQL queries, the response structs — all the same. Claude understands the intent regardless of the language wrapping it.
For reasoning tasks: marginal English advantage
Academic benchmarks like MMLU and GSM8K show a small advantage (2-5%) for English prompts on reasoning-heavy tasks. This is expected — the model has seen more chain-of-thought reasoning examples in English during training.
But here's the catch: those benchmarks test the model's language, not yours. When you write in Spanish, Claude still reasons internally in whatever representation it uses and only translates the final output. You're not forcing it to "think in Spanish."
For instruction clarity: native language wins
This is where the real difference lives. I tested my own typing speed and found I write ~30% faster in Spanish than in English. Not just faster — also more accurately. When I tried writing a quick note in English to test this, I immediately produced a typo. In Spanish? Clean on the first try.
When I tried writing prompts in English:
- I spent more time crafting the prompt instead of describing what I wanted
- I occasionally used ambiguous phrasing that I wouldn't use in Spanish
- Complex architectural discussions lost nuance
- I defaulted to simpler descriptions to avoid language friction
- More typos, which sometimes confused the model
When I write in Spanish, I express exactly what I mean, with all the nuance, qualifications, and edge cases. A prompt that takes me 5 seconds in Spanish takes 8-10 in English — and the Spanish version is usually more precise and typo-free.
The best prompt isn't the one with fewest tokens. It's the one that most accurately describes what you want. Native language gives you that.
Part 4: The Numbers from 21,000 Sessions
I've been using Spanish exclusively with Claude Code since running this analysis. Here's what the data shows:
- 245,295 messages exchanged, all conversations in Spanish
- 697,087 bash commands executed successfully
- 228,000+ code edits with no language-related quality issues
- 1,219 sessions marked satisfied — satisfaction rate unaffected by language choice
I've never had Claude misunderstand a Spanish prompt in a way that it wouldn't have misunderstood in English. The failure modes are the same: ambiguous requirements, missing context, underspecified edge cases. Language isn't the bottleneck — clarity is.
Part 5: The Real Cost Calculation
Let's put actual numbers on this. With Claude Sonnet at $3/M input tokens:
| Scenario | English | Spanish | Extra Cost |
|---|---|---|---|
| Single prompt (avg) | 35 tokens | 54 tokens | $0.000057 |
| Typical session (20 prompts) | 700 tokens | 1,080 tokens | $0.00114 |
| Heavy day (200 prompts) | 7,000 tokens | 10,800 tokens | $0.0114 |
| Month of intense usage | 140K tokens | 216K tokens | $0.228 |
The Spanish "tax" on my input prompts costs roughly 23 cents per month. That's less than a cent per working day. And remember: this only affects your messages, not the system prompt, tool results, or code content that make up the bulk of token usage.
With Claude Code's subscription model, this is even less relevant — you're paying a flat rate regardless of token count.
The Verdict
Native Language Advantages
- Think faster, prompt faster
- More precise instructions
- Better architectural discussions
- Lower cognitive overhead
- Natural expression of edge cases
English Advantages
- ~50% fewer tokens on prompts
- 2-5% better on reasoning benchmarks
- Marginal edge on rare/niche topics
- Easier to share prompts with English-speaking teams
Use your native language.
The token overhead is real but irrelevant at the scale of actual usage. The quality difference is measurable in benchmarks but invisible in practice. The clarity advantage of thinking in your own language is significant and compounds over thousands of interactions.
The one exception: if you're writing prompts that will be shared as templates with an English-speaking team, write those in English. But for your daily work — your debugging sessions, your architecture discussions, your "fix this bug" messages — use whatever language lets you think fastest.
That's what I do. 21,000 sessions and counting.
Analysis by Jairo Caro-Accino. Token counts measured with cl100k_base tokenizer. Session data from Claude Code usage analytics.