TL;DR

Spanish prompts cost ~50% more tokens than English, but the quality difference is negligible for coding tasks. Using your native language lets you think faster and give clearer instructions — that matters more than token savings.

The Setup

I'm a Spanish-speaking software engineer who uses Claude Code as my primary development tool. Over the past months, I've accumulated 21,000+ sessions and 245,000+ messages. My workflow is heavily execution-oriented: long debugging sessions, architecture discussions, code reviews — all in Spanish.

At some point I asked myself the obvious question: should I be writing all these prompts in English instead? English is the lingua franca of programming. Most training data is in English. Every benchmark is in English. Surely I'm leaving performance on the table?

I decided to find out.


Part 1: The Token Tax

LLMs don't see words — they see tokens. And tokenizers are heavily optimized for English. The word authentication is a single token. Its Spanish equivalent, autenticación, is three tokens.

I ran equivalent prompts through Claude's tokenizer (cl100k_base proxy) to measure the real cost. These are actual prompts I use daily in Claude Code:

Prompt Type English Spanish Overhead
Bug fix instruction 20 tokens 31 tokens +55%
API endpoint creation 32 tokens 46 tokens +44%
Refactoring task 26 tokens 41 tokens +58%
Documentation generation 33 tokens 52 tokens +58%
DevOps debugging 37 tokens 58 tokens +57%
Code review (long) 65 tokens 106 tokens +63%
Architecture discussion 58 tokens 93 tokens +60%
Average across all tests 352 tokens 544 tokens +54.5%

That's a 54.5% token overhead on average. For longer, more complex prompts, it gets worse — the code review prompt hit +63%.

Why is the gap so large?

BPE tokenizers build their vocabulary from training data, which is predominantly English. Common English words get their own tokens. Spanish words often get split into subword pieces:

authentication 1 token
autenticación 3 tokens
deployment 1 token
despliegue 4 tokens
configuration 1 token
configuración 3 tokens
database 1 token
base de datos 3 tokens
environment 1 token
entorno 2 tokens
backward compatibility 2 tokens
compatibilidad hacia atrás 5 tokens

Notice that middleware stays at 1 token in both languages — technical terms borrowed directly from English don't pay the tax. This is actually important: the more technical your prompt, the more English loanwords it contains, and the smaller the gap becomes.


Part 2: The Context Window Reality

In a typical Claude Code session, your prompts are a tiny fraction of the total context. Here's what actually fills the context window:

System
~35% — English (system prompt, tools)
Code
~40% — English (file contents, diffs)
Tool results
~15% — English (bash, grep)
You
~10% — Your language

Your messages typically account for ~10% of the context. A 54% overhead on 10% of context is a ~5.5% increase in total token usage. That's the real number.

54% sounds terrifying. 5.5% sounds manageable. Both numbers are correct — the difference is what you measure against.


Part 3: Does Language Affect Quality?

This is the question that actually matters. Tokens are money, but quality is everything.

For code generation: no meaningful difference

Claude generates the same Go, TypeScript, or Python regardless of whether you asked in English or Spanish. The code output is language-agnostic. Variable names, function signatures, and architecture decisions don't change based on prompt language.

Consider these two equivalent prompts:

English

Add pagination to the list endpoint. Use cursor-based
pagination with a default page size of 20.
Spanish

Añade paginación al endpoint de listado. Usa paginación
basada en cursor con un tamaño de página por defecto de 20.

Both produce identical code. The generated function names, the SQL queries, the response structs — all the same. Claude understands the intent regardless of the language wrapping it.

For reasoning tasks: marginal English advantage

Academic benchmarks like MMLU and GSM8K show a small advantage (2-5%) for English prompts on reasoning-heavy tasks. This is expected — the model has seen more chain-of-thought reasoning examples in English during training.

But here's the catch: those benchmarks test the model's language, not yours. When you write in Spanish, Claude still reasons internally in whatever representation it uses and only translates the final output. You're not forcing it to "think in Spanish."

For instruction clarity: native language wins

This is where the real difference lives. I tested my own typing speed and found I write ~30% faster in Spanish than in English. Not just faster — also more accurately. When I tried writing a quick note in English to test this, I immediately produced a typo. In Spanish? Clean on the first try.

When I tried writing prompts in English:

When I write in Spanish, I express exactly what I mean, with all the nuance, qualifications, and edge cases. A prompt that takes me 5 seconds in Spanish takes 8-10 in English — and the Spanish version is usually more precise and typo-free.

The best prompt isn't the one with fewest tokens. It's the one that most accurately describes what you want. Native language gives you that.


Part 4: The Numbers from 21,000 Sessions

I've been using Spanish exclusively with Claude Code since running this analysis. Here's what the data shows:

I've never had Claude misunderstand a Spanish prompt in a way that it wouldn't have misunderstood in English. The failure modes are the same: ambiguous requirements, missing context, underspecified edge cases. Language isn't the bottleneck — clarity is.


Part 5: The Real Cost Calculation

Let's put actual numbers on this. With Claude Sonnet at $3/M input tokens:

Scenario English Spanish Extra Cost
Single prompt (avg) 35 tokens 54 tokens $0.000057
Typical session (20 prompts) 700 tokens 1,080 tokens $0.00114
Heavy day (200 prompts) 7,000 tokens 10,800 tokens $0.0114
Month of intense usage 140K tokens 216K tokens $0.228

The Spanish "tax" on my input prompts costs roughly 23 cents per month. That's less than a cent per working day. And remember: this only affects your messages, not the system prompt, tool results, or code content that make up the bulk of token usage.

With Claude Code's subscription model, this is even less relevant — you're paying a flat rate regardless of token count.


The Verdict

Native Language Advantages

  • Think faster, prompt faster
  • More precise instructions
  • Better architectural discussions
  • Lower cognitive overhead
  • Natural expression of edge cases

English Advantages

  • ~50% fewer tokens on prompts
  • 2-5% better on reasoning benchmarks
  • Marginal edge on rare/niche topics
  • Easier to share prompts with English-speaking teams

Use your native language.

The token overhead is real but irrelevant at the scale of actual usage. The quality difference is measurable in benchmarks but invisible in practice. The clarity advantage of thinking in your own language is significant and compounds over thousands of interactions.

The one exception: if you're writing prompts that will be shared as templates with an English-speaking team, write those in English. But for your daily work — your debugging sessions, your architecture discussions, your "fix this bug" messages — use whatever language lets you think fastest.

That's what I do. 21,000 sessions and counting.


Analysis by Jairo Caro-Accino. Token counts measured with cl100k_base tokenizer. Session data from Claude Code usage analytics.