Posts tagged with production-llm

The Prompt Engineer · Jul 9 ·5 min read

Make It Answer Before It Answers

Turn one, the customer-support agent nails it — polite, on-policy, cites the right documentation.

arqinstruction-followingstructured-reasoning

The Prompt Engineer · Jul 8 ·5 min read

30,000 Tokens Before Hello

Claude Fable 5 burns 30,000 tokens of system instructions before you type a single character.

system-promptprompt-architectureproduction-llm

The Prompt Engineer · Jul 7 ·4 min read

Field Names Are Instructions

Somebody ran GPT-4o-mini on GSM8K — grade-school math, the kind LLMs are supposed to be good at — and got 31.8% accuracy.

structured-outputconstrained-decodingjson-schema

The Prompt Engineer · Jul 5 ·5 min read

Your Model Thinks Until You Stop It

Every reasoning model ships with the same default: think as hard as you can, every time.

thinking-tokensreasoning-budgetcost-optimization

The Prompt Engineer · Jul 4 ·4 min read

More Context Made It Dumber

Last month I watched a team migrate their RAG pipeline from 32K context to a shiny new 1M-token model.

context-rotcontext-windowreasoning-degradation

The Prompt Engineer · Jun 2 ·5 min read

Your Agent Is Paying Full Price Every Turn

Most prompt engineering advice focuses on what to say to the model.

prompt-cachingagentic-systemscost-optimization

The Prompt Engineer · Jun 1 ·4 min read

Effort Ate My Prompt

Three days ago, Anthropic shipped Claude Opus 4.8.

effort-levelsclaude-opus-4-8prompt-engineering

The Prompt Engineer · May 29 ·4 min read

Three Edits Beat a Full Rewrite

Most prompt engineers don't know when to stop editing. You tweak the system message, run ten test cases, change three words, run again.

prompt-optimizationautomated-promptingdspy

The Prompt Engineer · May 26 ·4 min read

The Prompt Got Demoted

Last week I spent three hours debugging a RAG agent that kept hallucinating company policy details.

context-engineeringprompt-engineeringanthropic

The Prompt Engineer · May 23 ·4 min read

Drop Your Examples

Every prompt engineering guide from 2023 to mid-2025 hammered the same advice: give the model 3-5 worked examples, then ask your question.

few-shotchain-of-thoughtreasoning-models

The Prompt Engineer · May 22 ·5 min read

Your Safety Layer Is Your Biggest Usability Bug

You shipped the guardrails. You added the system prompt hardening, the input classifiers, the output filters.

over-refusalllm-securityproduction-llm

The Prompt Engineer · May 21 ·4 min read

Same Endpoint, Different Brain

On May 5, OpenAI swapped GPT-5.3 Instant for GPT-5.

prompt-driftmodel-versioningproduction-llm

The Prompt Engineer · May 18 ·5 min read

Chain of Thought Taught Your Model to Lie Better

Last month I added chain-of-thought prompting to a medical Q&A pipeline. Hallucination rate dropped.

chain-of-thoughthallucination-detectionproduction-llm

The Prompt Engineer · May 16 ·4 min read

Your JSON Schema Is Making Your Model Dumber

Everyone loves structured outputs. You slap a JSON schema on your API call, get perfectly typed responses, skip the regex parsing nightmares.

structured-outputconstrained-decodingreasoning

The Prompt Engineer · May 15 ·4 min read

The Assert That Passed Once

You wrote a prompt. You wrote a test.

prompt-testingci-cdproduction-llm

The Prompt Engineer · May 13 ·5 min read

Stop Trusting Your Model's First Answer

Your LLM got the math problem right 74% of the time. But if you'd asked it five times and taken the majority vote, that number jumps to 92%.

self-consistencyinference-optimizationchain-of-thought

The Prompt Engineer · May 11 ·5 min read

Let a Smaller Model Edit Your Prompt First

Last month a SaaS company posted their API bill: 42,000 per month on LLM calls, down to 2,100 after one infrastructure change. No model swap.

prompt-compressionllmlinguacost-optimization

The Prompt Engineer · May 4 ·4 min read

Most of Your Prompts Don't Need Your Best Model

Last month I audited a startup's LLM spend. They were sending 100% of traffic to Claude Opus.

prompt-routingmodel-selectioncost-optimization

The Prompt Engineer · Apr 29 ·4 min read

The 69% You Never Optimize

Datadog just published their State of AI Engineering report for 2026, and one number stopped me cold: 69% of all input tokens in production LLM calls are...

system-prompttoken-optimizationproduction-llm

The Prompt Engineer · Apr 28 ·5 min read

The Prompt Hidden in Your JSON Schema

Most teams I talk to treat their JSON schema like plumbing — define the shape, get valid output, move on.

structured-outputjson-schemaprompt-engineering

1 / 2 Next →