Outcome-Oriented Prompting: Define Success, Then Generate
Shift prompting from instructing the start to defining verifiable outcomes and success tests, then use reasoning-enabled models to draft, evaluate, and iterate until the result meets objective criteria.
I’ve said this before, but it’s worth repeating: one of the best ways to think about prompting is in terms of outcomes.
In the early days with models like GPT‑2 or GPT‑3, prompting was mostly about creating a pattern for the model to continue. If I wanted it to generate an article, I’d write the beginning myself—a title, maybe a first paragraph—and let the model pick up from there. You could also describe what you wanted, and the model would often do a decent job. But a lot of the time, the most reliable approach was literally giving it the first part of the text and letting it complete the rest. That’s where the “autocomplete system” reputation came from. It’s overly simplistic, but it’s not hard to see why it stuck.
As models got generally smarter—and were trained on more examples of what completed tasks should look like, and explicitly trained that if the user asks you to do a thing, you should just go do it—prompting changed a lot. But that still doesn’t mean the model automatically knows what you mean.
I can say, “Write an amazing article about Mars colonization,” and the model will write something. But my idea of “amazing” and its idea of “amazing” might be very different. Part of the problem with writing with LLMs is that you often get good-quality writing, but it’s the average: writing by a committee of people who all have English degrees, but no particular style that stands out. Whatever distinctive edges exist tend to get smoothed out and averaged together. And when you’re working on other tasks—like app design or game design—adjectives only get you so far.
The introduction of reasoning models changes the game. These models don’t just understand the task; they get another pass at it. They can take instructions, break them into steps, and then evaluate and revise until they meet the goal. That enables a different (and better) way to prompt: focus less on the beginning and more on the outcome.
Vague prompt
Write an amazing article about Mars colonization.
Outcome-oriented prompt
Task: Write an article about Mars colonization.
Success criteria:
- 900 to 1,100 words
- Exactly 6 sections with H2 headings
- Includes: logistics, life support, governance, economics, ethics, timeline
- Includes one argument for and one argument against colonization
- Avoids hype language and avoids first-person voice
- Ends with a concrete 5-step action plan
Return format:
- Markdown
- One-sentence thesis at top
The second version gives the model something testable instead of subjective.
It used to be that you’d focus on the start: “Here’s how I want you to begin; here’s the direction.” Now you can focus on the end: “Here’s what the final thing should look like.” That sounds like a small shift, but it’s actually a big one, because an outcome can be a test.
An outcome can say:
- It should include X.
- It should be N words.
- It should use M paragraphs.
- It should follow this style (and avoid that style).
- It should include this kind of reflection.
- It should solve this specific problem.
- If tools are available, it should pass these tests.
You could have given those instructions to earlier models, and they would try. The difference now is that reasoning models are much more likely to hit those constraints, because they’re better at turning your requirements into a checklist, generating a draft, and then re-checking the draft against the checklist until it passes.
So my prompting style has changed considerably. I don’t just describe what I want. I describe the test for the final product. I’m effectively saying: this is what success looks like.
That’s the single best area where most people can improve their prompting: get more descriptive about the outcome, and do it objectively. Not “make it awesome,” but “make it verifiably this.” “Objective” doesn’t mean you can’t talk about feel or style—you can still say “it should feel like this, not like that”—but you have to pin it down in a way that can be judged. How long is it? What does it cover? What does it avoid? What problem does it solve? If you have access to code execution or other tools, what tests does it need to pass?
The goal is to tell the model how to evaluate whether it succeeded.
This also lines up with how these models are trained. Reasoning models use reinforcement fine-tuning: during training they run lots of small evaluations along the way to see whether they got a good outcome. Often it’s a score (like 0 to 5), but the important part is that it’s a hard metric—clear feedback about what “success” means.
Practical Example: Add an Explicit Self-Check
Before finalizing, run a checklist:
1) Word count in range? (900-1,100)
2) All 6 required sections present?
3) At least one pro and one con included?
4) Final section includes exactly 5 action steps?
If any item fails, revise and check again.
You can mimic that in your prompts. Provide a bad example and a good example. Don’t just label them; explain why one is better than the other. That turns your prompt into more than an answer key—it becomes an explanation of the answer key, which helps the model infer the test it’s supposed to satisfy.
So: think in terms of outcomes, especially objective outcomes.
Ask yourself: if I had to hand a complete stranger only the result, plus a few lines on a piece of paper explaining how to judge it, what would I write so they could decide whether it succeeded or failed?
Previously, the question was: “If I had to give a complete stranger instructions to do a thing, what would I say?”
Now the question is: “If I had to give a complete stranger instructions to evaluate a thing, what would I say?”