Posts

Browse the complete archive of prompting notes in this reading order.

How I Became OpenAI's First Prompt Engineer

Dive deep into an AI frontier, rigorously test and document prompts, and openly share useful findings to stand out and land a pioneering role like OpenAI's first prompt engineer.

The Evolution of Prompts: From Completion to Systems

Prompts have evolved from pattern-based completion to outcome-focused instructions, and the practical takeaway is to provide the simplest, clearest description of the finished product and its success criteria so the model can deliver the desired outcome.

Base Models vs Post-Training: What Each Layer Does

Base models are broad, raw text learners, while post-training adds an instruction-driven layer that greatly increases usefulness but can lead to overfitting, so the takeaway is to balance raw capabilities with careful post-training and prompt design.

Model Identity and Statelessness: Why Explicit Context Matters

LLMs are stateless and may not know their own identity unless explicitly provided in prompts or post-training guidance, and larger context windows make it easier to supply that metadata upfront.

The Stateless AI Guessing Game: A Prompting Lesson in Memory

Stateless models don’t remember between turns, so you can still play a guessing game by encoding the chosen object into the transcript, such as base-10 encoding or a foreign-language rendering, to persist it across questions.

GPT Demo Set List: Early Prompt Patterns That Still Hold Up

Curated prompts from a GPT-3 demo set reveal practical capabilities: token-based world view, autocomplete, structured text, translation, summarization, tone and persona control, multi-voice outputs, and turning unstructured text into structured data.

Early Sentence-to-Email Prompts: A Foundational Transformation Pattern

Turn a minimal instruction into a polished email by providing a handful of consistent examples and letting the model complete the pattern, illustrating in-context learning and rapid productization.

GPT-3 Emoji Story Demo: Narrative Compression in Tokens

GPT-3's emoji storytelling demo shows how models compress meaning into simple token choices that render as visuals, revealing how a narrative can be told with emojis and signaling the move from text to visual tokens.

Separating Instruction from Content: A Core Prompt Reliability Pattern

Clearly separate instruction from content with a reliable delimiter (three hashtags often being the strongest) and present structured data (Markdown, XML, or JSON) to reduce ambiguity and improve model performance.

Magic Words in Prompting: Domain Terms That Steer Behavior

Anchor prompts with domain-specific terminology and canonical formats to steer the model toward the desired structure and tone.

Invoking Experts in Prompts: When Persona Framing Improves Results

Invoking an expert persona in prompts steers the model to adopt a relevant reasoning frame, yielding clearer explanations and better solutions.

GPT-3 Grammar and Style Editing in Practice

GPT-3 enables advanced grammar and style edits, tone adjustment, coherence improvements, and format transformations across text without explicit training as a dedicated grammar tool.

Magic Phrases for Moderation: Prompt Patterns That Improve Safety Calls

Use standardized prompts and rating frameworks (like ESRB) along with explicit guidelines and practical examples to achieve more consistent, scalable AI-driven content moderation.

GPT-3 for Regex, Bucket Policies, and Solidity Tasks

GPT-3 can convert tedious, syntax-heavy tasks into actionable tooling by generating regex patterns from plain English, crafting precise bucket policies, and explaining or auditing Solidity contracts.

Rethinking best_of in GPT-3: Why It Misleads

Relying on best_of to improve LLM accuracy is misguided; the practical fix is to define clear task boundaries with better prompts and use outlier examples to ground interpretation, which can let you use smaller models and single-shot prompts while reducing cost.

The Fifth-Grade Summary Moment: Audience-Aware Compression

Generative summarization creates original, audience-tailored explanations rather than mere extracts, so specify the target reader and evaluate quality by usefulness to that audience.

Mini Prompts for Trick Questions and Nonsense Inputs

A brief upfront prompt tells the model to distinguish serious questions from nonsense or trick questions and to respond appropriately.

Prompts to Reduce Hallucinations: Practical Control Patterns

Teach models to say 'I don't know' when unsure by labeling truthful, false, and unknown statements, reducing hallucinations and boosting accuracy through prompting and fine-tuning.

Cross-Temperature Hallucination Testing for Sanity Checks

Cross-check AI outputs by comparing responses across temperatures and against smaller models to quickly flag hallucinations and verify with real sources.

Temperature in LLMs Explained: What It Actually Controls

Temperature adds a controlled amount of randomness to LLMs to explore alternative paths rather than boosting creativity, helping to break repetitive outputs but risking nonsensical results at high values and often being unnecessary with modern models.

Seeded Creativity for LLMs: Controlled Randomness That Helps

Generate random seeds outside the model, feed them into prompts, and let the LLM produce varied yet coherent output.

Creating Better Quiz Distractors with LLMs

Crafting plausible quiz distractors is hard; a practical workaround is to use a smaller model with a higher temperature to generate incorrect-but-plausible options, though results can still vary.

Bracketing Letters for Wordle: Token-Level Prompt Control

Token-level input can derail Wordle-like tasks; using a bracketed, character-level representation lets the model track each letter and constraint reliably.

The Missing Bracket: How Tiny Formatting Errors Break Outputs

Ambiguity from a missing closing bracket in a legal passage caused inconsistent model results, showing that thoroughly reading and correcting the input is essential for reliability.

Prompt Repetition and Rephrasing: A Reliability Tactic That Lasts

Repeat or rephrase the prompt by placing it at the top and/or bottom to keep the model anchored and improve consistency on long or complex inputs.

Prompt Size Reduction Checklist: Cut Tokens Without Losing Quality

Use a practical prompt-optimization checklist to reduce token usage by cleaning up examples, cutting verbosity, narrowing labels, and batching multiple classifications in a single API call for faster, cheaper results.

Small Model Advantages: When Smaller LLMs Outperform Bigger Ones

For large documents, extracting key points, phrases, and entities with a small model is cheaper, faster, and often more reliable than generating a full summary.

Small Models, Big Knowledge: Prompting Past the First Guess

Smaller language models aren’t inherently dumb; their true potential shows when prompts steer retrieval away from easy generalizations, unlocking non-obvious knowledge and cutting costs.

Using Small Models for Complex Natural-Language Tasks

Thoughtful prompting and lightweight schemas let small language models reliably convert flexible natural-language input into structured data for real-world tasks like scheduling, at a fraction of the typical cost.

Large Text Pattern Analysis with Prompted Models

Feed large batches of text into a single context window to extract overall patterns and sentiment across many posts, enabling scalable, non-sequential analysis while monitoring for hallucinations.

Prompt Maker: How to Teach Prompt Patterns by Example

Teach prompts by presenting models with a consistent, example-rich pattern so they infer the task and generate high-quality new prompts.

Compute at Scale: Growth, Limits, and AI Demand

Compute needs will rise with human ambition, potentially by about 1,000× today, and will be met through strategic, highway-like infrastructure expansion and smarter use rather than chasing unlimited physical limits.

How Small Can AI Be? Practical Limits and Opportunities

Smaller, compressed AI models trained on task-specific data can be genuinely useful on ordinary hardware, enabling distributed, cooperative intelligence rather than relying solely on ever-larger models.

Context as an AI Lever: The Compounding Effect of Longer Windows

Expanding context length unlocks new capabilities, enabling reliable handling of long documents, deeper reasoning, and more practical AI tasks.

Small Capabilities, Big Ramifications in Prompt Design

Expanding capabilities such as larger context windows and structured representations like arrays unlock significant practical gains, enabling handling of large codebases and the creation of more complex games.

Scaffolding Long-Form Content: Prompt Patterns for Coherence

Break long-form writing into small, solvable steps, then progressively expand with more scenes, motivations, and reversals to produce a complete piece.

Radio Play Scaffolds: A Better Prompt Pattern for Story Generation

Use a radio-play scaffold with a Narrator, Characters, and an optional Editor to structure prompts so the model generates longer, more coherent narratives with clear direction.

Building AI Choose-Your-Own Adventures with Prompt Scaffolding

Create AI-driven choose-your-own-adventure experiences by grounding the model with a map, state-tracking, and short scene summaries to preserve continuity and guide branching.

Character-Threaded Summarization for Long Documents

For long texts, build per-entity timelines (characters, locations, key events) and then fuse them into a coherent final summary to preserve reversals and changing perspectives.

Memory in Conversational AI: Why Context Persistence Matters

Equipping conversational AI with memory of past interactions creates coherent, context-aware dialogue and improves personalization beyond single-turn prompts.

Outcome-Oriented Prompting: Define Success, Then Generate

Shift prompting from instructing the start to defining verifiable outcomes and success tests, then use reasoning-enabled models to draft, evaluate, and iterate until the result meets objective criteria.

Style Guides for AI Writing: Getting a Specific Voice

To get AI to write in a specific voice, first have it analyze and articulate the target style, then prompt it to write using that explicit style guide.

Crystallized vs Fluid Intelligence in Language Models

Distinguish crystallized intelligence (memory of facts) from fluid intelligence (generalization) in language models and tailor evaluation and training to balance recall with robust reasoning.

Personal AI Evaluation Methods for Real-World Quality

Design and run your own diverse, task-specific evaluation suite to gauge AI model improvements beyond benchmarks, tailoring tests to your real use case and including multi-modal reasoning.

Understanding Embeddings for Better Prompting and Retrieval

Embeddings are high-dimensional word representations that encode multiple relational axes, and choosing prompt words that sit in the right regions of that space can steer model behavior more effectively than lengthy instructions.

Embedding-Based Retrieval Strategies That Actually Work

Embeddings are learned, high-dimensional representations used for retrieval, and the practical takeaway is to standardize and synthesize documents into retrieval-optimized representations rather than embedding raw text.

Context vs Retrieval: A Practical Decision Framework

Use a cost-driven framework to decide whether to put data in the prompt, retrieve it via keywords or embeddings, or fine-tune, guided by a spreadsheet that compares input/output costs and time investment.

Grounding Prompts with Wikidata and SPARQL

Ground model outputs in Wikidata by constructing SPARQL queries with correct property and entity IDs, optionally aided by a lightweight query generator or retrieval workflow, to fetch real data and reduce hallucinations.

The Prompt Context Flywheel for Continuous Improvement

Periodically mine conversations, have an LLM propose updated prompts that reflect current context, and deploy the improved prompt as a living prompt context flywheel—either in production or via shadow testing—to steadily improve responses.

Fine-Tuning Fundamentals: When to Use It and When Not To

Fine-tuning is a final option after prompting and RAG, chosen for memorization of facts or generalization of behavior, with practical steps to test on small models first and format data accordingly (facts in the assistant message; behavior in user/assistant pairs) before scaling.