Use Markdown to structure the prompts AI writes for you

When you ask an AI to write a system prompt for your product, ask it to use Markdown formatting. Hierarchical headers, bullet points, numbered lists, and section separators give the prompt a parseable structure that helps the consuming LLM process it faster and more reliably.

This is an important distinction: I am not talking about formatting your own conversational prompts in Markdown. When you are chatting with an AI to build something, stream of consciousness works fine. But the output — the system prompt that will run inside your product, controlling how your AI agent behaves — benefits enormously from clear structure.

Why does structure matter? An LLM reading a system prompt is doing the same thing a human does when scanning a document: it looks for hierarchy, grouping, and emphasis to determine what matters and how pieces relate. A wall of unformatted text forces the model to infer structure from context. Markdown makes the structure explicit.

One effective pattern I use:

  • H1 for the prompt's identity — who the agent is and its primary purpose
  • H2 for major behavioral categories — conversation flow, content rules, safety guardrails
  • H3 for specific rules within categories — individual behaviors and constraints
  • Bullet points for enumerated items — lists of allowed actions, prohibited topics, response formats

In practice, the structural instruction requires just one short sentence — see the prompt example below. The output arrives with the structure I need, and I can iterate on individual sections without rethinking the whole document.

This hierarchy lets you reason about the prompt spatially. When a behavior goes wrong, you can locate the relevant section, read the rule, and determine whether the issue is a missing instruction, a conflicting instruction, or a placement problem. That brings us to placement.

Use Markdown to structure the prompts AI writes for you

Ask for structure and elegance

Use md format for the prompt you create. Follow the u-rule. Ensure it is elegant and clean.

Three instructions, each doing heavy lifting: Markdown gives the prompt parseable hierarchy, the U-rule puts critical rules at the top and bottom where compliance is highest, and "elegant" triggers self-assessment that removes bloat and redundancy.

The U-shape principle — placement determines compliance

LLMs pay the most attention to what appears at the beginning and end of a prompt. Content in the middle gets less weight. This creates a U-shaped attention curve that has direct implications for how you structure system prompts.

Critical rules belong at the top (primacy) and bottom (recency) of a prompt. Not buried in the middle. If you have a safety guardrail that must never be violated, placing it in section 14 of a 20-section prompt is asking for trouble. Move it to the top three sections or repeat it at the bottom.

I discovered this through debugging. A behavior rule was being followed inconsistently — sometimes the agent obeyed it, sometimes it ignored it. The rule was correct, clearly written, unambiguous. The problem was purely spatial: it sat in the middle of a long prompt where the model's attention was weakest. Moving it to the top of the prompt fixed the compliance issue immediately, with no change to the wording.

When I suspect a placement issue, I ask the AI to review the prompt structure and identify any critical rules sitting in the middle third — then move them to the top or bottom, same wording, no rewrite needed.

The practical takeaway: when you organize a system prompt, think about where each instruction sits, not just what it says. Your most important behavioral rules should occupy prime real estate — the opening sections and the closing sections. Routine instructions and contextual information can fill the middle.

This is a separate concept from instruction dilution, which I will cover next. U-shape is about placement within a prompt. Dilution is about count of rules. Both affect compliance, but through different mechanisms.

The U-shape principle — placement determines compliance

Instruction dilution — fewer rules, better compliance

Every behavioral rule you add to a system prompt reduces compliance with all the other rules. This is instruction dilution, and it is one of the most counterintuitive aspects of AI prompt engineering craft.

The instinct when something goes wrong is to add a rule: "Do not do X." The problem is that adding rule number 15 does not just control X — it slightly weakens the model's adherence to rules 1 through 14. The attention budget is finite. More rules means less attention per rule.

I target ten or fewer behavioral rules per prompt. Not ten categories — ten actual behavioral instructions that the agent must actively follow. Context, background information, and reference material do not count the same way. It is the imperative instructions —"always do this," "never do that" — that compete for the model's compliance bandwidth.

To audit rule count, I ask the AI to identify every imperative in the prompt — anything that says always, never, or must — and suggest which to combine, move to a skill file, or remove if the total exceeds ten.

If you find yourself auditing rule counts regularly, maybe you should save this as a skill — a plain text file that the AI writes for itself, describing how to handle a recurring task. It picks up that file next time it encounters the same situation. In this case, a prompt audit checklist that runs whenever you update a system prompt.

When I need to add a new rule, I look for an existing rule to remove, combine, or demote. Can two related rules merge into one? Can a rule be moved from the system prompt into a skill or tool description where it applies more narrowly? Can a broad rule be replaced with a more targeted condition-action pair?

This discipline forces clarity. If you can only have ten rules, you think carefully about which behaviors truly need to be in the prompt and which can be handled through other mechanisms. The result is a focused, high-compliance prompt rather than a comprehensive-but-ignored one.

One signal that dilution is occurring: the model starts violating rules it previously followed reliably. If compliance degrades after adding new instructions, the solution is usually subtraction, not more addition.

Instruction dilution — fewer rules, better compliance

Using "elegant" as a quality trigger — and reading what AI writes

LLMs are surprisingly mediocre at writing prompts for other LLMs. Left to their own defaults, they produce verbose, repetitive, loosely structured text that throws words at behavioral problems. The prompt works, technically, but it is cluttered and harder for the consuming model to parse than it needs to be.

One technique I have found effective is asking whether the prompt is "elegant." The word works because it encodes multiple quality dimensions simultaneously — conciseness, clarity, structure, absence of waste. When you ask an AI to evaluate whether its own output is elegant, it activates a self-assessment mode that catches bloat, redundancy, and poor organization. This applies beyond prompts: you can ask whether code is elegant, whether a UI flow is elegant, whether documentation is elegant. It consistently triggers a more critical evaluation than "is this good?"

But even with the elegance check, AI-generated prompts are sub-optimal by default. This leads to the uncomfortable truth: you need to read the prompts your AI writes for your systems. Not all the time. If everything works, there is no need to audit every prompt. But when quality problems appear — inconsistent agent behavior, weird responses, rule violations — the first thing to check is the system prompt itself.

Read it as a human. Is it clear? Is it well-organized? Could you follow these instructions if you were the agent? If a section confuses you, it will confuse the model too. I have found issues this way that no amount of meta-prompting would have caught: contradictory rules in different sections, ambiguous phrasing that could be interpreted two ways, important context buried where it has no effect.

When reviewing a prompt for elegance, I ask it to flag any bloat, redundancy, or poor organization and revise to be more concise without losing behavioral rules. When a manual review reveals a specific problem — two sections that contradict each other — I name both sections and ask which instruction to remove. The principle is pragmatic: trust but verify. Let AI write prompts, ask it to make them elegant, but read the output yourself when things go wrong.

Using "elegant" as a quality trigger — and reading what AI writes

Targeted prompt surgery instead of wholesale rewrites

When a system prompt produces unwanted behavior, the temptation is to rewrite the whole thing or add a pile of new rules. Both approaches usually make things worse. Rewrites introduce new problems. More rules trigger instruction dilution.

The better approach is targeted prompt surgery: identify the specific section causing the issue, and make the minimum edit that fixes the behavior.

This requires diagnosis first. Which behavior is wrong? Where in the prompt is that behavior governed? Is the rule missing, ambiguous, contradicted by another rule, or just in a low-attention position? Each diagnosis leads to a different surgical fix:

  • Missing rule: Add one focused instruction in a high-attention position
  • Ambiguous phrasing: Rewrite the specific sentence to use condition-action format ("when X happens, do Y")
  • Contradicting rules: Remove one of the conflicting instructions — do not try to add a meta-rule to resolve the conflict, because that just adds dilution
  • Poor placement: Move the rule to the top or bottom of the prompt (the U-shape fix)

Condition-action format deserves special emphasis. Instead of broad imperative statements ("NEVER discuss politics"), try specific triggers with specific responses ("When a user raises a political topic, acknowledge their interest and redirect to the lesson content"). The model can follow a concrete procedure more reliably than it can interpret a broad prohibition.

In practice, the prompt example at the end of this section shows this approach — it asks whether something small can be tweaked without cluttering the prompt, rather than issuing a rewrite request. One behavior, one section, one change.

I also avoid emphatic formatting —"NEVER," "MUST," "CRITICAL" in all caps. These feel like they should increase compliance, but they often do not. Calm, specific instructions outperform emphatic vague ones. A clear condition-action pair beats a capitalized command almost every time.

The craft of prompt engineering is ultimately editorial. You are refining a document through small, deliberate changes and observing the behavioral results. Each edit teaches you something about how the model reads and interprets instructions. Over time, you develop an intuition for what will work — but the principle stays the same: change as little as possible, observe the effect, and iterate.

Targeted prompt surgery instead of wholesale rewrites

The surgical fix

Have a look at the actual rendered prompt sent to the AI. Is it crystal clear and elegant? Are we asking it to handle too many things? Do we need to consider some architectural change, or can we tweak something to fix this, without cluttering the prompt?

Starting with the rendered prompt — not the source — is the key move. It shows exactly what the AI sees, which is often different from what you think you wrote. The question of architectural change vs. a small tweak keeps the diagnosis honest: sometimes the prompt is fine and the problem is a structural issue that more rules cannot fix.

Frequently asked questions

What is the difference between conversational prompting and system prompt engineering?
Conversational prompting is how you talk to an AI during a work session — it can be informal and stream-of-consciousness. System prompt engineering is writing the instructions that live inside your product and control how an AI agent behaves for your users. System prompts need careful structure, placement, and rule management because they run thousands of times with no human oversight.
How many behavioral rules should a system prompt contain?
I target ten or fewer imperative behavioral rules per prompt. Every additional rule reduces compliance with all existing rules through instruction dilution. If you need to add a new rule, look for an existing rule to remove, combine, or move to a more targeted location like a tool description or skill file.
Why does the position of a rule in a prompt affect compliance?
LLMs exhibit a U-shaped attention pattern: they pay the most attention to the beginning and end of a prompt, with weaker attention in the middle. Critical rules placed in the middle sections of a long prompt get less reliable compliance. Moving the same rule — with identical wording — to the top or bottom of the prompt can fix compliance issues immediately.
Should I use emphatic formatting like capital letters and exclamation marks in prompts?
Generally, no. Calm, specific instructions in condition-action format (when X happens, do Y) outperform emphatic vague commands (NEVER do X). Specificity beats emphasis. The model responds better to a clear procedure it can follow than to a strongly worded prohibition it must interpret.
When should I read the system prompts AI generates for my product?
When things work well, there is no need to audit every prompt. But when you observe quality problems — inconsistent behavior, rule violations, confused responses — read the system prompt as a human. If a section confuses you, it will confuse the model too. Look for contradictions, ambiguity, poor placement, and bloat.