gusto.md

Specification

gusto.md format

Version draft · Apache-2.0

Copy as markdown

gusto.md Format Specification

Version: 0.1.2 Status: Draft, under active development License: Apache-2.0

A format specification for describing a brand's verbal identity to AI agents and content tools. A GUSTO.md file gives agents a persistent, structured understanding of how a brand sounds — its vocabulary, sentence rhythm, tonal modes, cultural references, and refusals — so that every piece of generated copy stays on voice across every surface a brand touches.

This document is the normative reference. The specification is opinionated and authored by the gusto.md project. Vendors, tools, and brands are free to adopt, implement, and extend it under the Apache-2.0 license.


Background and Position

The visual layer of brand identity has a working machine-readable standard. The W3C Design Tokens Community Group has shipped a stable specification (DTCG, 2025.10). Google's DESIGN.md format builds on that work to describe a full visual system — colors, typography, spacing, components — in a file that AI agents can read and apply.

The verbal layer has no equivalent. Brand voice today lives in PDFs, Notion pages, and the trained instincts of human writers. AI tools either ignore voice (and default to a generic "tasteful tech" tone) or solve it inside vendor silos (Jasper, Copy.ai, Contentstack — each with proprietary JSON formats that don't move between tools).

GUSTO.md fills that gap. The specification is opinionated, the file format is portable, and the lint rules are deterministic. Vendors are not required to participate in a standards body to support it. Brands are not required to commit to a vendor to use it. The format is open, the license is permissive, and the design decisions documented here are the considered output of a single project — not a committee.

This is the same posture that produced Markdown, AGENTS.md, and the early DTCG drafts. Standardization through adoption, not consensus.

GUSTO.md is designed as the verbal companion to a DESIGN.md file. The two formats sit alongside each other in a project, share the same conceptual model (machine-readable tokens + human-readable rationale), and are intended to be consumed together by the same generation of AI agents.


File Structure

A GUSTO.md file has two layers, in a fixed order.

  1. YAML front matter — Machine-readable voice tokens, delimited by --- fences at the top of the file.
  2. Markdown body — Human-readable voice rationale organized into ## sections.

The tokens are the normative values. The prose provides context for how to apply them. An agent that consumes a GUSTO.md file should treat the tokens as ground truth for lintable decisions (cadence, banned phrases, refusals) and the prose as guidance for judgement calls (atmosphere, cultural reference, register).

Tokens are not a substitute for prose. The prose is the primary guidance for an agent producing copy; tokens are the enforcement surface for lint and validation tools. An agent reading both should weight the prose for tone and judgment, and apply the tokens for checking and post-editing.

Minimal Example

---
version: "0.1.2"
name: "Heritage"
voice:
  formality: medium
  density: high
  warmth: medium
  irony: low
  imperative_ratio: 0.4
rhythm:
  avg_sentence_length: 14
  max_sentence_length: 22
vocabulary:
  preferred:
    - crafted
    - considered
    - direct
  banned:
    - "take it to the next level"
    - "game-changer"
    - "supercharge"
  avoid:
    - "just"
    - "very"
    - "we hope"
refusals:
  - no_apology_as_style
  - no_exclamation_for_emphasis
---

## Voice Atmosphere

Heritage speaks the way a senior editor speaks — calm, declarative, never
breathless. The voice trusts the reader to follow without being led.

## Vocabulary Palette
...

An agent reading this file knows several things immediately: that sentences should average 14 words and never exceed 22, that "game-changer" is hard-banned (an error on use), that "very" and "just" are softer hedges to avoid where possible (a warning on use), and that apologies should not be used as a stylistic device. The prose tells the agent why — Heritage is editorial, not promotional.


Token Schema

The YAML front matter defines token groups. All groups are optional except name. Groups present must follow the schema below.

Top-Level Fields

version: <string>             # optional, current: "0.1.1"
name: <string>                # required
description: <string>         # optional, one sentence
extends: <string>             # optional, see Extends below

Voice Tokens

Voice tokens describe the overall stance of the voice on a small number of orthogonal axes. Values are categorical (low | medium | high) for human-judgment axes, and numeric (0.0–1.0) for ratio axes.

voice:
  formality: <low | medium | high>
  density: <low | medium | high>
  warmth: <low | medium | high>
  irony: <low | medium | high>
  imperative_ratio: <number 0.0–1.0>
Axis Meaning Example: low Example: high
formality Register distance from spoken conversation Liquid Death A bank's terms of service
density Information per sentence A poem Apple spec page
warmth Affective closeness to the reader A coroner's report A children's book
irony Distance between literal and intended meaning A safety placard Liquid Death
imperative_ratio Share of sentences that command the reader Editorial prose (~0.1) Liquid Death (~0.7)

Voice tokens are deliberately few. Adding more axes invites false precision — voice is not a vector space, and an eight-axis system suggests a calibration we cannot deliver. Five axes capture the meaningful distinctions; further specificity belongs in prose.

Rhythm Tokens

Rhythm tokens describe sentence-level cadence in measurable terms. These are the primary linting surface — most cadence violations can be checked deterministically.

rhythm:
  avg_sentence_length: <number>      # words
  max_sentence_length: <number>      # words
  paragraph_style: <single_sentence_allowed | dense_only>
  exclamation_policy: <forbidden | tagline_only | sparing | free>
  semicolon_policy: <forbidden | sparing | free>

Rhythm tokens are advisory targets, not hard rules. The linter reports violations as warnings; tools may choose to enforce or soften.

Vocabulary Tokens

Vocabulary tokens are the most directly lintable group. They define what to reach for, what to avoid softly, and what to ban hard.

vocabulary:
  preferred:
    - <word or phrase>
  banned:                            # hard ban — error on use, no register exceptions
    - <word or phrase>
  avoid:                             # soft avoid — warning on use, register may override
    - <word or phrase>
  signature_phrases:
    - <phrase that is uniquely the brand's>
  reclaimed_terms:                   # optional
    - term: <word>
      note: <why this term is used unusually>

The banned and avoid lists are deliberately separate. banned is for marketing clichés, hype words, and retired phrases — vocabulary that should not appear in any consumer-facing copy regardless of register. avoid is for filler words, hedges, and apologetic vocabulary — vocabulary that should generally not appear but can occasionally serve cadence or warmth in specific registers (legal, support, error). The linter treats banned violations as errors and avoid violations as warnings.

The signature_phrases list is intentionally distinct from preferred. Signature phrases are brand-owned (e.g., Murder your thirst. for Liquid Death, Designed by Apple in California. for Apple). A consumer tool must not transfer signature phrases across brands.

Reclaimed Terms

Some voices use ordinary words in deliberately unusual ways. The reclaimed_terms list flags these for consumer tools so that linters don't flag them as off-voice and prompt-builders don't strip their irony.

vocabulary:
  reclaimed_terms:
    - term: healthy
      note: "Used straight-faced as wellness vocabulary, played for ironic contrast against the brand's violence imagery."
    - term: hydration
      note: "Reclaimed wellness term, used in deadpan voice to land the joke."

Reclaimed terms are not banned, not preferred — they are a third category. A consumer tool should preserve them in output and treat the note as guidance for register.

Register Tokens

Register tokens describe how the voice adjusts across surfaces. Each register is a named context with override rules.

register:
  <register_name>:
    formality: <override>           # optional, overrides voice.formality
    density: <override>             # optional
    warmth: <override>              # optional
    irony: <override>               # optional
    max_sentence_length: <number>   # optional, overrides rhythm
    notes: <prose>                  # optional, free text

Conventional register names — used by linters and tools — are:

marketing | support | error | developer | newsroom | legal | sustainability | social | packaging

Custom register names are permitted; consumers should preserve unknown registers without error.

Refusal Tokens

Refusals are the non-negotiable rules of the voice. They are listed as either named constants (linter-aware) or free-form strings.

refusals:
  - <refusal_name_or_freeform_string>

The specification defines a starter set of refusal names. These names are recognized by linters and produce specific findings when violated. Unknown refusal names are preserved without error.

Named Refusals — Style and Voice

Refusal name Meaning
no_apology_as_style Apologies must be substantive, not stylistic.
no_exclamation_for_emphasis Exclamation points used only where genuinely earned.
no_stacked_adjectives Three or more flat adjectives in a row are rejected.
no_all_caps_for_emphasis ALL CAPS is not a substitute for word choice.
no_mid_sentence_capitalization No Capitalization Mid-Sentence For Emphasis.
no_marketing_cliches Banned-phrase enforcement is strict in marketing contexts.
no_specs_in_marketing_headlines Specs follow narrative, not the other way around.
no_introducing_as_opener "Introducing..." is a retired opener.
no_version_2_framing "X 2.0" framing rejected.
no_first_without_qualification Claims of "first" require specific qualification.

Named Refusals — Ethics and Reader Treatment

Refusal name Meaning
no_punching_down Voice does not target identity, vulnerable groups, or individuals.
no_real_violence_references Where violence vocabulary is used, only cartoon horror — never real-world events.
no_competitor_disparagement_by_name Comparisons are oblique.
no_user_in_consumer_copy "User" is reserved for developer-facing surfaces.
no_manufactured_urgency No "limited time," no countdown copy, no fear-driven pressure.
no_ai_as_a_feature Don't sell "AI" as the feature. Name what the feature does.

Brands may add their own refusal strings beyond this set. Consumers should preserve unknown refusals as free-form strings and treat them as advisory guidance.

Cultural Reference Tokens

Cultural references shape feel rather than syntax. They are advisory, not lintable.

references:
  drawn_from:
    - <cultural_touchstone>
  avoided:
    - <cultural_touchstone>

Token Types

Type Format Example
Categorical One of an enumerated set low, medium, high
Number Float or integer 12, 0.7
String Quoted string "Murder your thirst."
List YAML sequence [crafted, considered, direct]
Token Reference {path.to.token} {vocabulary.preferred}

Token References

A token may reference another token by path. This allows a register override to reuse a top-level token without restating it.

voice:
  formality: low

register:
  error:
    formality: "{voice.formality}"   # explicitly reuses top-level value
  marketing:
    formality: high                  # overrides

Token references resolve at consumer time. A reference that does not resolve to a defined token produces a broken-ref linter error.


Section Order and Aliases

Sections in the markdown body use ## headings. Sections may be omitted, but those present must appear in the canonical order below. Sections are referenced by name, not number — the canonical order is enforced by linting, not by author-supplied numbering. Authors should not prefix section headings with numbers.

Order Section Aliases
1 Voice Atmosphere Overview, Brand Voice
2 Vocabulary Palette Vocabulary
3 Sentence Rhythm Rhythm, Cadence
4 Cultural References Reference Universe
5 Tonal Modes Register, Modes
6 Refusals Hard Rules
7 Anti-patterns Anti-patterns and Banned Phrases
8 Voice in Context Surfaces, Applied Voice
9 Agent Prompt Guide Prompt Guide, Implementation

The canonical names are normative; aliases are accepted by consumers. Out-of-order sections produce a section-order linter warning, not an error — older files may have legitimate variations.

extends for Inherited Voice

A GUSTO.md file may reference another GUSTO.md file as a base. The top-level extends field resolves to a URL or path; the consumer must merge the base file's tokens with the local file, with local values taking precedence.

extends: "./parent-brand.gusto.md"
name: "Sub-brand"
voice:
  warmth: high                       # overrides parent

This supports multi-brand systems where a parent voice has variants. Consumers should resolve extends chains to a maximum depth of five, and produce a circular-extends error on cycles.


Consumer Behavior

Consumers of GUSTO.md files (linters, generators, AI agents, design tools) should behave predictably when they encounter content outside the spec.

Scenario Behavior
Unknown section heading Preserve; do not error
Numbered section heading (e.g. ## 1. Voice Atmosphere) Strip numbering; resolve by name; produce section-numbered warning
Unknown token group Preserve; produce info finding
Unknown axis under voice Accept; produce info finding
Unknown refusal name Preserve; treat as free-form string
Unknown register name Preserve; apply overrides as given
Categorical value outside enum invalid-value error
Numeric value outside expected range out-of-range warning
Duplicate section heading Error; reject the file
Token reference does not resolve broken-ref error

This permissive posture is deliberate. The spec will grow; consumers built against 0.1 should not break against 0.2 files. Strict enforcement applies only to clearly malformed input.


Linting Rules

The reference linter (gusto-lint) runs the following rules against a parsed GUSTO.md. Each rule produces findings at a fixed severity level.

Rule Severity What it checks
broken-ref error Token references that don't resolve
duplicate-section error Same ## heading appears twice
invalid-value error Categorical value outside enum, or wrong type
circular-extends error extends chain forms a cycle
missing-name error No name field
section-order warning Sections appear out of canonical order
section-numbered warning Section heading carries author-supplied numbering
out-of-range warning Numeric value outside expected range
banned-in-preferred warning A word appears in both preferred and banned
reclaimed-in-banned warning A reclaimed_terms.term value also appears in banned
signature-thin warning signature_phrases empty when irony: high or strongly stylized voice
register-undefined warning A register is referenced in prose but not declared in tokens
token-summary info Summary of how many tokens are defined in each group
prose-thin info A section heading exists but body is under 100 words

Linting validates structure and consistency. It does not evaluate generated copy against the voice — that is a separate function, described below.


Validation of Generated Copy

A linter validates the GUSTO.md file itself. A separate function — referred to here as copy validation — checks whether a piece of generated copy conforms to the file's rules.

Copy validation is the consumer's responsibility. The specification defines the surface for this validation: a copy validator that reads a GUSTO.md file and a piece of copy and returns findings.

The reference CLI exposes this as:

gusto check <copy.txt> --against GUSTO.md

Expected findings include:

Finding Severity Trigger
banned-phrase-used error Copy contains a string from vocabulary.banned (hard-banned regardless of register)
avoid-phrase-used warning Copy contains a string from vocabulary.avoid (soft-avoid; register may justify)
sentence-over-max warning A sentence exceeds rhythm.max_sentence_length
avg-length-drift warning Average sentence length deviates from rhythm.avg_sentence_length by more than 30%
exclamation-violation warning Exclamation point used against exclamation_policy
semicolon-violation warning Semicolon used against semicolon_policy
signature-phrase-foreign error Signature phrase from another GUSTO.md appears in this copy
refusal-suspected info Heuristic match against a refusals rule (e.g., text matches an apology pattern when no_apology_as_style is declared)

Copy validation is intentionally pragmatic. It catches obvious violations, not nuanced ones. Voice is partly judgment, and the validator does not pretend otherwise.


CLI Reference

A reference CLI is available as @gusto-md/cli on npm. All commands accept a file path or - for stdin and output JSON by default.

gusto lint

Validate a GUSTO.md file for structural correctness.

npx @gusto-md/cli lint GUSTO.md

Exit code 1 if errors are found, 0 otherwise.

gusto check

Validate a piece of copy against a GUSTO.md file.

npx @gusto-md/cli check hero.txt --against GUSTO.md

gusto diff

Compare two GUSTO.md files and report token-level changes.

npx @gusto-md/cli diff GUSTO.md GUSTO-v2.md

Exit code 1 if regressions are detected (added refusals violated by historical copy, removed banned phrases, etc.).

gusto export

Export a GUSTO.md file to other consumer formats.

npx @gusto-md/cli export --format system-prompt GUSTO.md > prompt.md
npx @gusto-md/cli export --format claude-project GUSTO.md > claude-config.json
Format Output Description
system-prompt Markdown A ready-to-paste system prompt for an LLM
claude-project JSON Configuration for Anthropic Claude Projects
openai-instructions Markdown Custom instructions for OpenAI assistants
json JSON Pure data dump of all tokens

gusto spec

Output the GUSTO.md format specification (useful for injecting spec context into agent prompts).

npx @gusto-md/cli spec
npx @gusto-md/cli spec --rules-only --format json

Integration Guidance for Vendors

Tools and platforms adopting GUSTO.md should follow these conventions.

Read. Accept a GUSTO.md file as an input artifact. Parse the YAML front matter as voice tokens. Treat the markdown body as guidance prose. Pass both to the underlying generation model — tokens as rules, prose as context.

Write. When producing a GUSTO.md from a tool's internal voice profile, emit tokens conformant to this specification. Preserve unknown fields when round-tripping.

Validate. Before generating copy on a user's behalf, lint the GUSTO.md and surface errors to the user. Do not silently ignore broken references or invalid values.

Silent application. The voice should be applied without surfacing the rules to the user. Generated copy should not include meta-commentary like "following your spec" or "according to your rules." The voice is the output, not the rules behind it. This is best-practice guidance rather than a strict requirement: output testing during the development of the Liquid Death exemplar (two surfaces, six runs total, May 2026) did not surface meta-commentary even without this guidance in place. The recommendation is included to guide vendor implementations toward the cleaner pattern.

Round-trip. A GUSTO.md exported from a tool, imported into a second tool, and exported again should remain equivalent at the token level. Prose may be reformatted but should not be discarded.

Attribute. A user-facing surface that consumes GUSTO.md should indicate which file is active, so the user can verify what voice is being applied.

A reference implementation, integration kit, and example consumer adapters are maintained in the gusto.md repository.


Versioning and Compatibility

The format follows a <major>.<minor>.<patch> versioning scheme. Patch releases are non-breaking clarifications and additions to the named-constants sets. Minor releases may add new token groups or rules. Major releases may break compatibility. A version 1.0 will be declared when the specification has stabilized through real-world use.

Change type Version bump
Add a named refusal or named register Patch
Clarify wording without changing behavior Patch
Add a new token group, axis, or category Minor
Add a new lint rule at info or warning severity Minor
Rename a canonical section (with alias for back-compat) Minor
Add a new lint rule at error severity Major
Remove or rename a token without alias Major
Change the semantics of an existing token Major

Consumers should declare which spec version they implement. Files should declare the version they target via the version field. Consumers encountering a file targeting a newer spec version should attempt best-effort parsing and surface an info-level warning.


Status and Roadmap

This specification is 0.1.2. The format, schema, lint rules, and CLI surface are under active development. Breaking changes are expected before 1.0.

Near-term priorities for the spec:

  • Stabilize the voice axis enumeration
  • Continue expanding the named refusals and named registers sets through exemplar authoring
  • Define a JSON Schema for the YAML front matter
  • Publish reference exemplar files alongside the spec (Apple and Liquid Death in this release; three additional voice families to follow)
  • Ship gusto check as the first copy validator

The specification is authored and maintained by the gusto.md project. Issues, discussion, and proposed changes are tracked in the public repository.


Changelog

0.1.2

  • Split vocabulary.banned into vocabulary.banned (hard ban — error on use) and vocabulary.avoid (soft avoid — warning on use). The single banned list was treating context-blind hedges and marketing clichés at the same severity, causing models to over-correct and strip natural language flow. Apple exemplar migrates to the new shape (filler and hedge words moved to avoid); Liquid Death exemplar's list is uniformly hard-banned and stays in banned only.
  • Added avoid-phrase-used linter rule at warning severity, to accompany the existing banned-phrase-used rule (now scoped to hard bans only).
  • Added clarification in the File Structure section that prose is the primary guidance for an agent producing copy, and tokens are the enforcement surface for lint and validation tools. The two-layer architecture was documented but the hierarchy was not; output testing across May 2026 (Apple) and May 2026 (Liquid Death, two surfaces) confirmed that prose carries voice atmosphere and tokens enforce vocabulary at the word level. A separate observation — that token-enforced cadence may compress brand-flavored vocabulary at tight length budgets — was surfaced in the same tests and is logged for monitoring across future exemplars rather than acted on at this version.
  • Added "Silent application" paragraph to the Integration Guidance for Vendors section, recommending that voice be applied without surfacing the rules to the reader. Best-practice guidance, not an urgent fix: the Liquid Death output tests (two surfaces, six runs total) did not reproduce the May Apple regression in which the model cited the spec back to itself. The recommendation is included to guide vendor implementations toward the cleaner pattern.
  • Updated minimal example and all internal version references from 0.1.1 to 0.1.2.

0.1.1

  • Renamed all internal section references from numbered (e.g. "Section 3.2") to named (e.g. "the Voice Tokens section"). Section numbers were a documentation convention, not a token, and exemplar authors were mistakenly numbering their markdown headings to match.
  • Added section-numbered warning to the linter, with consumer behavior to strip author-supplied numbering automatically.
  • Expanded named refusals from 8 to 16, split into two tables — style/voice and ethics/reader treatment. New names: no_all_caps_for_emphasis, no_mid_sentence_capitalization, no_specs_in_marketing_headlines, no_introducing_as_opener, no_version_2_framing, no_first_without_qualification, no_manufactured_urgency, no_ai_as_a_feature. Surfaced through writing the Apple exemplar.
  • Added Reclaimed Terms subsection under Vocabulary Tokens with worked example. The schema field existed in 0.1 but had no example; needed before authoring brands that play with wellness or corporate vocabulary ironically.
  • Added reclaimed-in-banned linter rule to catch authors who reclaim a term in one list and ban it in another.
  • Updated minimal example and all references from version: "alpha" to version: "0.1.1".

0.1 (alpha)

  • Initial draft. Five voice axes, eight named refusals, nine canonical sections, reference CLI commands defined.

License

This specification is published under the Apache License 2.0. Exemplar GUSTO.md files in the reference collection are published under the MIT License.


GUSTO.md is the verbal companion to a DESIGN.md file. Where DESIGN.md captures the visual layer — colors, typography, spacing — GUSTO.md captures the verbal layer: vocabulary, rhythm, register, refusals, and references. Together, the two formats describe a brand a coding agent can read.