gusto.md Format Specification

Version: 0.1.2 Status: Draft, under active development License: Apache-2.0

A format specification for describing a brand's verbal identity to AI agents and content tools. A GUSTO.md file gives agents a persistent, structured understanding of how a brand sounds — its vocabulary, sentence rhythm, tonal modes, cultural references, and refusals — so that every piece of generated copy stays on voice across every surface a brand touches.

This document is the normative reference. The specification is opinionated and authored by the gusto.md project. Vendors, tools, and brands are free to adopt, implement, and extend it under the Apache-2.0 license.

Background and Position

The visual layer of brand identity has a working machine-readable standard. The W3C Design Tokens Community Group has shipped a stable specification (DTCG, 2025.10). Google's DESIGN.md format builds on that work to describe a full visual system — colors, typography, spacing, components — in a file that AI agents can read and apply.

The verbal layer has no equivalent. Brand voice today lives in PDFs, Notion pages, and the trained instincts of human writers. AI tools either ignore voice (and default to a generic "tasteful tech" tone) or solve it inside vendor silos (Jasper, Copy.ai, Contentstack — each with proprietary JSON formats that don't move between tools).

GUSTO.md fills that gap. The specification is opinionated, the file format is portable, and the lint rules are deterministic. Vendors are not required to participate in a standards body to support it. Brands are not required to commit to a vendor to use it. The format is open, the license is permissive, and the design decisions documented here are the considered output of a single project — not a committee.

This is the same posture that produced Markdown, AGENTS.md, and the early DTCG drafts. Standardization through adoption, not consensus.

GUSTO.md is designed as the verbal companion to a DESIGN.md file. The two formats sit alongside each other in a project, share the same conceptual model (machine-readable tokens + human-readable rationale), and are intended to be consumed together by the same generation of AI agents.

File Structure

A GUSTO.md file has two layers, in a fixed order.

YAML front matter — Machine-readable voice tokens, delimited by --- fences at the top of the file.
Markdown body — Human-readable voice rationale organized into ## sections.

The tokens are the normative values. The prose provides context for how to apply them. An agent that consumes a GUSTO.md file should treat the tokens as ground truth for lintable decisions (cadence, banned phrases, refusals) and the prose as guidance for judgement calls (atmosphere, cultural reference, register).

Tokens are not a substitute for prose. The prose is the primary guidance for an agent producing copy; tokens are the enforcement surface for lint and validation tools. An agent reading both should weight the prose for tone and judgment, and apply the tokens for checking and post-editing.

Minimal Example

---
version: "0.1.2"
name: "Heritage"
voice:
  formality: medium
  density: high
  warmth: medium
  irony: low
  imperative_ratio: 0.4
rhythm:
  avg_sentence_length: 14
  max_sentence_length: 22
vocabulary:
  preferred:
    - crafted
    - considered
    - direct
  banned:
    - "take it to the next level"
    - "game-changer"
    - "supercharge"
  avoid:
    - "just"
    - "very"
    - "we hope"
refusals:
  - no_apology_as_style
  - no_exclamation_for_emphasis
---

## Voice Atmosphere

Heritage speaks the way a senior editor speaks — calm, declarative, never
breathless. The voice trusts the reader to follow without being led.

## Vocabulary Palette
...

An agent reading this file knows several things immediately: that sentences should average 14 words and never exceed 22, that "game-changer" is hard-banned (an error on use), that "very" and "just" are softer hedges to avoid where possible (a warning on use), and that apologies should not be used as a stylistic device. The prose tells the agent why — Heritage is editorial, not promotional.

Token Schema

The YAML front matter defines token groups. All groups are optional except name. Groups present must follow the schema below.

Top-Level Fields

version: <string>             # optional, current: "0.1.1"
name: <string>                # required
description: <string>         # optional, one sentence
extends: <string>             # optional, see Extends below

Voice Tokens

Voice tokens describe the overall stance of the voice on a small number of orthogonal axes. Values are categorical (low | medium | high) for human-judgment axes, and numeric (0.0–1.0) for ratio axes.

voice:
  formality: <low | medium | high>
  density: <low | medium | high>
  warmth: <low | medium | high>
  irony: <low | medium | high>
  imperative_ratio: <number 0.0–1.0>

Axis	Meaning	Example: low	Example: high
`formality`	Register distance from spoken conversation	Liquid Death	A bank's terms of service
`density`	Information per sentence	A poem	Apple spec page
`warmth`	Affective closeness to the reader	A coroner's report	A children's book
`irony`	Distance between literal and intended meaning	A safety placard	Liquid Death
`imperative_ratio`	Share of sentences that command the reader	Editorial prose (~0.1)	Liquid Death (~0.7)

Voice tokens are deliberately few. Adding more axes invites false precision — voice is not a vector space, and an eight-axis system suggests a calibration we cannot deliver. Five axes capture the meaningful distinctions; further specificity belongs in prose.

Rhythm Tokens

Rhythm tokens describe sentence-level cadence in measurable terms. These are the primary linting surface — most cadence violations can be checked deterministically.

rhythm:
  avg_sentence_length: <number>      # words
  max_sentence_length: <number>      # words
  paragraph_style: <single_sentence_allowed | dense_only>
  exclamation_policy: <forbidden | tagline_only | sparing | free>
  semicolon_policy: <forbidden | sparing | free>

Rhythm tokens are advisory targets, not hard rules. The linter reports violations as warnings; tools may choose to enforce or soften.

Vocabulary Tokens

Vocabulary tokens are the most directly lintable group. They define what to reach for, what to avoid softly, and what to ban hard.

vocabulary:
  preferred:
    - <word or phrase>
  banned:                            # hard ban — error on use, no register exceptions
    - <word or phrase>
  avoid:                             # soft avoid — warning on use, register may override
    - <word or phrase>
  signature_phrases:
    - <phrase that is uniquely the brand's>
  reclaimed_terms:                   # optional
    - term: <word>
      note: <why this term is used unusually>

The banned and avoid lists are deliberately separate. banned is for marketing clichés, hype words, and retired phrases — vocabulary that should not appear in any consumer-facing copy regardless of register. avoid is for filler words, hedges, and apologetic vocabulary — vocabulary that should generally not appear but can occasionally serve cadence or warmth in specific registers (legal, support, error). The linter treats banned violations as errors and avoid violations as warnings.

The signature_phrases list is intentionally distinct from preferred. Signature phrases are brand-owned (e.g., Murder your thirst. for Liquid Death, Designed by Apple in California. for Apple). A consumer tool must not transfer signature phrases across brands.

Reclaimed Terms

Some voices use ordinary words in deliberately unusual ways. The reclaimed_terms list flags these for consumer tools so that linters don't flag them as off-voice and prompt-builders don't strip their irony.

vocabulary:
  reclaimed_terms:
    - term: healthy
      note: "Used straight-faced as wellness vocabulary, played for ironic contrast against the brand's violence imagery."
    - term: hydration
      note: "Reclaimed wellness term, used in deadpan voice to land the joke."

Reclaimed terms are not banned, not preferred — they are a third category. A consumer tool should preserve them in output and treat the note as guidance for register.

Register Tokens

register:
  <register_name>:
    formality: <override>           # optional, overrides voice.formality
    density: <override>             # optional
    warmth: <override>              # optional
    irony: <override>               # optional
    max_sentence_length: <number>   # optional, overrides rhythm
    notes: <prose>                  # optional, free text

Conventional register names — used by linters and tools — are:

Custom register names are permitted; consumers should preserve unknown registers without error.

Refusal Tokens

Refusals are the non-negotiable rules of the voice. They are listed as either named constants (linter-aware) or free-form strings.

refusals:
  - <refusal_name_or_freeform_string>

The specification defines a starter set of refusal names. These names are recognized by linters and produce specific findings when violated. Unknown refusal names are preserved without error.

Named Refusals — Style and Voice

Refusal name	Meaning
`no_apology_as_style`	Apologies must be substantive, not stylistic.
`no_exclamation_for_emphasis`	Exclamation points used only where genuinely earned.
`no_stacked_adjectives`	Three or more flat adjectives in a row are rejected.
`no_all_caps_for_emphasis`	ALL CAPS is not a substitute for word choice.
`no_mid_sentence_capitalization`	No Capitalization Mid-Sentence For Emphasis.
`no_marketing_cliches`	Banned-phrase enforcement is strict in marketing contexts.
`no_specs_in_marketing_headlines`	Specs follow narrative, not the other way around.
`no_introducing_as_opener`	"Introducing..." is a retired opener.
`no_version_2_framing`	"X 2.0" framing rejected.
`no_first_without_qualification`	Claims of "first" require specific qualification.

Named Refusals — Ethics and Reader Treatment

Refusal name	Meaning
`no_punching_down`	Voice does not target identity, vulnerable groups, or individuals.
`no_real_violence_references`	Where violence vocabulary is used, only cartoon horror — never real-world events.
`no_competitor_disparagement_by_name`	Comparisons are oblique.
`no_user_in_consumer_copy`	"User" is reserved for developer-facing surfaces.
`no_manufactured_urgency`	No "limited time," no countdown copy, no fear-driven pressure.
`no_ai_as_a_feature`	Don't sell "AI" as the feature. Name what the feature does.

Brands may add their own refusal strings beyond this set. Consumers should preserve unknown refusals as free-form strings and treat them as advisory guidance.

Cultural Reference Tokens

Cultural references shape feel rather than syntax. They are advisory, not lintable.

references:
  drawn_from:
    - <cultural_touchstone>
  avoided:
    - <cultural_touchstone>

Token Types

Type	Format	Example
Categorical	One of an enumerated set	`low`, `medium`, `high`
Number	Float or integer	`12`, `0.7`
String	Quoted string	`"Murder your thirst."`
List	YAML sequence	`[crafted, considered, direct]`
Token Reference	`{path.to.token}`	`{vocabulary.preferred}`

Token References

A token may reference another token by path. This allows a register override to reuse a top-level token without restating it.

voice:
  formality: low

register:
  error:
    formality: "{voice.formality}"   # explicitly reuses top-level value
  marketing:
    formality: high                  # overrides

Token references resolve at consumer time. A reference that does not resolve to a defined token produces a broken-ref linter error.

Section Order and Aliases

Sections in the markdown body use ## headings. Sections may be omitted, but those present must appear in the canonical order below. Sections are referenced by name, not number — the canonical order is enforced by linting, not by author-supplied numbering. Authors should not prefix section headings with numbers.

Order	Section	Aliases
1	Voice Atmosphere	Overview, Brand Voice
2	Vocabulary Palette	Vocabulary
3	Sentence Rhythm	Rhythm, Cadence
4	Cultural References	Reference Universe
5	Tonal Modes	Register, Modes
6	Refusals	Hard Rules
7	Anti-patterns	Anti-patterns and Banned Phrases
8	Voice in Context	Surfaces, Applied Voice
9	Agent Prompt Guide	Prompt Guide, Implementation

The canonical names are normative; aliases are accepted by consumers. Out-of-order sections produce a section-order linter warning, not an error — older files may have legitimate variations.

`extends` for Inherited Voice

A GUSTO.md file may reference another GUSTO.md file as a base. The top-level extends field resolves to a URL or path; the consumer must merge the base file's tokens with the local file, with local values taking precedence.

extends: "./parent-brand.gusto.md"
name: "Sub-brand"
voice:
  warmth: high                       # overrides parent

This supports multi-brand systems where a parent voice has variants. Consumers should resolve extends chains to a maximum depth of five, and produce a circular-extends error on cycles.

Consumer Behavior

Consumers of GUSTO.md files (linters, generators, AI agents, design tools) should behave predictably when they encounter content outside the spec.

Scenario	Behavior
Unknown section heading	Preserve; do not error
Numbered section heading (e.g. `## 1. Voice Atmosphere`)	Strip numbering; resolve by name; produce `section-numbered` warning
Unknown token group	Preserve; produce `info` finding
Unknown axis under `voice`	Accept; produce `info` finding
Unknown refusal name	Preserve; treat as free-form string
Unknown register name	Preserve; apply overrides as given
Categorical value outside enum	`invalid-value` error
Numeric value outside expected range	`out-of-range` warning
Duplicate section heading	Error; reject the file
Token reference does not resolve	`broken-ref` error

This permissive posture is deliberate. The spec will grow; consumers built against 0.1 should not break against 0.2 files. Strict enforcement applies only to clearly malformed input.

Linting Rules

The reference linter (gusto-lint) runs the following rules against a parsed GUSTO.md. Each rule produces findings at a fixed severity level.

Rule	Severity	What it checks
`broken-ref`	error	Token references that don't resolve
`duplicate-section`	error	Same `##` heading appears twice
`invalid-value`	error	Categorical value outside enum, or wrong type
`circular-extends`	error	`extends` chain forms a cycle
`missing-name`	error	No `name` field
`section-order`	warning	Sections appear out of canonical order
`section-numbered`	warning	Section heading carries author-supplied numbering
`out-of-range`	warning	Numeric value outside expected range
`banned-in-preferred`	warning	A word appears in both `preferred` and `banned`
`reclaimed-in-banned`	warning	A `reclaimed_terms.term` value also appears in `banned`
`signature-thin`	warning	`signature_phrases` empty when `irony: high` or strongly stylized voice
`register-undefined`	warning	A register is referenced in prose but not declared in tokens
`token-summary`	info	Summary of how many tokens are defined in each group
`prose-thin`	info	A section heading exists but body is under 100 words

Linting validates structure and consistency. It does not evaluate generated copy against the voice — that is a separate function, described below.

Validation of Generated Copy

A linter validates the GUSTO.md file itself. A separate function — referred to here as copy validation — checks whether a piece of generated copy conforms to the file's rules.

Copy validation is the consumer's responsibility. The specification defines the surface for this validation: a copy validator that reads a GUSTO.md file and a piece of copy and returns findings.

The reference CLI exposes this as:

gusto check <copy.txt> --against GUSTO.md

Expected findings include:

Finding	Severity	Trigger
`banned-phrase-used`	error	Copy contains a string from `vocabulary.banned` (hard-banned regardless of register)
`avoid-phrase-used`	warning	Copy contains a string from `vocabulary.avoid` (soft-avoid; register may justify)
`sentence-over-max`	warning	A sentence exceeds `rhythm.max_sentence_length`
`avg-length-drift`	warning	Average sentence length deviates from `rhythm.avg_sentence_length` by more than 30%
`exclamation-violation`	warning	Exclamation point used against `exclamation_policy`
`semicolon-violation`	warning	Semicolon used against `semicolon_policy`
`signature-phrase-foreign`	error	Signature phrase from another GUSTO.md appears in this copy
`refusal-suspected`	info	Heuristic match against a `refusals` rule (e.g., text matches an apology pattern when `no_apology_as_style` is declared)

Copy validation is intentionally pragmatic. It catches obvious violations, not nuanced ones. Voice is partly judgment, and the validator does not pretend otherwise.

CLI Reference

A reference CLI is available as @gusto-md/cli on npm. All commands accept a file path or - for stdin and output JSON by default.

`gusto lint`

Validate a GUSTO.md file for structural correctness.

npx @gusto-md/cli lint GUSTO.md

Exit code 1 if errors are found, 0 otherwise.

`gusto check`

Validate a piece of copy against a GUSTO.md file.

npx @gusto-md/cli check hero.txt --against GUSTO.md

`gusto diff`

Compare two GUSTO.md files and report token-level changes.

npx @gusto-md/cli diff GUSTO.md GUSTO-v2.md

Exit code 1 if regressions are detected (added refusals violated by historical copy, removed banned phrases, etc.).

`gusto export`

Export a GUSTO.md file to other consumer formats.

npx @gusto-md/cli export --format system-prompt GUSTO.md > prompt.md
npx @gusto-md/cli export --format claude-project GUSTO.md > claude-config.json

Format	Output	Description
`system-prompt`	Markdown	A ready-to-paste system prompt for an LLM
`claude-project`	JSON	Configuration for Anthropic Claude Projects
`openai-instructions`	Markdown	Custom instructions for OpenAI assistants
`json`	JSON	Pure data dump of all tokens

`gusto spec`

Output the GUSTO.md format specification (useful for injecting spec context into agent prompts).

npx @gusto-md/cli spec
npx @gusto-md/cli spec --rules-only --format json

Integration Guidance for Vendors

Tools and platforms adopting GUSTO.md should follow these conventions.

Read. Accept a GUSTO.md file as an input artifact. Parse the YAML front matter as voice tokens. Treat the markdown body as guidance prose. Pass both to the underlying generation model — tokens as rules, prose as context.

Write. When producing a GUSTO.md from a tool's internal voice profile, emit tokens conformant to this specification. Preserve unknown fields when round-tripping.

Validate. Before generating copy on a user's behalf, lint the GUSTO.md and surface errors to the user. Do not silently ignore broken references or invalid values.

Silent application. The voice should be applied without surfacing the rules to the user. Generated copy should not include meta-commentary like "following your spec" or "according to your rules." The voice is the output, not the rules behind it. This is best-practice guidance rather than a strict requirement: output testing during the development of the Liquid Death exemplar (two surfaces, six runs total, May 2026) did not surface meta-commentary even without this guidance in place. The recommendation is included to guide vendor implementations toward the cleaner pattern.

Round-trip. A GUSTO.md exported from a tool, imported into a second tool, and exported again should remain equivalent at the token level. Prose may be reformatted but should not be discarded.

Attribute. A user-facing surface that consumes GUSTO.md should indicate which file is active, so the user can verify what voice is being applied.

A reference implementation, integration kit, and example consumer adapters are maintained in the gusto.md repository.

Versioning and Compatibility

The format follows a <major>.<minor>.<patch> versioning scheme. Patch releases are non-breaking clarifications and additions to the named-constants sets. Minor releases may add new token groups or rules. Major releases may break compatibility. A version 1.0 will be declared when the specification has stabilized through real-world use.

Change type	Version bump
Add a named refusal or named register	Patch
Clarify wording without changing behavior	Patch
Add a new token group, axis, or category	Minor
Add a new lint rule at `info` or `warning` severity	Minor
Rename a canonical section (with alias for back-compat)	Minor
Add a new lint rule at `error` severity	Major
Remove or rename a token without alias	Major
Change the semantics of an existing token	Major

Consumers should declare which spec version they implement. Files should declare the version they target via the version field. Consumers encountering a file targeting a newer spec version should attempt best-effort parsing and surface an info-level warning.

Status and Roadmap

This specification is 0.1.2. The format, schema, lint rules, and CLI surface are under active development. Breaking changes are expected before 1.0.

Near-term priorities for the spec:

Stabilize the voice axis enumeration
Continue expanding the named refusals and named registers sets through exemplar authoring
Define a JSON Schema for the YAML front matter
Publish reference exemplar files alongside the spec (Apple and Liquid Death in this release; three additional voice families to follow)
Ship gusto check as the first copy validator

The specification is authored and maintained by the gusto.md project. Issues, discussion, and proposed changes are tracked in the public repository.

Changelog

0.1.2

Split vocabulary.banned into vocabulary.banned (hard ban — error on use) and vocabulary.avoid (soft avoid — warning on use). The single banned list was treating context-blind hedges and marketing clichés at the same severity, causing models to over-correct and strip natural language flow. Apple exemplar migrates to the new shape (filler and hedge words moved to avoid); Liquid Death exemplar's list is uniformly hard-banned and stays in banned only.
Added avoid-phrase-used linter rule at warning severity, to accompany the existing banned-phrase-used rule (now scoped to hard bans only).
Added clarification in the File Structure section that prose is the primary guidance for an agent producing copy, and tokens are the enforcement surface for lint and validation tools. The two-layer architecture was documented but the hierarchy was not; output testing across May 2026 (Apple) and May 2026 (Liquid Death, two surfaces) confirmed that prose carries voice atmosphere and tokens enforce vocabulary at the word level. A separate observation — that token-enforced cadence may compress brand-flavored vocabulary at tight length budgets — was surfaced in the same tests and is logged for monitoring across future exemplars rather than acted on at this version.
Added "Silent application" paragraph to the Integration Guidance for Vendors section, recommending that voice be applied without surfacing the rules to the reader. Best-practice guidance, not an urgent fix: the Liquid Death output tests (two surfaces, six runs total) did not reproduce the May Apple regression in which the model cited the spec back to itself. The recommendation is included to guide vendor implementations toward the cleaner pattern.
Updated minimal example and all internal version references from 0.1.1 to 0.1.2.

0.1.1

Renamed all internal section references from numbered (e.g. "Section 3.2") to named (e.g. "the Voice Tokens section"). Section numbers were a documentation convention, not a token, and exemplar authors were mistakenly numbering their markdown headings to match.
Added section-numbered warning to the linter, with consumer behavior to strip author-supplied numbering automatically.
Expanded named refusals from 8 to 16, split into two tables — style/voice and ethics/reader treatment. New names: no_all_caps_for_emphasis, no_mid_sentence_capitalization, no_specs_in_marketing_headlines, no_introducing_as_opener, no_version_2_framing, no_first_without_qualification, no_manufactured_urgency, no_ai_as_a_feature. Surfaced through writing the Apple exemplar.
Added Reclaimed Terms subsection under Vocabulary Tokens with worked example. The schema field existed in 0.1 but had no example; needed before authoring brands that play with wellness or corporate vocabulary ironically.
Added reclaimed-in-banned linter rule to catch authors who reclaim a term in one list and ban it in another.
Updated minimal example and all references from version: "alpha" to version: "0.1.1".

0.1 (alpha)

Initial draft. Five voice axes, eight named refusals, nine canonical sections, reference CLI commands defined.

License

This specification is published under the Apache License 2.0. Exemplar GUSTO.md files in the reference collection are published under the MIT License.

GUSTO.md is the verbal companion to a DESIGN.md file. Where DESIGN.md captures the visual layer — colors, typography, spacing — GUSTO.md captures the verbal layer: vocabulary, rhythm, register, refusals, and references. Together, the two formats describe a brand a coding agent can read.

gusto.md Format Specification#

Background and Position#

File Structure#

Minimal Example#

Token Schema#

Top-Level Fields#

Voice Tokens#

Rhythm Tokens#

Vocabulary Tokens#

Reclaimed Terms#

Register Tokens#

Refusal Tokens#

Named Refusals — Style and Voice#

Named Refusals — Ethics and Reader Treatment#

Cultural Reference Tokens#

Token Types#

Token References#

Section Order and Aliases#

extends for Inherited Voice#

Consumer Behavior#

Linting Rules#

Validation of Generated Copy#

CLI Reference#

gusto lint#

gusto check#

gusto diff#

gusto export#

gusto spec#

Integration Guidance for Vendors#

Versioning and Compatibility#

Status and Roadmap#

Changelog#

0.1.2#

0.1.1#

0.1 (alpha)#

License#