# gusto.md Format Specification

**Version:** 0.1.2
**Status:** Draft, under active development
**License:** Apache-2.0

A format specification for describing a brand's verbal identity to AI agents and content tools. A GUSTO.md file gives agents a persistent, structured understanding of how a brand sounds — its vocabulary, sentence rhythm, tonal modes, cultural references, and refusals — so that every piece of generated copy stays on voice across every surface a brand touches.

This document is the normative reference. The specification is opinionated and authored by the gusto.md project. Vendors, tools, and brands are free to adopt, implement, and extend it under the Apache-2.0 license.

---

## Background and Position

The visual layer of brand identity has a working machine-readable standard. The W3C Design Tokens Community Group has shipped a stable specification (DTCG, 2025.10). Google's DESIGN.md format builds on that work to describe a full visual system — colors, typography, spacing, components — in a file that AI agents can read and apply.

The verbal layer has no equivalent. Brand voice today lives in PDFs, Notion pages, and the trained instincts of human writers. AI tools either ignore voice (and default to a generic "tasteful tech" tone) or solve it inside vendor silos (Jasper, Copy.ai, Contentstack — each with proprietary JSON formats that don't move between tools).

GUSTO.md fills that gap. The specification is opinionated, the file format is portable, and the lint rules are deterministic. Vendors are not required to participate in a standards body to support it. Brands are not required to commit to a vendor to use it. The format is open, the license is permissive, and the design decisions documented here are the considered output of a single project — not a committee.

This is the same posture that produced Markdown, AGENTS.md, and the early DTCG drafts. Standardization through adoption, not consensus.

GUSTO.md is designed as the verbal companion to a DESIGN.md file. The two formats sit alongside each other in a project, share the same conceptual model (machine-readable tokens + human-readable rationale), and are intended to be consumed together by the same generation of AI agents.

---

## File Structure

A GUSTO.md file has two layers, in a fixed order.

1. **YAML front matter** — Machine-readable voice tokens, delimited by `---` fences at the top of the file.
2. **Markdown body** — Human-readable voice rationale organized into `##` sections.

The tokens are the normative values. The prose provides context for how to apply them. An agent that consumes a GUSTO.md file should treat the tokens as ground truth for lintable decisions (cadence, banned phrases, refusals) and the prose as guidance for judgement calls (atmosphere, cultural reference, register).

Tokens are not a substitute for prose. The prose is the primary guidance for an agent producing copy; tokens are the enforcement surface for lint and validation tools. An agent reading both should weight the prose for tone and judgment, and apply the tokens for checking and post-editing.

### Minimal Example

```yaml
---
version: "0.1.2"
name: "Heritage"
voice:
  formality: medium
  density: high
  warmth: medium
  irony: low
  imperative_ratio: 0.4
rhythm:
  avg_sentence_length: 14
  max_sentence_length: 22
vocabulary:
  preferred:
    - crafted
    - considered
    - direct
  banned:
    - "take it to the next level"
    - "game-changer"
    - "supercharge"
  avoid:
    - "just"
    - "very"
    - "we hope"
refusals:
  - no_apology_as_style
  - no_exclamation_for_emphasis
---

## Voice Atmosphere

Heritage speaks the way a senior editor speaks — calm, declarative, never
breathless. The voice trusts the reader to follow without being led.

## Vocabulary Palette
...
```

An agent reading this file knows several things immediately: that sentences should average 14 words and never exceed 22, that "game-changer" is hard-banned (an error on use), that "very" and "just" are softer hedges to avoid where possible (a warning on use), and that apologies should not be used as a stylistic device. The prose tells the agent *why* — Heritage is editorial, not promotional.

---

## Token Schema

The YAML front matter defines token groups. All groups are optional except `name`. Groups present must follow the schema below.

### Top-Level Fields

```yaml
version: <string>             # optional, current: "0.1.1"
name: <string>                # required
description: <string>         # optional, one sentence
extends: <string>             # optional, see Extends below
```

### Voice Tokens

Voice tokens describe the overall stance of the voice on a small number of orthogonal axes. Values are categorical (`low | medium | high`) for human-judgment axes, and numeric (0.0–1.0) for ratio axes.

```yaml
voice:
  formality: <low | medium | high>
  density: <low | medium | high>
  warmth: <low | medium | high>
  irony: <low | medium | high>
  imperative_ratio: <number 0.0–1.0>
```

| Axis | Meaning | Example: low | Example: high |
|---|---|---|---|
| `formality` | Register distance from spoken conversation | Liquid Death | A bank's terms of service |
| `density` | Information per sentence | A poem | Apple spec page |
| `warmth` | Affective closeness to the reader | A coroner's report | A children's book |
| `irony` | Distance between literal and intended meaning | A safety placard | Liquid Death |
| `imperative_ratio` | Share of sentences that command the reader | Editorial prose (~0.1) | Liquid Death (~0.7) |

Voice tokens are deliberately few. Adding more axes invites false precision — voice is not a vector space, and an eight-axis system suggests a calibration we cannot deliver. Five axes capture the meaningful distinctions; further specificity belongs in prose.

### Rhythm Tokens

Rhythm tokens describe sentence-level cadence in measurable terms. These are the primary linting surface — most cadence violations can be checked deterministically.

```yaml
rhythm:
  avg_sentence_length: <number>      # words
  max_sentence_length: <number>      # words
  paragraph_style: <single_sentence_allowed | dense_only>
  exclamation_policy: <forbidden | tagline_only | sparing | free>
  semicolon_policy: <forbidden | sparing | free>
```

Rhythm tokens are advisory targets, not hard rules. The linter reports violations as warnings; tools may choose to enforce or soften.

### Vocabulary Tokens

Vocabulary tokens are the most directly lintable group. They define what to reach for, what to avoid softly, and what to ban hard.

```yaml
vocabulary:
  preferred:
    - <word or phrase>
  banned:                            # hard ban — error on use, no register exceptions
    - <word or phrase>
  avoid:                             # soft avoid — warning on use, register may override
    - <word or phrase>
  signature_phrases:
    - <phrase that is uniquely the brand's>
  reclaimed_terms:                   # optional
    - term: <word>
      note: <why this term is used unusually>
```

The `banned` and `avoid` lists are deliberately separate. `banned` is for marketing clichés, hype words, and retired phrases — vocabulary that should not appear in any consumer-facing copy regardless of register. `avoid` is for filler words, hedges, and apologetic vocabulary — vocabulary that should generally not appear but can occasionally serve cadence or warmth in specific registers (legal, support, error). The linter treats `banned` violations as errors and `avoid` violations as warnings.

The `signature_phrases` list is intentionally distinct from `preferred`. Signature phrases are brand-owned (e.g., *Murder your thirst.* for Liquid Death, *Designed by Apple in California.* for Apple). A consumer tool must not transfer signature phrases across brands.

#### Reclaimed Terms

Some voices use ordinary words in deliberately unusual ways. The `reclaimed_terms` list flags these for consumer tools so that linters don't flag them as off-voice and prompt-builders don't strip their irony.

```yaml
vocabulary:
  reclaimed_terms:
    - term: healthy
      note: "Used straight-faced as wellness vocabulary, played for ironic contrast against the brand's violence imagery."
    - term: hydration
      note: "Reclaimed wellness term, used in deadpan voice to land the joke."
```

Reclaimed terms are not banned, not preferred — they are a third category. A consumer tool should preserve them in output and treat the `note` as guidance for register.

### Register Tokens

Register tokens describe how the voice adjusts across surfaces. Each register is a named context with override rules.

```yaml
register:
  <register_name>:
    formality: <override>           # optional, overrides voice.formality
    density: <override>             # optional
    warmth: <override>              # optional
    irony: <override>               # optional
    max_sentence_length: <number>   # optional, overrides rhythm
    notes: <prose>                  # optional, free text
```

Conventional register names — used by linters and tools — are:

`marketing | support | error | developer | newsroom | legal | sustainability | social | packaging`

Custom register names are permitted; consumers should preserve unknown registers without error.

### Refusal Tokens

Refusals are the non-negotiable rules of the voice. They are listed as either named constants (linter-aware) or free-form strings.

```yaml
refusals:
  - <refusal_name_or_freeform_string>
```

The specification defines a starter set of refusal names. These names are recognized by linters and produce specific findings when violated. Unknown refusal names are preserved without error.

#### Named Refusals — Style and Voice

| Refusal name | Meaning |
|---|---|
| `no_apology_as_style` | Apologies must be substantive, not stylistic. |
| `no_exclamation_for_emphasis` | Exclamation points used only where genuinely earned. |
| `no_stacked_adjectives` | Three or more flat adjectives in a row are rejected. |
| `no_all_caps_for_emphasis` | ALL CAPS is not a substitute for word choice. |
| `no_mid_sentence_capitalization` | No Capitalization Mid-Sentence For Emphasis. |
| `no_marketing_cliches` | Banned-phrase enforcement is strict in marketing contexts. |
| `no_specs_in_marketing_headlines` | Specs follow narrative, not the other way around. |
| `no_introducing_as_opener` | "Introducing..." is a retired opener. |
| `no_version_2_framing` | "X 2.0" framing rejected. |
| `no_first_without_qualification` | Claims of "first" require specific qualification. |

#### Named Refusals — Ethics and Reader Treatment

| Refusal name | Meaning |
|---|---|
| `no_punching_down` | Voice does not target identity, vulnerable groups, or individuals. |
| `no_real_violence_references` | Where violence vocabulary is used, only cartoon horror — never real-world events. |
| `no_competitor_disparagement_by_name` | Comparisons are oblique. |
| `no_user_in_consumer_copy` | "User" is reserved for developer-facing surfaces. |
| `no_manufactured_urgency` | No "limited time," no countdown copy, no fear-driven pressure. |
| `no_ai_as_a_feature` | Don't sell "AI" as the feature. Name what the feature does. |

Brands may add their own refusal strings beyond this set. Consumers should preserve unknown refusals as free-form strings and treat them as advisory guidance.

### Cultural Reference Tokens

Cultural references shape feel rather than syntax. They are advisory, not lintable.

```yaml
references:
  drawn_from:
    - <cultural_touchstone>
  avoided:
    - <cultural_touchstone>
```

---

## Token Types

| Type | Format | Example |
|---|---|---|
| Categorical | One of an enumerated set | `low`, `medium`, `high` |
| Number | Float or integer | `12`, `0.7` |
| String | Quoted string | `"Murder your thirst."` |
| List | YAML sequence | `[crafted, considered, direct]` |
| Token Reference | `{path.to.token}` | `{vocabulary.preferred}` |

### Token References

A token may reference another token by path. This allows a register override to reuse a top-level token without restating it.

```yaml
voice:
  formality: low

register:
  error:
    formality: "{voice.formality}"   # explicitly reuses top-level value
  marketing:
    formality: high                  # overrides
```

Token references resolve at consumer time. A reference that does not resolve to a defined token produces a `broken-ref` linter error.

---

## Section Order and Aliases

Sections in the markdown body use `##` headings. Sections may be omitted, but those present must appear in the canonical order below. Sections are referenced by name, not number — the canonical order is enforced by linting, not by author-supplied numbering. Authors should not prefix section headings with numbers.

| Order | Section | Aliases |
|---|---|---|
| 1 | Voice Atmosphere | Overview, Brand Voice |
| 2 | Vocabulary Palette | Vocabulary |
| 3 | Sentence Rhythm | Rhythm, Cadence |
| 4 | Cultural References | Reference Universe |
| 5 | Tonal Modes | Register, Modes |
| 6 | Refusals | Hard Rules |
| 7 | Anti-patterns | Anti-patterns and Banned Phrases |
| 8 | Voice in Context | Surfaces, Applied Voice |
| 9 | Agent Prompt Guide | Prompt Guide, Implementation |

The canonical names are normative; aliases are accepted by consumers. Out-of-order sections produce a `section-order` linter warning, not an error — older files may have legitimate variations.

### `extends` for Inherited Voice

A GUSTO.md file may reference another GUSTO.md file as a base. The top-level `extends` field resolves to a URL or path; the consumer must merge the base file's tokens with the local file, with local values taking precedence.

```yaml
extends: "./parent-brand.gusto.md"
name: "Sub-brand"
voice:
  warmth: high                       # overrides parent
```

This supports multi-brand systems where a parent voice has variants. Consumers should resolve `extends` chains to a maximum depth of five, and produce a `circular-extends` error on cycles.

---

## Consumer Behavior

Consumers of GUSTO.md files (linters, generators, AI agents, design tools) should behave predictably when they encounter content outside the spec.

| Scenario | Behavior |
|---|---|
| Unknown section heading | Preserve; do not error |
| Numbered section heading (e.g. `## 1. Voice Atmosphere`) | Strip numbering; resolve by name; produce `section-numbered` warning |
| Unknown token group | Preserve; produce `info` finding |
| Unknown axis under `voice` | Accept; produce `info` finding |
| Unknown refusal name | Preserve; treat as free-form string |
| Unknown register name | Preserve; apply overrides as given |
| Categorical value outside enum | `invalid-value` error |
| Numeric value outside expected range | `out-of-range` warning |
| Duplicate section heading | Error; reject the file |
| Token reference does not resolve | `broken-ref` error |

This permissive posture is deliberate. The spec will grow; consumers built against `0.1` should not break against `0.2` files. Strict enforcement applies only to clearly malformed input.

---

## Linting Rules

The reference linter (`gusto-lint`) runs the following rules against a parsed GUSTO.md. Each rule produces findings at a fixed severity level.

| Rule | Severity | What it checks |
|---|---|---|
| `broken-ref` | error | Token references that don't resolve |
| `duplicate-section` | error | Same `##` heading appears twice |
| `invalid-value` | error | Categorical value outside enum, or wrong type |
| `circular-extends` | error | `extends` chain forms a cycle |
| `missing-name` | error | No `name` field |
| `section-order` | warning | Sections appear out of canonical order |
| `section-numbered` | warning | Section heading carries author-supplied numbering |
| `out-of-range` | warning | Numeric value outside expected range |
| `banned-in-preferred` | warning | A word appears in both `preferred` and `banned` |
| `reclaimed-in-banned` | warning | A `reclaimed_terms.term` value also appears in `banned` |
| `signature-thin` | warning | `signature_phrases` empty when `irony: high` or strongly stylized voice |
| `register-undefined` | warning | A register is referenced in prose but not declared in tokens |
| `token-summary` | info | Summary of how many tokens are defined in each group |
| `prose-thin` | info | A section heading exists but body is under 100 words |

Linting validates structure and consistency. It does not evaluate generated *copy* against the voice — that is a separate function, described below.

---

## Validation of Generated Copy

A linter validates the GUSTO.md file itself. A separate function — referred to here as **copy validation** — checks whether a piece of generated copy conforms to the file's rules.

Copy validation is the consumer's responsibility. The specification defines the surface for this validation: a copy validator that reads a GUSTO.md file and a piece of copy and returns findings.

The reference CLI exposes this as:

```
gusto check <copy.txt> --against GUSTO.md
```

Expected findings include:

| Finding | Severity | Trigger |
|---|---|---|
| `banned-phrase-used` | error | Copy contains a string from `vocabulary.banned` (hard-banned regardless of register) |
| `avoid-phrase-used` | warning | Copy contains a string from `vocabulary.avoid` (soft-avoid; register may justify) |
| `sentence-over-max` | warning | A sentence exceeds `rhythm.max_sentence_length` |
| `avg-length-drift` | warning | Average sentence length deviates from `rhythm.avg_sentence_length` by more than 30% |
| `exclamation-violation` | warning | Exclamation point used against `exclamation_policy` |
| `semicolon-violation` | warning | Semicolon used against `semicolon_policy` |
| `signature-phrase-foreign` | error | Signature phrase from another GUSTO.md appears in this copy |
| `refusal-suspected` | info | Heuristic match against a `refusals` rule (e.g., text matches an apology pattern when `no_apology_as_style` is declared) |

Copy validation is intentionally pragmatic. It catches obvious violations, not nuanced ones. Voice is partly judgment, and the validator does not pretend otherwise.

---

## CLI Reference

A reference CLI is available as `@gusto-md/cli` on npm. All commands accept a file path or `-` for stdin and output JSON by default.

### `gusto lint`

Validate a GUSTO.md file for structural correctness.

```
npx @gusto-md/cli lint GUSTO.md
```

Exit code `1` if errors are found, `0` otherwise.

### `gusto check`

Validate a piece of copy against a GUSTO.md file.

```
npx @gusto-md/cli check hero.txt --against GUSTO.md
```

### `gusto diff`

Compare two GUSTO.md files and report token-level changes.

```
npx @gusto-md/cli diff GUSTO.md GUSTO-v2.md
```

Exit code `1` if regressions are detected (added refusals violated by historical copy, removed banned phrases, etc.).

### `gusto export`

Export a GUSTO.md file to other consumer formats.

```
npx @gusto-md/cli export --format system-prompt GUSTO.md > prompt.md
npx @gusto-md/cli export --format claude-project GUSTO.md > claude-config.json
```

| Format | Output | Description |
|---|---|---|
| `system-prompt` | Markdown | A ready-to-paste system prompt for an LLM |
| `claude-project` | JSON | Configuration for Anthropic Claude Projects |
| `openai-instructions` | Markdown | Custom instructions for OpenAI assistants |
| `json` | JSON | Pure data dump of all tokens |

### `gusto spec`

Output the GUSTO.md format specification (useful for injecting spec context into agent prompts).

```
npx @gusto-md/cli spec
npx @gusto-md/cli spec --rules-only --format json
```

---

## Integration Guidance for Vendors

Tools and platforms adopting GUSTO.md should follow these conventions.

**Read.** Accept a GUSTO.md file as an input artifact. Parse the YAML front matter as voice tokens. Treat the markdown body as guidance prose. Pass both to the underlying generation model — tokens as rules, prose as context.

**Write.** When producing a GUSTO.md from a tool's internal voice profile, emit tokens conformant to this specification. Preserve unknown fields when round-tripping.

**Validate.** Before generating copy on a user's behalf, lint the GUSTO.md and surface errors to the user. Do not silently ignore broken references or invalid values.

**Silent application.** The voice should be applied without surfacing the rules to the user. Generated copy should not include meta-commentary like "following your spec" or "according to your rules." The voice is the output, not the rules behind it. This is best-practice guidance rather than a strict requirement: output testing during the development of the Liquid Death exemplar (two surfaces, six runs total, May 2026) did not surface meta-commentary even without this guidance in place. The recommendation is included to guide vendor implementations toward the cleaner pattern.

**Round-trip.** A GUSTO.md exported from a tool, imported into a second tool, and exported again should remain equivalent at the token level. Prose may be reformatted but should not be discarded.

**Attribute.** A user-facing surface that consumes GUSTO.md should indicate which file is active, so the user can verify what voice is being applied.

A reference implementation, integration kit, and example consumer adapters are maintained in the gusto.md repository.

---

## Versioning and Compatibility

The format follows a `<major>.<minor>.<patch>` versioning scheme. Patch releases are non-breaking clarifications and additions to the named-constants sets. Minor releases may add new token groups or rules. Major releases may break compatibility. A version `1.0` will be declared when the specification has stabilized through real-world use.

| Change type | Version bump |
|---|---|
| Add a named refusal or named register | Patch |
| Clarify wording without changing behavior | Patch |
| Add a new token group, axis, or category | Minor |
| Add a new lint rule at `info` or `warning` severity | Minor |
| Rename a canonical section (with alias for back-compat) | Minor |
| Add a new lint rule at `error` severity | Major |
| Remove or rename a token without alias | Major |
| Change the semantics of an existing token | Major |

Consumers should declare which spec version they implement. Files should declare the version they target via the `version` field. Consumers encountering a file targeting a newer spec version should attempt best-effort parsing and surface an `info`-level warning.

---

## Status and Roadmap

This specification is `0.1.2`. The format, schema, lint rules, and CLI surface are under active development. Breaking changes are expected before `1.0`.

Near-term priorities for the spec:

- Stabilize the voice axis enumeration
- Continue expanding the named refusals and named registers sets through exemplar authoring
- Define a JSON Schema for the YAML front matter
- Publish reference exemplar files alongside the spec (Apple and Liquid Death in this release; three additional voice families to follow)
- Ship `gusto check` as the first copy validator

The specification is authored and maintained by the gusto.md project. Issues, discussion, and proposed changes are tracked in the public repository.

---

## Changelog

### 0.1.2

- Split `vocabulary.banned` into `vocabulary.banned` (hard ban — error on use) and `vocabulary.avoid` (soft avoid — warning on use). The single `banned` list was treating context-blind hedges and marketing clichés at the same severity, causing models to over-correct and strip natural language flow. Apple exemplar migrates to the new shape (filler and hedge words moved to `avoid`); Liquid Death exemplar's list is uniformly hard-banned and stays in `banned` only.
- Added `avoid-phrase-used` linter rule at warning severity, to accompany the existing `banned-phrase-used` rule (now scoped to hard bans only).
- Added clarification in the File Structure section that prose is the primary guidance for an agent producing copy, and tokens are the enforcement surface for lint and validation tools. The two-layer architecture was documented but the hierarchy was not; output testing across May 2026 (Apple) and May 2026 (Liquid Death, two surfaces) confirmed that prose carries voice atmosphere and tokens enforce vocabulary at the word level. A separate observation — that token-enforced cadence may compress brand-flavored vocabulary at tight length budgets — was surfaced in the same tests and is logged for monitoring across future exemplars rather than acted on at this version.
- Added "Silent application" paragraph to the Integration Guidance for Vendors section, recommending that voice be applied without surfacing the rules to the reader. Best-practice guidance, not an urgent fix: the Liquid Death output tests (two surfaces, six runs total) did not reproduce the May Apple regression in which the model cited the spec back to itself. The recommendation is included to guide vendor implementations toward the cleaner pattern.
- Updated minimal example and all internal version references from `0.1.1` to `0.1.2`.

### 0.1.1

- Renamed all internal section references from numbered (e.g. "Section 3.2") to named (e.g. "the Voice Tokens section"). Section numbers were a documentation convention, not a token, and exemplar authors were mistakenly numbering their markdown headings to match.
- Added `section-numbered` warning to the linter, with consumer behavior to strip author-supplied numbering automatically.
- Expanded named refusals from 8 to 16, split into two tables — style/voice and ethics/reader treatment. New names: `no_all_caps_for_emphasis`, `no_mid_sentence_capitalization`, `no_specs_in_marketing_headlines`, `no_introducing_as_opener`, `no_version_2_framing`, `no_first_without_qualification`, `no_manufactured_urgency`, `no_ai_as_a_feature`. Surfaced through writing the Apple exemplar.
- Added `Reclaimed Terms` subsection under Vocabulary Tokens with worked example. The schema field existed in 0.1 but had no example; needed before authoring brands that play with wellness or corporate vocabulary ironically.
- Added `reclaimed-in-banned` linter rule to catch authors who reclaim a term in one list and ban it in another.
- Updated minimal example and all references from `version: "alpha"` to `version: "0.1.1"`.

### 0.1 (alpha)

- Initial draft. Five voice axes, eight named refusals, nine canonical sections, reference CLI commands defined.

---

## License

This specification is published under the Apache License 2.0. Exemplar GUSTO.md files in the reference collection are published under the MIT License.

---

*GUSTO.md is the verbal companion to a DESIGN.md file. Where DESIGN.md captures the visual layer — colors, typography, spacing — GUSTO.md captures the verbal layer: vocabulary, rhythm, register, refusals, and references. Together, the two formats describe a brand a coding agent can read.*
