Labs · open research

Labs.

Experiments currently running on the practice. The hypothesis, the method, what gets published, and when. Built so a reader can argue with me before the results land.

running2026-06-01 → 2026-07-01

Claude vs GPT on brand-voice consistency — 30-day evaluation

Hypothesis. Same system prompt, same brand voice guidelines, same source material. Hypothesis: Claude holds brand voice more consistently across long documents; GPT wins on short headlines. We measure how many sentences need rewriting before the team would publish.

Method.

Pull 25 archived Margin Notes paragraphs (deck + body lines), strip the original phrasing.
Feed each to Claude and GPT with the same 400-word brand-voice system prompt.
Score each output 1–3 on: voice match, claim accuracy, would-publish-as-is.
Repeat across three model versions of each over the month.

What I'm publishing. The cleaned scorecard (anonymised samples) at the end of the 30 days. If a model surprises me materially before then, an interim Field Notes.

← Back to home Margin Notes · AI Marketing Audit