Methodology

How We Grade

Every outcome on EvidencedBy gets an A–F grade for the strength of the evidence — how confident we are that the effect is real and meaningful. A grade is not a measure of how large the effect is, nor of how safe or right the intervention is for you.

Inputs

What goes into a grade

Each grade weighs five dimensions together. No single one is decisive — a huge effect in one tiny, unreplicated study does not earn a top grade.

Study design: Randomized controlled trials and meta-analyses of RCTs carry the most weight; prospective cohorts less; cross-sectional, open-label, or anecdotal reports least.
Sample size: Total number of participants across the evidence base. Small pilot studies (n<30) are treated as preliminary regardless of how clean the result looks.
Replication: Has the finding been independently reproduced by separate research groups? A single striking study, unreplicated, is capped well below the top grades.
Effect size: Is the effect large enough to matter in everyday life? A tightly-proven but trivially-small effect is graded lower on practical usefulness.
Recency & durability: Has the effect survived in recent, better-powered, pre-registered studies, or does it rest mainly on older work from before modern methodological standards?

The scale

A through F

A
Strong evidence — Consistent, high-quality evidence of a meaningful effect.
Multiple well-powered RCTs and/or a robust meta-analysis, independently replicated, showing a consistent and practically meaningful effect that holds up in recent work.
B
Moderate evidence — Good evidence, with some gaps or inconsistency.
At least one well-conducted RCT (or several smaller ones) with supportive replication, but limited by sample size, mixed results, or thin recent confirmation.
C
Preliminary / mixed — Suggestive but unsettled — could go either way.
Early or conflicting trials, small samples, or reliance on non-randomized designs. A real effect is plausible but not established.
D
Weak evidence — Little support; mostly indirect or low-quality data.
Evidence limited to open-label, observational, mechanistic, or animal studies, or human trials with serious methodological problems.
E
Very weak / anecdotal — Essentially only theory or anecdote in humans.
Claims rest on mechanism, tradition, or anecdote, with no credible human outcome trials.
F
Evidence against — Good evidence the effect does NOT occur (for this outcome/population).
Well-conducted studies show no effect, or the effect fails to replicate under rigorous testing.

One bar for everything

Supplements and practices, graded the same way

A breathing protocol and a supplement are held to the same evidentiary bar. The dimensions above apply equally to a randomized trial of a capsule and a randomized trial of a meditation program.