EvidencedBy

Methodology

How We Grade

Every outcome on EvidencedBy gets an A–F grade for the strength of the evidence — how confident we are that the effect is real and meaningful. A grade is not a measure of how large the effect is, nor of how safe or right the intervention is for you.

Inputs

What goes into a grade

Each grade weighs five dimensions together. No single one is decisive — a huge effect in one tiny, unreplicated study does not earn a top grade.

Study design
Randomized controlled trials and meta-analyses of RCTs carry the most weight; prospective cohorts less; cross-sectional, open-label, or anecdotal reports least.
Sample size
Total number of participants across the evidence base. Small pilot studies (n<30) are treated as preliminary regardless of how clean the result looks.
Replication
Has the finding been independently reproduced by separate research groups? A single striking study, unreplicated, is capped well below the top grades.
Effect size
Is the effect large enough to matter in everyday life? A tightly-proven but trivially-small effect is graded lower on practical usefulness.
Recency & durability
Has the effect survived in recent, better-powered, pre-registered studies, or does it rest mainly on older work from before modern methodological standards?

The scale

A through F

  • A

    Strong evidence Consistent, high-quality evidence of a meaningful effect.

    Multiple well-powered RCTs and/or a robust meta-analysis, independently replicated, showing a consistent and practically meaningful effect that holds up in recent work.

  • B

    Moderate evidence Good evidence, with some gaps or inconsistency.

    At least one well-conducted RCT (or several smaller ones) with supportive replication, but limited by sample size, mixed results, or thin recent confirmation.

  • C

    Preliminary / mixed Suggestive but unsettled — could go either way.

    Early or conflicting trials, small samples, or reliance on non-randomized designs. A real effect is plausible but not established.

  • D

    Weak evidence Little support; mostly indirect or low-quality data.

    Evidence limited to open-label, observational, mechanistic, or animal studies, or human trials with serious methodological problems.

  • E

    Very weak / anecdotal Essentially only theory or anecdote in humans.

    Claims rest on mechanism, tradition, or anecdote, with no credible human outcome trials.

  • F

    Evidence against Good evidence the effect does NOT occur (for this outcome/population).

    Well-conducted studies show no effect, or the effect fails to replicate under rigorous testing.

One bar for everything

Supplements and practices, graded the same way

A breathing protocol and a supplement are held to the same evidentiary bar. The dimensions above apply equally to a randomized trial of a capsule and a randomized trial of a meditation program.