Methodology
How We Grade
Every outcome on EvidencedBy gets an A–F grade for the strength of the evidence — how confident we are that the effect is real and meaningful. A grade is not a measure of how large the effect is, nor of how safe or right the intervention is for you.
Inputs
What goes into a grade
Each grade weighs five dimensions together. No single one is decisive — a huge effect in one tiny, unreplicated study does not earn a top grade.
- Study design
- Randomized controlled trials and meta-analyses of RCTs carry the most weight; prospective cohorts less; cross-sectional, open-label, or anecdotal reports least.
- Sample size
- Total number of participants across the evidence base. Small pilot studies (n<30) are treated as preliminary regardless of how clean the result looks.
- Replication
- Has the finding been independently reproduced by separate research groups? A single striking study, unreplicated, is capped well below the top grades.
- Effect size
- Is the effect large enough to matter in everyday life? A tightly-proven but trivially-small effect is graded lower on practical usefulness.
- Recency & durability
- Has the effect survived in recent, better-powered, pre-registered studies, or does it rest mainly on older work from before modern methodological standards?
The scale
A through F
- A
Strong evidence — Consistent, high-quality evidence of a meaningful effect.
Multiple well-powered RCTs and/or a robust meta-analysis, independently replicated, showing a consistent and practically meaningful effect that holds up in recent work.
- B
Moderate evidence — Good evidence, with some gaps or inconsistency.
At least one well-conducted RCT (or several smaller ones) with supportive replication, but limited by sample size, mixed results, or thin recent confirmation.
- C
Preliminary / mixed — Suggestive but unsettled — could go either way.
Early or conflicting trials, small samples, or reliance on non-randomized designs. A real effect is plausible but not established.
- D
Weak evidence — Little support; mostly indirect or low-quality data.
Evidence limited to open-label, observational, mechanistic, or animal studies, or human trials with serious methodological problems.
- E
Very weak / anecdotal — Essentially only theory or anecdote in humans.
Claims rest on mechanism, tradition, or anecdote, with no credible human outcome trials.
- F
Evidence against — Good evidence the effect does NOT occur (for this outcome/population).
Well-conducted studies show no effect, or the effect fails to replicate under rigorous testing.
One bar for everything
Supplements and practices, graded the same way
A breathing protocol and a supplement are held to the same evidentiary bar. The dimensions above apply equally to a randomized trial of a capsule and a randomized trial of a meditation program.