Literature Review

The research foundation.

Every capability in Hologram traces back to a specific body of peer-reviewed work on grading reform, learner modeling, and tutoring. This is the curated version of that reading list — organized by the three questions that matter most to a K-12 decision-maker.

Evidence is ranked strongest to weakest: meta-analysis › systematic review › randomized controlled trial › quasi-experimental › foundational / preprint. Our own efficacy data will come from the Fall 2026 pilot.

01 / Evidence

Standards-based grading

The research base for grading practices that report mastery by learning standard rather than rolled-up percentages.

RCT

The Impacts of a Standards-Based Grading System Emphasizing Formative Assessment, Feedback, and Re-Assessment

Lawrence, N., Krier, K., & Posner, M. · Journal of Research on Educational Effectiveness (PARLO cluster RCT, 29 schools) · 2023

A cluster randomized controlled trial testing the PARLO (Proficiency-based Assessment and Reassessment of Learning Outcomes) system in 9th-grade mathematics. Students in the PARLO condition scored 0.33 SD higher on end-of-course algebra and geometry tests than controls — roughly 36–45% of a year of learning. Hologram's standards-level grading and re-assessment loop is built on this same intervention pattern.

Read source (opens in new tab)

Meta-analysis / synthesis

Formative Assessment: A Meta-Analysis and a Call for Research

Kingston, N., & Nash, B. · Educational Measurement: Issues and Practice, 30(4), 28–37 · 2011

The most-cited independent meta-analysis of K-12 formative assessment. Across 42 independent effect sizes the median observed effect was 0.25 SD, with ELA outperforming math. Important as a reality check on often-inflated Black-&-Wiliam–era claims. Hologram’s design is calibrated to the rigorous end of this evidence base — feedback, re-assessment, and mastery tracking, not vibes.

Read source (opens in new tab)

Scholarly book

Grading With Integrity: A Research-Based Approach

Guskey, T. R. · Corwin Press · 2024

Guskey is the most-cited living scholar on classroom grading reform. His recent Corwin work frames the “knowing-doing gap” in standards-based grading: teachers believe in the approach, but lack tools that make it practical at scale. Hologram is built to close exactly that gap.

Read source (opens in new tab)

Policy data

Competency Education State by State

KnowledgeWorks · KnowledgeWorks.org · 2024

Tracks every U.S. state's posture on competency- and mastery-based education. Dozens of states have active SBG-supportive policies, with a clear directional trend away from seat-time and toward mastery. This is the policy tailwind behind our go-to-market timing.

Read source (opens in new tab)

02 / Method

Bayesian Knowledge Tracing

Thirty years of peer-reviewed work on probabilistic models of per-skill mastery — the technique Hologram uses to score every interaction.

Foundational

Knowledge tracing: Modeling the acquisition of procedural knowledge

Corbett, A. T., & Anderson, J. R. · User Modeling and User-Adapted Interaction, 4(4), 253–278 · 1994

The canonical Bayesian Knowledge Tracing paper. Introduces the four-parameter model (prior, learn rate, guess, slip) that Hologram still uses at its core to estimate per-skill mastery probabilities from observed student work. Thirty years of replication stand behind this model.

Read source (opens in new tab)

Systematic review

Twenty-five years of Bayesian knowledge tracing: a systematic review

Pelánek, R., et al. · User Modeling and User-Adapted Interaction, 34, 1127–1173 · 2024

The most recent comprehensive synthesis of BKT research. Confirms BKT remains competitive with deep learning approaches on real K-12 data, is more interpretable, and requires dramatically less data to bootstrap — three properties that matter in a privacy-constrained K-12 setting.

Read source (opens in new tab)

Foundational

Deep Knowledge Tracing

Piech, C., Spencer, J., Huang, J., Ganguli, S., Sahami, M., Guibas, L., & Sohl-Dickstein, J. · Advances in Neural Information Processing Systems (NeurIPS) · 2015

Showed that recurrent neural networks can outperform classical BKT on some benchmarks. Later replications (Khajah et al. 2016) found the gap shrinks considerably once BKT is tuned and regularized. Hologram uses BKT as the interpretable default and reserves DKT-class methods for research, not student-facing decisions.

Read source (opens in new tab)

03 / Pedagogy

Socratic tutoring

Empirical evidence for tutoring that asks guiding questions rather than delivering answers — including the newest AI-tutoring RCTs.

RCT

AI tutoring outperforms in-class active learning: an RCT introducing a novel research-based design

Kestin, G., et al. · Scientific Reports, 15, 17458 · 2025

A randomized trial in an authentic classroom setting where a Socratic, mastery-oriented AI tutor outperformed in-class active learning on posttest. Establishes that the Socratic design — not AI tutoring in general — is what produces the learning gain.

Read source (opens in new tab)

Preprint · RCT

Tutor CoPilot: A Human-AI Approach for Scaling Real-Time Expertise

Wang, R. E., et al. · Preprint · Stanford + Harvard · arXiv:2410.03017 · 2024

A preregistered RCT of 900 tutors and 1,800 K-12 students from under-served communities, in mathematics. Students paired with AI-assisted tutors were 4 percentage points more likely to master topics (p < 0.01), with a 9 p.p. gain for students of lower-rated tutors. Direct evidence that Socratic-aligned AI tutoring lifts real K-12 outcomes.

Read source (opens in new tab)

Preprint · RCT

Socratic AI in K-12 Science Classrooms: Effects on Critical Thinking, Motivation, and Self-Regulation

Kao, Grant, & Woltering · Preprint · Research Square · 2025

One of the first K-12 RCTs specifically testing Socratic AI dialogue against both control and non-Socratic AI conditions. The Socratic condition produced significantly greater gains in scientific argumentation, critical thinking, and metacognitive self-regulation. Preprint — strong signal, still awaiting peer review.

Read source (opens in new tab)

This evidence is the floor, not the ceiling.

Hologram’s own efficacy results will come from the Fall 2026 pilot, measured against the same evidence standards you see above. If you’d like to be part of that cohort or review the full research notes, get in touch.