The most important number in AI in education right now is negative. In a 2024 randomized controlled trial by Bastani et al. At Wharton, students given unguided ChatGPT access scored 17% worse on exams taken without AI than students who never used it.
Students given the same model wrapped in tutoring guardrails improved roughly 127% on practice problems.
That single split explains most of what happened in 2024 through 2026: South Korea pausing the world's largest AI textbook rollout, the EU classifying educational AI as high-risk, and parents in Santa Barbara organizing under the banner "Pencils, Not Pixels."
TL;DR: AI tutoring produces large, real learning gains when it's pedagogically scaffolded, and measurable harm when it isn't. Meanwhile, the governance layer (the EU AI Act, FTC enforcement, child-rights frameworks) is hardening fast, and South Korea's August 2024 rollback of its AI Digital Textbook shows that even well-funded national deployments stall without teacher and parent buy-in. The deciding variable is design and governance, since the underlying models are now broadly similar.
Here is the quotable version: in education, the AI model is a commodity; the pedagogy wrapped around it is the product.
Key takeaways
- Guardrailed AI tutoring delivered ~127% practice gains in the Wharton RCT; unguided access produced -17% on no-AI transfer exams.
- South Korea deferred its AI Digital Textbook (AIDT) rollout on 4 August 2024 after union and parent opposition; only 32.4% of schools opted into the 2025 pilot.
- The EU AI Act treats admission, grading, and cheating-detection systems as high-risk, and has banned emotion recognition in classrooms since 2 February 2025.
- School surveillance and proctoring tools (Bark, Gaggle, Proctorio) are a mature market with almost no independent efficacy research behind them.
- Bias findings reproduce across systems: commercial speech recognition shows ~35% word-error rates for African American English speakers versus ~19% for white speakers, per Koenecke et al. (PNAS 2020).
What does the evidence say about personalized learning?
Personalized learning works, with two big conditions: the gains concentrate among low prior-achievers, and the tool must constrain how students use it.
The market context first. HolonIQ projects global education spending will reach $10 trillion by 2030, and the AI sub-market is growing at a forecast 35.1% CAGR according to Market.us. Platforms like Khan Academy's Khanmigo, Carnegie Learning's MATHia, and McGraw Hill's ALEKS anchor the commercial landscape, using techniques from Bayesian Knowledge Tracing to LLM-driven tutoring.
But vendor claims and independent evidence are different things. A 2025 systematic review in Discover Education finds measurable but uneven effect sizes, strongest in mathematics and among students starting furthest behind. No leading K-12 platform has an independent RCT matching the rigor of the Wharton study.
And the Wharton numbers deserve to be stared at:
The unguided group looked productive during practice and then performed worse than the control group when the AI was taken away. Students had outsourced the thinking. Any school evaluating an AI tutor should treat "what happens when the tool is removed" as the primary metric, because practice-session gains can mask exactly this failure mode.
How is AI surveillance in schools actually deployed?
AI surveillance in schools splits into two product categories: always-on communication monitoring in K-12, and webcam-based exam proctoring in higher ed. Both are commercially mature and empirically under-evaluated.
On the monitoring side, products like Bark, Gaggle, Securly, and GoGuardian scan student messages, email, and browsing on school accounts, generating alerts for indicators of bullying, self-harm, or predation. The pitch is child safety.
The gap is that no peer-reviewed study in the public record measures these systems' true-positive and false-positive rates against a gold standard.
There's also a quieter cost. Privacy advocates point to a chilling effect documented in workplace monitoring research: students who know an algorithm reads their messages write differently, even when no alert ever fires.
Proctoring has drawn sharper criticism. Proctorio, the category leader, received the 2021 BigBrotherAwards in education, with the laudation cataloging concerns about continuous webcam monitoring and biometric data collection. Documented disputes at multiple universities involve systems flagging eye movements, fidgeting, and reading aloud, behaviors common among students with ADHD, autism, or anxiety.
The practical workaround exists, though. Universities like Oregon State publish deployment guidance that pairs proctoring with human review of every flag. If your institution uses these tools, mandatory human adjudication before any academic-integrity action is the minimum defensible configuration.
What happened with South Korea's AI Digital Textbook?
South Korea ran the world's most ambitious national AI classroom deployment, then paused it. That sequence is the most instructive case study in AI governance in schools to date.
The AI Digital Textbook (AIDT) program, led by KERIS and the Ministry of Education, is a cloud platform that personalizes problem sets, tracks learning analytics, and feeds teacher dashboards. The 2025 pilot covered English, math, and computer science across selected grades, with 76 approved AIDT products. Of 11,932 schools, 3,870 (32.4%) opted in.
Then came the reversal. On 4 August 2024, the Ministry of Education announced it was deferring full national deployment. The Korean Teachers and Education Workers' Union had raised concerns about workload, screen time, and algorithmic instruction substituting for teacher judgment.
Parents objected to the granular learning-behavior data AIDT collects on minors, data governed by Korea's Personal Information Protection Act, which requires parental consent for children under 14 and carries penalties up to 3% of relevant revenue.
The lesson generalizes. South Korea has centralized education governance, a strong EdTech industry, and decades of digital-education policy going back to its 2011 SMART Education initiative. If a top-down AI mandate can stall there, it can stall anywhere teachers and parents aren't convinced.
A 32.4% opt-in rate was a signal, and the government read it.
How does Serbia's approach differ from Korea's?
Serbia chose curriculum over platform. Its National AI Strategy 2020-2025, adopted in December 2019 with UNDP and World Bank support, names education as one of four priority sectors.
The flagship program, AI4Youth, is run by the non-profit Petlja Foundation with the Ministry of Education, UNDP, UNICEF, and UNESCO, and teaches secondary students AI concepts, machine learning, and Python.
| Dimension | South Korea (AIDT) | Serbia (AI4Youth) |
|---|---|---|
| Model | Top-down national platform | Donor-supported curriculum program |
| Lead actor | KERIS (government agency) | Petlja Foundation (NGO) + Ministry |
| Target | Instructional core (AI textbooks) | AI literacy and teacher training |
| Scale signal | 3,870 schools (32.4%) in 2025 pilot | Multiple secondary schools, cohort-based |
| Status | Deferred since 4 Aug 2024 | Proceeding without major controversy |
The trade-off is honest in both directions. Serbia's approach generates no surveillance controversy and builds durable human capacity, but it has no rigorous impact evaluation and touches far fewer students per year.
Korea's approach could move learning outcomes at national scale, and is currently parked. For ministries watching both, the synthesis is to sequence them: teacher capacity and AI literacy first, platform deployment second.
What does ethical AI in education require now?
Ethical AI in education stopped being a discussion topic and became a compliance regime between 2023 and 2025. Four layers now interlock.
Hard law. The EU AI Act classifies educational admission, learning-outcome evaluation, test scoring, and cheating detection as high-risk under Annex III. Its Article 5 prohibition on emotion recognition in education has been in force since 2 February 2025. A Digital Omnibus agreement in May 2026 pushed the high-risk obligations to 2 December 2027, but the prohibitions stand.
Enforcement. The FTC's Edmodo order (May 2023, $6 million penalty suspended to $650,000) was the first to prohibit an EdTech company from demanding more student data than an educational service reasonably needs, and from outsourcing COPPA compliance to school districts. In the US, FERPA covers school-held records, but the FTC is the regulator vendors now fear.
Soft law. UNESCO's 2023 guidance on generative AI recommends a minimum age of 13 for GenAI interaction and frames the technology strictly as a support tool. The UN Committee on the Rights of the Child's General Comment No. 25 requires states to protect children from surveillance and arbitrary data collection. Age thresholds worldwide are converging on 13-14: COPPA at 13, GDPR Article 8 at 13-16, Korea at 14.
Bias evidence. This layer is reproducible, which makes it actionable. Koenecke et al. (PNAS 2020) found commercial speech recognition systems hit roughly 35% word-error rates for African American English speakers versus 19% for white speakers, a direct problem for speech-based tutoring. NIST's 2019 vendor test found face-matching false positives 10 to 100 times higher for Black and Asian faces. Švábenský et al. (EDM 2024) extended this to learning analytics, documenting regional bias in models predicting Filipino students' performance across 48.7 million Canvas log records.
What this means for you
If you run a district, school, or EdTech procurement process, the evidence supports a short list of moves you can make now.
Buy scaffolding, not chatbots. Require vendors to show how their tool constrains student interaction toward productive struggle. The Wharton RCT is your reference: open-ended access to a frontier model is the configuration with documented negative effects.
Treat surveillance and tutoring as separate decisions. They arrive bundled in platform contracts, but their evidence bases and legal exposure differ completely. McKinsey's K-12 teacher analysis estimates 20-30% of teacher task time is automatable; that productivity case stands on its own without any monitoring component.
Map your tools to Annex III now. If you operate in or sell into the EU, anything touching grading, admission, or cheating detection carries high-risk obligations from December 2027, and the emotion-recognition ban already applies.
Budget for teachers, not just licenses. Every credible source, from the Springer systematic review to McKinsey, identifies teacher professional development as the binding constraint. Korea's rollback and the Santa Barbara parent revolt both trace back to deployments that outran the people expected to live with them. New York City's path, from banning ChatGPT in January 2023 to a permissive-with-guardrails policy backed by teacher training, is the pattern that has held up.
The technology cleared its proof-of-concept bar in 2024. Whether it narrows or widens educational gaps is now a governance question, and the jurisdictions answering it deliberately are the ones worth copying.
Sources
- Generative AI Without Guardrails Can Harm Learning (Bastani et al., SSRN)
- AI in adaptive education: systematic review (Discover Education, Springer)
- Khan Academy partnership (OpenAI)
- EU AI Act Annex III (AI Act Service Desk)
- FTC v. Edmodo press release (FTC)
- UNESCO Guidance for Generative AI in Education and Research
- UN CRC General Comment No. 25 (2021)
- GDPR Article 8
- Korean PIPA legal text (Prighter)
- BigBrotherAwards 2021: Proctorio
- How AI will impact K-12 teachers (McKinsey)
- Evaluating algorithmic bias in academic prediction models (Švábenský et al., EDM 2024)
- AI Watch: National AI strategies, European perspective (JRC)
- "Pencils, Not Pixels" parent campaign (Santa Barbara Independent)
- New York's AI education policy landscape (PedagogyFutures)
