AI Teaching Materials for K-12 Math

What to Build, and Why — Deep Product Research R1
March 2026 · Eric San

I. TL;DR + Key Metrics

Competitors Mapped
25+
5 categories
Formats Evaluated
6
text → video → scan
P0 Features
4
ship now
Generation Cost
<$100
full 774-question bank

The atomic skill graph + multi-format content generation + HK DSE alignment is genuinely unoccupied territory. Text explanations and practice variations are production-ready now. Video and audio are viable with review. The highest-ROI move is generating teaching materials — explanations, worked examples, hints — from the existing 774-question bank and competency maps at trivial cost ($10–100). Two user contexts: (1) student self-study — materials shown after wrong answers, (2) teacher preparation — materials teachers assign or reference. Math first, framework generalizes to other subjects.

II. What to Build — Feature Recommendations

PriorityFeatureUser DemandFeasibilityCompetitive PositionGate
P0 Skill-linked text explanations (EN + 中文) T1, T3, T4, T81,2,3,4 Production-ready 85–92% Missing — nobody links explanations to atomic skills Review pipeline
P0 Practice variations (5× content multiplier) S2, S3, S711,14 Production-ready 90%+ Differentiated — from verified seed bank Auto-verify
P0 Adaptive hints (skill-graph-aware) S1, S3, S48,13 Viable with review 80–85% Differentiated
P0 Error analysis explanations S2, S78,14 Viable with review 75–85% Missing — nobody generates from misconception data Human review
P1 Audio lessons (EN first, then Cantonese) S6, T4 Production-ready (EN) / Viable (Cantonese) Missing for math Azure TTS for Cantonese
P1 Teacher prep materials from skill graph T1, T3, T6, T717 Viable with review 85%+ Missing — MagicSchool uses prompts, not skill graphs
P2 Avatar intro videos (per topic) S6 Viable with review 75–85% Consensus HeyGen API
P2 Manim-style animated explanations Accuracy-gated 60–75% Differentiated Math-To-Manim maturity
P3 Handwriting scan grading T9, S5 Accuracy-gated 50–77% Consensus >85% VLM accuracy
P3 Interactive diagram manipulation Accuracy-gated 60–70% Missing Engineering cost

III. Why These Features — User Pain Points

Teacher Pain Points

T1: Time consumed by worksheet & lesson prep. Median 5 hrs/week planning, 5 hrs/week grading. AI tools save an average of ~5.9 hrs/week across lesson planning, worksheet creation, and assessment grading.1,12

T2: AI-generated math content is wrong ~33% of the time. LLMs produce mathematically incorrect solutions at alarming rates. Consistency across repeated attempts is only 73% — the same prompt gives different answers 27% of the time.8

T3: HK teachers work 51–61+ hrs/week, happiness at decade low (4.33/10). 85% report excessive pressure. Burnout is structural, not seasonal.2,3

T4: Can't personalize for mixed-ability classrooms. 56–59% of teachers cite differentiation as a top-5 stressor. One worksheet per class is the norm because creating multiple versions takes too long.3

T5: EdTech tools get abandoned. 31% of teachers stop using AI tools when trials end. 43% are unsure the tools address their actual needs. Adoption is fragile — tools must prove value within the trial window or die.9,10

T6: Training gap. Only 20% of teachers received "good or excellent" AI training. Most are self-taught or untrained, leading to shallow usage patterns.5,6

T7: Professional judgment erosion. Teachers worry AI undermines the reflective, pedagogical work that makes teaching a profession — not just lesson delivery.7

T8: Bilingual material creation is double the work. HK instruction requires English + Traditional Chinese. No single AI tool handles bilingual math content natively — teachers create materials twice or use awkward translation.

T9: AI can't detect specific student misconceptions. State-of-the-art LLMs fail at identifying the specific error a student made. They can tell you the answer is wrong, but not why the student got it wrong or which misconception is at play.13

Student Pain Points

S1: AI gives answers, not understanding. Students bypass productive struggle entirely. The tool becomes a shortcut engine instead of a learning engine.8

S2: Persistent misconceptions don't self-correct. Systematic errors repeat without targeted feedback. A student who misapplies the distributive property will keep misapplying it across 50 practice problems unless the misconception is explicitly addressed.14

S3: No feedback loop after wrong answers. In self-study mode, getting a question wrong is a dead end. No explanation, no hint, no related practice — just "incorrect."

S4: Going too fast. Students overestimate comprehension without checkpoints. Adaptive pacing requires the system to know what they actually understand, not just what they've seen.

S5: AI answers are inconsistent and sometimes wrong. 27% inconsistency rate when the same question is asked twice. Students lose trust, and rightly so — an unreliable tutor is worse than no tutor.8

S6: Exam-driven anxiety without mastery. 50%+ DSE students report highest stress levels. The system optimizes for exam scores, not for understanding — creating anxiety without competence.15,16

S7: No bridge between "wrong" and "understood." CMU research shows erroneous examples (showing a common mistake and asking students to find the error) outperform correct worked examples for learning. Nobody generates these systematically.14

IV. Market & Local Context

EDB HK$500M AI in Education Fund Schools receive HK$500K each over 3 years (through 2027/28) to adopt AI tools. This is a direct procurement channel — schools have budget earmarked specifically for AI teaching tools. The fund covers procurement, training, and integration.17
Structural context: Hong Kong's education market
  • 72% of students attend private tutoring — HK$10B+ industry annually18
  • Exam-driven: DSE → university, only 37% of JUPAS applicants are placed16
  • Bilingual instruction required (EN + Traditional Chinese) — every material needs both
  • Paper-based assessment culture — tools must output printable materials to see adoption
  • Teacher burnout at crisis level — 81% work 51+ hrs/week2

The intersection is clear: schools have budget (EDB fund), teachers need time back (burnout), students need better feedback (dead-end wrong answers), and the entire market is bilingual and exam-focused. Tools that ignore any of these realities won't survive in HK.

V. Competitive Landscape

A. Pure AI Content Generators

PlayerHQKey FeaturesHK Relevance
Photomath (Google)USCamera scan → step-by-step solutions, 300M+ downloadsUsed by students, no curriculum alignment
Mathway (Chegg)USMulti-domain solver, subscription modelGeneric — no DSE alignment
Mathos YC W24USAI math tutor, step-by-step hints, real-time feedbackNew entrant, no HK presence
VideoTutorUSAI-generated video explanations from worksheetsEnglish only, no Chinese math notation
SciMigoUSAI STEM tutor with visual reasoningNo HK curriculum awareness

B. Adaptive Learning Platforms

PlayerHQKey FeaturesHK Relevance
Khan Academy / KhanmigoUSFree video library + GPT-4 tutor, Socratic methodNo DSE curriculum; English only
IXLUSDiagnostic + skill practice, 9K+ skills mappedUsed in some intl schools
DreamBoxUSK-8 adaptive math, game-based learningNot present in HK
ALEKS (McGraw-Hill)USKnowledge-space-theory adaptive, diagnostic assessmentUniversity-level usage only
Century TechUKAI-powered adaptive learning, teacher dashboardsInternational schools only
Squirrel AI (松鼠AI)China10K+ knowledge points, nano-level adaptive learningMainland China focus; Mandarin only
Mindspark (EI)IndiaRCT-proven adaptive math, misconception targetingIndia only — but pedagogy is reference-worthy

C. Teacher Productivity Tools

PlayerHQKey FeaturesHK Relevance
MagicSchool ($919M val)US3M+ teachers, 60+ AI tools, worksheet/rubric/plan genUS curriculum; prompt-based, no skill graph
DiffitUSReading-level adapted materials, auto-differentiationReading-focused, not math
CuripodNorwayAI slides + interactive activitiesNo math specialization
EduaideUS100+ content templates, standards-alignedUS standards only
BriskUSChrome extension, feedback/grading inside Google DocsGeneric — no math-specific features

D. Video & Audio Generation

PlayerHQKey FeaturesHK Relevance
NotebookLM (Google)USStudy-session audio from documents, conversational formatEnglish only; no math specialization yet
NumeradeUSAI video solutions, step-by-step, pivoting to AI tutorUS curriculum; financial distress
Synthesia / HeyGenUK / USAvatar video generation, multilingualGeneric — usable for intros/wrappers
Manim (open source)Programmatic math animations (3Blue1Brown engine)High potential; requires engineering

E. HK / China Players

PlayerHQKey FeaturesHK Relevance
dsemath.aiHKDSE past-paper AI tutor, step-by-step, freeDirect competitor — DSE-aligned, but chat-only
SmartQuestHKAI-assisted learning, 80+ schools, school-facingActive in HK schools; limited public data
Snapask (now Toppan)HK150K HK users, live tutor matching, Q&APivoted away from pure tutoring
AfterSchoolHKDSE past papers, mock exams, video lessonsContent library, no AI generation
TAL / Xueersi (学而思)China1.6B question bank, post-crackdown pivot to AIMainland China; Mandarin + Simplified only
Yuanfudao (猿辅导)ChinaAI adaptive practice, massive user baseMainland only
Zuoyebang (作业帮)ChinaHomework helper, camera scan, 800M+ usersMainland only; no Traditional Chinese

Feature Taxonomy

FeatureClassificationWho Has It
Adaptive difficulty adjustmentBaselineIXL, ALEKS, DreamBox, Squirrel AI
AI tutoring chatBaselineKhanmigo, Mathos, dsemath.ai
Worksheet / quiz generationBaselineMagicSchool, Eduaide, Brisk
Step-by-step solutionsConsensusPhotomath, Mathway, Numerade
Video explanationsConsensusKhan Academy, Numerade, VideoTutor
Avatar video lessonsConsensusSynthesia, HeyGen (generic)
HK DSE curriculum alignmentMissingdsemath.ai (partial, chat-only)
Skill-graph-linked content genMissingNobody
Audio / podcast for mathMissingNobody (NotebookLM is generic)
Teacher prep from skill graphMissingNobody
Error analysis from misconception dataMissingNobody (Mindspark closest, India only)
Bilingual atomic content (EN + 中文)MissingNobody

Notable Failures

EdTech graveyard — patterns from 67 analyzed failures
  • Byju's: $22B valuation → effectively zero. Hyper-growth without unit economics.
  • AllHere: Bankruptcy + fraud investigation. AI chatbot for K-12 attendance.
  • 2U: $7.6B market cap → bankruptcy. Online degree programs couldn't compete.
  • Vedantu: $1B+ valuation → 40% layoffs. Live tutoring at scale didn't work.
  • Numerade: Pivoting under financial pressure. Video-first math → AI tutor.
  • Pattern: competition killed 43.3%, average lifespan 6.5 years, $17.1B total capital burned.

VI. Technical Feasibility

FormatFeasibilityAccuracyCost / UnitBest ToolShip Now?
Text explanations Production 85–92% $0.01/unit Claude 3.5 Sonnet YES
Practice variations Production 90%+ $0.005–0.02/var GPT-4o-mini + SymPy YES — highest ROI
Adaptive hints Production 80–85% $0.01–0.03/set Claude 3.5 Sonnet YES
Audio lessons (EN) Production 90%+ $0.15–0.30/5min Podcastfy + ElevenLabs YES
Audio lessons (Cantonese) Viable 80–85% $0.02–0.05/5min Azure Neural TTS With QA
Avatar video Viable 75–85% $3–9/min HeyGen API Intros only
Manim animation Gated 60–75% $0.05/scene + iteration Math-To-Manim Heavy iteration
Handwriting grading Gated 50–77% $0.07–0.17/worksheet Mathpix + GPT-4o Human-in-loop only
Cost is not a constraint. Generating text explanations + worked examples + error analysis + practice variations for the entire 774-question bank costs $10–100 total. Even with 5× variation multiplier and bilingual output, the full bank comes in under $500. The decision is about accuracy and review workflow, not budget.

VII. Critical Assessment of P0 Choices

P0-A: Skill-Linked Text Explanations

Stress Tests

  • Source test: Demand from teacher surveys (EdWeek, HKFEW) — real but US-heavy. HK evidence is structural (bilingual burden) not survey-based.
  • Feasibility test: 85–92% accuracy is current-gen LLM data (Claude 3.5, GPT-4). Holds as of March 2026.
  • Counter-example: MagicSchool generates explanations from prompts — but prompt-based generation has no curriculum structure. Quality varies wildly per teacher.

Why It Holds

  • Adoption test: Explanations attach to existing question bank — zero new workflow for teachers. Students see them after wrong answers — zero friction.
  • Differentiation: Skill-graph-linked (not generic) + bilingual (not translated) + curriculum-aligned (DSE/P6). Nobody else combines all three.
  • Reformed position: HOLD. Highest signal-to-noise, lowest risk, immediate value.

P0-B: Practice Variations

Stress Tests

  • Source test: Student demand is inferred (misconception persistence research, not direct survey). But pedagogical evidence is strong — spaced practice with variation is proven.
  • Feasibility test: 90%+ accuracy with SymPy verification. Strongest feasibility of any P0.
  • Counter-example: IXL has 9K+ skills with auto-generated practice. But IXL's items aren't from a verified seed bank — they're template-generated without human review.

Why It Holds

  • Adoption test: Teachers already need more practice material. 5× multiplier from a trusted bank is immediate value.
  • Differentiation: Variations from verified seed questions. Each variation inherits the skill tags, difficulty rating, and trap classification of its parent.
  • Reformed position: HOLD. Highest ROI feature. Generates itself and verifies itself.

P0-C: Adaptive Hints

Stress Tests

  • Source test: "AI gives answers, not understanding" is a consistent complaint in Reddit threads, teacher surveys, and pedagogical research. Real.
  • Feasibility test: 80–85% accuracy is lower than text explanations. Hints require understanding where the student is stuck — harder than explaining a concept.
  • Counter-example: Khanmigo does Socratic hints. But Khanmigo's hints are generic (no skill graph context) and US-curriculum only.

Why It Holds

  • Adoption test: Hints are the "wrong answer" response. Without them, the self-study product has a dead end at every wrong answer.
  • Differentiation: Skill-graph-aware hints know which prerequisite skill is missing, not just that the answer is wrong.
  • Reformed position: HOLD. Table-stakes for self-study use case. Without hints, there's no self-study product.

P0-D: Error Analysis Explanations

Stress Tests

  • Source test: CMU erroneous-examples research is strong but from 2014. The specific application (auto-generating error explanations from misconception data) is novel — no precedent to cite.
  • Feasibility test: 75–85% accuracy is the lowest P0. "Why did the student get this wrong" is genuinely hard for LLMs.13
  • Counter-example: No one has built this at scale and failed — but no one has built it at scale and succeeded either.

Why It Holds

  • Adoption test: Error analysis is the "what went wrong" content. Teachers already do this manually — the tool automates their hardest diagnostic work.
  • Differentiation: Essai has trap and misconception data baked into the question bank. This is not generic — it's generated from actual misconception patterns.
  • Reformed position: HOLD with gate. Human review required on every error analysis until accuracy reaches 90%+. But the concept is too differentiated to cut.

VIII. Creative Differentiators

What only Essai can build — because it already has the assets.

1. "Skill → multi-format content" pipeline. The competency map drives generation. Each skill node in the graph produces text explanations, practice variations, hints, error analysis, and eventually audio — all from the same structured source. Nobody else has the graph. Without it, you're generating from prompts and hoping for consistency.

2. "Wrong → understand" bridge. Use error analysis + erroneous examples (CMU research14) to generate "what went wrong" content specific to the misconception pattern. Show the student a common mistake, ask them to find the error, then explain. This is pedagogically proven and nobody generates it systematically.

3. Bilingual atomic content. Natively EN + 中文, not translated. Every skill node has both language versions generated independently, ensuring mathematical terminology and notation are correct in each language — not machine-translated artifacts.

4. Teacher + student from same graph. Teacher prep materials and student self-study content are generated from the same skill nodes. When a teacher assigns a topic, the student's self-study content, hints, and error analysis are guaranteed to be aligned with what the teacher is covering.

5. NotebookLM-style "study session" audio. Topic-focused conversational audio generated from the skill graph — two AI voices discussing a math concept, walking through examples, and highlighting common mistakes. Nobody does this for math. NotebookLM is generic; this would be curriculum-aware and topic-scoped.

IX. Verdict

Build the text-based pipeline first — explanations, worked examples, error analysis, and practice variations from the existing 774-question competency map. This is production-ready, costs under $100 for the entire bank, and directly feeds both student self-study and teacher preparation.

Add audio lessons as a P1 differentiator. Defer video and handwriting until accuracy improves.

The moat is the atomic skill graph, not any single format — the graph makes every format better, cheaper, and more curriculum-aligned than generic AI output.

X. Open Questions

XI. References

[1] EdWeek Research Center. "How Teachers Use AI and What They Need." 2022. US nationally representative survey of K-12 teachers. 5 hrs/week planning, 5 hrs/week grading. AI saves ~5.9 hrs/week.
[2] Hong Kong Federation of Education Workers (HKFEW). "Teacher Workload Survey." 2022. HK primary + secondary. 81% work 51+ hrs/week. Happiness score: 4.33/10 (decade low).
[3] Education University of Hong Kong. "Teacher Well-Being and Stress Survey." 2025. n=851 HK teachers. 85% report excessive pressure. 56–59% cite differentiation as top-5 stressor.
[4] EdWeek Research Center. "Teachers and AI: What's Working and What Isn't." 2025. US nationally representative. Updates 2022 baseline with AI adoption data.
[5] RAND Corporation, American Teacher Panel. "AI Adoption and Training in Schools." 2024. US teachers. Only 20% received good or excellent AI training.
[6] UC Berkeley / Hechinger Report. "The State of AI Training for Educators." 2025. Qualitative research on teacher preparedness. Most self-taught or untrained.
[7] EdWeek. "Teachers Worry AI Undermines Professional Judgment." 2026. National survey. Concern that AI reduces teaching to content delivery.
[8] arXiv [2509.01395]. "Evaluating LLM Mathematical Reasoning in Educational Contexts." 2025. LLMs wrong ~33% on math content. 73% consistency (27% inconsistency across repeated prompts).
[9] eSpark. "Teacher Perspectives on AI in the Classroom." 2024. n≈600 US teachers. 31% stop using AI tools when trials end.
[10] Post-BETT Survey / The Learning Agency. "EdTech Adoption Barriers." 2025–2026. 43% of teachers unsure AI tools address actual needs.
[11] Adaptive Learning Meta-Analysis (Springer/SAGE). "Effectiveness of Adaptive Learning Systems." 2024. Meta-analysis confirming adaptive practice + spaced variation improves learning outcomes 0.3–0.5 SD.
[12] NotieAI / AI Worksheet Guide. "AI-Generated Teaching Materials Accuracy Report." 2025. AI-generated worksheets save ~5.9 hrs/week. Accuracy ranges 85–92% for text, lower for diagrams.
[13] arXiv [2509.01395]. "LLM Failure Modes in Student Error Detection." 2025. State-of-the-art LLMs fail at identifying specific student misconceptions. Can detect "wrong" but not "why wrong."
[14] Carnegie Mellon University / CHB Study. "Erroneous Examples Improve Learning." 2014. Erroneous examples outperform correct worked examples for learning. Students learn more from finding and explaining mistakes.
[15] South China Morning Post / DSE Survey Data. "Student Stress and DSE." 2016, 2025. 50%+ DSE students at highest stress levels. Exam-driven anxiety correlated with tutoring dependency.
[16] JUPAS / University Admissions Data. "DSE to University Placement Rates." 2025. Only 37% of JUPAS applicants placed. High-stakes filtering creates extreme exam pressure.
[17] Education Bureau (EDB) / Asia Education Review. "HK$500M AI in Education Fund." 2025. Schools receive HK$500K each over 3 years through 2027/28 for AI adoption.
[18] HKFYG / PRO360. "Private Tutoring Market Size." 2023–2026. 72% of HK students attend private tutoring. Industry >HK$10B annually.
[19] SCMP / HK Education Data. "Teacher Retention and Burnout." 2025. Teacher turnover increasing. 61+ hrs/week common in aided schools.
[20] CB Insights / EdTech Failure Analysis. "67 EdTech Post-Mortems." 2024. Competition killed 43.3%. Average lifespan 6.5 years. $17.1B total capital burned across sample.