AI Teaching Materials for K-12 Math

I. TL;DR + Key Metrics

Competitors Mapped

25+

5 categories

Formats Evaluated

text → video → scan

P0 Features

ship now

Generation Cost

<$100

full 774-question bank

The atomic skill graph + multi-format content generation + HK DSE alignment is genuinely unoccupied territory. Text explanations and practice variations are production-ready now. Video and audio are viable with review. The highest-ROI move is generating teaching materials — explanations, worked examples, hints — from the existing 774-question bank and competency maps at trivial cost ($10–100). Two user contexts: (1) student self-study — materials shown after wrong answers, (2) teacher preparation — materials teachers assign or reference. Math first, framework generalizes to other subjects.

II. What to Build — Feature Recommendations

Priority	Feature	User Demand	Feasibility	Competitive Position	Gate
P0	Skill-linked text explanations (EN + 中文)	T1, T3, T4, T8^1,2,3,4	Production-ready 85–92%	Missing — nobody links explanations to atomic skills	Review pipeline
P0	Practice variations (5× content multiplier)	S2, S3, S7^11,14	Production-ready 90%+	Differentiated — from verified seed bank	Auto-verify
P0	Adaptive hints (skill-graph-aware)	S1, S3, S4^8,13	Viable with review 80–85%	Differentiated	—
P0	Error analysis explanations	S2, S7^8,14	Viable with review 75–85%	Missing — nobody generates from misconception data	Human review
P1	Audio lessons (EN first, then Cantonese)	S6, T4	Production-ready (EN) / Viable (Cantonese)	Missing for math	Azure TTS for Cantonese
P1	Teacher prep materials from skill graph	T1, T3, T6, T7^1–7	Viable with review 85%+	Missing — MagicSchool uses prompts, not skill graphs	—
P2	Avatar intro videos (per topic)	S6	Viable with review 75–85%	Consensus	HeyGen API
P2	Manim-style animated explanations	—	Accuracy-gated 60–75%	Differentiated	Math-To-Manim maturity
P3	Handwriting scan grading	T9, S5	Accuracy-gated 50–77%	Consensus	>85% VLM accuracy
P3	Interactive diagram manipulation	—	Accuracy-gated 60–70%	Missing	Engineering cost

III. Why These Features — User Pain Points

Teacher Pain Points

T1: Time consumed by worksheet & lesson prep. Median 5 hrs/week planning, 5 hrs/week grading. AI tools save an average of ~5.9 hrs/week across lesson planning, worksheet creation, and assessment grading.^1,12

T2: AI-generated math content is wrong ~33% of the time. LLMs produce mathematically incorrect solutions at alarming rates. Consistency across repeated attempts is only 73% — the same prompt gives different answers 27% of the time.⁸

T3: HK teachers work 51–61+ hrs/week, happiness at decade low (4.33/10). 85% report excessive pressure. Burnout is structural, not seasonal.^2,3

T4: Can't personalize for mixed-ability classrooms. 56–59% of teachers cite differentiation as a top-5 stressor. One worksheet per class is the norm because creating multiple versions takes too long.³

T5: EdTech tools get abandoned. 31% of teachers stop using AI tools when trials end. 43% are unsure the tools address their actual needs. Adoption is fragile — tools must prove value within the trial window or die.^9,10

T6: Training gap. Only 20% of teachers received "good or excellent" AI training. Most are self-taught or untrained, leading to shallow usage patterns.^5,6

T7: Professional judgment erosion. Teachers worry AI undermines the reflective, pedagogical work that makes teaching a profession — not just lesson delivery.⁷

T8: Bilingual material creation is double the work. HK instruction requires English + Traditional Chinese. No single AI tool handles bilingual math content natively — teachers create materials twice or use awkward translation.

T9: AI can't detect specific student misconceptions. State-of-the-art LLMs fail at identifying the specific error a student made. They can tell you the answer is wrong, but not why the student got it wrong or which misconception is at play.¹³

Student Pain Points

S1: AI gives answers, not understanding. Students bypass productive struggle entirely. The tool becomes a shortcut engine instead of a learning engine.⁸

S2: Persistent misconceptions don't self-correct. Systematic errors repeat without targeted feedback. A student who misapplies the distributive property will keep misapplying it across 50 practice problems unless the misconception is explicitly addressed.¹⁴

S3: No feedback loop after wrong answers. In self-study mode, getting a question wrong is a dead end. No explanation, no hint, no related practice — just "incorrect."

S4: Going too fast. Students overestimate comprehension without checkpoints. Adaptive pacing requires the system to know what they actually understand, not just what they've seen.

S5: AI answers are inconsistent and sometimes wrong. 27% inconsistency rate when the same question is asked twice. Students lose trust, and rightly so — an unreliable tutor is worse than no tutor.⁸

S6: Exam-driven anxiety without mastery. 50%+ DSE students report highest stress levels. The system optimizes for exam scores, not for understanding — creating anxiety without competence.^15,16

S7: No bridge between "wrong" and "understood." CMU research shows erroneous examples (showing a common mistake and asking students to find the error) outperform correct worked examples for learning. Nobody generates these systematically.¹⁴

IV. Market & Local Context

EDB HK$500M AI in Education Fund Schools receive HK$500K each over 3 years (through 2027/28) to adopt AI tools. This is a direct procurement channel — schools have budget earmarked specifically for AI teaching tools. The fund covers procurement, training, and integration.¹⁷

Structural context: Hong Kong's education market

72% of students attend private tutoring — HK$10B+ industry annually¹⁸
Exam-driven: DSE → university, only 37% of JUPAS applicants are placed¹⁶
Bilingual instruction required (EN + Traditional Chinese) — every material needs both
Paper-based assessment culture — tools must output printable materials to see adoption
Teacher burnout at crisis level — 81% work 51+ hrs/week²

The intersection is clear: schools have budget (EDB fund), teachers need time back (burnout), students need better feedback (dead-end wrong answers), and the entire market is bilingual and exam-focused. Tools that ignore any of these realities won't survive in HK.

V. Competitive Landscape

A. Pure AI Content Generators

Player	HQ	Key Features	HK Relevance
Photomath (Google)	US	Camera scan → step-by-step solutions, 300M+ downloads	Used by students, no curriculum alignment
Mathway (Chegg)	US	Multi-domain solver, subscription model	Generic — no DSE alignment
Mathos YC W24	US	AI math tutor, step-by-step hints, real-time feedback	New entrant, no HK presence
VideoTutor	US	AI-generated video explanations from worksheets	English only, no Chinese math notation
SciMigo	US	AI STEM tutor with visual reasoning	No HK curriculum awareness

B. Adaptive Learning Platforms

Player	HQ	Key Features	HK Relevance
Khan Academy / Khanmigo	US	Free video library + GPT-4 tutor, Socratic method	No DSE curriculum; English only
IXL	US	Diagnostic + skill practice, 9K+ skills mapped	Used in some intl schools
DreamBox	US	K-8 adaptive math, game-based learning	Not present in HK
ALEKS (McGraw-Hill)	US	Knowledge-space-theory adaptive, diagnostic assessment	University-level usage only
Century Tech	UK	AI-powered adaptive learning, teacher dashboards	International schools only
Squirrel AI (松鼠AI)	China	10K+ knowledge points, nano-level adaptive learning	Mainland China focus; Mandarin only
Mindspark (EI)	India	RCT-proven adaptive math, misconception targeting	India only — but pedagogy is reference-worthy

C. Teacher Productivity Tools

Player	HQ	Key Features	HK Relevance
MagicSchool ($919M val)	US	3M+ teachers, 60+ AI tools, worksheet/rubric/plan gen	US curriculum; prompt-based, no skill graph
Diffit	US	Reading-level adapted materials, auto-differentiation	Reading-focused, not math
Curipod	Norway	AI slides + interactive activities	No math specialization
Eduaide	US	100+ content templates, standards-aligned	US standards only
Brisk	US	Chrome extension, feedback/grading inside Google Docs	Generic — no math-specific features

D. Video & Audio Generation

Player	HQ	Key Features	HK Relevance
NotebookLM (Google)	US	Study-session audio from documents, conversational format	English only; no math specialization yet
Numerade	US	AI video solutions, step-by-step, pivoting to AI tutor	US curriculum; financial distress
Synthesia / HeyGen	UK / US	Avatar video generation, multilingual	Generic — usable for intros/wrappers
Manim (open source)	—	Programmatic math animations (3Blue1Brown engine)	High potential; requires engineering

E. HK / China Players

Player	HQ	Key Features	HK Relevance
dsemath.ai	HK	DSE past-paper AI tutor, step-by-step, free	Direct competitor — DSE-aligned, but chat-only
SmartQuest	HK	AI-assisted learning, 80+ schools, school-facing	Active in HK schools; limited public data
Snapask (now Toppan)	HK	150K HK users, live tutor matching, Q&A	Pivoted away from pure tutoring
AfterSchool	HK	DSE past papers, mock exams, video lessons	Content library, no AI generation
TAL / Xueersi (学而思)	China	1.6B question bank, post-crackdown pivot to AI	Mainland China; Mandarin + Simplified only
Yuanfudao (猿辅导)	China	AI adaptive practice, massive user base	Mainland only
Zuoyebang (作业帮)	China	Homework helper, camera scan, 800M+ users	Mainland only; no Traditional Chinese

Feature Taxonomy

Feature	Classification	Who Has It
Adaptive difficulty adjustment	Baseline	IXL, ALEKS, DreamBox, Squirrel AI
AI tutoring chat	Baseline	Khanmigo, Mathos, dsemath.ai
Worksheet / quiz generation	Baseline	MagicSchool, Eduaide, Brisk
Step-by-step solutions	Consensus	Photomath, Mathway, Numerade
Video explanations	Consensus	Khan Academy, Numerade, VideoTutor
Avatar video lessons	Consensus	Synthesia, HeyGen (generic)
HK DSE curriculum alignment	Missing	dsemath.ai (partial, chat-only)
Skill-graph-linked content gen	Missing	Nobody
Audio / podcast for math	Missing	Nobody (NotebookLM is generic)
Teacher prep from skill graph	Missing	Nobody
Error analysis from misconception data	Missing	Nobody (Mindspark closest, India only)
Bilingual atomic content (EN + 中文)	Missing	Nobody

Notable Failures

EdTech graveyard — patterns from 67 analyzed failures

Byju's: $22B valuation → effectively zero. Hyper-growth without unit economics.
AllHere: Bankruptcy + fraud investigation. AI chatbot for K-12 attendance.
2U: $7.6B market cap → bankruptcy. Online degree programs couldn't compete.
Vedantu: $1B+ valuation → 40% layoffs. Live tutoring at scale didn't work.
Numerade: Pivoting under financial pressure. Video-first math → AI tutor.
Pattern: competition killed 43.3%, average lifespan 6.5 years, $17.1B total capital burned.

VI. Technical Feasibility

Format	Feasibility	Accuracy	Cost / Unit	Best Tool	Ship Now?
Text explanations	Production	85–92%	$0.01/unit	Claude 3.5 Sonnet	YES
Practice variations	Production	90%+	$0.005–0.02/var	GPT-4o-mini + SymPy	YES — highest ROI
Adaptive hints	Production	80–85%	$0.01–0.03/set	Claude 3.5 Sonnet	YES
Audio lessons (EN)	Production	90%+	$0.15–0.30/5min	Podcastfy + ElevenLabs	YES
Audio lessons (Cantonese)	Viable	80–85%	$0.02–0.05/5min	Azure Neural TTS	With QA
Avatar video	Viable	75–85%	$3–9/min	HeyGen API	Intros only
Manim animation	Gated	60–75%	$0.05/scene + iteration	Math-To-Manim	Heavy iteration
Handwriting grading	Gated	50–77%	$0.07–0.17/worksheet	Mathpix + GPT-4o	Human-in-loop only

Cost is not a constraint. Generating text explanations + worked examples + error analysis + practice variations for the entire 774-question bank costs $10–100 total. Even with 5× variation multiplier and bilingual output, the full bank comes in under $500. The decision is about accuracy and review workflow, not budget.

VII. Critical Assessment of P0 Choices

P0-A: Skill-Linked Text Explanations

Stress Tests

Source test: Demand from teacher surveys (EdWeek, HKFEW) — real but US-heavy. HK evidence is structural (bilingual burden) not survey-based.
Feasibility test: 85–92% accuracy is current-gen LLM data (Claude 3.5, GPT-4). Holds as of March 2026.
Counter-example: MagicSchool generates explanations from prompts — but prompt-based generation has no curriculum structure. Quality varies wildly per teacher.

Why It Holds

Adoption test: Explanations attach to existing question bank — zero new workflow for teachers. Students see them after wrong answers — zero friction.
Differentiation: Skill-graph-linked (not generic) + bilingual (not translated) + curriculum-aligned (DSE/P6). Nobody else combines all three.
Reformed position: HOLD. Highest signal-to-noise, lowest risk, immediate value.

P0-B: Practice Variations

Stress Tests

Source test: Student demand is inferred (misconception persistence research, not direct survey). But pedagogical evidence is strong — spaced practice with variation is proven.
Feasibility test: 90%+ accuracy with SymPy verification. Strongest feasibility of any P0.
Counter-example: IXL has 9K+ skills with auto-generated practice. But IXL's items aren't from a verified seed bank — they're template-generated without human review.

Why It Holds

Adoption test: Teachers already need more practice material. 5× multiplier from a trusted bank is immediate value.
Differentiation: Variations from verified seed questions. Each variation inherits the skill tags, difficulty rating, and trap classification of its parent.
Reformed position: HOLD. Highest ROI feature. Generates itself and verifies itself.

P0-C: Adaptive Hints

Stress Tests

Source test: "AI gives answers, not understanding" is a consistent complaint in Reddit threads, teacher surveys, and pedagogical research. Real.
Feasibility test: 80–85% accuracy is lower than text explanations. Hints require understanding where the student is stuck — harder than explaining a concept.
Counter-example: Khanmigo does Socratic hints. But Khanmigo's hints are generic (no skill graph context) and US-curriculum only.

Why It Holds

Adoption test: Hints are the "wrong answer" response. Without them, the self-study product has a dead end at every wrong answer.
Differentiation: Skill-graph-aware hints know which prerequisite skill is missing, not just that the answer is wrong.
Reformed position: HOLD. Table-stakes for self-study use case. Without hints, there's no self-study product.

P0-D: Error Analysis Explanations

Stress Tests

Source test: CMU erroneous-examples research is strong but from 2014. The specific application (auto-generating error explanations from misconception data) is novel — no precedent to cite.
Feasibility test: 75–85% accuracy is the lowest P0. "Why did the student get this wrong" is genuinely hard for LLMs.¹³
Counter-example: No one has built this at scale and failed — but no one has built it at scale and succeeded either.

Why It Holds

Adoption test: Error analysis is the "what went wrong" content. Teachers already do this manually — the tool automates their hardest diagnostic work.
Differentiation: Essai has trap and misconception data baked into the question bank. This is not generic — it's generated from actual misconception patterns.
Reformed position: HOLD with gate. Human review required on every error analysis until accuracy reaches 90%+. But the concept is too differentiated to cut.

VIII. Creative Differentiators

What only Essai can build — because it already has the assets.

1. "Skill → multi-format content" pipeline. The competency map drives generation. Each skill node in the graph produces text explanations, practice variations, hints, error analysis, and eventually audio — all from the same structured source. Nobody else has the graph. Without it, you're generating from prompts and hoping for consistency.

2. "Wrong → understand" bridge. Use error analysis + erroneous examples (CMU research¹⁴) to generate "what went wrong" content specific to the misconception pattern. Show the student a common mistake, ask them to find the error, then explain. This is pedagogically proven and nobody generates it systematically.

3. Bilingual atomic content. Natively EN + 中文, not translated. Every skill node has both language versions generated independently, ensuring mathematical terminology and notation are correct in each language — not machine-translated artifacts.

4. Teacher + student from same graph. Teacher prep materials and student self-study content are generated from the same skill nodes. When a teacher assigns a topic, the student's self-study content, hints, and error analysis are guaranteed to be aligned with what the teacher is covering.

5. NotebookLM-style "study session" audio. Topic-focused conversational audio generated from the skill graph — two AI voices discussing a math concept, walking through examples, and highlighting common mistakes. Nobody does this for math. NotebookLM is generic; this would be curriculum-aware and topic-scoped.

IX. Verdict

Build the text-based pipeline first — explanations, worked examples, error analysis, and practice variations from the existing 774-question competency map. This is production-ready, costs under $100 for the entire bank, and directly feeds both student self-study and teacher preparation.

Add audio lessons as a P1 differentiator. Defer video and handwriting until accuracy improves.

The moat is the atomic skill graph, not any single format — the graph makes every format better, cheaper, and more curriculum-aligned than generic AI output.

X. Open Questions

How does Leslie's AI tutor (GPT-4o chat) interact with pre-generated content? Complement or conflict?
Should audio lessons be Cantonese-first or English-first for the HK market?
What's the review workflow? Renee reviews all generated content, or sample-based QA?
NotebookLM Podcast API availability and pricing — worth pursuing or build with Podcastfy?
How to handle DSE curriculum changes (syllabus updates) — regeneration pipeline or manual refresh?
Paper printability: should generated materials have PDF export as a first-class feature?

XI. References

[1] EdWeek Research Center. "How Teachers Use AI and What They Need." 2022. US nationally representative survey of K-12 teachers. 5 hrs/week planning, 5 hrs/week grading. AI saves ~5.9 hrs/week.

[2] Hong Kong Federation of Education Workers (HKFEW). "Teacher Workload Survey." 2022. HK primary + secondary. 81% work 51+ hrs/week. Happiness score: 4.33/10 (decade low).

[3] Education University of Hong Kong. "Teacher Well-Being and Stress Survey." 2025. n=851 HK teachers. 85% report excessive pressure. 56–59% cite differentiation as top-5 stressor.

[4] EdWeek Research Center. "Teachers and AI: What's Working and What Isn't." 2025. US nationally representative. Updates 2022 baseline with AI adoption data.

[5] RAND Corporation, American Teacher Panel. "AI Adoption and Training in Schools." 2024. US teachers. Only 20% received good or excellent AI training.

[6] UC Berkeley / Hechinger Report. "The State of AI Training for Educators." 2025. Qualitative research on teacher preparedness. Most self-taught or untrained.

[7] EdWeek. "Teachers Worry AI Undermines Professional Judgment." 2026. National survey. Concern that AI reduces teaching to content delivery.

[8] arXiv [2509.01395]. "Evaluating LLM Mathematical Reasoning in Educational Contexts." 2025. LLMs wrong ~33% on math content. 73% consistency (27% inconsistency across repeated prompts).

[9] eSpark. "Teacher Perspectives on AI in the Classroom." 2024. n≈600 US teachers. 31% stop using AI tools when trials end.

[10] Post-BETT Survey / The Learning Agency. "EdTech Adoption Barriers." 2025–2026. 43% of teachers unsure AI tools address actual needs.

[11] Adaptive Learning Meta-Analysis (Springer/SAGE). "Effectiveness of Adaptive Learning Systems." 2024. Meta-analysis confirming adaptive practice + spaced variation improves learning outcomes 0.3–0.5 SD.

[12] NotieAI / AI Worksheet Guide. "AI-Generated Teaching Materials Accuracy Report." 2025. AI-generated worksheets save ~5.9 hrs/week. Accuracy ranges 85–92% for text, lower for diagrams.

[13] arXiv [2509.01395]. "LLM Failure Modes in Student Error Detection." 2025. State-of-the-art LLMs fail at identifying specific student misconceptions. Can detect "wrong" but not "why wrong."

[14] Carnegie Mellon University / CHB Study. "Erroneous Examples Improve Learning." 2014. Erroneous examples outperform correct worked examples for learning. Students learn more from finding and explaining mistakes.

[15] South China Morning Post / DSE Survey Data. "Student Stress and DSE." 2016, 2025. 50%+ DSE students at highest stress levels. Exam-driven anxiety correlated with tutoring dependency.

[16] JUPAS / University Admissions Data. "DSE to University Placement Rates." 2025. Only 37% of JUPAS applicants placed. High-stakes filtering creates extreme exam pressure.

[17] Education Bureau (EDB) / Asia Education Review. "HK$500M AI in Education Fund." 2025. Schools receive HK$500K each over 3 years through 2027/28 for AI adoption.

[18] HKFYG / PRO360. "Private Tutoring Market Size." 2023–2026. 72% of HK students attend private tutoring. Industry >HK$10B annually.

[19] SCMP / HK Education Data. "Teacher Retention and Burnout." 2025. Teacher turnover increasing. 61+ hrs/week common in aided schools.

[20] CB Insights / EdTech Failure Analysis. "67 EdTech Post-Mortems." 2024. Competition killed 43.3%. Average lifespan 6.5 years. $17.1B total capital burned across sample.