This is the checkpoint after the first P6 difficulty-expansion pass. Instead of jumping straight into generator code, we assembled a reviewable set of question families, tagged their provenance, and filtered them through the local atomic graph plus the HK curriculum.1
The goal is not to import “harder foreign questions” blindly. The goal is to improve variety in question templates and variety in difficulty while staying syllabus-safe and debuggable later.2
The current P6 bank covers the TSA floor well: forward percentages, direct averages, direct speed formula, ratio division, formula-based area/volume, and standard data handling.1
Different source systems contribute different kinds of value. HKAT is the main local spine. SG and UK contribute reasoning styles. HKMO/HKIMO show the ceiling but should not leak into the core bank.3456
| Source | What It Gives Us | Status In This Checkpoint |
|---|---|---|
| HKAT | Cross-domain questions, explicit equation use, harder-but-local benchmark | PRIMARY SPINE |
| HK school tiered objectives | Evidence for what “high” looks like inside HK primary practice | LOCAL VALIDATION |
| UK SATs | Backward / missing-number reasoning and answer-checking habits | PATTERN DONOR |
| PSLE | Optimization and comparison styles | STRETCH ONLY |
| HKMO / HKIMO | Competition ceiling and enrichment lane | HOLD OUT |
| Internal synthesis | Combining already-owned skills into new families | CASE BY CASE |
Template variety is not just “more topics.” It is more ways for a student to be asked to think. This checkpoint expands the bank along five meaningful directions.12
Move from “apply a formula” to “recover the hidden original.”
Examples: original price from sale price, required score from target average.
Move from single-topic prompts to data-plus-percentage or discount-plus-equation questions.
This is the cleanest way to make the bank feel more exam-like without leaving syllabus.
Move from one-shot percentage change to sequential change on changing bases.
Example: 27% removed, then 40% of the remainder.
Move from exact calculation only to “does this answer even make sense?”
This improves number sense, not just procedure.
Use existing owned skills in new combinations, such as speed plus ratio comparison.
Useful, but teacher validation should come first.
These are strong for enrichment, but they currently fail the overlap test for the core bank.
Good inspiration, bad default import.
The checkpoint is not only adding new prompt shapes. It is deliberately adding the missing upper band. Current production P6 has no advanced questions after calibration.1
| Difficulty lever | Old bank tendency | New checkpoint shift |
|---|---|---|
| Reasoning direction | Forward only | Backward and reverse reasoning |
| Topic count | One topic per item | Two or three topics interleaved |
| Information use | All information directly useful | Interpret, filter, or compare information |
| Decision demand | Compute a single answer | Judge, compare, justify, or choose best fit |
| Mathematical communication | Short numeric answer | Equation setup or explanation-like structure |
The table below is the practical build list. It is where provenance, template variety, and difficulty variety meet.
| Family | Sample IDs | Source | Decision | What It Improves |
|---|---|---|---|---|
| Backward percentage / recover original | BR-1, BR-3 | HKAT + HK school tiered objectives | ADOPT | Backward reasoning + equation-friendly difficulty |
| Backward average / required score | BR-2 | HKAT + UK SATs reasoning style | ADOPT | Reverse mean reasoning + threshold thinking |
| Sequential percentages | SP-1, SP-2 | HK school tiered objectives | ADOPT | Changing-base percentage reasoning |
| Data reading + percentage | CT-1 | HKAT | ADOPT | Cross-domain question shape |
| Discount + equation | CT-2 | HKAT | ADOPT | Equation setup + percentage integration |
| Speed + ratio comparison | CT-3 | Internal synthesis / PSLE-style comparison | ADAPT | New fused template from owned skills |
| Deal comparison / least-cost package | CO-1, CO-2 | PSLE | OUT | Comparison / optimization inspiration only |
| Estimation / reasonableness | EST-1, EST-2 | TSA + UK SATs | ADAPT | Number sense and answer-checking |
| Pattern / sequence families | held out | UK SATs + competition | OUT | Enrichment only |
| Competition families | held out | HKMO / HKIMO | OUT | Ceiling reference, not core syllabus |
The point of the sample set is to show the flavor of the uplift, not to pretend all 12 belong in production as-is.1
Build now: backward percentage, backward average, data-plus-percentage cross-domain, and discount-plus-equation.
Build as stretch: sequential percentages, speed-plus-ratio, estimation / reasonableness.
Hold out: PSLE-style optimization, pattern families, and competition-only content.
Why this order works: it gives us visible variety gains while keeping the core bank locally defensible.
The right next move is selective uplift, not broad import. HKAT should be the main backbone for P6 difficulty expansion. UK SATs and PSLE are useful only when they donate a reasoning pattern that still survives local overlap checks. HKMO/HKIMO should remain ceiling references rather than core bank inputs.
Template variety improves when we move from forward single-topic prompts into backward, cross-domain, and judgment-heavy prompts. Difficulty variety improves when those new prompts introduce reasoning reversal, topic interleaving, filtering, and equation setup in a controlled way.
Practical build order: generatorize the four `adopt` families now, keep the three `adapt` families as stretch with teacher validation, and keep the `out_of_syllabus` families out of the main P6 bank until school materials confirm them.