Scoring on a 2018 AA HL paper against current boundaries is like reading a map to the wrong destination. Recent sessions are widely described by students as more time-pressured and stylistically distinct—especially in unfamiliar problem framing—which means a raw percentage earned on older material can imply a different final grade than the same percentage on a 2024–2025 paper mapped onto today’s boundaries. Students building confidence primarily on pre-2022 papers aren’t necessarily getting stronger; they’re getting better at a version of the exam that no longer sets the standard.
Because grade boundaries and paper difficulty profiles move together, a score from IB Mathematics AA HL mock exams only becomes diagnostic when it is held against the right generation of paper and the right boundary band. The same percentage on a 2018 paper and a 2024 paper does not carry the same grade implication when mapped onto current boundaries: students who build confidence mainly on older papers may be misled about how secure their grade is, while those working only on newer papers often misread lower raw scores as failure instead of a boundary effect. The reliable way to read mocks now is to match the paper generation, translate raw marks into current boundary ranges per paper, and turn that into a personal error budget.
Translating Grade Boundaries Into Score Targets
AA HL grade boundaries change every session and can differ by timezone. In November 2025, AA HL marks out of 100 show TZ1 Grade 6 = 64–77 and Grade 7 = 78–100, while TZ3 Grade 6 = 63–74 and Grade 7 = 75–100. That 3-mark difference in the Grade 7 cutoff between TZ3 and TZ1–75 versus 78 on the same scaled final mark—within a single session is exactly why anchoring to a fixed target like 80% or 85% is unreliable: the threshold that earns a 7 depends on which timezone you sat, and it shifts session to session.
A flat percentage target—80%, 85%, whatever the forum consensus settled on this week—collapses all of that into a number that fits no specific paper, no specific weighting, and no specific timezone. The workflow below converts your boundary band into something you can actually train against.
- Pick a planning point inside your timezone’s recent FINAL boundary band for your target grade (often near the middle).
- List each component you use in mocks (Paper 1, Paper 2, Paper 3, plus IA if included) with its maximum marks and its official weighting percentage, taken from your course guide or teacher rather than from social-media templates or third-party summaries, because mock formats can vary by school.
- For each component, divide your raw score by that paper’s maximum marks to find the percentage you earned on that paper, multiply by its weighting percentage, and add all of these weighted contributions together; the result is your current weighted total on the same 0–100 scale as the published boundaries.
- Work out how far you are from your planning point by subtracting your current weighted total from that planning mark; if the result is zero or negative, you are already at or above that point.
- Decide how to share any positive gap: split it evenly across the papers or load more of it onto the paper you can most reliably improve.
- For each paper, turn the share of the gap you have assigned to it into a raw-mark target by asking how much you need that paper’s weighted contribution to increase, then converting that increase back into the number of extra marks based on its maximum marks and weighting.
- Define your mark budget for each paper by subtracting your target raw mark for that paper from its maximum marks; the result is how many marks you can still afford to lose on that paper in future mocks.
- After every full-condition mock, run back through Steps 1–7 to update your planning point, targets, and remaining mark budget, and keep whichever gap-sharing approach you chose—the Balanced split or the Strength-tilted focus—fixed for 2–3 weeks unless your planning boundary point changes or one paper’s remaining mark budget is effectively zero.
Converting Mock Results Into an Error Budget
A raw mock mark becomes useful only when you know how those lost marks are distributed. Sorting every loss into four buckets—conceptual, procedural, communication, and time-management—turns a single percentage into a map of what’s going wrong, and each type demands a structurally different preparation response.
A specialist IB Math AA HL tutoring resource notes that the course and exams demand non-calculator fluency, proof, and sustained reasoning under pressure. Paper 1 is a 120-minute non-calculator paper that leans on precise algebraic reasoning, so small procedural errors there can be disproportionately expensive. Paper 3 is a 60-minute paper built around two extended unfamiliar problems, meaning conceptual gaps and time-management failures can compound: getting stuck early or misreading the structure can cascade into large losses. The same error type behaves differently across papers—which is why a single percentage obscures more than it reveals.
Use the per-paper mark budget you built from the boundary workflow as the constraint for everything that follows. Start with the paper where that budget is smallest, identify the error type consuming the largest share of lost marks there—especially repeatable losses like procedural slips, communication gaps, or predictable time sinks—and commit your next 7–10 days to that category. Knowing which error to fix gets you halfway there; the harder question is which papers to use when you go back to fix it.
Sequencing Your Paper Archive Over 12 Weeks
Authentic 2024–2025 AA HL papers are a finite resource, and using them for topic drilling burns through your highest-fidelity calibration tools before you need them most. A better approach divides a twelve-week window into two phases: weeks 1–6 focus on topic-specific questions and third-party prediction sets to build skills and coverage, while weeks 7–12 reserve current-style past papers for timed, full-paper simulations under exam conditions.
A major revision provider for AA HL Paper 3 explicitly separates exam-style case-study sets—designed to replicate the real Paper 3 format and difficulty—from topic-specific drills intended for earlier learning. That logic scales cleanly to all three papers: treat exam-style, current-format papers as simulation tools, not everyday exercises.
- Weeks 7–10 (simulate): Sit one full paper per week under exam conditions, alternating Paper 1 and Paper 2; every other week, add a full Paper 3 in the same style as the real exam.
- Weeks 11–12 (calibrate): Increase to two full-condition sits per week and reserve your closest-matching 2024–2025 sets for these final rehearsals.
- Next-day rule: After each simulation, debrief the very next day using your four error categories and update your P1/P2/P3 mark budgets before doing any additional practice.
Reframing Difficulty as a Calibration Signal
Difficulty shift is information. When AA HL feels harder on current-style papers, that’s not evidence that a 7 is receding—it’s a signal that the gap between your preparation baseline and the exam’s actual demands is wider than your scores on older material suggested.
Students who align practice with current-style papers and timezone-specific boundary bands build an accurate picture of where they stand. Those chasing hours against outdated benchmarks build a flattering one—and the difference shows up at the boundary, not before. A per-paper mark budget paired with the weeks 7–12 simulation cadence doesn’t soften the difficulty. It converts each mock into a specific deficit on a specific paper. That’s what makes the next decision obvious.
Turning Calibration into Your Next Mock Plan
Sequence is the part that actually matters. Running the boundary workflow after a poorly timed mock, against the wrong timezone’s numbers, produces calibration theater—figures that look precise and point in the wrong direction. Get the generation right, identify your planning point, then let the per-paper mark budgets do the diagnostic work they’re built for.
After your next full-condition sit, add up your weighted paper contributions to find where you land on the 0–100 boundary scale, identify your timezone’s planning point, and write your P1, P2, and P3 mark budgets at the top of the debrief sheet—before anything else. The score tells you what happened. The budget tells you what to do about it.