On the paid full speaking test, your final score is reviewed and signed off by a qualified human grader. The AI pre-grades to keep things consistent and to save the grader from typing every word of feedback from scratch. The judgment, the override, and the certificate are theirs.
✦Certified Japanese instructors
Graders are working Japanese teachers and licensed examiners. They have heard thousands of L2 learners at every level and know what an Ii3 speaker sounds like compared to an Ii4.
✦Calibrated to the ii-5 rubric
Every grader uses the same construct definitions, tier weights, and scoring scale. The grader portal seeds each question with the AI pre-grade so two different graders on the same submission converge to similar scores.
✦Final say on the certificate
The AI is advisory. The human grader can override any AI score, rewrite any feedback, and is the only one who can issue a passing certificate. Your certificate has their judgment behind it, not just a model.
Step 01AI pre-grades in the background
The moment you submit a full test, the AI pipeline runs. Whisper transcribes every answer, signals are extracted, and Claude Opus 4.7 produces per-question 1 to 5 scores plus written feedback. This happens whether or not you stay on the page.
Step 02A human grader reviews everything
A certified grader opens your submission. They see the AI pre-grade as a starting point, listen to every audio response, edit any scores they disagree with, and finalize the per-construct feedback. Tier weights are applied automatically (primary x3, secondary x2, observed not scored).
Step 03You get your certified result
Once the human grader submits, you receive a notification email and your results page unlocks. If you pass, your certificate is ready to view and share. The result is final and reflects both AI grounding and human judgment.
The Inversion Measurement ModelEach question targets specific constructs.
Every audio response is evaluated by how it sounds to a native listener. What you say matters, but how you say it matters more. No matter your level, the ii-5 model judges your Japanese the way a real native ear would.
Not every question measures all five constructs. A reading task is built to test pronunciation and discourse flow; a casual conversation prompt is built to test grammar and situational appropriateness. We score each question only on the constructs it was designed to test, and we weight those scores by how central each construct is to that prompt.
Across the full test, every construct shows up on many questions, weighted differently each time. The Inversion Measurement Model combines those weighted scores so that your Phonological Accuracy result comes from the questions that actually depended on pronunciation, your Discourse Flow result comes from the questions that actually depended on sustained speech, and so on.