The ii-5 model

How we grade your Japanese.

Every ii-5 test scores your speaking across five constructs. The placement test gives you a quick AI read. The Baseline diagnostic deepens that into a study plan with charts. The full speaking test adds human review and a certified result. This page explains every piece of how that grading works, so the numbers feel honest.

00 How natives speak

The mechanics that make Japanese sound Japanese.

Native Japanese isn't just correct grammar with a Japanese accent. It is built on pitch patterns, even mora timing, vowel length, and intonation shapes that don't exist the same way in English. These four features below are where most learners sound textbook, and where small adjustments make you sound natural. They feed directly into the Phonological Accuracy and Discourse Flow constructs.

Pitch accent

高低アクセント (kōtei akusento)

Each mora is either high or low. The same string can mean different things depending on which moras are high. English speakers naturally stress for emphasis; Japanese uses fixed pitch patterns that change the word itself.

hashi · chopsticksHead-high (頭高型)
hashi · bridgeEnd-high (尾高型)
hashi · edgeFlat (平板型)

Three different words. Same hiragana. The only thing separating them is which moras are high.

Mora timing

拍 (haku)

Every mora gets roughly the same amount of time. A small つ (促音) isn't a consonant cluster; it's a full beat of silence. Skipping that beat changes the word.

2 beats
kite · come (imperative)
3 beats
kitte · postage stamp

English speakers often rush through the small っ. To a native ear, kitte said as kite is a different word, not a sloppy one.

Long vowels

長音 (chōon)

Doubling a vowel doubles its length, and the lengthened vowel counts as its own mora. Trim the length and you change the word entirely.

3 beats
さん
obasan · aunt
4 beats
さん
obaasan · grandmother

Other common pairs to watch: ビル (building) vs ビール (beer), おじさん (uncle) vs おじいさん (grandfather), 主人 (husband) vs 囚人 (prisoner).

Sentence intonation

文末イントネーション

Japanese uses pitch direction at the end of a phrase to mark questions, confirmation, and affirmation. The word itself doesn't change; the rise or fall at the end does the work.

行く。iku · I'm goingStatement (drop)
?
行く?iku? · are you going?Question (rise)

Same verb, no question particle, opposite meaning. Casual Japanese leans hard on this rise to mark questions, especially among friends.

None of these features are gradable in isolation; they show up across every construct. A learner who nails mora timing tends to score well on Phonological Accuracy. A learner with natural sentence intonation tends to score well on Discourse Flow. Speak the way these patterns work and the rest tends to follow.

01 The five constructs

What we measure when you speak.

Five constructs cover the things a friendly Japanese listener actually notices. They are scored independently, and different test levels weight them differently.

01PA

Phonological Accuracy

音声の正確さ

Whether Japanese sounds come out the way a native listener expects. Long vowels (長音), short consonants (促音 っ), nasal n (撥音 ん), pitch accent, and devoiced vowels all live here.

Strong

おばあさんが きょうとへ いきました。

The long vowel in obaa-san and the long vowel in Kyoto both land cleanly.

Weak

おばさんが きょとへ いきました。

Long vowels collapsed into short ones. "Obaa-san" becomes "oba-san" (a different word: aunt instead of grandmother).

Whisper transcripts are noisy on L2 Japanese, so the AI looks at clarity and consistency across the whole sample, not single-word verdicts.

02LF

Lexi Flex

語彙の柔軟さ

Vocabulary range and flexibility. Word choice variety, appropriateness, and the ability to find a word that fits the intended meaning rather than circling around it.

Strong

友達と 待ち合わせを して、駅で 30分 ぐらい 待っていました。

Specific verbs (待ち合わせ, 待つ) and a precise time expression.

Weak

友達と あう。駅で 長い 時間 いる。

Vague generic verbs (あう, いる) and approximate time. Communication works, but the listener has to fill in the details.

A larger vocabulary lets you say what you actually mean. The AI rewards precise word choices over hedging vocabulary.

03GC

Grammatical Control

文法の運用

Particles, verb conjugation, tense, aspect, and sentence structure. Errors are expected at lower levels; the AI calibrates expectations to the test level rather than against native speakers.

Strong

昨日 図書館で 本を 読んでいたら、友達が 来ました。

Past progressive (読んでいた) plus the conditional ら construction connects two events cleanly.

Weak

昨日 図書館 本 読む。友達 来る。

Particles missing, verbs in dictionary form. Listener can guess but works for it.

Self-correction (catching a particle mistake and fixing it mid-sentence) is treated as a strength, not a weakness.

04DF

Discourse Flow

談話の流れ

Coherence across multiple sentences. Use of connectives (それで, でも, だから), the ability to develop an idea over several turns, repair, and natural turn-taking signals.

Strong

最初は 難しいと 思いました。でも、毎日 練習したら、だんだん 楽しくなって きました。

Two clauses joined by でも, then a result phrase with なって きました showing change over time.

Weak

難しい。練習する。楽しい。

Three disconnected fragments. The ideas might be the same, but the listener has to construct the relationship between them.

On a reading task, this becomes about how naturally you carry the passage forward, not whether you connect ideas.

05ST

Situational Tuning

場面への調整

Register choice (formal vs casual), politeness level, contextual appropriateness, and whether you interpret what the question is actually asking. A polite response to a casual question is just as off as the reverse.

Strong

先生、すみません。少し お時間 よろしいですか。

Formal address, hedged request, polite verb form. The listener (a teacher) is treated correctly.

Weak

先生、ちょっと いい?

The grammar is fine, but the register is wrong for the listener. Native ears notice immediately.

On higher-level tests, situational tuning carries more weight. On lower-level tests, it is observed but not scored heavily.

02 How the AI grades

AI is your first read. Not your final score.

We use AI for what it is genuinely good at: fast, consistent grounding on every transcript and audio signal. The model is anchored to specific evidence, not vibes. It does not invent claims. Anywhere the AI is uncertain, that uncertainty is recorded and surfaced to the human grader (on the full test) or shown to you directly (on the Baseline diagnostic).

Role 01

Transcribe your speech

OpenAI Whisper turns your audio into a Japanese transcript with word-by-word timestamps. This is the foundation for everything that follows. Whisper is biased toward Japanese vocabulary for the test prompts so short clips do not get mis-decoded as English filler.

Role 02

Extract audio signals

Speaking pace (words per minute), pause patterns, pacing consistency, and ASR clarity are computed from the Whisper timestamps. Japanese filler words like えーと and あの are counted and density-normalized so a 30-second response with two fillers is not lumped with a 5-second response that has two.

Role 03

Score each construct

The placement test sends transcripts and signals (and the actual image, when an image is part of the prompt) to GPT-4o. The full speaking test sends the same evidence plus per-question tier weights to Claude Opus 4.7. Both score every construct 1 to 5 on every gradable question.

Role 04

Write specific feedback

The AI produces warm, specific feedback in English with Japanese sounds and particles referenced inline. Hedging language like "try to" or "consider" is banned in favor of direct, actionable next steps.

The AI never decides your certificate. On the full speaking test, the AI produces a pre-grade. A certified human grader then reviews everything, edits what they disagree with, and submits the final result. Without that human step, no certificate is issued.

03 How humans grade

The certified result lives with a person.

On the paid full speaking test, your final score is reviewed and signed off by a qualified human grader. The AI pre-grades to keep things consistent and to save the grader from typing every word of feedback from scratch. The judgment, the override, and the certificate are theirs.

Certified Japanese instructors

Graders are working Japanese teachers and licensed examiners. They have heard thousands of L2 learners at every level and know what an Ii3 speaker sounds like compared to an Ii4.

Calibrated to the ii-5 rubric

Every grader uses the same construct definitions, tier weights, and scoring scale. The grader portal seeds each question with the AI pre-grade so two different graders on the same submission converge to similar scores.

Final say on the certificate

The AI is advisory. The human grader can override any AI score, rewrite any feedback, and is the only one who can issue a passing certificate. Your certificate has their judgment behind it, not just a model.

Step 01

AI pre-grades in the background

The moment you submit a full test, the AI pipeline runs. Whisper transcribes every answer, signals are extracted, and Claude Opus 4.7 produces per-question 1 to 5 scores plus written feedback. This happens whether or not you stay on the page.

Step 02

A human grader reviews everything

A certified grader opens your submission. They see the AI pre-grade as a starting point, listen to every audio response, edit any scores they disagree with, and finalize the per-construct feedback. Tier weights are applied automatically (primary x3, secondary x2, observed not scored).

Step 03

You get your certified result

Once the human grader submits, you receive a notification email and your results page unlocks. If you pass, your certificate is ready to view and share. The result is final and reflects both AI grounding and human judgment.

The Inversion Measurement Model

Each question targets specific constructs.

Every audio response is evaluated by how it sounds to a native listener. What you say matters, but how you say it matters more. No matter your level, the ii-5 model judges your Japanese the way a real native ear would.

Not every question measures all five constructs. A reading task is built to test pronunciation and discourse flow; a casual conversation prompt is built to test grammar and situational appropriateness. We score each question only on the constructs it was designed to test, and we weight those scores by how central each construct is to that prompt.

Across the full test, every construct shows up on many questions, weighted differently each time. The Inversion Measurement Model combines those weighted scores so that your Phonological Accuracy result comes from the questions that actually depended on pronunciation, your Discourse Flow result comes from the questions that actually depended on sustained speech, and so on.

Try it for yourself

See your score across these five.

The placement test is free and runs in under five minutes. Take it once to see roughly where you sit, then choose the depth you want.