8 min read

We Made Every Sentence in Your Past Mistakes Searchable, Word by Word

Most vocab apps treat a saved sentence as one frozen string. Tap it, get a definition, or nothing. We tokenize every saved sentence into phrases AND words, filter out function words, and let you tap any chip to look it up — without leaving the card.

Quick answer: When SpeakShark catches a sentence-level mistake in your speaking session — like "I doing really well or I feel great" — most apps would save that as one opaque string and let you stare at it. We split every saved item into phrases and content words, hide the boring "the / a / is" stops, and turn every meaningful chunk into a clickable chip that opens the full dictionary popup. Try it on a free account — the Vocabulary tab in Library is where it lives.

I want to talk about a small UX detail in SpeakShark that I think makes a disproportionate difference to how much learners actually use their saved mistakes — and why it took us four iterations to get right.

The problem with sentence-length vocab items

When you finish a Daily Talk conversation, the AI saves your mistakes to your Vocabulary bank. Some of those mistakes are clean single-word substitutions:

I winned → ✓ I won

Easy. The card shows "won," you tap it, you get the dictionary entry, you move on.

But a lot of mistakes aren't single words. They're sentence fragments:

"that i am doing well or that i feel great" → ✓ "that I'm doing really well, or that I feel great"

You can't look that up. Free Dictionary API returns 404 for anything longer than ~5 words. Datamuse tokenizes per-word and doesn't know what to do with a clause. So historically, sentence-length cards just sat there — frozen text, no audio button, no lookup, no path forward for the learner.

The first version of our Vocabulary tab was honest about this: long items got a card with no interactivity. The user could read the correction, that was it. Looking at telemetry, about 47% of saved items were over 5 words long. Half the bank was essentially dead text.

That's the problem we set out to fix.

What "tokenization" means here

Tokenization in language tooling usually means splitting text into atomic tokens — words, punctuation, etc. We do that, but with two ESL-specific twists:

  1. Filter out stop words. Function words ("the", "is", "I", "you", "a", "of") are not learnable vocabulary. A Vietnamese intermediate learner doesn't need to look up "the." Filtering them out cuts noise dramatically.

  2. Preserve phrases. Collocations and phrasal verbs are the part of vocabulary that ESL learners actually need to study. "Doing really well" is one learnable unit. Breaking it into ["doing", "really", "well"] loses the natural English pairing. So we extract phrases first, then individual words.

The result: tapping on a sentence-length card shows you the chips that are actually worth studying — and skips the ones you already know.

The algorithm, walked through

Here's the actual logic, with a worked example. Saved sentence:

"that I am doing really well or that I feel great"

Step 1: lowercase, strip punctuation, split on whitespace:

["that", "i", "am", "doing", "really", "well", "or",
 "that", "i", "feel", "great"]

Step 2: walk left-to-right, grouping tokens into "runs" — every time we hit a stop word, we close the current run and start a new one. Stop words include the usual suspects: articles, pronouns, auxiliaries, modals, conjunctions, common prepositions, wh-words, "this/that/these/those."

"that"   → stop word — closes run (run is empty, skip)
"i"      → stop word — closes run
"am"     → stop word — closes run
"doing"  → content, start run: ["doing"]
"really" → content, append: ["doing", "really"]
"well"   → content, append: ["doing", "really", "well"]
"or"     → stop word — closes run → push ["doing", "really", "well"]
"that"   → stop word
"i"      → stop word
"feel"   → content, start run: ["feel"]
"great"  → content, append: ["feel", "great"]
END      → close last run → push ["feel", "great"]

Runs: [["doing", "really", "well"], ["feel", "great"]]

Step 3: emit the multi-word phrases first (length ≥ 2), then the individual content words. Dedupe along the way:

Phrases:  "doing really well", "feel great"
Words:    "doing", "really", "well", "feel", "great"

Final chips: ["doing really well", "feel great", "doing", "really", "well", "feel", "great"]

Cap at 8 chips per card so the UI doesn't blow up on long sentences.

Why phrases come first

Two reasons.

One: phrases are higher-information. "Doing really well" is a complete English collocation. Looking up the phrase teaches you register (informal, friendly), pairing ("doing well" not "making well"), and intensification ("really" works here; "very" would feel off). Looking up "doing" alone gets you a verb definition you already half-know.

Two: phrases get visually larger chips. We style phrases (chips containing spaces) bigger and bolder than word chips, so the natural reading order is:

[ doing really well ]  [ feel great ]
[doing] [really] [well] [feel] [great]

Phrases on top. Words below as a fallback. This is a small thing but I think it nudges learners toward the higher-leverage lookup.

The Quizlet-style card click

While I was rebuilding the vocab tab, I tested it on five learners. Three of them — completely independently — tried to tap the whole card to look up the word. I had buttons for everything; the card itself was a static rectangle. They all thought it should "open."

That's a strong signal. Flashcard apps (Quizlet, Anki, RemNote) trained an entire generation of learners to expect "tap the card" to mean "show me more." So we wired it up.

The whole vocab card is now a button. For short items (≤3 words), the click looks up the whole phrase. For longer items, it looks up the first extracted chip (so the longest natural sub-phrase). Inner controls — the audio speaker, the mastery toggle, the individual word chips — call event.stopPropagation() so they don't accidentally trigger the parent card lookup.

For multi-word items, we also show a quiet "Tap card for full definition" hint, plus the chip row. Either route works. The user picks whichever feels natural.

The explanation fallback

There's a quieter problem with sentence-length items: even after we tokenize, sometimes the chip extraction doesn't yield anything usable. Example: an item where the AI saved a tone correction like "that's not the most natural way to say it in casual conversation." The chips would be {most natural way, casual conversation}. The user taps most natural way — Free Dictionary 404s on a 3-word non-phrase. Datamuse returns nothing meaningful.

The popup would be empty. The user would close it and lose trust in the feature.

So we added a fallback: when the dictionary lookup returns "not found," we display the AI's saved explanation for that mistake in the popup body. That text is always meaningful — it's the AI's note about why the original was wrong, written when the mistake happened. The popup says, in effect:

"We couldn't find a dictionary entry for 'most natural way', but here's what was wrong with how you used it in your session: 'In English, we usually say the most natural way — but adding to put it makes the sentence flow better.'"

The fallback is plumbed through as a fallbackNote prop on DictionaryPopup. When VocabularyPage opens a lookup, it stashes the card's explanation in component state, and passes it to the popup. The popup uses it whenever the API didn't return content. Closing the popup clears the fallback.

This single addition turned "empty popup = broken" into "always has content = always useful." Telemetry shows the popup close rate dropped from ~32% (lots of "this didn't help me, close") to ~9%.

The junk vocab filter

While we were at it: a separate but related problem. Sometimes the AI saves obvious garbage — "n/a", "incomplete sentence", "no error", "unclear". These come from the LLM occasionally hallucinating a structured response when the actual sentence was fine. Pre-filter, they showed up as cards in the bank.

We hand-built a JUNK_VOCAB_PHRASES Set in the library route — about 12 known-bad phrases. Anything matching gets dropped at query time before it ever reaches the user. The filter runs server-side, so junk never even hits the client.

This isn't novel — it's just hygiene. But it makes the bank trustworthy. Every card you see is one the AI actually had something to say about.

How a learner uses this now, end-to-end

Real workflow, end-to-end, from a Daily Talk session to a vocabulary review:

1. User has a 10-minute Daily Talk conversation.
2. AI catches 6 mistakes during the session. 4 are sentence-length,
   2 are single-word.
3. After session ends, all 6 items land in Vocabulary tab.
4. User opens Library → Vocabulary later that day.
5. Sticky DictionarySearchBar at top — also available for ad-hoc lookups.
6. User scrolls cards. Sees a sentence-length card:
   "I doing really well or I feel great"
7. They tap "doing really well" chip.
8. DictionaryPopup opens. Shows:
   - "doing well" definition (or fallback explanation)
   - Datamuse Deep Dive: synonyms, collocations, adjectives, related
   - 7 free lookups left today
9. They tap the synonym "thriving" — popup re-fires with "thriving"
10. They tap "describes" chip "particularly" — popup shows "particularly"
11. Two minutes later, they've drilled five levels deep through their own
    spoken English's natural lexical neighborhood.
12. They close the popup. Mastery toggle on the card.
13. New → Learning → eventually Mastered.

Every step in here was previously a dead end. Sentence-level cards used to sit static. Now they're entry points.

What we deliberately don't do

  • No AI re-tokenization. We considered sending each saved sentence to an LLM for "find the learnable chunks." Decided against it — adds latency, adds cost, and regex + stop-word filter gets us 90% of the value with zero runtime cost. The 10% we lose are things like idioms with internal stop words ("end of the day" → only "end" + "day" survive). For those, the whole-card click still works.
  • No spaced-repetition scheduling. Vocabulary tab is for review on demand, not flashcards on a schedule. If users ask for SR, we'll layer it on, but the bar for adding that complexity is high — most learners use the bank organically, not as a Anki-style queue.
  • No translation. Every chip looks up the English dictionary entry. We don't auto-translate to Vietnamese or the user's native language. Two reasons: (1) the user picked English to learn; making it bilingual incentivizes lazy comprehension, (2) it adds latency and a translation provider dependency. Users who want translation can highlight + use the OS dictionary on iOS/Android.

What's next

We're considering letting users add their own vocab items — not just AI-caught mistakes. The mechanic: highlight any word in any transcript (or any blog post on this site), tap "Add to bank," it joins your saved items with no error context. Useful for words you encountered but want to memorize, separate from things you got wrong.

We're also tracking which chips get tapped the most. If a learner taps phrasal-verb-style chips 5x more than single-word chips, that's a signal that phrasal verb practice would be the highest-leverage drill for them. We'll wire that into the personalization layer over the next few weeks.

Try it

Sign up free. Have a 5-minute Daily Talk. Check the Vocabulary tab the next morning. Tap a sentence-level card. Tap a chip inside it. Tap a chip in the popup. See how far down you can drill before you stop learning.

The point: your past mistakes aren't a record of failure. They're a personalized dictionary of exactly the words you need next. We just made it tappable.