9 min read

Why Most Dictionary Apps Are Stuck in 2010 (And What We Built Instead)

Looking up a word in 2026 still gets you a definition, an IPA, and an audio clip. That's the same thing dictionaries did 15 years ago. We rebuilt the SpeakShark dictionary popup with seven layers of data per lookup — collocations, frequency, adjective patterns, related words, rhymes — because knowing a definition isn't the same as knowing a word.

Quick answer: Most dictionary apps tell you what a word means. They don't tell you which words appear next to it in real English, how common it is, what adjectives native speakers pair with it, or which related concepts cluster around it. Those are the things that make the difference between recognizing a word and using one. We rebuilt the SpeakShark dictionary popup to show all of it — in one tap, no leaving the conversation.

If you've ever looked up a word, understood the definition perfectly, and then a week later used it slightly wrong in a sentence — you've felt this gap directly. Definition isn't usage. Usage is everything around the word: collocations, register, frequency, the adjectives that natively combine with it. Most dictionaries don't show that.

This post is about what we built and why we think most dictionary apps are solving the wrong problem.

The 2010 dictionary stack

Open any major dictionary app today — Cambridge, Merriam-Webster, Oxford Learner's, Google's "define" widget. You'll see roughly the same things:

  1. The word itself
  2. IPA pronunciation
  3. Audio button (sometimes US / UK)
  4. Part of speech
  5. 2–5 definitions
  6. An example sentence per definition (sometimes)
  7. Synonyms (usually a short, low-quality list)

This is a 2010-era dictionary. It hasn't changed because most users don't realize what they're missing.

What they're missing is distributional data — information about how the word is actually used by real English speakers. That's the data that turns recognition into production.

Three concrete examples:

"Wreak" has the dictionary definition "to cause something bad to happen." That's correct. It's also useless if you don't know that "wreak" is almost always followed by "havoc". 95%+ of real-world uses of "wreak" are paired with "havoc," "destruction," or "vengeance." A learner who picks up "wreak" from a dictionary and uses it in "the kids wreak fun" sounds wrong even though the definition fits.

"Ocean" is a noun. Dictionary gives you definitions. But what adjectives do native speakers use to describe oceans? "Vast", "deep", "calm", "rough", "boundless", "endless". Knowing these isn't optional — it's how you talk about oceans without sounding like a Wikipedia entry.

"Get" is one of the most common verbs in English. Frequency: ~500 occurrences per million words. "Esoteric" is rare — about 1 per million. A dictionary that doesn't tell you this gives equal weight to both, and the learner ends up overusing rare words in casual conversation. The frequency signal is what tells you whether to use this word at all.

We wanted SpeakShark's dictionary to fix all three.

The seven layers

When you look up a word in SpeakShark now, the popup runs seven parallel API calls and renders the merged result in ~400 milliseconds. Here's what each layer gives you:

1. The Definition (Free Dictionary API)

Standard dictionary entries. Definitions per part of speech. Example sentences. This is the table-stakes layer.

2. IPA + Syllables + Part of Speech (Datamuse metadata)

/words?sp={word}&qe=sp&md=prsf&ipa=1

In one call, we get the IPA pronunciation, syllable count, and part of speech tags. This is the same data a dedicated phonetics dictionary would give you — but at the same latency as a normal lookup.

For Vietnamese learners specifically, the IPA matters a lot. Vietnamese learners often guess pronunciation from spelling — which fails on English's irregular orthography. IPA forces correct pronunciation.

3. Frequency Label

Datamuse returns a tags: ["f:N.NN"] field where N is occurrences per million words in Google Books Ngrams. We bucket it into 4 labels:

  • Very common (≥100/M) — like "get", "work", "go". Use freely.
  • Common (≥10/M) — "remarkable", "particularly", "challenge". Use them.
  • Uncommon (≥1/M) — "esoteric", "ostensible". Use sparingly.
  • Rare (<1/M) — Avoid in conversation unless quoting.

A bright green pill labels the band right under the IPA. Learners stop overusing rare academic vocabulary in casual chat because they can see the word is rare.

4. Synonyms (rel_syn)

/words?rel_syn={word} — WordNet-derived synonyms.

The killer use case: "Don't say good every time." Look up good and the synonyms section gives you excellent · fantastic · superb · great · wonderful · pleasant · favorable · decent as clickable chips. Tap any chip to lookup that word and decide if it fits.

This isn't novel by itself. What's novel is that the synonyms ship in the same popup as everything else, so you don't context-switch.

5. Collocations Before + After (rel_bgb + rel_bga)

This is the gem.

/words?rel_bgb={word} returns the words that statistically appear before the queried word in real English. rel_bga returns the words that appear after.

Example for havoc:

  • Words before: wreak, wreaks, cause, bring
  • Words after: on, with, among, everywhere

So the popup tells you, point-blank: if you're using the word "havoc," it almost always follows "wreak" or "cause," and is often followed by "on" or "with."

This is information no dictionary I've used surfaces this prominently. It's information that turns "I know what havoc means" into "I know how to use havoc in a real sentence."

Vietnamese learners especially struggle with collocations because Vietnamese lexical pairing rules don't transfer. "Do homework" vs "make homework" can only be learned through collocational data — there's no grammar rule that predicts it.

6. Adjectives That Describe It (rel_jjb)

/words?rel_jjb={word} — adjectives that native speakers commonly pair with this noun.

Look up ocean:

  • vast, deep, calm, wide, pacific, atlantic, salt, frozen, boundless

Look up decision:

  • final, right, wrong, important, difficult, quick, informed, executive

This populates "what adjectives sound natural with this noun" — which is one of the highest-impact moves a learner can make to sound less robotic. The adjective informed is band-7+ vocabulary for decision; most learners default to "good decision" or "bad decision" because they never saw a native pairing.

7. Related Words (rel_trg) + Rhymes (rel_rhy)

rel_trg returns "trigger" words that statistically co-occur with the queried word — topic clustering.

Look up cow: milking, farm, barn, dairy, pasture, calf, udder, moo

This is useful for building thematic vocabulary. A learner doing a "farm life" unit can tap cow and see the surrounding vocabulary cluster.

Rhymes (rel_rhy) are mostly for fun — and for songwriting / poetry users. Look up motionnotion, potion, ocean, devotion. We added them because they're low-effort and cool, not because they're core to language learning.

How the popup renders

Everything above lives in one popup. From top to bottom:

[ word ]                                          x

/ipa/  ·  2 syllables  ·  [Very Common]    🔊US 🔊UK

NOUN
1. Definition 1
   "example sentence"
2. Definition 2

VERB
1. Definition...

═══════════════════════════════════════════
✨ SYNONYMS           [chip] [chip] [chip] ...
🔗 OFTEN PAIRED WITH
   Before "word":    [chip] [chip] [chip] ...
   After "word":     [chip] [chip] [chip] ...
📚 DESCRIBED AS       [chip] [chip] [chip] ...
#️⃣ RELATED           [chip] [chip] [chip] ...
🎵 RHYMES             [chip] [chip] [chip] ...
═══════════════════════════════════════════

5 free lookups left today.  Upgrade for unlimited →

Every chip is clickable. Tap a synonym, the popup re-fires lookup() with that word. Tap a collocation, same thing. You can drill from oceanboundless (adjective from "described as") → discover boundless synonyms (infinite, endless) → tap one → its collocations. A learner can do a 90-second deep dive into the lexical neighborhood of any word.

The chips are color-coded by section — synonyms in emerald, collocations in violet, adjectives in cyan, related words in amber, rhymes in rose. Quick visual scan tells you which kind of information you're looking at.

Why we picked Datamuse over building this ourselves

Datamuse is a free, no-auth public API that wraps WordNet, ConceptNet, and Google Books Ngrams. It's not perfect — definitions are thin (we still use Free Dictionary for those), and it has no audio. But it does the one thing that matters for distributional data exceptionally well, and it does it for free.

Alternatives we considered:

Source Why we didn't use it
Wordnik Paid, requires API key, lower rate limits
Cambridge API Paid, expensive, English-only restrictions
Build our own Ngrams pipeline 4-week project for the same outcome
OpenAI for collocations $0.001 per call, would burn cost as we scale

Datamuse: free, no auth, 7-request parallel fetch under 400ms, decent quality across all 13 of its rel_* endpoints. It's the right tradeoff for this layer.

We added a CSP allowlist for api.datamuse.com and we cache nothing — Datamuse is fast enough that re-fetching on each lookup is cheaper than implementing client-side cache invalidation.

The graceful degradation story

Seven parallel API calls means seven things that can fail. Here's what happens when each fails:

Failure What you see
Free Dictionary 404 Definition section empty; Datamuse sections still render
Datamuse metadata fails No IPA/frequency badge; rest renders
One specific rel_* endpoint fails That section silently drops; others render
All Datamuse fails (CSP block?) Popup looks like the old dictionary (definitions only)

Promise.allSettled everywhere. The popup is never empty unless literally every source fails — which would mean the user is offline.

What this changes for the workflow

Before, looking up a word was a 3-step ritual:

  1. Open SpeakShark to see the word
  2. Switch to Cambridge or Google to see collocations
  3. Switch to another resource to check frequency

Now it's 1 step: tap the word in your transcript. Popup opens. Scroll down for the deep dive. Done.

Power user move: if a learner is reviewing a past session's vocab card, every chip is clickable. They can do an entire vocabulary review session by drilling — oceanvastexpansiveenormous → trace the natural English lexical web for ten minutes. No context switching. Everything in-app.

Limits

A few things this deep dive doesn't do, and probably shouldn't:

  • Idioms — Datamuse doesn't have great idiom data. For idioms we still rely on Free Dictionary (which has decent idiom entries) plus the Grammar Reference's "Featured Phrases" section.
  • Slang — Datamuse skews toward written English (Google Books). "Rizz", "gyatt", "delulu" — none of these will surface meaningful collocations. For Gen-Z slang we'd need Urban Dictionary, which has well-known noise/NSFW problems.
  • Multi-word phrases — Searching for "look up" works (Datamuse handles short multi-words), but "for the time being" sometimes returns nothing because tokenization is per-word.

For each limitation, the popup degrades cleanly. If rel_jjb returns nothing for an idiom, the section just doesn't render.

Try it

Open SpeakShark, look up any word from the Dictionary tab — or tap a vocabulary chip in your past mistakes. The deep dive is on every free account from day one. Pro lifts the daily lookup quota to unlimited.

The fastest way to feel the difference: look up a word you think you know. Then look at the collocations. If even one is unfamiliar, that's a vocabulary upgrade you didn't know you needed.

Definition is just the start.