Update — 24 May 2026. We rolled out the hands-free version of Daily Talk earlier this week, then rolled it back the same day after telemetry showed a subset of sessions sitting silent for 30 seconds on turn 1. The cause is a Chrome browser-permission edge case we want to fix properly (with a visible "enable hands-free" tap inside a user gesture) before we re-ship. Daily Talk is currently tap-to-speak. The design described below is where we're heading, not where we are today. We'll publish a follow-up when the proper version is live.

Quick answer: If you've ever practised English with an AI app and felt the rhythm broken by tapping a mic button every turn — you're not imagining it. Walkie-talkie UX is bad for conversation practice because the gap between "I'm done speaking" and "tap to send" is where your fluency dies. SpeakShark is moving Daily Talk toward a hands-free flow where the mic listens automatically after the AI's greeting and the AI responds the moment you stop talking. We tried shipping it this week, hit a browser edge case, rolled it back, and are wiring it up the right way.

I'll be honest — we shipped the tap-to-speak version of Daily Talk for nine months before realising it was the wrong default. The reasons it was wrong are not what I expected.

This post is about what changed and why I think it matters more than any AI model upgrade we've done.

The walkie-talkie problem

Every conversation app I've tried — Speak, ELSA, Quazel, ChatGPT Voice Mode, Replika, even Duolingo's max-tier — uses some flavour of "tap to start, tap to stop." The mental model is a walkie-talkie. You press a button, talk, release, the other party hears you, they respond.

This works fine for short utterances. It breaks for actual conversation.

Here's what I noticed when I watched twenty learners use SpeakShark with tap-to-speak:

1. They paused mid-sentence to think about pressing the button. A learner would say "I think the weather is — (hand reaches for screen) — going to be nice." That hand reach is dead air, and dead air during your own sentence is one of the worst feelings in language learning. It interrupts the flow that you're paying us to help build.

2. They forgot to press stop. The recorder would keep running for ten or fifteen seconds after they finished talking. The AI eventually got the full audio, but the lag felt like the app froze. Half of them assumed the app was broken.

3. They never used barge-in. If the AI started saying something boring or off-topic, the user couldn't interrupt without tapping a button to start a new recording. Real conversation has interruptions. Walkie-talkie conversation doesn't.

4. The first turn was always awkward. New users land on the session, hear the AI greeting, and then sit in silence because they don't realise they have to tap. We had a "👇 Tap to speak" hint on the button. Even with that, half of new users got stuck on turn 1.

The cumulative effect: conversations that should have flowed like a coffee chat felt like submitting form fields one at a time.

What "real conversation" actually requires

I went back to first principles. What does a real phone call look like? Three properties matter:

Always-on listening. The other person can speak any time. You don't grant them permission per utterance.
End-of-turn detection. They know you're done because you stopped talking, not because you announced it.
Interruption. Either party can interrupt at any moment.

The mic-tap UX violates all three.

Why we'd disabled the hands-free flow before

We've actually had hands-free conversation in SpeakShark before. We had it on in an early IELTS practice version, and learners loved it. Then we disabled it after a real problem cropped up: TV and other ambient noise was triggering false recordings.

We swung too far. Yes, ambient false-triggers are annoying. But they're annoying for a small slice of users in a small slice of sessions. The other 90% lost the hands-free flow entirely as collateral damage. Bad trade.

We re-enabled hands-free this week — but only for Daily Talk.

Why Daily Talk specifically

Different modes need different defaults. Daily Talk is the one where conversation IS the point. Challenges have timed prompts where you need control over when the clock starts. Role Play is scenario-driven and turn structure is part of the practice. Pronunciation is per-word drilling where automatic turn-taking would just get in the way.

So Daily Talk gets the hands-free flow; the other modes keep the tap. That's a deliberate choice, not laziness — we don't want one mode's defaults to break another mode's practice.

The flow we're aiming for

Here's what a Daily Talk session is meant to look like once the hands-free version ships properly:

User picks a teacher (Sarah / James / Emily / Liam). No topic picker.
Tap "Start". Page loads.
AI greeting plays automatically.
AI finishes. The mic silently starts listening. No "tap to speak" button needed.
User speaks. Words appear on screen as they're spoken (live transcript bubble).
User stops talking. The app detects the silence and the AI responds.
The conversation continues. The user never touches the mic button.

The mic button stays on screen as a fallback for the small number of cases where automatic detection doesn't catch the start (usually very noisy environments), but most users would go an entire session without using it.

In the version that's live right now, steps 1–3 work as described, but step 4 is still a manual tap on the mic button. We turned auto-listen on briefly this week and saw telemetry showing some users sitting silent on turn 1 — the browser had blocked the silent permission upgrade, the mic was never actually listening, and there was no visible cue that anything was wrong. Rather than ship a half-working "magic" UX, we reverted. The proper fix is to put a one-time "Tap to enable hands-free" gesture in front of it, which we're building next.

The bonus feature: live transcript bubble

While we were in there rebuilding the flow, we brought back something we'd removed weeks ago: the live transcript bubble.

When you speak, your words appear in the chat in real time. We'd disabled this before because partial transcripts sometimes showed broken or repeated words. That's true — live partial transcripts are messier than the final cleaned-up version. But again, we'd over-corrected. The information that your voice is being picked up is worth the occasional ugly word. New users especially need that signal — they need to see the app reacting to their voice, not just trust that something will happen in five seconds.

So the live bubble is back. It shows partial words during recording. When the turn submits, the bubble is replaced by the clean final transcript with errors highlighted.

The higher-EQ rewrite

While we were rebuilding the flow, we also rewrote how the AI talks back. The old persona was, frankly, casual-friend energy — lots of "Oh wow!" "Hmm interesting!" "Yeah totally!" Reasonable for some learners, condescending for others.

The new persona targets a thoughtful, emotionally intelligent adult:

No robotic affirmations. No "Great answer!" / "Excellent!" / "Wonderful!" after every turn.
Active listening. The AI references specifics from what you just said, not generic follow-ups.
Validation before pivot. When you share something hard, the AI acknowledges it before redirecting.
No slang. No "lol", "bro", "lowkey." This is IELTS-band-7-to-8 register — natural educated English, not Gen-Z TikTok English.
Phrasal verbs and idioms woven in naturally — "come across", "end up", "get on with", "right up my alley", "in the long run" — because that's how natives actually speak.
Sentence variety — simple, compound, complex sentences mixed; cohesive devices like "however", "that said", "on the other hand" used appropriately.

The opening few turns are gently guided — you'll be asked your name and what you do, which gives the AI context and gives new learners a soft on-ramp. After that, free conversation.

What this is and isn't

This is not voice-cloning, real-time AI, or a tech moonshot. The pieces have existed for years. The thing that's new is combining them in defaults that respect how humans actually have conversations.

This is not "perfect" yet. False-positive ambient triggers still happen sometimes. If you sit next to a TV running an English movie, the app will hear it and start recording. There's no good client-side fix for that yet. Future versions will let you tap an explicit "Pause auto-mic" button mid-session if you walk into a noisy room.

This is not a replacement for tap-to-speak. The button is still there. It's a fallback. If you prefer manual control — or you're in Challenges / Role Play / Pronunciation mode — the tap is still the default UX.

Try it (today, honestly)

If you want to try Daily Talk as it stands today, create a free account, pick a teacher, and start a session. The AI greets you, the new higher-EQ persona kicks in, the live transcript bubble shows your words as you speak — all of that is live. The one piece that isn't yet is the auto-listen on turn 1; you'll still tap the mic to start each turn while we wire the hands-free path up properly.

If you'd rather wait until the no-tap version is back, keep an eye on the blog — we'll publish a follow-up when it ships properly.

The button is still in the way for now. We're working on it.

Why We're Rebuilding Daily Talk Around the 'No Tap' Idea (Work in Progress)

The walkie-talkie problem

What "real conversation" actually requires

Why we'd disabled the hands-free flow before

Why Daily Talk specifically

The flow we're aiming for

The bonus feature: live transcript bubble

The higher-EQ rewrite

What this is and isn't

Try it (today, honestly)

Keep reading

We Made Every Sentence in Your Past Mistakes Searchable, Word by Word

What Finally Closed My English Speaking Gap

Why Scripted Dialogues Don't Make You Fluent