Dictation
Live dictation requires the Input Monitoring permission so the hotkey can fire while another app is focused. Without it, dictation is unavailable, but you can still transcribe audio and video files from the menu bar.
The basic flow
- Click into any text field: an email, a code editor, an LLM prompt, the search bar, anything that accepts typing.
- Press and hold your hotkey (Right Option by default).
- Speak. The recording overlay shows up with a live waveform so you know it’s listening.
- Release the key. TongueType transcribes locally and pastes the text where your cursor is.
That’s the whole loop. No window to switch to, no “send” button, no waiting on a server.
Choosing a hotkey
Open Settings… from the menu bar (⌘,) and pick from the Hotkey list. You’ve got 24 options:
- Modifier keys: Right or Left ⌥, ⌘, ⇧, or ⌃.
- Function keys: F1 through F15.
- The Globe key (fn) on modern Apple keyboards.
Most people stick with Right Option. It’s easy to hold with a thumb and you almost never use it for anything else. If you do something hand-heavy on the right side of the keyboard, try Left Option or Right Shift instead.
About the Globe key
If you pick fn, you may need to set “Press Globe key to” → “Do Nothing” in System Settings → Keyboard, otherwise macOS will keep popping up the emoji picker every time you start a sentence.
About function keys
Some function keys are mapped to brightness, volume, and similar by default. Enable “Use F1, F2, etc. as standard function keys” in System Settings → Keyboard if you want a clean function-key hotkey.
The grace period
If you tap the hotkey instead of holding it, TongueType ignores you. The default grace period is 100 ms, long enough to absorb accidental brushes against the key, short enough that intentional holds feel instant.
You can drag the slider in Settings from 0 to 1000 ms. A few starting points:
- 0 ms: instant, but you’ll get the occasional false start when you bump the key.
- 100 ms: the default. Feels instant for normal use.
- 250–500 ms: if you’re using a key you also use for other things (like ⌘), bump this up so quick keyboard shortcuts don’t trigger dictation.
Double-tap to latch (hands-free mode)
Holding a key while you talk is great for short bursts, but tedious for long ones. Turn on Double-tap to latch in Settings to fix that.
With latch enabled:
- Tap the hotkey twice within one second.
- TongueType keeps recording even after you let go.
- Tap the hotkey again to stop. Or hit Esc to throw the recording out.
It’s especially nice for long-form dictation: meeting notes, blog drafts, anything where you’ll be talking for more than a few seconds.
Canceling a recording
Press Esc at any point while recording. The audio is dropped, no transcription runs, and TongueType resets. Works for both held and latched recordings.
Canceling with your voice
End a recording with “scratch that,” “cancel that,” or “discard that” and TongueType throws the whole thing away instead of pasting it. Useful when you realize you’ve gone off the rails halfway through a sentence.
You can change the cancel phrases, or turn the feature off entirely, in Postprocessing.
The recording overlay
While you’re recording, a small pill appears on screen with a pulsing dot, your custom “listening” label, and a live audio waveform. Where it appears (top, center, or bottom of the screen) and what it looks like are configurable in Settings & appearance.
Output: insert vs clipboard
By default, TongueType inserts text at your cursor. If you’d rather have it land on the clipboard so you can paste it yourself, switch Output mode in Settings to Copy text to clipboard. Useful if you bounce between machines via screen sharing, or you just want a buffer.
If you skipped the Accessibility permission during setup, TongueType automatically falls back to clipboard mode and shows a brief reminder to press ⌘V. The transcription still arrives. See Step 4 of the setup guide if you’d like to enable direct insertion.
“Loading model…”
The Whisper model takes a minute or two to load onto the Neural Engine the first time after each launch. TongueType kicks that load off in the background as soon as the app starts (and as soon as the setup wizard opens), so by the time you reach for the hotkey it’s usually already ready. If you’re quick, you may see a brief loading overlay, followed by a small “ready” sound. Subsequent transcriptions are instant.
Where to next
- Postprocessing: turn spoken phrases like “new line” or “open parenthesis” into actual symbols, and write your own rules.
- Settings & appearance: microphones, languages, output mode, and the recording overlay.