Guide · June 8, 2026 · 7 min read
How AI Meeting Notes Work Without Storing Any Audio
AI meeting notes can be made without storing audio by transcribing the sound live, in memory, and immediately discarding it. The audio is never written to disk or uploaded as a file — only the resulting text transcript and AI summary are saved. Nod holds audio in memory roughly five seconds, then releases it.
If you've read a privacy page that says "we don't store your recordings," it's fair to wonder whether that's literally true and how it could possibly work. A summary has to come from somewhere — so where does the audio go? This article walks through the actual mechanics: how transcription happens without a saved recording, what gets kept, what gets thrown away, and where the honest limits of "no stored audio" are.
Do AI note takers store your recordings?
Most do, by default. A traditional meeting bot dials into your call, records the whole thing, and uploads that audio file to a cloud service so it can be transcribed, summarized, and — usually — replayed later. The recording sticks around. That stored file is convenient if you want to scrub back to minute 14, but it's also a permanent copy of a confidential conversation sitting on someone else's server.
Some newer tools record audio and then delete it after transcribing. Granola, for example, captures your device's audio, uses it to enhance your notes, and removes the recording afterward — a meaningful step up from keeping everything forever.
A smaller group goes further and never writes an audio file at all. Instead of "record, then delete," the design is "transcribe in flight, never save." That's the category this article is about, and it's how Nod works. The distinction matters: a recording that's never created can't be leaked, subpoenaed, or quietly retained past its supposed deletion date.
How transcription works without saving the audio
The key idea is streaming. Your Mac is already producing audio during a call — your microphone is picking up your voice, and the system is playing back everyone else's. A bot-free desktop app reads those two streams at the operating-system level. It's listening to your computer, not joining the meeting as a participant.
Instead of accumulating that audio into a file, the app processes it in a continuous pipeline:
- Capture. The app reads the live microphone and system-audio streams locally on the Mac. Nothing has joined the call; there's no extra name in the attendee list.
- Buffer briefly. The incoming sound is collected into a short in-memory window — a chunk of roughly five seconds — held only in RAM. No file is opened on disk.
- Transcribe the chunk. That chunk is sent to a speech-to-text engine, which returns text.
- Release the audio. As soon as the transcript for that chunk comes back, the audio bytes are discarded. The memory is freed and reused by the next chunk.
- Save only the text. The transcript text (and later, an AI summary built from it) is what gets stored.
Repeat that loop a few hundred times across a meeting and you've transcribed the whole conversation without ever having written a recording. There's no audio file, no waveform export, no cloud-stored recording — at any point in the chain. You can see this documented in Nod's Security & Privacy page, in the audio-recordings and how-data-flows sections.
The in-memory buffer (why ~5 seconds is enough)
Five seconds sounds short, but it's plenty. Speech-to-text models don't need the whole meeting at once — they work on overlapping windows of audio a few seconds long, which is enough context to transcribe a phrase accurately. By keeping the buffer that small, the app never holds more than a few seconds of sound in memory at a time, and that sound exists only in volatile RAM that's continuously overwritten. Nothing is ever flushed to disk. If your Mac lost power mid-call, there would be no recording to recover, because none was ever being written.
What's stored vs what's discarded
This is the part worth being precise about. Here's exactly what happens to each kind of data when Nod takes notes:
| Discarded (never saved) | Stored (saved to your account) |
|---|---|
| Raw audio of the call | Text transcript (per chunk, with speaker side + timestamp) |
| Any audio file on disk | AI-generated meeting summary |
| Waveform / audio export | Search embeddings of your transcript (so you can ask across meetings) |
| Cloud-stored recording | Light metadata: title, time, language, session type |
| — | A one-time consent acknowledgement audit row (no meeting content) |
The audio appears only in the "discarded" column. Everything in the "stored" column is text or numbers derived from text — the things you'd actually want to read, search, and act on. Those stored items live in the EU (Supabase Postgres on AWS eu-west-1, Ireland), encrypted at rest with AES-256, with per-user Row-Level Security so no other account can read your data. Because there's no recording, "give me the audio back" is a request Nod literally cannot fulfill — only the transcript and summary exist.
Why does storing audio create risk?
Every stored recording is a liability that grows over time. Three concrete reasons people avoid it:
Breach surface. A library of full meeting recordings on a vendor's server is a high-value target. If that store is compromised, the attacker gets your actual conversations, not just notes. No stored audio means there's nothing in that particular bucket to steal.
Subpoena and legal exposure. Recordings can be compelled in litigation. A conversation that was never saved as audio can't be produced as audio — there's simply no file to hand over. That's a meaningful reduction in exposure for sensitive client, legal, or health-related discussions.
Retention obligations. Once you hold recordings, you inherit the duty to manage, secure, and eventually delete them — and to prove you did. Not creating the file in the first place is the cleanest form of data minimization there is.
This is the same logic behind avoiding intrusive meeting bots in general. If you want the deeper version, see why stored recordings are risky and the broader case for bot-free meeting notes.
Does no stored audio mean fully offline?
No — and this is the honest nuance that's easy to get wrong. "No stored audio" describes what happens to the recording. It does not mean the entire process runs on your machine with no internet.
Here's the accurate picture for Nod. The capture is local: the audio is read on your Mac through macOS APIs, and no recording is ever written. But the transcription and summarization run in the cloud — the short audio chunks are sent over an encrypted connection to be transcribed, and the transcript is sent to an AI model to produce the summary. Those calls go to EU-hosted inference configured with Zero Data Retention, and the upstream providers contractually do not train on your data. The audio chunk is processed and released; it isn't stored anywhere along the way, including by the inference providers.
So Nod is local capture plus EU cloud inference with no stored audio — not an on-device, fully offline transcription tool. If you specifically need everything to run with no network at all, that's a different category of product (typically built around a local Whisper model on your machine), and you should look at those directly. We cover the difference in detail in the guide to local AI meeting notes for Mac. The strength of the no-stored-audio approach isn't that it's offline; it's that there's no recording to leak, the processing happens under strict no-retention and no-training terms, and your notes stay searchable across every conversation.
Frequently asked questions
Does Nod keep a recording of my meeting?
No. Nod never writes an audio file. Sound is held in memory for roughly five seconds while a chunk is transcribed, then the bytes are released. Only the resulting transcript and summary are saved. This is documented in the Security & Privacy page.
Is the audio uploaded anywhere?
Short audio chunks are sent over an encrypted connection to be transcribed, then immediately discarded — they're never stored as a file by Nod or by the transcription provider. What's persisted is text: the transcript and summary, stored in the EU.
Can I get the audio back later?
No. Because no recording is ever created, there's nothing to retrieve. You'll have the full text transcript and the AI summary, but the audio itself only ever existed for a few seconds in memory.
If audio isn't stored, how is the summary made?
The summary is built from the transcript text, not from a stored recording. Each five-second chunk is transcribed in flight; those transcripts are assembled into the full text, and an AI model turns that text into a structured recap — topics, decisions, action items, and open questions.
Is this the same as an offline transcription tool?
No. Nod captures audio locally but transcribes and summarizes via EU cloud inference with Zero Data Retention. A fully offline tool runs the speech-to-text model on your own machine with no network. Both can avoid storing audio; only the offline tools keep everything on-device. See GDPR-compliant transcription for how Nod handles the cloud hop.
Try Nod
Nod is an AI notepad for macOS that captures your Mac's own audio — no bot, no participant in the call — and turns each meeting into a clean, searchable summary without ever storing the recording. It's free during private beta, with pricing published before any billing begins. If you want notes without a saved recording, you can download Nod for Mac and try it on your next call.