Devonian Times Masthead

The DEVONtechnologies Blog

How to Transcribe Speech in DEVONthink

June 24, 2025 — Jim Neumann
Screenshot of a DEVONthink window showing the Annotations & Reminders inspector. A transcript with timestamps is visible as an annotation.

If you have audio or video files you’ve downloaded or recorded yourself, the Pro and Server editions of DEVONthink let you use external AI to convert the audio into text. You can use the transcript to read what was said, find the media documents in the search, take notes on the text, or navigate through the audio.

First, open Settings > AI > Transcription and choose a Destination where the transcript will be stored. This setting also applies to text recognition in images. The options are:

  • Searchable Text: DEVONthink stores the transcript silently in the database’s index, making it possible to search for text in the media document.
  • Annotation: This creates an annotation document for the text, automatically linked to the media document. You can access the transcript in the Info: Annotations & Reminders inspector or open it in a separate window.
  • Comment: DEVONthink stores the transcribed text in the Finder Comment for the media document.

Then, choose an engine in the Audio & Video section. You can use Apple’s free speech engines to good effect. You’ll need Siri enabled to use the local engine. With their remote option, you’ll be sending the audio to their servers for processing. However, we do employ privacy-preserving techniques for the data being sent. And if you want to use the OpenAI model, you’ll need an API key.

Enabling Add timestamps to transcription will record times of certain parts of the audio, with at least one timestamp per minute. When used with the annotation option, these timestamps act as live links so you can jump to specific times in the audio of the document. And lastly, you can set the specific language of the audio, if needed, though Automatic works well in many cases.

Now, to transcribe a media document, select it and choose Data > Recognition > Transcribe Speech from the menu, the context menu, or via automation actions and commands like smart rules, batch processing, and scripting.