Skip to content

Transcribes a directory of voice notes (WAV files) and forwards them to an Emacs org-file.

Notifications You must be signed in to change notification settings

kheast/voicenotes2org

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

21 Commits
 
 
 
 
 
 
 
 

Repository files navigation

voicenotes2org

./formatted-output.png

voicenotes2org is a Python script which collects WAV files in a given directory, sends them to Google Cloud Platform (GCP) for transcription, and then formats the resulting transcripts into a combined org file, including links back to the original audio.

Each note becomes a heading in the org file, and includes:

  1. The date and time of the note
  2. An org-link of type voicenote, which, when followed, plays the original audio file in EMMS
  3. Google’s transcript of the note, broken down into 10-second segments. Each segment begins with a voicenote link which will play the original audio file at the time offset which corresponds to that segment.

Prerequisites

This uses Google’s Cloud Speech-to-Text API, and you will need your own GCP account. Make sure you have a service account JSON.

Other than that, you’ll need python3 and ffmpeg on your system. Only tested on Arch Linux.

Installation

Clone this repo, then install like so:

git clone https://github.com/bgutter/voicenotes2org
cd voicenotes2org
pip install . # optionally with sudo, depending on your system

It’s also on PyPI as voicenotes2org, but not usually up to date there.

sudo pip install voicenotes2org

Basic Usage

Transcription jobs can be defined on the command line, or in a config file.

CLI Example:

> voicenotes2org --voice_notes_dir=~/new-voice-notes/ --archive_dir=~/org/archived-voice-notes/ --org_transcript_file=~/org/unfiled-voice-notes.org

…or…

Config File:

> cat ~/.config/voicenotes2org/default.toml
voice_notes_dir="~/new-voice-notes/"
archive_dir="~/org/archived-voice-notes/"
org_transcript_file="~/org/unfiled-voice-notes.org"

> voicenotes2org

Note that, in the config file, all relative paths will be interpreted as relative to the config file. For example, “filename_regex.txt” in ~/.config/voicenotes2org/default.toml will be treated as ~/.config/voicenotes2org/filename_regex.txt.

In both case, the script will find every WAV file in ~/new-voice-notes/, and transcribe them. After transcription, they will be moved to ~/org/archived-voice-notes/. If ~/org/unfiled-voice-notes.org does not exist, it will be created with an eval header statement which defines the voicenote link type. If the file already exists, voicenotes2org will only append content, leaving existing content unmodified. There will be one new heading for each WAV file transcribed.

Optional Arguments

OptionMeaning
--gcp_credentials_pathPath to JSON file. If provided, use this to access the Google Speech-to-Text API. If missing, you must have configured the GOOGLE_APPLICATION_CREDENTIALS environment variable!
--voicenote_filename_regex_pathPath to a text file containing a Python regex which will be used to match and parse voice note filenames. It MUST contain named groups for year, month, day, hour, minute, and ampm. All but ampm are local date/time (or, whatever you want, really), 12 hour clock. ampm should be either literally am or pm. This is an unsanitized input. Be smart.
--max_concurrent_requestsMaximum number of concurrent transcription requests.
--verboseDefault false. Print the name of WAV files currently being transcribed.
--just_copyBoolean. Default false. If true, don’t remove audio from original folder.

If you prefer to avoid eval statements in your file headers, you may instead include this somewhere in your init code:

(org-link-set-parameters "voicenote"
                         :follow (lambda (content)
                                   (cl-multiple-value-bind (file seconds)
                                       (split-string content ":")
                                     (emms-play-file file)
                                     (sit-for 0.5)
                                     (emms-seek-to (string-to-number seconds)))))

Example Output

Formatted Output:

./formatted-output.png

Plain Text:

# -*- eval: (org-link-set-parameters "voicenote" :follow (lambda (content) (cl-multiple-value-bind (file seconds) (split-string content ":") (emms-play-file file) (sit-for 0.5) (emms-seek-to seconds)))) -*-
#+TITLE: Unfiled Voice Notes

C-c C-o on any link to play clip starting from that offset.

* New Voice Note
[2020-01-01 Wed 00:52]
[[voicenote:~/org/archived-voice-notes/My recording 2020-01-01 12-52 AM 143.wav:0][Archived Clip]]

[[voicenote:~/org/archived-voice-notes/My recording 2020-01-01 12-52 AM 143.wav:0][00:00]] this is a second voice note I am talking into a phone right now roses are red violets are blue
[[voicenote:~/org/archived-voice-notes/My recording 2020-01-01 12-52 AM 143.wav:10][00:10]] blah blah blah


* New Voice Note
[2020-01-01 Wed 00:52]
[[voicenote:~/org/archived-voice-notes/My recording 2020-01-01 12-52 AM 142.wav:0][Archived Clip]]

[[voicenote:~/org/archived-voice-notes/My recording 2020-01-01 12-52 AM 142.wav:0][00:00]] this is a voice note for testing this is the first one that I will do I'm going to talk about nothing
[[voicenote:~/org/archived-voice-notes/My recording 2020-01-01 12-52 AM 142.wav:10][00:10]] because I don't know what else to say


* New Voice Note
[2020-01-01 Wed 00:53]
[[voicenote:~/org/archived-voice-notes/My recording 2020-01-01 12-53 AM 144.wav:0][Archived Clip]]

[[voicenote:~/org/archived-voice-notes/My recording 2020-01-01 12-53 AM 144.wav:0][00:00]] Mona Lisa lost her smile the painters hands are trembling now and if she's out
[[voicenote:~/org/archived-voice-notes/My recording 2020-01-01 12-53 AM 144.wav:10][00:10]] there running wild it's just because I taught her how the Masterpiece that we had planned is laying shattered
[[voicenote:~/org/archived-voice-notes/My recording 2020-01-01 12-53 AM 144.wav:20][00:20]] on the ground Mona Lisa lost her smile and the painters hands are trembling now and the eyes that used to burn for
[[voicenote:~/org/archived-voice-notes/My recording 2020-01-01 12-53 AM 144.wav:30][00:30]] me now they no longer look my way and the love that used to be why it just got lost in yesterday
[[voicenote:~/org/archived-voice-notes/My recording 2020-01-01 12-53 AM 144.wav:40][00:40]] and if she seems cold to the touch well there used to be burn a flame I gave to a little took
[[voicenote:~/org/archived-voice-notes/My recording 2020-01-01 12-53 AM 144.wav:50][00:50]] too much til I erased the painter's name ... too much till I erased the painter's name

WAV file naming rules

Unless you define your own regex file, WAV files must be named according to the following pattern:

.* YYYY-MM-DD H-MM AM|PM .*.wav

Where:

  • YYYY is the year.
  • MM is zero-padded month.
  • DD is zero-padded day.
  • H is unpadded (sorry) hour in 12-hour format.
  • MM is zero-padded minute.
  • AM|PM is literally just “AM” or “PM”.
  • Everything is whitespace delimited.

🚨 Limitations 🚨

Many corners have been cut in the making of this script. If literally anyone else ever uses this code, these issues might be worth fixing some day.

Only WAV files are supported

Wouldn’t be hard to figure out the file format, but Google’s transcription API requires non-WAV formats specify things like sample rate and encoding. I did not need this.

Ugliness caused by avoiding Google Cloud Storage

Google caps the duration of audio which has been inlined into the transcription request at 1 minute. Anything longer than that, and you need to configure a Google Cloud Storage bucket. I didn’t want to, so I split each voice note into 55-second chunks with a 5-second overlap.

For example, a 3 minute long voice note is actually transcribed in 4 separate chunks:

  1. 0:00 to 0:55 – 55 seconds
  2. 0:50 to 1:45 – 55 seconds, first 5 overlap
  3. 1:40 to 2:35 – 55 seconds, first 5 overlap
  4. 2:30 to 3:00 – 30 seconds, first 5 overlap

To reduce (or, maybe produce) confusion, I insert an ellipsis (…) into the transcription wherever we’re about to start inserting overlapped content. For example:

and we went to the store for some ... the store for some candy to bring with us

This is ugly and lazy and later versions might improve this.

Example Workflow

This is how I integrate my voice recordings into org-mode.

Convenient Voice Recording

I record voice notes on my Android device using “Easy Voice Recorder”. I use this app specifically because it provides a system shortcut to toggle recording. The first invocation of this shortcut begins recording, and the second stops recording, saving the audio to a new WAV file. A third invocation would start recording again, but with another new file.

This app also lets you specify how audio files should be named, which makes it easy to encode date and time.

Most importantly, I use the “Button Mapper” app to bind a long-press of the volume-up key to this shortcut. This works even when the screen is off.

With this setup, ideas, tasks, and notes can be recorded instantly and effortlessly. Just long hold the volume up key, say whatever needs to be said, and long hold again to complete the file. No unlocking the phone, and no interacting with the touchscreen.

Alternatively, If you don’t mind carrying a second device, a dedicated voice recorder would work at least as well.

Syncing The Audio Files

I use Syncthing to sync the voice notes directory on my Android device to a directory on my PC. This is probably the easiest way to achieve near realtime syncing, and Syncthing is FOSS!

Alternatively, you can manually copy the files every evening over USB, or SSH, or Google Drive, or…well, you get the idea.

Transcription

In my org directory structure, I have a file dedicated to receiving transcribed, but not yet properly filed, voice notes. Let’s say that this is at ~/org/unfiled-voice-notes.org. Let’s also assume that my untranscribed voice notes are synced – by Syncthing – to ~/new-voice-notes/.

If I run the example command under the Basic Usage heading, then absent any errors, ~/new-voice-notes/ will be cleared out. This frees up space on the phone, though otherwise isn’t all that important. What is important is that, for each processed audio file, a new heading will appended to ~/org/unfiled-voice-notes.org. The audio file will now live in ~/org/archived-voice-notes/, and any file links in the org entries will point to this location. Because the links are absolute, the headings can be moved around wherever you’d like and will not break.

Filing

Once voicenotes2org has returned, you should open ~/org/unfiled-voice-notes.org in Emacs, then use org-refile to pop each entry into a more proper location in your org directory structure. Make sure you’ve configured org-refile-targets first!

About

Transcribes a directory of voice notes (WAV files) and forwards them to an Emacs org-file.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%