Szymon Kaliski

Prototyping Component Re-Use, and the Simplest Whisper Wrapper

Hi!

In the last issue I mentioned I've been working on some more computer-y side-projects, and here's the first one: Vineyard — an exploration of prototype-based component re-use.

This idea was brewing in the back of my mind for years now — at the moment it still feels undercooked, but I'm curious to get some feedback and hear your thoughts. Go check it out, there's a bit of a longer write-up and a live version to play with ↗.


A friend recently made a dictation app in a weekend ↗.

Here's my 15-minute version:

#!/usr/bin/env bash

set -euo pipefail

export OPENAI_API_KEY="$(cat ~/.openai-key)"

tmp=$(mktemp /tmp/whisper.XXXXXX.wav)
trap 'rm -f "$tmp"' EXIT

echo "Recording... (press Ctrl-C to stop)"
rec --no-show-progress -c 1 -b 16 -r 16000 "$tmp"

[ "$(soxi -r "$tmp")" != "16000" ] && sox "$tmp" -q -r 16000 "${tmp%.wav}_16k.wav" && mv "${tmp%.wav}_16k.wav" "$tmp"

text=$(openai api audio.transcriptions.create -m whisper-1 -f "$tmp" --response-format text)

echo $text | pbcopy
printf "\nTranscript:\n%s\n" "$text"

Adding a global shortcut in Hammerspoon ↗ was another 5 minutes on top of that.

Yes, it sends the audio files to OpenAI and requires a network connection — I don't care, for my limited use it's more than ok.

It would be easy to speed up the recorded audio before beaming up to OpenAI to save on some of the cost ↗.

Worth Checking Out

What I've been reading lately:

On the web: