This is a collection of command-line and GUI tools for capturing and analyzing audio data.
The most interesting tool is called keytap - it can guess pressed keyboard keys only by analyzing the audio captured from the computer's microphone.
Check this blog post for more details:
The keytap2 tool is another interesting tool for recovering text from audio. It does not require training data - instead it uses statistical information about the frequencies of the letters and n-grams in the English language. The tool is still in development, but you can see a short demonstration here:
SDL2 - used to capture audio and to open GUI windows libsdl
[Ubuntu] $ sudo apt install libsdl2-dev [Mac OS with brew] $ brew install sdl2
FFTW3 (optional) - some of the helper tools perform Fourier transformations fftw
Linux and Mac OS
git clone https://github.com/ggerganov/kbd-audio cd kbd-audio git submodule update --init mkdir build && cd build cmake .. make
(todo, PRs welcome)
Short summary of the available tools. If the status of the tool is not stable, expect problems and non-optimal results.
Record audio to a raw binary file on disk
./record-full output.kbd [-cN]
Playback a recording captured via the record-full tool
./play-full input.kbd [-pN]
Record audio only while typing. Useful for collecting training data for keytap
./record output.kbd [-cN]
Playback a recording created via the record tool
./play input.kbd [-pN]
Detect pressed keys via microphone audio capture in real-time. Uses training data captured via the record tool.
./keytap input0.kbd [input1.kbd] [input2.kbd] ... [-cN] [-pF] [-tF]
Detect pressed keys via microphone audio capture in real-time. Uses training data captured via the record tool. GUI version.
./keytap-gui input0.kbd [input1.kbd] [input2.kbd] ... [-cN]
keytap2-gui (work in progress)
Detect pressed keys via microphone audio capture. Uses statistical information (n-gram frequencies) about the language. No training data is required. The 'recording.kbd' input file has to be generated via the record-full tool and contains the audio data that will be analyzed. The 'n-gram.txt' file has to contain n-gram probabilities for the corresponding language.
./keytap2-gui recording.kbd n-gram.txt
Visualize waveforms recorded with the record-full tool. Can also playback the audio data.
Visualize training data recorded with the record tool. Can also playback the audio data.
Any feedback about the performance of the tools is highly appreciated. Please drop a comment here.