Quick Start with Whisper

Whisper is OpenAI's general-purpose speech recognition model that accurately converts speech to text. This guide shows you how to set up and run Whisper using the LlamaEdge whisper API server server, which provides an OpenAI-compatible API interface.

Install WasmEdge

First off, you'll need WasmEdge along with the necessary plugin for whisper, open your terminal and execute the following command:

curl -sSf https://raw.githubusercontent.com/WasmEdge/WasmEdge/master/utils/install_v2.sh | bash -s

Next, install the wasi-nn-whisper plugin manually. We're working to improve the installation experience. Stay tuned.

For Mac Apple Silicon

# Download the whisper plugin for Mac Apple Silicon
curl -LO https://github.com/WasmEdge/WasmEdge/releases/download/0.14.1/WasmEdge-plugin-wasi_nn-whisper-0.14.1-darwin_arm64.tar.gz

# Unzip the plugin to $HOME/.wasmedge/plugin
tar -xzf WasmEdge-plugin-wasi_nn-whisper-0.14.1-darwin_arm64.tar.gz -C $HOME/.wasmedge/plugin

For CUDA 12.0 (Ubuntu)

# Download the stable diffusion plugin for cuda 12.0
curl -LO https://github.com/WasmEdge/WasmEdge/releases/download/0.14.1/WasmEdge-plugin-wasi_nn-whisper-cuda-12.0-0.14.1-ubuntu20.04_x86_64.tar.gz

# Unzip the plugin to $HOME/.wasmedge/plugin
tar -xzf WasmEdge-plugin-wasi_nn-whisper-cuda-12.0-0.14.1-ubuntu20.04_x86_64.tar.gz -C $HOME/.wasmedge/plugin

For CUDA 11.0 (Ubuntu)

# Download the stable diffusion plugin for cuda 11.0
curl -LO https://github.com/WasmEdge/WasmEdge/releases/download/0.14.1/WasmEdge-plugin-wasi_nn-whisper-cuda-11.3-0.14.1-ubuntu20.04_x86_64.tar.gz

# Unzip the plugin to $HOME/.wasmedge/plugin
tar -xzf WasmEdge-plugin-wasi_nn-whisper-cuda-11.3-0.14.1-ubuntu20.04_x86_64.tar.gz -C $HOME/.wasmedge/plugin

For release assets for other platform, please check out the plugin release assets page.

Download the portable API server app

Download the API server application. It's a Wasm file, which is lightweight (the size of the server is 3.7 MB) and cross-platform.

curl -LO https://github.com/LlamaEdge/whisper-api-server/releases/download/0.3.9/whisper-api-server.wasm

Download the whisper model

You can browse and download the ggml model from https://huggingface.co/ggerganov/whisper.cpp/tree/main.

curl -LO https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-medium.bin

Run the Whisper model

Start the whisper API server.

wasmedge --dir .:. whisper-api-server.wasm -m ggml-medium.bin

The server will start on port 8080 by default.

Use the API

Transcribe an audio file (Speech-to-Text)

Download a test audio file:

curl -LO https://github.com/LlamaEdge/whisper-api-server/raw/main/data/test.wav

Send a transcription request:

curl --location 'http://localhost:8080/v1/audio/transcriptions' \
  --header 'Content-Type: multipart/form-data' \
  --form 'file=@"test.wav"'

For non-English audio, specify the language as below. To check the language code, please refer to List of ISO 639 language codes.

curl --location 'http://localhost:8080/v1/audio/transcriptions' \
 --header 'Content-Type: multipart/form-data' \
 --form 'file=@"test.wav"' \
 --form 'language="ja"'

Example response:

{
    "text": "[00:00:00.000 --> 00:00:03.540]  This is a test record for Whisper.cpp"
}

Translate Audio (Speech-to-Text with Translation)

Download a test audio file:
```
curl -LO https://github.com/LlamaEdge/whisper-api-server/raw/main/data/test_cn.wav
```
This audio contains a Chinese sentence, 这里是中文广播, the English meaning is This is a Chinese broadcast.

Send a translation request:

curl --location 'http://localhost:8080/v1/audio/translations' \
  --header 'Content-Type: multipart/form-data' \
  --form 'file=@"test-cn.wav"' \
  --form 'language="cn"'

Example response:

{
  "text": "[00:00:00.000 --> 00:00:04.000]  This is a Chinese broadcast."
}

That's all! Use whisper to process your audio now!

Quick Start with Whisper

Install WasmEdge​

Download the portable API server app​

Download the whisper model​

Run the Whisper model​

Use the API​

Translate Audio (Speech-to-Text with Translation)​