Whisper-FastAPI

Whisper-FastAPI is a very simple Python FastAPI interface for konele and OpenAI services. It is based on the faster-whisper project and provides an API for konele-like interface, where translations and transcriptions can be obtained by connecting over websockets or POST requests.

Features

Translation and Transcription: The application provides an API for konele service, where translations and transcriptions can be obtained by connecting over websockets or POST requests.
Language Support: If no language is specified, the language will be automatically recognized from the first 30 seconds.
Websocket and POST Method Support: The project supports a websocket (/konele/ws) and a POST method to /konele/post.
Audio Transcriptions: The /v1/audio/transcriptions endpoint allows users to upload an audio file and receive transcription in response, with an optional response_type parameter. The response_type can be 'json', 'text', 'tsv', 'srt', and 'vtt'.
Simplified Chinese: The traditional Chinese will be automatically convert to simplified Chinese for konele using opencc library.

Usage

Konele Voice Typing

For konele voice typing, you can use either the websocket endpoint or the POST method endpoint.

Websocket: Connect to the websocket at /konele/ws (or /v1/konele/ws) and send audio data. The server will respond with the transcription or translation.
POST Method: Send a POST request to /konele/post (or /v1/konele/post) with the audio data in the body. The server will respond with the transcription or translation.

You can also use the demo I have created to quickly test the effect at https://yongyuancv.cn/v1/konele/ws and https://yongyuancv.cn/v1/konele/post

OpenAI Whisper Service

To use the service that matches the structure of the OpenAI Whisper service, send a POST request to /v1/audio/transcriptions with an audio file. The server will respond with the transcription in the format specified by the response_type parameter.

You can also use the demo I have created to quickly test the effect at https://yongyuancv.cn/v1/audio/transcriptions

My demo is using the large-v2 model on RTX3060.

Getting Started

To run the application, you need to have Python installed on your machine. You can then clone the repository and install the required dependencies.

git clone https://github.com/heimoshuiyu/whisper-fastapi.git
cd whisper-fastapi
pip install -r requirements.txt

You can then run the application using the following command: (model will be download from huggingface if not exists in cache dir)

python whisper_fastapi.py --host 0.0.0.0 --port 5000 --model large-v2

This will start the application on http://<your-ip-address>:5000.

Limitation

Defect: Due to the synchronous nature of inference, this API can actually only handle one request at a time.

Name		Name	Last commit message	Last commit date
Latest commit History 48 Commits
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
requirements.txt		requirements.txt
requirements_version.txt		requirements_version.txt
start-docker.sh		start-docker.sh
start-podman.sh		start-podman.sh
whisper_fastapi.py		whisper_fastapi.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Whisper-FastAPI

Features

Usage

Konele Voice Typing

OpenAI Whisper Service

Getting Started

Limitation

About

Releases

Packages

Languages

heimoshuiyu/whisper-fastapi

Folders and files

Latest commit

History

Repository files navigation

Whisper-FastAPI

Features

Usage

Konele Voice Typing

OpenAI Whisper Service

Getting Started

Limitation

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages