This project provides a speech-to-text API backend using OpenAI's Whisper model provided by faster_whisper for transcribing speech to text. The backend receives speech from a mobile frontend application, processes it, and returns the transcribed text.
- OpenAI Whisper Integration: Leveraging Whisper's advanced speech-to-text capabilities.
- Mobile Frontend Integration: The mobile app sends audio data to the backend for transcription.
- RESTful API: Exposes endpoints to handle audio input and return transcribed text.
- Backend: Flask-based API using OpenAI Whisper from faster_whisper for speech recognition.
- Mobile Frontend: Kotlin app developed to capture and send speech data.
- API: RESTful API facilitating the communication between mobile frontend and the backend.
- Python (tested on 3.12.5)
- OpenAI Whisper from faster-whisper huggingface
- Flask
- Kotlin development environment Android Studio
- Clone the repository:
git clone https://github.com/LukasSeratowicz/Speech-to-Text-with-OpenAI-Whisper-API-Backend-and-Mobile-Integration.git cd Speech-to-Text-with-OpenAI-Whisper-API-Backend-and-Mobile-Integration
-
Open Backend folder:
cd Backend
-
Install dependencies:
pip install -r requirements.txt
-
Run the API backend:
python3 app.py
- Open Frontend folder as a project in Android Studio
- Enjoy!
- Send an audio file from the mobile app to the backend API.
- The backend processes the file using OpenAI Whisper and returns the transcribed text.
- Example request for Windows Testing:
A status will be displayed, to get the status curl
curl -X POST -F "[email protected]" http://localhost:5000/transcribe
status
:curl http://localhost:5000/status/b3f2bb95-cd03-46e3-a416-74eb5678472c
Feel free to copy and modify as you please, but make sure to credit us:
L. Seratowicz [email protected]
T. Czajkowski