Instructions to use formospeech/cohere-transcribe-03-2026-taiwanese-hakka with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use formospeech/cohere-transcribe-03-2026-taiwanese-hakka with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("automatic-speech-recognition", model="formospeech/cohere-transcribe-03-2026-taiwanese-hakka", trust_remote_code=True)# Load model directly from transformers import AutoProcessor, AutoModelForSpeechSeq2Seq processor = AutoProcessor.from_pretrained("formospeech/cohere-transcribe-03-2026-taiwanese-hakka", trust_remote_code=True) model = AutoModelForSpeechSeq2Seq.from_pretrained("formospeech/cohere-transcribe-03-2026-taiwanese-hakka", trust_remote_code=True) - Notebooks
- Google Colab
- Kaggle
You need to agree to share your contact information to access this model
This repository is publicly accessible, but you have to accept the conditions to access its files and content.
This model is released under the Creative Commons Attribution-NonCommercial 4.0 International License, also known as CC BY-NC 4.0.
Access is limited to non-commercial use. Users must provide appropriate attribution when sharing or adapting the model where required by the license. Commercial use, including use in paid products, paid services, or commercial deployment, is not permitted under this license.
Log in or Sign Up to review the conditions and access this model content.
Model Card for cohere-transcribe-03-2026-taiwanese-hakka
This model is a fine-tuned version of CohereLabs/cohere-transcribe-03-2026 for Taiwanese Hakka automatic speech recognition with Hanzi output.
Training process
The training of the model was performed with the following hyperparameters:
- Hardware: 4x NVIDIA L40S
- Per-device batch size: 4
- Gradient accumulation steps: 32
- Total training steps: 4795
- Best checkpoint: step 2877
- Learning rate: 2e-4
- Warmup ratio: 0.02
- Optimizer: adamw_torch_fused
- LR scheduler type: linear
- Decoder prompt language:
zh - Max audio length: 35 seconds
Training data
The model was trained on the following datasets:
formospeech/hat_asr_sixian_reading_cleanformospeech/hat_asr_sixian_broadcast_cleanformospeech/hat_asr_nansixian_reading_cleanformospeech/hat_asr_hailu_reading_cleanformospeech/hat_tts_hailu_cleanformospeech/hat_tts_sixian_cleanformospeech/fsr23_eval_cleanformospeech/fsr25_warmup_reading_cleanformospeech/fsr25_train_cleanformospeech/fsr25_final_cleanformospeech/fsr25_warmup_media_cleanformospeech/hakka_elearning_example_cleanformospeech/hakkatv_hanzawa_cleanformospeech/hakka_elearning_yt_cleanformospeech/hakkaradio_news_clean
Comparison with formospeech/whisper-large-v2-taiwanese-hakka-v1
| Model | Hailu CER | Hailu Norm CER | Sixian CER | Sixian Norm CER | Speed (RTFx) |
|---|---|---|---|---|---|
formospeech/whisper-large-v2-taiwanese-hakka-v1 |
7.21 | 3.29 | 8.69 | 4.88 | 144.45 |
formospeech/cohere-transcribe-03-2026-taiwanese-hakka |
10.80 | 3.99 | 13.26 | 5.58 | 524.88 |
Speed numbers above are temporary values taken from the Open ASR Leaderboard. They will be replaced later with direct vLLM measurements.
Usage
Access and Authentication
This model is hosted as a gated Hugging Face repository. Before using it:
- Visit the model page and request access.
- Log in with the same Hugging Face account that has been granted access.
- Authenticate your local environment with a Hugging Face access token.
A read token is sufficient for inference.
pip install -U huggingface_hub
hf auth login
Alternatively, you can provide the token through the HF_TOKEN environment variable:
export HF_TOKEN=hf_xxx
Do not hard-code your Hugging Face token in scripts, notebooks, or public repositories.
If you see an error such as Cannot access gated repo, make sure that:
- your Hugging Face account has been granted access to this model;
hf auth whoamishows the expected account;HF_HUB_DISABLE_IMPLICIT_TOKENis not set.
This model follows the same inference interface as Cohere Transcribe in transformers.
Installation
pip install "transformers>=5.4.0" torch huggingface_hub soundfile librosa sentencepiece protobuf
Python API
from transformers import AutoProcessor, CohereAsrForConditionalGeneration
from transformers.audio_utils import load_audio
model_id = "formospeech/cohere-transcribe-03-2026-taiwanese-hakka"
processor = AutoProcessor.from_pretrained(model_id)
model = CohereAsrForConditionalGeneration.from_pretrained(model_id, device_map="auto")
audio = load_audio("path/to/audio.wav", sampling_rate=16000)
inputs = processor(audio, sampling_rate=16000, return_tensors="pt", language="zh")
inputs.to(model.device, dtype=model.dtype)
outputs = model.generate(**inputs, max_new_tokens=256)
text = processor.decode(outputs, skip_special_tokens=True)
print(text)
Notes
- This release contains inference files only. Optimizer states and trainer checkpoints are intentionally excluded.
- The tokenizer and processor format follow the upstream Cohere Transcribe release.
- Downloads last month
- 6
Model tree for formospeech/cohere-transcribe-03-2026-taiwanese-hakka
Base model
CohereLabs/cohere-transcribe-03-2026Evaluation results
- CER on Hakkaradio News Clean (Sixian)test set self-reported13.260
- Normalized CER on Hakkaradio News Clean (Sixian)test set self-reported5.580
- CER on Hakkaradio News Clean (Hailu)test set self-reported10.800
- Normalized CER on Hakkaradio News Clean (Hailu)test set self-reported3.990