目前使用speech-dispatcher和Piper(通过Flatpak分发的Pied)作为系统语音合成器。
不过遇到一个问题是speech-dispatcher(使用中文语音模型zh_CN-huayan-medium.onnx)阅读中文会跳过中文字符,只识别出了字母和数字。
搜了一下虽然找到了解决办法,在~/.config/speech-dispatcher/modules/piper.conf中添加了GenericLanguage "zh" "zh-CN" "utf-8"后重启speech-dispatcher服务,在终端模拟器输入spd-say "一二三四" 可以输出正常语音,而且启用的了GNOME的屏幕阅读器后也能正常朗读系统中的中文部分。
但是在Flatpak版本的Firefox和Foliate中使用speech-dispatcher仍然会出现只识别字母和数字的问题。
貌似Flatpak版的Firefox和Foliate是通过speech-dispatcher.socket来使用系统的speech-dispatcher服务,看了看服务状态也已经读取了~/.config/speech-dispatcher/modules/piper.conf的配置。
不太清楚是哪个环节出了问题,本来怀疑是Flatpak的沙盒环境,但用pacman安装的Firefox一样会出现同样的问题。
相关信息:
~/.config/speech-dispatcher/speechd.conf
###
### THIS CONFIG WAS GENERATED BY PIED
###
SymbolsPreproc "char"
SymbolsPreprocFile "gender-neutral.dic"
SymbolsPreprocFile "font-variants.dic"
SymbolsPreprocFile "symbols.dic"
SymbolsPreprocFile "emojis.dic"
SymbolsPreprocFile "orca.dic"
SymbolsPreprocFile "orca-chars.dic"
AddModule "piper" "sd_generic" "piper.conf"
DefaultModule "piper"
LanguageDefaultModule "zh" "piper"
LanguageDefaultModule "zh-CN" "piper"
Include "clients/*.conf"
~/.config/speech-dispatcher/modules/piper.conf
GenericExecuteSynth "if command -v sox > /dev/null; then\
PROCESS=\'sox -r 22050 -c 1 -b 16 -e signed-integer -t raw - -t wav - tempo $RATE pitch $PITCH norm\';\
if command -v paplay > /dev/null; then\
OUTPUT=\'$PLAY_COMMAND\';\
else\
OUTPUT=\'aplay\';\
fi;\
elif command -v paplay > /dev/null; then\
PROCESS=\'cat\'; OUTPUT=\'$PLAY_COMMAND --raw --channels 1 --rate 22050\';\
else\
PROCESS=\'cat\'; OUTPUT=\'aplay -t raw -c 1 -r 22050 -f S16_LE\';\
fi;\
echo \'$DATA\' | /home/shaan/.var/app/com.mikeasoft.pied/data/pied/piper/piper --model /home/shaan/.var/app/com.mikeasoft.pied/data/pied/models/zh_CN-huayan-medium.onnx -s 0 --output_raw | $PROCESS | $OUTPUT;"
GenericLanguage "zh" "zh-CN" "utf-8"
GenericRateAdd 1
GenericPitchAdd 1
GenericVolumeAdd 1
GenericRateMultiply 1
GenericPitchMultiply 1000
AddVoice "zh_CN" "MALE1" "Piper"
systemctl --user status speech-dispatcher.socket
● speech-dispatcher.socket - Speech Dispatcher Socket
Loaded: loaded (/usr/lib/systemd/user/speech-dispatcher.socket; enabled; preset: enabled)
Active: active (running) since Fri 2025-11-14 12:32:34 CST; 10h ago
Invocation: 9a4d8fe80e744c909544d5aca963b655
Triggers: ● speech-dispatcher.service
Listen: /run/user/1000/speech-dispatcher/speechd.sock (Stream)
11月 14 12:32:34 cyan systemd[5306]: Listening on Speech Dispatcher Socket.
systemctl --user status speech-dispatcher.service
● speech-dispatcher.service - Common interface to speech synthesizers
Loaded: loaded (/usr/lib/systemd/user/speech-dispatcher.service; static)
Active: active (running) since Fri 2025-11-14 22:43:09 CST; 21min ago
Invocation: e2546061208f4ba9a3ec5bfb55944829
TriggeredBy: ● speech-dispatcher.socket
Main PID: 429325 (speech-dispatch)
Tasks: 12 (limit: 37210)
Memory: 7.9M (peak: 148.2M)
CPU: 18.391s
CGroup: /user.slice/user-1000.slice/user@1000.service/app.slice/speech-dispatcher.service
├─429325 /usr/bin/speech-dispatcher -s -t 0
├─429345 /usr/lib/speech-dispatcher/speech-dispatcher-modules/sd_generic /home/shaan/.config/speech-dispatcher/modules/piper.conf
└─429349 /usr/lib/speech-dispatcher/speech-dispatcher-modules/sd_dummy /home/shaan/.config/speech-dispatcher/modules/
11月 14 22:43:09 cyan systemd[5306]: Stopped Common interface to speech synthesizers.
11月 14 22:43:09 cyan systemd[5306]: speech-dispatcher.service: Consumed 24.601s CPU time, 213.9M memory peak.
11月 14 22:43:09 cyan systemd[5306]: Started Common interface to speech synthesizers.
11月 14 22:43:10 cyan speech-dispatcher[429325]: [Fri Nov 14 22:43:10 2025 : 12693] speechd: Speech Dispatcher 0.12.1 starting
11月 14 22:43:10 cyan speech-dispatcher[429325]: [Fri Nov 14 22:43:10 2025 : 12800] speechd: Configuration has been read from "/home/shaan/.config/speech-dispatcher/speechd.conf"