T.E.O.'s Draft--Cascading Speech Style Sheets

JuanJo Miguez (JuanJo.Miguez@esat.kuleuven.ac.be)
Wed, 21 Feb 1996 16:37:44 +0100 (MET)

Messages sorted by: [ date ][ thread ][ subject ][ author ]
Next message: T. V. Raman: "T.E.O.'s Draft--Cascading Speech Style Sheets"
Previous message: Scott E. Preece: ""units" for URI values"
Next in thread: T. V. Raman: "T.E.O.'s Draft--Cascading Speech Style Sheets"

<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0 plus SQ/ICADD Tables//EN"
"html.dtd">
T.E.O.'s Draft--Cascading Speech Style Sheets

T.E.O.'s Draft--Cascading Speech Style Sheets

K.U. Leuven

Ing. to be Juan Jose Miguez Iglesias Juanjo.Miguez@KULeuven.ac.be
in. Filip Evenepoel Filip.Evenepoel@KULeuven.ac.be
in. Bart BAwens Bart.Bauwens@KULeuven.ac.be
Prof.dr.in Jan Engelen Jan.Engelen@KULeuven.ac.be
Prof.ing Antonio S. Pena from the E.T.S.I.Telecomunication of Vigo (Spain)

A simple definition

The T.E.O. group at the Katholique University of Leuven in Belgium believe that the best way to include Speech within the CSS is to make it simple and general, so that it's easy to use. We agree with the Raman T.V. Initial Draft that is very interesting to include Speech in the CSS but we don't want to make it very complicated. Many people doesn't even know decibels, most actual speech synthesizers are mono and it's easier to give values to some features with numbers (in a more theoretical way, then this values will be mapped to the real values for each synthesizer).

We have defined the set of properties for Cascading Speech Style Sheets like in the CSS1 Working draft:

Speech

Volume
Value: | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 |
Initial: 0
Applies to: All elements
Example: volume: 5
The reason why the default value is 0 is because normally there will not be sound, but in the case that other value is specified the speech syntetizer will start working. There are many sets of values in the volume range (and all the other set of properties) depending on which speech synthesizer you use, so theese theoretical values will be mapped into the real values used by the synthesizer.
Speed
Value: | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8| 9 | 10 |
Initial: UA specific
Applies to: All elements
Example: speed: 6
Some users (specially between blind people) prefers very high speed speech because they have a very good hearing so they could go very fast reading web pages. That is the reason why we chose this big range. Of course "speed: 0" is not allowed because you could not hear anything.
Voice-type
Value: | child1 | child2 | male1 | male2 | female1 | female2 |
Initial: UA specific
Applies to: All elements
Example: voice-type: female1
This is the way to set the phisical features of the articulating voice. For example the voice of a boy, a woman, a man, a terminator sounds different, and that is the reason.
Pitch
Value: | 1 | 2 | 3 | 4 | 5 | 6 |
Initial: UA specific
Applies to: All elements
Example: pitch: 4
This is a small range for the medium frequency (F0). The same person (the same voice type) can talk (in media) more grave or less, which gives the appearance to be a different voice. If we try to combine "Pitch" and "voice-type" for example:
if voice-type=child1, F0=1 (low voice)--> real medium frequency:150Hz
if voice-type=child1, F0=6 (high voice)-> real medium frequency:350Hz
if voice-type=male2, F0=1 (low voice)--> real medium frequency: 50Hz
if voice-type=male2, F0=6 (high voice)-> real medium frequency:150Hz
All this voices sounds different. We have a big range of different voices because F0 (Pitch frequency) is mapped to different values of real frequency depending on the voice-type. That's why 6 possible values of pitch are enough to make a simple definition.
Prosidy
Value: | on | off |
Initial: on
Applies to: All elements
Example: prosidy: off
With prosidy activated the synthesizer gives the entonation (the evolution of F0 along the time) which will sound hard, soft, angry questionable..... If you have "prosidy:off" the result will be like the voice of a robot (blind people prefer this kind of voice and also hearing very fast voice)
Language
Value: defined in the ISO 639 (Codes for the representation of the names of languages)
Initial: en
Applies to: All elements
Example: language: fr
You can specify any language because the way to pronounce the same message is different between countries (e.g. fr,nl,es,en....). For example the Apollo II (multilingual speech syntesizer) supports 7 languages (russian, english, french, spanish...). The default value is english because it's the most used language in the web, and although many languages are not supported nor perhaps will be in the future, it's better to include all than a little part of them.

This is a DRAFT, we have discuss about it, and now is your turn to say if you like as it is, or you would like to talk about some features. I hope you will tell us what you think about it. Thank you!

Kath. Universiteit Leuven--Dept. Electrotechniek (ESAT), T.E.O. Juanjo.Miguez@KULeuven.ac.be

---------------------------------------------------------------- Juan Jose Miguez Iglesias Kath. Universiteit Leuven | Phone : +32 16 32 18 66 Dept. Electrotechniek (ESAT), T.E.O. | Kard. Mercierlaan 94 | Fax : +32 16 32 19 86 B-3001 LEUVEN - HEVERLEE Adress: Groenveldlaan 1, 8/107 ; Heverlee (Leuven) B-3001 Phone: 206185 or 235201 E-mail:Juanjo.Miguez@esat.kuleuven.ac.be jmiguez@ait.uvigo.es ----------------------------------------------------------------

Next message: T. V. Raman: "T.E.O.'s Draft--Cascading Speech Style Sheets"
Previous message: Scott E. Preece: ""units" for URI values"
Next in thread: T. V. Raman: "T.E.O.'s Draft--Cascading Speech Style Sheets"