In this area we propose a contest among speech synthesis systems, similar to the Blizzard Challenge, but focused on emotional speech. The voice development would be carried out on a common database for Spanish: a professional male speaker simulating 4 emotions (happiness, sadness, anger and surprise) and a neutral speaking style. The participating teams will have several weeks to build their system from the development material that will be supplied. Next, a set of sentences will be released and the teams will have one week to synthesize them and send the audio files back. No manual intervention is allowed during synthesis (prompt sculpting, using different subsets of the database for different test sentences or sentence types unless this is a fully automatic part of your system) although recordings or models from other speakers can also be used. Each participant is expected to provide at least ten listeners for the evaluation (Spanish native speakers and, preferably, speech experts). The corpus for development will consist of more than three hours of recordings. In addition to the speech waveforms and text transcriptions, automatic segmentation labels can be supplied.

The information related to the test sentences will not be public until the evaluation campaign is launched, in order to avoid any tailoring of the TTS system to them. In any case, the corpus will consist of more than one hundred sentences, although only some of them will be used during the subjective evaluation in order to determine both the identifiability of the emotion, the perceived emotional intensity, the naturalness of the synthetic speech and the similarity to the natural voice of the speaker.

Each participant will be expected to submit not only the synthesized waveforms but also a paper describing the main characteristics of their system. It is mandatory to submit the description paper using the LNAI template available at http://iberspeech2012.ii.uam.es.

All the details can be found in the Albayzin 2012 Speech Synthesis Evaluation Plan.


Deadline: July 16th 2012
Procedure: Submit an e-mail to the organization contact:  This e-mail address is being protected from spambots. You need JavaScript enabled to view it. , with a copy to the Chairs of the Albayzin 2012 Evaluations:  This e-mail address is being protected from spambots. You need JavaScript enabled to view it. and  This e-mail address is being protected from spambots. You need JavaScript enabled to view it. , providing the following information:

        Group name
        Group ID
        Contact person
        Email address
        Postal address

Data delivery

Starting from June 22nd 2012, and once registration data are validated, the development datasets will be released via web (only to registered participants). Although 16bit/16kHz and 16bit/48KHz wav files will be supplied and can be used, the submitted wav files must be at 16kHz sampling rate.


        May 28, 2012: Registration is open.
        June 22, 2012: Development data are released via web.
        July 16, 2012: Registration deadline.
        September 17, 2012: Evaluation data are released via web and system submission is open.
        September 24, 2012: Deadline for submitting the synthesised evaluation sentences and system descriptions.
        September 26-October 10, 2012: Evaluation campaign
        October 15, 2012: Notication of the evaluation results to each of the participants.
        November 21-23, 2012: Albayzin 2012 Workshop at IberSpeech 2012, Madrid, Spain.


Juan Manuel Montero Martínez
Speech Technology Group (GTH)
Department of Electronic Engineering (IEL)
Technical University of Madrid (UPM)
Ciudad universitaria s/n
28040 Madrid - SPAIN

Additional information