2012 ALBAYZIN EVALUATIONS: SEARCH ON SPEECH

Post-Eval Updates:

The complete MAVIR Corpus has been made available for research purposes on the MAVIR Corpus website (in Spanish) upon signature of a free license for research purposes. The MAVIR Corpus website includes audios, texts, videos and a revised phonological lexicon.

The materials used in the Search on Speech corpus (development and eval materials including speech and time-aligned keywords) have also made available for research purposes on the same website, again upon signature of a free license for research purposes.

Results of the Evaluation have been published on the following journal article in a collaborative work by all the participants.

J. Tejedor, D. T. Toledano, X. Anguera, A. Varona, L. F. Hurtado, A. Miguel and J. Colás, "Query-by-Example Spoken Term Detection ALBAYZIN 2012 evaluation: overview, systems, results, and discussion.", EURASIP Journal on Audio, Speech and Music Processing , Springer, Vol. 23, pp. 1-17, September 2013. [PDF] [Bibtex]

Search on Speech evaluation:

The Search on Speech evaluation of ALBAYZIN 2012 involves searching in audio content a list of terms/queries. This evaluation focuses on retrieving the appropriate audio files, with the occurrences and timestamps, that contain any of those terms/queries. This evaluation consists of three different tasks:

1) Keyword Spotting, where the input to the system is a list of terms, which is known when processing the audio and hence word-based recognizers can be effectively used to hypothesize detections.

2) Spoken Term Detection, where the input to the system is a list of terms (as in the Keyword Spotting task), but this is unknown when processing the audio. This is the same  task that the one proposed by the NIST so-called NIST STD 2006 evaluation [1].

3) Query-by-example Spoken Term Detection, where the input to the system is an acoustic example per query and hence a prior knowledge of the correct word/phone transcription corresponding to each query cannot be made. This is the same task that the one proposed in MediaEval 2011 Search on Speech [2].


Database description:

Data provided by the organizers for system training, development and evaluation belong to recordings of the MAVIR workshops in 2006, 2007 and 2008 (Corpus MAVIR 2006, 2007 and 2008) corresponding to Spanish language. However, any kind of data can be used for system training/development provided that these data are fully described in the system description. For Keyword Spotting and Spoken Term Detection tasks, orthographic transcriptions of the selected list of terms along with the occurrences and timestamps for each of these terms corresponding to training/development data will be provided. The training/development list of terms consists of 268 different terms, each formed by a single word whose length varies between 7 and 16 single graphemes for Keyword Spotting and Spoken Term Detection tasks. 60 queries (i.e., 60 acoustic examples, one example per query) will be extracted from the Keyword Spotting/Spoken Term Detection training/development list of terms to compose the Query-by-example Spoken Term Detection task input for training/development. Occurrences and timestamps of these 60 queries will be also provided. Training/development data amount about 5 hours of speech material in total. For evaluation, test speech data amount about 2 hours in total. Note that no orthographic transcription of the test data will be provided. For Keyword Spotting and Spoken Term Detection tasks, only the list of terms used for evaluation will be provided. This list consists of 155 different terms, each formed by a single word whose length varies between 7 and 16 single graphemes. For the Query-by-example Spoken Term Detection task, 60 queries, extracted from the Keyword Spotting/Spoken Term Detection test list of terms, will be used for evaluation.

 

System evaluation:

The Figure-of-Merit (FOM) definition for word spotting [3], as defined in HTK Book, will be the primary metric for the Keyword Spotting task whereas the Actual Term Weighted Value (ATWV) [1] will be the primary metric for the Spoken Term Detection and Query-by-example Spoken Term Detection tasks.

Participants could submit their system/s either for the Keyword Spotting task, Spoken Term Detection task or Query-by-example Spoken Term Detection task or for all tasks. Participants are required to submit one primary system for any task (Keyword Spotting and/or Spoken Term Detection and/or Query-by-example Spoken Term Detection) and up to 2 contrastive systems for any task (Keyword Spotting and/or Spoken Term Detection and/or Query-by-example Spoken Term Detection).


A detailed ALBAYZIN 2012 Search on Speech evaluation plan is here.


Registration process:

Interested groups must register for the evaluation before July 16th 2012, by contacting the organizing team at:

javier.tejedor at uam.es

with copy to the other Chair of the Albayzin 2012 Evaluations:

javier.gonzalez at uam.es

and providing the following information:

  • Research group.
  • Institution.
  • Contact person.
  • Email.


Schedule (Tentative):

  • May 18, 2012. The evaluation plan is released through the website of IberSpeech 2012. Registration for Albayzin 2012 Search on Speech opens.
  • June 26, 2012. The training and development data are released via web.
  • July 16, 2012. Registration for Albayzin 2012 Search on Speech closes.
  • September 5, 2012. The evaluation data are released via web. System submission (via e-mail) opens.
  • September 24, 2012. Deadline for submitting system results and system descriptions. EXTENDED September 28 (23:59 Pacific Time).
  • October 15, 2012. Results are sent to the participants.
  • November 21-23, 2012. IberSpeech 2012, Madrid, Spain. Albayzin 2012 Search on Speech Workshop: presentation of systems and discussion of results. Plenary session: summary of results.


References:

[1] http://www.itl.nist.gov/iad/mig/tests/std/2006/index.html

[2] Rajput, N., Metze, F.: Spoken web search. In: Proceedings of MediaEval’11. pp. 1–2
(2011)

[3] http://htk.eng.cam.ac.uk/docs/docs.shtml


Evaluation Organization:

Javier Tejedor - Human Computer Technology Laboratory (HCTLab) Universidad Autónoma de Madrid (UAM)

Doroteo Torre - ATVS Biometric Recognition Group Universidad Autónoma de Madrid (UAM)

José Colás - Human Computer Technology Laboratory (HCTLab) Universidad Autónoma de Madrid (UAM)


For any questions concerning the evaluation, please contact Javier Tejedor: javier.tejedor at uam.es

Javier Tejedor

Human Computer Technology Laboratory  (HCTLab)

Av. Francisco Tomás y Valiente 11, 28049.

Escuela Politécnica Superior, Universidad Autónoma de Madrid.

Web page: www.hctlab.com

Additional information