Takao Kobayashi, Keikichi Hirose, Satoshi Nakamura (Eds.):
INTERSPEECH 2010, 11th Annual Conference of the International Speech Communication Association, Makuhari, Chiba, Japan, September 26-30, 2010.
ISCA 2010
Keynotes
- Steve Young:
Still talking to machines (cognitively speaking).
1-10

- Tohru Ifukube:
Sound-based assistive technology supporting "seeing", "hearing" and "speaking" for the disabled and the elderly.
11-19

- Chiu-yu Tseng:
Beyond sentence prosody.
20-29

Special Session:
Models of Speech - In Search of Better Representations
- Hosung Nam, Vikramjit Mitra, Mark Tiede, Elliot Saltzman, Louis Goldstein, Carol Y. Espy-Wilson, Mark Hasegawa-Johnson:
A procedure for estimating gestural scores from natural speech.
30-33

- Yen-Liang Shue, Gang Chen, Abeer Alwan:
On the interdependencies between voice quality, glottal gaps, and voice-source related acoustic measures.
34-37

- Hideki Kawahara, Masanori Morise, Toru Takahashi, Hideki Banno, Ryuichi Nisimura, Toshio Irino:
Simplification and extension of non-periodic excitation source representations for high-quality speech manipulation systems.
38-41

- Sadao Hiroya, Takemi Mochida:
Phase equalization-based autoregressive model of speech signals.
42-45

- Yi Xu, Santitham Prom-on:
Articulatory-functional modeling of speech prosody: a review.
46-49

- Humberto M. Torres, Hansjörg Mixdorff, Jorge A. Gurlekian, Hartmut R. Pfitzinger:
Two new estimation methods for a superpositional intonation model.
50-53

ASR:
Acoustic Models I-III
- Simon Wiesler, Georg Heigold, Markus Nußbaum-Thom, Ralf Schlüter, Hermann Ney:
A discriminative splitting criterion for phonetic decision trees.
54-57

- Mark J. F. Gales, Kai Yu:
Canonical state models for automatic speech recognition.
58-61

- Pierre L. Dognin, John R. Hershey, Vaibhava Goel, Peder A. Olsen:
Restructuring exponential family mixture models.
62-65

- Françoise Beaufays, Vincent Vanhoucke, Brian Strope:
Unsupervised discovery and training of maximally dissimilar cluster models.
66-69

- Khe Chai Sim:
Probabilistic state clustering using conditional random field for context-dependent acoustic modelling.
70-73

- Xie Sun, Yunxin Zhao:
Integrate template matching and statistical modeling for speech recognition.
74-77

- George Saon, Hagen Soltau:
Boosting systems for LVCSR.
1341-1344

- Vaibhava Goel, Tara N. Sainath, Bhuvana Ramabhadran, Peder A. Olsen, David Nahamoo, Dimitri Kanevsky:
Incorporating sparse representation phone identification features in automatic speech recognition using exponential families.
1345-1348

- Xin Chen, Yunxin Zhao:
Integrating MLP features and discriminative training in data sampling based ensemble acoustic modeling.
1349-1352

- Jui-Ting Huang, Mark Hasegawa-Johnson:
Semi-supervised training of Gaussian mixture models by conditional entropy minimization.
1353-1356

- Guangchuan Shi, Yu Shi, Qiang Huo:
A study of irrelevant variability normalization based training and unsupervised online adaptation for LVCSR.
1357-1360

- Roger Hsiao, Florian Metze, Tanja Schultz:
Improvements to generalized discriminative feature transformation for speech recognition.
1361-1364

- Karel Veselý, Lukas Burget, Frantisek Grézl:
Parallel training of neural networks for speech recognition.
2934-2937

- Rita Singh, Benjamin Lambert, Bhiksha Raj:
The use of sense in unsupervised training of acoustic models for ASR systems.
2938-2941

- Jun Du, Yu Hu, Hui Jiang:
Boosted mixture learning of Gaussian mixture HMMs for speech recognition.
2942-2945

- Volker Leutnant, Reinhold Haeb-Umbach:
On the exploitation of hidden Markov models and linear dynamic models in a hybrid decoder architecture for continuous speech recognition.
2946-2949

- Alberto Abad, Thomas Pellegrini, Isabel Trancoso, João Paulo Neto:
Context dependent modelling approaches for hybrid speech recognizers.
2950-2953

- Yotaro Kubo, Shinji Watanabe, Atsushi Nakamura, Tetsunori Kobayashi:
A regularized discriminative training method of acoustic models derived by minimum relative entropy discrimination.
2954-2957

- Hank Liao, Christopher Alberti, Michiel Bacchiani, Olivier Siohan:
Decision tree state clustering with word and syllable features.
2958-2961

- Hiroshi Fujimura, Takashi Masuko, Mitsuyoshi Tachimori:
A duration modeling technique with incremental speech rate normalization.
2962-2965

- Martin Wöllmer, Yang Sun, Florian Eyben, Björn Schuller:
Long short-term memory networks for noise robust speech recognition.
2966-2969

- Tsuneo Nitta, Takayuki Onoda, Masashi Kimura, Yurie Iribe, Kouichi Katsurada:
One-model speech recognition and synthesis based on articulatory movement HMMs.
2970-2973

- Xiaodong Cui, Jian Xue, Pierre L. Dognin, Upendra V. Chaudhari, Bowen Zhou:
Acoustic modeling with bootstrap and restructuring for low-resourced languages.
2974-2977

- Tetsuo Kosaka, Keisuke Goto, Takashi Ito, Masaharu Katoh:
Lecture speech recognition by combining word graphs of various acoustic models.
2978-2981

- Khe Chai Sim, Shilin Liu:
Semi-parametric trajectory modelling using temporally varying feature mapping for speech recognition.
2982-2985

- Dong Yu, Li Deng:
Deep-structured hidden conditional random fields for phonetic recognition.
2986-2989

- Jonathan Malkin, Jeff A. Bilmes:
Semi-supervised learning for improved expression of uncertainty in discriminative classifiers.
2990-2993

- Peder A. Olsen, Vaibhava Goel, Charles A. Micchelli, John R. Hershey:
Modeling posterior probabilities using the linear exponential family.
2994-2997

Spoken Dialogue Systems I, II
- Fabrice Lefèvre, François Mairesse, Steve Young:
Cross-lingual spoken language understanding from unaligned data using discriminative classification models and machine translation.
78-81

- Rajesh Balchandran, Leonid Rachevsky, Bhuvana Ramabhadran, Miroslav Novak:
Techniques for topic detection based processing in spoken dialog systems.
82-85

- Senthilkumar Chandramohan, Matthieu Geist, Olivier Pietquin:
Optimizing spoken dialogue management with fitted value iteration.
86-89

- Filip Jurcícek, Blaise Thomson, Simon Keizer, François Mairesse, Milica Gasic, Kai Yu, Steve Young:
Natural belief-critic: a reinforcement algorithm for parameter estimation in statistical spoken dialogue systems.
90-93

- Alexander Schmitt, Michael Scholz, Wolfgang Minker, Jackson Liscombe, David Suendermann:
Is it possible to predict task completion in automated troubleshooters?.
94-97

- David Suendermann, Jackson Liscombe, Roberto Pieraccini:
Minimally invasive surgery for spoken dialog systems.
98-101

Spoken Dialogue Systems II
- Ramón López-Cózar, David Griol:
New technique to enhance the performance of spoken dialogue systems based on dialogue states-dependent language models and grammatical rules.
2998-3001

- Lluís F. Hurtado, Joaquin Planells, Encarna Segarra, Emilio Sanchis, David Griol:
A stochastic finite-state transducer approach to spoken dialog management.
3002-3005

- Romain Laroche, Philippe Bretier, Ghislain Putois:
Enhanced monitoring tools and online dialogue optimisation merged into a new spoken dialogue system design experience.
3006-3009

- Romain Laroche, Ghislain Putois, Philippe Bretier:
Optimising a handcrafted dialogue system design.
3010-3013

- Felix Putze, Tanja Schultz:
Utterance selection for speech acts in a cognitive tourguide scenario.
3014-3017

- Gabriel Parent, Maxine Eskenazi:
Lexical entrainment of real users in the let's go spoken dialog system.
3018-3021

- Silvia Quarteroni, Meritxell González, Giuseppe Riccardi, Sebastian Varges:
Combining user intention and error modeling for statistical dialog simulators.
3022-3025

- Jaakko Hakulinen, Markku Turunen, Raul Santos de la Camara, Nigel Crook:
Parallel processing of interruptions and feedback in companions affective dialogue system.
3026-3029

- Antoine Raux, Neville Mehta, Deepak Ramachandran, Rakesh Gupta:
Dynamic language modeling using Bayesian networks for spoken dialog systems.
3030-3033

- Sunao Hara, Norihide Kitaoka, Kazuya Takeda:
Automatic detection of task-incompleted dialog for spoken dialog system based on dialog act n-gram.
3034-3037

- Wei-Bin Liang, Chung-Hsien Wu, Yu-Cheng Hsiao:
Dialogue act detection in error-prone spoken dialogue systems using partial sentence tree and latent dialogue act matrix.
3038-3041

- Tatsuya Kawahara, Kouhei Sumi, Zhi-Qiang Chang, Katsuya Takanashi:
Detection of hot spots in poster conversations based on reactive tokens of audience.
3042-3045

- Yoichi Matsuyama, Shinya Fujie, Hikaru Taniyama, Tetsunori Kobayashi:
Psychological evaluation of a group communication activation robot in a party game.
3046-3049

- Kyoko Matsuyama, Kazunori Komatani, Ryu Takeda, Toru Takahashi, Tetsuya Ogata, Hiroshi G. Okuno:
Analyzing user utterances in barge-in-able spoken dialogue system for improving identification accuracy.
3050-3053

- Mattias Heldner, Jens Edlund, Julia Hirschberg:
Pitch similarity in the vicinity of backchannels.
3054-3057

- Khiet P. Truong, Ronald Poppe, Dirk Heylen:
A rule-based backchannel prediction model using pitch and pause information.
3058-3061

Speech Perception:
Factors Influencing Perception
Prosody:
Models
- Tomás Dubeda, Katalin Mády:
Nucleus position within the intonation phrase: a typological study of English, Czech and Hungarian.
126-129

- Yong-cheol Lee, Satoshi Nambu:
Focus-sensitive operator or focus inducer: always and only.
130-133

- Jiahong Yuan, Mark Liberman:
F0 declination in English and Mandarin broadcast news speech.
134-137

- Katrin Schweitzer, Michael Walsh, Bernd Möbius, Hinrich Schütze:
Frequency of occurrence effects on pitch accent realisation.
138-141

- César González Ferreras, Carlos Vivaracho-Pascual, David Escudero Mancebo, Valentín Cardeñoso-Payo:
On the automatic toBI accent type identification from data.
142-145

- Andrew Rosenberg:
AutoBI - a tool for automatic toBI annotation.
146-149

Speech Synthesis:
Unit Selection and Others
- Volker Strom, Simon King:
A classifier-based target cost for unit selection speech synthesis trained on perceptual data.
150-153

- Wei Zhang, Xiaodong Cui:
Applying scalable phonetic context similarity in unit selection of concatenative text-to-speech.
154-157

- Mitsuaki Isogai, Hideyuki Mizuno:
Speech database reduction method for corpus-based TTS system.
158-161

- Heng Lu, Zhen-Hua Ling, Si Wei, Li-Rong Dai, Ren-Hua Wang:
Automatic error detection for unit selection speech synthesis using log likelihood ratio based SVM classifier.
162-165

- Hanna Silén, Elina Helander, Jani Nurminen, Konsta Koppinen, Moncef Gabbouj:
Using robust viterbi algorithm and HMM-modeling in unit selection TTS to replace units of poor quality.
166-169

- Yeon-Jun Kim, Marc C. Beutnagel:
Automatic detection of abnormal stress patterns in unit selection synthesis.
170-173

- Daniel Tihelka, Jirí Kala, Jindrich Matousek:
Enhancements of viterbi search for fast unit selection synthesis.
174-177

- Thomas Ewender, Beat Pfister:
Accurate pitch marking for prosodic modification of speech segments.
178-181

- Shifeng Pan, Meng Zhang, Jianhua Tao:
A novel hybrid approach for Mandarin speech synthesis.
182-185

- Josafá de Jesus Aguiar Pontes, Sadaoki Furui:
Modeling liaison in French by using decision trees.
186-189

- Jian Luan, Jian Li:
Improvement on plural unit selection and fusion.
190-193

- Alok Parlikar, Alan W. Black, Stephan Vogel:
Improving speech synthesis of machine translation output.
194-197

- Ghislain Putois, Jonathan Chevelu, Cédric Boidin:
Paraphrase generation to improve text-to-speech synthesis.
198-201

ASR:
Search, Decoding and Confidence Measures I, II
- Chang Woo Han, Shin Jae Kang, Chul Min Lee, Nam Soo Kim:
Phone mismatch penalty matrices for two-stage keyword spotting via multi-pass phone recognizer.
202-205

- Petr Motlícek, Fabio Valente, Philip N. Garner:
English spoken term detection in multilingual recordings.
206-209

- Icksang Han, Chiyoun Park, Jeongmi Cho, Jeongsu Kim:
A hybrid approach to robust word lattice generation via acoustic-based word detection.
210-213

- Volker Steinbiss, Martin Sundermeyer, Hermann Ney:
Direct observation of pruning errors (DOPE): a search analysis tool.
214-217

- David Rybach, Michael Riley:
Direct construction of compact context-dependency transducers from data.
218-221

- Miroslav Novak:
Incremental composition of static decoding graphs with label pushing.
222-225

- Zhanlei Yang, Wenju Liu:
A novel path extension framework using steady segment detection for Mandarin speech recognition.
226-229

- Ralf Schlüter, Markus Nußbaum-Thom, Hermann Ney:
On the relation of Bayes risk, word error, and word posteriors in ASR.
230-233

- David Nolden, Hermann Ney, Ralf Schlüter:
Time conditioned search in automatic speech recognition reconsidered.
234-237

- Satoshi Kobashikawa, Taichi Asami, Yoshikazu Yamaguchi, Hirokazu Masataki, Satoshi Takahashi:
Efficient data selection for speech recognition based on prior confidence estimation using speech and context independent models.
238-241

- Atsunori Ogawa, Atsushi Nakamura:
A novel confidence measure based on marginalization of jointly estimated error cause probabilities.
242-245

- Julien Fayolle, Fabienne Moreau, Christian Raymond, Guillaume Gravier, Patrick Gros:
CRF-based combination of contextual features to improve a posteriori word-level confidence measures.
1942-1945

- Martin Wöllmer, Florian Eyben, Björn Schuller, Gerhard Rigoll:
Recognition of spontaneous conversational speech using long short-term memory phoneme predictions.
1946-1949

- Thomas Pellegrini, Isabel Trancoso:
Improving ASR error detection with non-decoder based features.
1950-1953

- Ladan Golipour, Douglas D. O'Shaughnessy:
Phoneme classification and lattice rescoring based on a k-NN approach.
1954-1957

- Jeff Bilmes, Hui Lin:
Online adaptive learning for speech recognition decoding.
1958-1961

- Takaaki Hori, Shinji Watanabe, Atsushi Nakamura:
Improvements of search error risk minimization in viterbi beam search for speech recognition.
1962-1965

Special-Purpose Speech Applications
- Robin Hofe, Stephen R. Ell, Michael J. Fagan, James M. Gilbert, Phil D. Green, Roger K. Moore, Sergey I. Rybchenko:
Evaluation of a silent speech interface based on magnetic sensing.
246-249

- Rubén San Segundo, Verónica López, Raquel Martín, Syaheerah L. Lutfi, Javier Ferreiros, Ricardo de Córdoba, José Manuel Pardo:
Advanced speech communication system for deaf people.
250-253

- Sethserey Sam, Eric Castelli, Laurent Besacier:
Unsupervised acoustic model adaptation for multi-origin non native ASR.
254-257

- Dilek Hakkani-Tür, Dimitra Vergyri, Gökhan Tür:
Speech-based automated cognitive status assessment.
258-261

- Toru Imai, Shinichi Homma, Akio Kobayashi, Takahiro Oku, Shoei Sato:
Speech recognition with a seamlessly updated language model for real-time closed-captioning.
262-265

- Takuya Nishimoto, Takayuki Watanabe:
The comparison between the deletion-based methods and the mixing-based methods for audio CAPTCHA systems.
266-269

- Martine Adda-Decker, Lori Lamel, Natalie D. Snoeren:
Comparing mono- & multilingual acoustic seed models for a low e-resourced language: a case-study of luxembourgish.
270-273

- R. J. J. H. van Son, Irene Jacobi, Frans J. M. Hilgers:
Manipulating treacheoesophageal speech.
274-277

- David Imseng, Hervé Bourlard, Mathew Magimai-Doss:
Towards mixed language speech recognition systems.
278-281

- Etienne Barnard, Johan Schalkwyk, Charl Johannes van Heerden, Pedro J. Moreno:
Voice search for development.
282-285

- Gina-Anne Levow, Susan Duncan, Edward T. King:
Cross-cultural investigation of prosody in verbal feedback in interactional rapport.
286-289

- Mary Tai Knox, Gerald Friedland:
Multimodal speaker diarization using oriented optical flow histograms.
290-293

- Catherine Middag, Yvan Saeys, Jean-Pierre Martens:
Towards an ASR-free objective analysis of pathological speech.
294-297

Speech Analysis
- Keith W. Godin, John H. L. Hansen:
Session variability contrasts in the MARP corpus.
298-301

- Kazuhiro Kondo, Yusuke Takano:
Estimation of two-to-one forced selection intelligibility scores by speech recognizers using noise-adapted models.
302-305

- Thomas Schaaf, Florian Metze:
Analysis of gender normalization using MLP and VTLN features.
306-309

- Guillaume Aimetti, Roger K. Moore, Louis ten Bosch:
Discovering an optimal set of minimally contrasting acoustic speech units: a point of focus for whole-word pattern matching.
310-313

- Themos Stafylakis, Xavier Anguera:
Improvements to the equal-parameter BIC for speaker diarization.
314-317

- Nima Mesgarani, Samuel Thomas, Hynek Hermansky:
A multistream multiresolution framework for phoneme recognition.
318-321

- Giampiero Salvi, Fabio Tesser, Enrico Zovato, Piero Cosi:
Cluster analysis of differential spectral envelopes on emotional speech.
322-325

- Sam Bowman, Karen Livescu:
Modeling pronunciation variation with context-dependent articulatory feature decision trees.
326-329

- Bhiksha Raj, Kevin W. Wilson, Alexander Krueger, Reinhold Haeb-Umbach:
Ungrounded independent non-negative factor analysis.
330-333

- John R. Hershey, Peder A. Olsen, Steven J. Rennie:
Signal interaction and the devil function.
334-337

Systems for LVCSR
- Yuya Akita, Masato Mimura, Graham Neubig, Tatsuya Kawahara:
Semi-automated update of automatic transcription system for the Japanese national congress.
338-341

- Xunying Liu, Mark J. F. Gales, Philip C. Woodland:
Language model cross adaptation for LVCSR system combination.
342-345

- Shinji Watanabe, Takaaki Hori, Atsushi Nakamura:
Large vocabulary continuous speech recognition using WFST-based linear classifier for structured data.
346-349

- Pavel Kveton, Miroslav Novak:
Accelerating hierarchical acoustic likelihood computation on graphics processors.
350-353

- Jiulong Shan, Genqing Wu, Zhihong Hu, Xiliu Tang, Martin Jansche, Pedro J. Moreno:
Search by voice in Mandarin Chinese.
354-357

- Thomas Hain, Lukas Burget, John Dines, Philip N. Garner, Asmaa El Hannani, Marijn Huijbregts, Martin Karafiát, Mike Lincoln, Vincent Wan:
The AMIDA 2009 meeting transcription system.
358-361

Speaker Characterization and Recognition I-IV
- William M. Campbell, Zahi N. Karam:
Simple and efficient speaker comparison using approximate KL divergence.
362-365

- Hanwu Sun, Bin Ma, Chien-Lin Huang, Trung Hieu Nguyen, Haizhou Li:
The IIR NIST SRE 2008 and 2010 summed channel speaker recognition systems.
366-369

- Chien-Lin Huang, Hanwu Sun, Bin Ma, Haizhou Li:
Speaker characterization using long-term and temporal information.
370-373

- Sergio Perez-Gomez, Daniel Ramos, Javier Gonzalez-Dominguez, Joaquin Gonzalez-Rodriguez:
Score-level compensation of extreme speech duration variability in speaker verification.
374-377

- Alberto Abad, Isabel Trancoso:
Speaker recognition experiments using connectionist transformation network features.
378-381

- Yun Lei, John H. L. Hansen:
Speaker recognition using supervised probabilistic principal component analysis.
382-385

- Benjamin Bigot, Julien Pinquier, Isabelle Ferrane, Régine André-Obrecht:
Looking for relevant features for speaker role recognition.
1057-1060

- Marcel Kockmann, Lukas Burget, Ondrej Glembek, Luciana Ferrer, Jan Cernocký:
Prosodic speaker verification using subspace multinomial models with intersession compensation.
1061-1064

- Eryu Wang, Kong-Aik Lee, Bin Ma, Haizhou Li, Wu Guo, Li-Rong Dai:
The estimation and kernel metric of spectral correlation for text-independent speaker verification.
1065-1068

- Rahim Saeidi, Pejman Mowlaee, Tomi Kinnunen, Zheng-Hua Tan, Mads Græsbøll Christensen, Søren Holdt Jensen, Pasi Fränti:
Improving monaural speaker identification by double-talk detection.
1069-1072

- B. Avinash, S. Guruprasad, B. Yegnanarayana:
Exploring subsegmental and suprasegmental features for a text-dependent speaker verification in distant speech signals.
1073-1076

- Qingsong Liu, Wei Huang, Dongxing Xu, Hongbin Cai, Beiqian Dai:
A fast implementation of factor analysis for speaker verification.
1077-1080

- Ce Zhang, Rong Zheng, Bo Xu:
An investigation into direct scoring methods without SVM training in speaker verification.
1437-1440

- Reda Jourani, Khalid Daoudi, Régine André-Obrecht, Driss Aboutajdine:
Large margin Gaussian mixture models for speaker identification.
1441-1444

- Rong Zheng, Bo Xu:
On the use of Gaussian component information in the generative likelihood ratio estimation for speaker verification.
1445-1448

- Man-Wai Mak, Wei Rao:
Acoustic vector resampling for GMMSVM-based speaker verification.
1449-1452

- Konstantin Biatov:
A fast speaker indexing using vector quantization and second order statistics with adaptive threshold computation.
1453-1456

- Gang Wang, Xiaojun Wu, Thomas Fang Zheng:
Using phoneme recognition and text-dependent speaker verification to improve speaker segmentation for Chinese speech.
1457-1460

- Claudio Garretón, Néstor Becerra Yoma:
On enhancing feature sequence filtering with filter-bank energy transformation in speaker verification with telephone speech.
1461-1464

- Donglai Zhu, Bin Ma, Kong-Aik Lee, Cheung-Chi Leung, Haizhou Li:
MAP estimation of subspace transform for speaker recognition.
1465-1468

- Ayeh Jafari, Ramji Srinivasan, Danny Crookes, Ji Ming:
A longest matching segment approach for text-independent speaker recognition.
1469-1472

- Ville Hautamäki, Tomi Kinnunen, Mohaddeseh Nosratighods, Kong-Aik Lee, Bin Ma, Haizhou Li:
Approaching human listener accuracy with modern speaker verification.
1473-1476

- Jouni Pohjalainen, Rahim Saeidi, Tomi Kinnunen, Paavo Alku:
Extended weighted linear prediction (XLP) analysis of speech and its application to speaker verification in adverse conditions.
1477-1480

- Guoli Ye, Brian Mak:
The use of subvector quantization and discrete densities for fast GMM computation for speaker verification.
1481-1484

- Fred S. Richardson, Joseph P. Campbell:
Transcript-dependent speaker recognition using mixer 1 and 2.
2102-2105

- Thomas Drugman, Thierry Dutoit:
On the potential of glottal signatures for speaker recognition.
2106-2109

- R. Padmanabhan, Hema A. Murthy:
Acoustic feature diversity and speaker verification.
2110-2113

- Omid Dehzangi, Bin Ma, Engsiong Chng, Haizhou Li:
A discriminative performance metric for GMM-UBM speaker identification.
2114-2117

- Xavier Anguera, Jean-François Bonastre:
A novel speaker binary key derived from anchor models.
2118-2121

- Weiqiang Zhang, Yan Deng, Liang He, Jia Liu:
Variant time-frequency cepstral features for speaker recognition.
2122-2125

- Ning Wang, P. C. Ching, Tan Lee:
Exploitation of phase information for speaker recognition.
2126-2129

- Yanhua Long, Li-Rong Dai, Bin Ma, Wu Guo:
Effects of the phonological relevance in speaker verification.
2130-2133

- Gabriel Hernández Sierra, Jean-François Bonastre, Driss Matrouf, José R. Calvo:
Topological representation of speech for speaker recognition.
2134-2137

- Seyed Omid Sadjadi, John H. L. Hansen:
Assessment of single-channel speech enhancement techniques for speaker identification under mismatched conditions.
2138-2141

- Xiang Zhang, Chuan Cao, Lin Yang, Hongbin Suo, Jianping Zhang, Yonghong Yan:
Speaker recognition using the resynthesized speech via spectrum modeling.
2142-2145

Source Separation
- Robert Peharz, Michael Stark, Franz Pernkopf, Yannis Stylianou:
A factorial sparse coder model for single channel source separation.
386-389

- Yasmina Benabderrahmane, Sid-Ahmed Selouani, Douglas D. O'Shaughnessy:
Oriented PCA method for blind speech separation of convolutive mixtures.
390-393

- Hsin-Lung Hsieh, Jen-Tzung Chien:
Online Gaussian process for nonstationary speech separation.
394-397

- Meng Yu, Wenye Ma, Jack Xin, Stanley Osher:
Convexity and fast speech extraction by split bregman method.
398-401

- Wenye Ma, Meng Yu, Jack Xin, Stanley Osher:
Reducing musical noise in blind source separation by time-domain sparse filters and split bregman method.
402-405

- John Woodruff, Rohit Prabhavalkar, Eric Fosler-Lussier, DeLiang Wang:
Combining monaural and binaural evidence for reverberant speech segregation.
406-409

Speech Synthesis:
HMM-Based Speech Synthesis I, II
- Heiga Zen:
Speaker and language adaptive training for HMM-based polyglot speech synthesis.
410-413

- Kai Yu, Heiga Zen, François Mairesse, Steve Young:
Context adaptive training with factorized decision trees for HMM-based speech synthesis.
414-417

- Junichi Yamagishi, Oliver Watts, Simon King, Bela Usabaev:
Roles of the average voice in speaker-adaptive HMM-based speech synthesis.
418-421

- Yao Qian, Zhi-Jie Yan, Yi-Jian Wu, Frank K. Soong, Xin Zhuang, Shengyi Kong:
An HMM trajectory tiling (HTT) approach to high quality TTS.
422-425

- Yining Chen, Zhi-Jie Yan, Frank K. Soong:
A perceptual study of acceleration parameters in HMM-based TTS.
426-429

- Shuji Yokomizo, Takashi Nose, Takao Kobayashi:
Evaluation of prosodic contextual factors for HMM-based speech synthesis.
430-433

- Slava Shechtman, Alexander Sorin:
Sinusoidal model parameterization for HMM-based TTS system.
805-808

- Yoshinori Shiga, Tomoki Toda, Shinsuke Sakai, Hisashi Kawai:
Improved training of excitation for HMM-based parametric speech synthesis.
809-812

- June Sig Sung, Doo Hwa Hong, Kyung Hwan Oh, Nam Soo Kim:
Excitation modeling based on waveform interpolation for HMM-based speech synthesis.
813-816

- Xin Zhuang, Yao Qian, Frank K. Soong, Yi-Jian Wu, Bo Zhang:
Formant-based frequency warping for improving speaker adaptation in HMM TTS.
817-820

- Hongwei Hu, Martin J. Russell:
Improved modelling of speech dynamics using non-linear formant trajectories for HMM-based speech synthesis.
821-824

- Zhen-Hua Ling, Yu Hu, Li-Rong Dai:
Global variance modeling on the log power spectrum of LSPs for HMM-based speech synthesis.
825-828

- Matt Shannon, William Byrne:
Autoregressive clustering for HMM speech synthesis.
829-832

- Nicholas Pilkington, Heiga Zen:
An implementation of decision tree-based context clustering on graphics processing units.
833-836

- Alexander Gutkin, Xavi Gonzalvo, Stefan Breuer, Paul Taylor:
Quantized HMMs for low footprint text-to-speech synthesis.
837-840

- Oliver Watts, Junichi Yamagishi, Simon King:
The role of higher-level linguistic features in HMM-based speech synthesis.
841-844

- Ayami Mase, Keiichiro Oura, Yoshihiko Nankaku, Keiichi Tokuda:
HMM-based singing voice synthesis system using pitch-shifted pseudo training data.
845-848

- Jinfu Ni, Hisashi Kawai:
An unsupervised approach to creating web audio contents-based HMM voices.
849-852

- Tomoki Koriyama, Takashi Nose, Takao Kobayashi:
Conversational spontaneous speech synthesis using average voice model.
853-856

Multi-Modal Signal Processing
- Jonas Hörnstein, José Santos-Victor:
Learning words and speech units through natural interactions.
434-437

- Qingju Liu, Wenwu Wang, Philip J. B. Jackson:
Bimodal coherence based scale ambiguity cancellation for target speech extraction and enhancement.
438-441

- Hiroaki Kawashima, Yu Horii, Takashi Matsuyama:
Speech estimation in non-stationary noise environments using timing structures between mouth movements and sound signals.
442-445

- Lijuan Wang, Xiaojun Qian, Wei Han, Frank K. Soong:
Synthesizing photo-real talking head via trajectory-guided sample selection.
446-449

- Victoria M. Florescu, Lise Crevier-Buchman, Bruce Denby, Thomas Hueber, Antonia Colazo-Simon, Claire Pillot-Loiseau, Pierre Roussel-Ragot, Cédric Gendrot, Sophie Quattrocchi:
Silent vs vocalized articulation for a portable ultrasound-based silent speech interface.
450-453

- Gregor Hofer, Korin Richmond:
Comparison of HMM and TMDN methods for lip synchronisation.
454-457

Paralanguage
- Florian Schiel, Christian Heinrich, Veronika Neumeyer:
Rhythm and formant features for automatic alcohol detection.
458-461

- Irena Yanushevskaya, Christer Gobl, John Kane, Ailbhe Ní Chasaide:
An exploration of voice source correlates of focus.
462-465

- James D. Harnsberger, Rahul Shrivastav, W. S. Brown Jr.:
Modeling perceived vocal age in american English.
466-469

- Marie-José Caraty, Claude Montacié:
Multivariate analysis of vocal fatigue in continuous reading.
470-473

- Alexander Kain, Jan P. H. van Santen:
Frequency-domain delexicalization using surrogate vowels.
474-477

- Florian Metze, Anton Batliner, Florian Eyben, Tim Polzehl, Björn Schuller, Stefan Steidl:
Emotion recognition using imperfect speech recognition.
478-481

- Gang Liu, Yun Lei, John H. L. Hansen:
A novel feature extraction strategy for multi-stream robust emotion identification.
482-485

- Asterios Toutios, Utpala Musti, Slim Ouni, Vincent Colotte, Brigitte Wrobel-Dautcourt, Marie-Odile Berger:
Setup for acoustic-visual speech synthesis by concatenating bimodal units.
486-489

- Bart Jochems, Martha Larson, Roeland Ordelman, Ronald Poppe, Khiet P. Truong:
Towards affective state modeling in narrative and conversational settings.
490-493

- Narichika Nomoto, Hirokazu Masataki, Osamu Yoshioka, Satoshi Takahashi:
Detection of anger emotion in dialog speech using prosody feature and temporal relation of utterances.
494-497

- Benjamin Roustan, Marion Dohen:
Gesture and speech coordination: the influence of the relationship between manual gesture and speech.
498-501

- Hynek Boril, Seyed Omid Sadjadi, Tristan Kleinschmidt, John H. L. Hansen:
Analysis and detection of cognitive load and frustration in drivers' speech.
502-505

- Akira Sasou, Yasuharu Hashimoto, Katsuhiko Sakaue:
Acoustic-based recognition of head gestures accompanying speech.
506-509

- Sandro Castronovo, Angela Mahr, Margarita Pentcheva, Christian A. Müller:
Multimodal dialog in the car: combining speech and turn-and-push dial to control comfort functions.
510-513

- Danil Korchagin, Philip N. Garner, Petr Motlícek:
Hands free audio analysis from home entertainment.
514-517

- Shaikh Mostafa Al Masum, Antonio Rui Ferreira Rebordão, Keikichi Hirose:
Affective story teller: a TTS system for emotional expressivity.
518-521

ASR:
Speaker Adaptation, Robustness Against Reverberation
- Shweta Ghai, Rohit Sinha:
Enhancing children's speech recognition under mismatched condition by explicit acoustic normalization.
522-525

- Bo Li, Khe Chai Sim:
Comparison of discriminative input and output transformations for speaker adaptation in the hybrid NN/HMM systems.
526-529

- Ravichander Vipperla, Steve Renals, Joe Frankel:
Augmentation of adaptation data.
530-533

- Lukás Machlica, Zbynek Zajíc, Ludek Müller:
Discriminative adaptation based on fast combination of DMAP and dfMLLR.
534-537

- Doddipatla Rama Sanand, Ralf Schlüter, Hermann Ney:
Revisiting VTLN using linear transformation on conventional MFCC.
538-541

- Toyohiro Hayashi, Yoshihiko Nankaku, Akinobu Lee, Keiichi Tokuda:
Speaker adaptation based on nonlinear spectral transform for speech recognition.
542-545

- Tetsuo Kosaka, Takashi Ito, Masaharu Katoh, Masaki Kohda:
Speaker adaptation based on system combination using speaker-class models.
546-549

- Yongwon Jeong, Young Rok Song, Hyung Soon Kim:
Speaker adaptation in transformation space using two-dimensional PCA.
550-553

- Jan Trmal, Jan Zelinka, Ludek Müller:
On speaker adaptive training of artificial neural networks.
554-557

- Yongjun He, Jiqing Han:
Model synthesis for band-limited speech recognition.
558-561

- Takahiro Fukumori, Masanori Morise, Takanobu Nishiura:
Performance estimation of reverberant speech recognition based on reverberant criteria RSR-dn with acoustic parameters.
562-565

- Armin Sehr, Christian Hofmann, Roland Maas, Walter Kellermann:
A novel approach for matched reverberant training of HMMs using data pairs.
566-569

- Hari Krishna Maganti, Marco Matassoni:
An auditory based modulation spectral feature for reverberant speech recognition.
570-573

- Martin Wolf, Climent Nadeu:
On the potential of channel selection for recognition of reverberated speech with multiple microphones.
574-577

- Randy Gomez, Tatsuya Kawahara:
An improved wavelet-based dereverberation for robust automatic speech recognition.
578-581

- Rico Petrick, Thomas Fehér, Masashi Unoki, Rüdiger Hoffmann:
Methods for robust speech recognition in reverberant environments: a comparison.
582-585

Language Learning, TTS, and Other Applications
- Masayuki Suzuki, Yu Qiao, Nobuaki Minematsu, Keikichi Hirose:
Integration of multilayer regression analysis with structure-based pronunciation assessment.
586-589

- Joost van Doremalen, Catia Cucchiarini, Helmer Strik:
Using non-native error patterns to improve pronunciation verification.
590-593

- Dean Luo, Yu Qiao, Nobuaki Minematsu, Yutaka Yamauchi, Keikichi Hirose:
Regularized-MLLR speaker adaptation for computer-assisted language learning system.
594-597

- Kuniaki Hirabayashi, Seiichi Nakagawa:
Automatic evaluation of English pronunciation by Japanese speakers using various acoustic features and pattern recognition techniques.
598-601

- Hsien-Cheng Liao, Jiang-Chun Chen, Sen-Chia Chang, Ying-Hua Guan, Chin-Hui Lee:
Decision tree based tone modeling with corrective feedbacks for automatic Mandarin tone assessment.
602-605

- Jingli Lu, Ruili Wang, Liyanage C. De Silva, Yang Gao, Jia Liu:
CASTLE: a computer-assisted stress teaching and learning environment for learners of English as a second language.
606-609

- Shen Huang, Hongyan Li, Shijin Wang, Jiaen Liang, Bo Xu:
Automatic reference independent evaluation of prosody quality using multiple knowledge fusions.
610-613

- Su-Youn Yoon, Mark Hasegawa-Johnson, Richard Sproat:
Landmark-based automated pronunciation error detection.
614-617

- Zhiwei Shuang, Shiyin Kang, Yong Qin, Li-Rong Dai, Lianhong Cai:
HMM based TTS for mixed language text.
618-621

- Hui Liang, John Dines:
An analysis of language mismatch in HMM state mapping-based cross-lingual speaker adaptation.
622-625

- Tatsuya Kawahara, Norihiro Katsumaru, Yuya Akita, Shinsuke Mori:
Classroom note-taking system for hearing impaired students using automatic speech recognition adapted to lectures.
626-629

- Paul R. Dixon, Sadaoki Furui:
Exploring web-browser based runtimes engines for creating ubiquitous speech interfaces.
630-632

Pitch and Glottal-Waveform Estimation and Modeling I, II
- Xuejing Sun, Sameer Gadre:
Efficient three-stage pitch estimation for packet loss concealment.
633-636

- Keiichi Funaki:
On evaluation of the f0 estimation based on time-varying complex speech analysis.
637-640

- Feng Huang, Tan Lee:
Pitch estimation in noisy speech based on temporal accumulation of spectrum peaks.
641-644

- Tianyu T. Wang, Thomas F. Quatieri:
Multi-pitch estimation by a joint 2-d representation of pitch and pitch dynamics.
645-648

- Pirros Tsiakoulis, Alexandros Potamianos:
On the effect of fundamental frequency on amplitude and frequency modulation patterns in speech resonances.
649-652

- M. Shahidur Rahman, Tetsuya Shimamura:
Pitch determination using autocorrelation function in spectral domain.
653-656

- Thomas Drugman, Thierry Dutoit:
Chirp complex cepstrum-based decomposition for asynchronous glottal analysis.
657-660

- Alan Ó Cinnéide, David Dorran, Mikel Gainza, Eugene Coyle:
Exploiting glottal formant parameters for glottal inverse filtering and parameterization.
661-664

- Nicolas Sturmel, Christophe d'Alessandro, Boris Doval:
Glottal parameters estimation on speech using the zeros of the z-transform.
665-668

- Sri Harish Reddy Mallidi, Kishore Prahallad, Suryakanth V. Gangashetty, B. Yegnanarayana:
Significance of pitch synchronous analysis for speaker recognition using AANN models.
669-672

- Gang Chen, Xue Feng, Yen-Liang Shue, Abeer Alwan:
On using voice source measures in automatic gender classification of children's speech.
673-676

- Wei Chu, Abeer Alwan:
SAFE: a statistical algorithm for F0 estimation for both clean and noisy speech.
2590-2593

- Jung Ook Hong, Patrick J. Wolfe:
Robust and efficient pitch estimation using an iterative ARMA technique.
2594-2597

- Yasunori Ohishi, Hirokazu Kameoka, Daichi Mochihashi, Hidehisa Nagano, Kunio Kashino:
Statistical modeling of F0 dynamics in singing voices based on Gaussian processes with multiple oscillation bases.
2598-2601

- Martin Heckmann, Claudius Gläser, Frank Joublin, Kazuhiro Nakadai:
Applying geometric source separation for improved pitch extraction in human-robot interaction.
2602-2605

- John Kane, Mark Kane, Christer Gobl:
A spectral LF model based approach to voice source parameterisation.
2606-2609

- Thomas Drugman, Thierry Dutoit:
Glottal-based analysis of the lombard effect.
2610-2613

Open Vocabulary Spoken Document Retrieval (Special Session)
- Yoshiaki Itoh, Hiromitsu Nishizaki, Xinhui Hu, Hiroaki Nanjo, Tomoyosi Akiba, Tatsuya Kawahara, Seiichi Nakagawa, Tomoko Matsui, Yoichi Yamashita, Kiyoaki Aikawa:
Constructing Japanese test collections for spoken term detection.
677-680

- Satoshi Natori, Hiromitsu Nishizaki, Yoshihiro Sekiguchi:
Japanese spoken term detection using syllable transition network derived from multiple speech recognizers' outputs.
681-684

- Sha Meng, Weiqiang Zhang, Jia Liu:
Combining Chinese spoken term detection systems via side-information conditioned linear logistic regression.
685-688

- Taisuke Kaneko, Tomoyosi Akiba:
Metric subspace indexing for fast spoken term detection.
689-692

- Chun-an Chan, Lin-Shan Lee:
Unsupervised spoken-term detection with spoken queries using segment-based dynamic time warping.
693-696

- Daniel Schneider, Timo Mertens, Martha Larson, Joachim Köhler:
Contextual verification for open vocabulary spoken term detection.
697-700

- Javier Tejedor, Doroteo Torre Toledano, Miguel Bautista, Simon King, Dong Wang, José Colás:
Augmented set of features for confidence estimation in spoken term detection.
701-704

- Xinhui Hu, Ryosuke Isotani, Hisashi Kawai, Satoshi Nakamura:
Cluster-based language model for spoken document retrieval using NMF-based document clustering.
705-708

Robust ASR
- Rogier C. van Dalen, Mark J. F. Gales:
Asymptotically exact noise-corrupted speech likelihoods.
709-712

- Ramón Fernandez Astudillo, Reinhold Orglmeister:
A MMSE estimator in mel-cepstral domain for robust large vocabulary automatic speech recognition using uncertainty propagation.
713-716

- Bhiksha Raj, Tuomas Virtanen, Sourish Chaudhuri, Rita Singh:
Non-negative matrix factorization based compensation of music for automatic speech recognition.
717-720

- Kris Demuynck, Xueru Zhang, Dirk Van Compernolle, Hugo Van Hamme:
Feature versus model based noise robustness.
721-724

- Ji Hun Park, Seon Man Kim, Jae Sam Yoon, Hong Kook Kim, Sung Joo Lee, Yunkeun Lee:
SNR-based mask compensation for computational auditory scene analysis applied to speech recognition in a car environment.
725-728

- Chanwoo Kim, Richard M. Stern, Kiwan Eom, Jaewon Lee:
Automatic selection of thresholds for signal separation algorithms based on interaural delay.
729-732

Language and Dialect Identification
- Florian Verdet, Driss Matrouf, Jean-François Bonastre, Jean Hennebert:
Channel detectors for system fusion in the context of NIST LRE 2009.
733-736

- Rong Tong, Bin Ma, Haizhou Li, Engsiong Chng:
Selecting phonotactic features for language recognition.
737-740

- Abualsoud Hanani, Michael J. Carey, Martin J. Russell:
Improved language recognition using mixture components statistics.
741-744

- Mikel Peñagarikano, Amparo Varona, Luis Javier Rodríguez-Fuentes, Germán Bordel:
Using cross-decoder co-occurrences of phone n-grams in SVM-based phonotactic language recognition.
745-748

- Oscar Koller, Alberto Abad, Isabel Trancoso, Céu Viana:
Exploiting variety-dependent phones in portuguese variety identification applied to broadcast news transcription.
749-752

- Fadi Biadsy, Julia Hirschberg, Michael Collins:
Dialect recognition using a phone-GMM-supervector-based SVM kernel.
753-756

Technologies for Learning and Education
- Xiaojun Qian, Frank K. Soong, Helen M. Meng:
Discriminative acoustic model for improving mispronunciation detection and diagnosis in computer-aided pronunciation training (CAPT).
757-760

- Liang-Yu Chen, Jyh-Shing Roger Jang:
Automatic pronunciation scoring using learning to rank and DP-based score segmentation.
761-764

- Wai Kit Lo, Shuang Zhang, Helen M. Meng:
Automatic derivation of phonological rules for mispronunciation detection in a computer-assisted pronunciation training system.
765-768

- Minh Duong, Jack Mostow:
Adapting a duration synthesis model to rate children's oral reading prosody.
769-772

- Su-Youn Yoon, Lei Chen, Klaus Zechner:
Predicting word accuracy for the automatic speech recognition of non-native speech.
773-776

- Taotao Zhu, Dengfeng Ke, Zhenbiao Chen, Bo Xu:
A new approach for automatic tone error detection in strong accented Mandarin based on dominant set.
777-780

Emotional Speech
- S. R. Mahadeva Prasanna, D. Govind:
Analysis of excitation source information in emotional speech.
781-784

- Dongrui Wu, Thomas D. Parsons, Shrikanth S. Narayanan:
Acoustic feature analysis in speech emotion primitives estimation.
785-788

- Lan-Ying Yeh, Tai-Shih Chi:
Spectro-temporal modulations for robust speech emotion recognition.
789-792

- Chi-Chun Lee, Matthew Black, Athanasios Katsamanis, Adam C. Lammert, Brian R. Baucom, Andrew Christensen, Panayiotis G. Georgiou, Shrikanth S. Narayanan:
Quantification of prosodic entrainment in affective spontaneous spoken interactions of married couples.
793-796

- Emily Mower, Kyu Jeong Han, Sungbok Lee, Shrikanth S. Narayanan:
A cluster-profile representation of emotion using agglomerative hierarchical clustering.
797-800

- Björn Schuller, Laurence Devillers:
Incremental acoustic valence recognition: an inter-corpus perspective on features, matching, and performance in a gating paradigm.
801-804

New Paradigms in ASR I, II
- Xiaodong Wang, Kunihiko Owa, Makoto Shozakai:
Mandarin digit recognition assisted by selective tone distinction.
857-860

- Kazuhiko Abe, Sakriani Sakti, Ryosuke Isotani, Hisashi Kawai, Satoshi Nakamura:
Brazilian portuguese acoustic model training based on data borrowing from other language.
861-864

- Ngoc Thang Vu, Tim Schlippe, Franziska Kraus, Tanja Schultz:
Rapid bootstrapping of five eastern european languages using the rapid language adaptation toolkit.
865-868

- Houwei Cao, Tan Lee, P. C. Ching:
Cross-lingual speaker adaptation via Gaussian component mapping.
869-872

- Mohamed Elmahdy, Rainer Gruhn, Wolfgang Minker, Slim Abdennadher:
Cross-lingual acoustic modeling for dialectal Arabic speech recognition.
873-876

- Samuel Thomas, Sriram Ganapathy, Hynek Hermansky:
Cross-lingual and multi-stream posterior features for low resource LVCSR systems.
877-880

- Shiva Sundaram, Jerome R. Bellegarda:
Latent perceptual mapping: a new acoustic modeling framework for speech recognition.
881-884

- Richard Dufour, Fethi Bougares, Yannick Estève, Paul Deléglise:
Unsupervised model adaptation on targeted speech segments for LVCSR system combination.
885-888

- Irene Ayllón Clemente, Martin Heckmann, Alexander Denecke, Britta Wrede, Christian Goerick:
Incremental word learning using large-margin discriminative training and variance floor estimation.
889-892

- Tuomas Virtanen, Jort F. Gemmeke, Antti Hurmalainen:
State-based labelling for a sparse representation of speech and its application to robust speech recognition.
893-896

- Mirko Hannemann, Stefan Kombrink, Martin Karafiát, Lukas Burget:
Similarity scoring for recognizing repeated out-of-vocabulary words.
897-900

- Dino Seppi, Dirk Van Compernolle:
Data pruning for template-based automatic speech recognition.
901-904

- Man-Hung Siu, Herbert Gish, Arthur Chan, William Belfield:
Improved topic classification and keyword discovery using an HMM-based speech recognizer trained without supervision.
2838-2841

- Dimitri Kanevsky, Tara N. Sainath, Bhuvana Ramabhadran, David Nahamoo:
An analysis of sparseness and regularization in exemplar-based methods for speech classification.
2842-2845

- Abdel-rahman Mohamed, Dong Yu, Li Deng:
Investigation of full-sequence training of deep belief networks for speech recognition.
2846-2849

- Yow-Bang Wang, Lin-Shan Lee:
Mandarin tone recognition using affine-invariant prosodic features and tone posteriorgram.
2850-2853

- Geoffrey Zweig, Patrick Nguyen, Jasha Droppo, Alex Acero:
Continuous speech recognition with a TF-IDF acoustic model.
2854-2857

- Geoffrey Zweig, Patrick Nguyen:
SCARF: a segmental conditional random field toolkit for speech recognition.
2858-2861

Speech Production:
Various Approaches
- Akiko Amano-Kusumoto, John-Paul Hosom, Alexander Kain:
Speaking style dependency of formant targets.
905-908

- Tatsuya Kitamura:
Similarity of effects of emotions on the speech organ configuration with and without speaking.
909-912

- Daniel Bone, Samuel Kim, Sungbok Lee, Shrikanth S. Narayanan:
A study of intra-speaker and inter-speaker affective variability using electroglottograph and inverse filtered glottal waveforms.
913-916

- Ken-Ichi Sakakibara, Hiroshi Imagawa, Miwako Kimura, Hisayuki Yokonishi, Niro Tayama:
Modal analysis of vocal fold vibrations using laryngotopography.
917-920

- Martti Vainio, Matti Airas, Juhani Järvikivi, Paavo Alku:
Laryngeal voice quality in the expression of focus.
921-924

- Masako Fujimoto, Kikuo Maekawa, Seiya Funatsu:
Laryngeal characteristics during the production of geminate consonants.
925-928

- Julien Cisonni, Kazunori Nozaki, Annemie Van Hirtum, Shigeo Wada:
Numerical study of turbulent flow-induced sound production in presence of a tooth-shaped obstacle: towards sibilant [s] physical modeling.
929-932

- Iris Hanique, Barbara Schuppler, Mirjam Ernestus:
Morphological and predictability effects on schwa reduction: the case of dutch word-initial syllables.
933-936

- Samer Al Moubayed, G. Ananthakrishnan:
Acoustic-to-articulatory inversion based on local regression.
937-940

- Mirjam Broersma:
Korean lenis, fortis, and aspirated stops: effect of place of articulation on acoustic realization.
941-944

- Toru Nakashika, Ryuki Tachibana, Masafumi Nishimura, Tetsuya Takiguchi, Yasuo Ariki:
Speech synthesis by modeling harmonics structure with multiple function.
945-948

- Makoto Otani, Tatsuya Hirahara:
Physics of body-conducted silent speech - production, propagation and representation of non-audible murmur.
949-952

Speech Enhancement
- Subhojit Chakladar, Nam Soo Kim, Yu Gwang Jin, Tae Gyoon Kang:
Multichannel noise reduction using low order RTF estimate.
953-956

- Inho Lee, Jongsung Yoon, Yoonjae Lee, Hanseok Ko:
Reinforced blocking matrix with cross channel projection for speech enhancement.
957-960

- Ning Cheng, Wenju Liu, Lan Wang:
Masking property based microphone array post-filter design.
961-964

- Yusuke Sato, Tetsuya Hoya, Hovagim Bakardjian, Andrzej Cichocki:
Reduction of broadband noise in speech signals by multilinear subspace analysis.
965-968

- Jungpyo Hong, Seung Ho Han, Sangbae Jeong, Minsoo Hahn:
Novel probabilistic control of noise reduction for improved microphone array beamforming.
969-972

- Kai Li, Qiang Fu, Yonghong Yan:
Speech enhancement using improved generalized sidelobe canceller in frequency domain with multi-channel postfiltering.
973-976

- Jani Even, Carlos Toshinori Ishi, Hiroshi Saruwatari, Norihiro Hagita:
Close speaker cancellation for suppression of non-stationary background noise for hands-free speech interface.
977-980

- Ajay Srinivasamurthy, Thippur V. Sreenivas:
Multi-channel iterative dereverberation based on codebook constrained iterative multi-channel wiener filter.
981-984

- Anand Joseph Xavier Medabalimi, Sri Harish Reddy Mallidi, B. Yegnanarayana:
Speaker-dependent mapping of source and system features for enhancement of throat microphone speech.
985-988

- Jun Cai, Stefano Marini, Pierre Malarme, Francis Grenez, Jean Schoentgen:
An analytic modeling approach to enhancing throat microphone speech commands for keyword spotting.
989-992

- Stephen So, Kamil K. Wójcicki, Kuldip K. Paliwal:
Single-channel speech enhancement using kalman filtering in the modulation domain.
993-996

- Miao Yao, Weiqian Liang:
Integrated feedback and noise reduction algorithm in digital hearing aids via oscillation detection.
997-1000

- Charles Mercier, Roch Lefebvre:
A blind signal-to-noise ratio estimator for high noise speech recordings.
1001-1004

Special Session:
Fact and Replica of Speech Production (Special Session)
- Hiroshi Imagawa, Ken-Ichi Sakakibara, Isao T. Tokuda, Mamiko Otsuka, Niro Tayama:
Estimation of glottal area function using stereo-endoscopic high-speed digital imaging.
1005-1008

- Kazunori Nozaki, Youhei Ohnishi, Takashi Suda, Shigeo Wada, Shinji Shimojo:
Toward aero-acoustical analysis of the sibilant /s/: an oral cavity modeling.
1009-1012

- Kunitoshi Motoki:
Effects of wall impedance on transmission and attenuation of higher-order modes in vocal-tract model.
1013-1016

- Peter Birkholz, Bernd J. Kröger, Christiane Neuschaefer-Rube:
Articulatory synthesis and perception of plosive-vowel syllables with virtual consonant targets.
1017-1020

- Kotaro Fukui, Toshihiro Kusano, Yoshikazu Mukaeda, Yuto Suzuki, Atsuo Takanishi, Masaaki Honda:
Speech robot mimicking human articulatory motion.
1021-1024

- Takayuki Arai:
Mechanical vocal-tract models for speech dynamics.
1025-1028

- Michael C. Brady:
Prosodic timing analysis for articulatory re-synthesis using a bank of resonators with an adaptive oscillator.
1029-1032

ASR:
Language Modeling
- Ahmad Emami, Stanley F. Chen, Abraham Ittycheriah, Hagen Soltau, Bing Zhao:
Decoding with shrinkage-based language models.
1033-1036

- Stanley F. Chen, Stephen M. Chu:
Enhanced word classing for model M.
1037-1040

- Junho Park, Xunying Liu, Mark J. F. Gales, Philip C. Woodland:
Improved neural network based language modelling and adaptation.
1041-1044

- Tomas Mikolov, Martin Karafiát, Lukas Burget, Jan Cernocký, Sanjeev Khudanpur:
Recurrent neural network based language model.
1045-1048

- Preethi Jyothi, Eric Fosler-Lussier:
Discriminative language modeling using simulated ASR errors.
1049-1052

- Graham Neubig, Masato Mimura, Shinsuke Mori, Tatsuya Kawahara:
Learning a language model from continuous speech.
1053-1056

Single-Channel Speech Enhancement
Speech Synthesis:
Miscellaneous Topics
- Kalu U. Ogbureke, Peter Cahill, Julie Carson-Berndsen:
Hidden Markov models with context-sensitive observations for grapheme-to-phoneme conversion.
1105-1108

- Brian Langner, Stephan Vogel, Alan W. Black:
Evaluating a dialog language generation system: comparing the mountain system to other NLG approaches.
1109-1112

- Wesley Mattheyses, Lukas Latacz, Werner Verhelst:
Active appearance models for photorealistic visual speech synthesis.
1113-1116

- Jerome R. Bellegarda:
Latent affective mapping: a novel framework for the data-driven analysis of emotion in text.
1117-1120

- Anna C. Janska, Robert A. J. Clark:
Native and non-native speaker judgements on the quality of synthesized speech.
1121-1124

- Dominic Espinosa, Michael White, Eric Fosler-Lussier, Chris Brew:
Machine learning for text selection with expressive unit-selection voices.
1125-1128

Prosody:
Basics & Applications
- Alexei V. Ivanov, Giuseppe Riccardi, Sucheta Ghosh, Sara Tonelli, Evgeny A. Stepanov:
Acoustic correlates of meaning structure in conversational speech.
1129-1132

- Nicolas Obin, Xavier Rodet, Anne Lacheret:
HMM-based prosodic structure model using rich linguistic context.
1133-1136

- Charlotte Wollermann, Bernhard Schröder, Ulrich Schade:
Audiovisual congruence and pragmatic focus marking.
1137-1140

- Margaret Zellers, Michele Gubian, Brechtje Post:
Redescribing intonational categories with functional data analysis.
1141-1144

- Shen Huang, Hongyan Li, Shijin Wang, Jiaen Liang, Bo Xu:
Exploring goodness of prosody by diverse matching templates.
1145-1148

- Mickael Rouvier, Richard Dufour, Georges Linarès, Yannick Estève:
A language-identification inspired method for spontaneous speech detection.
1149-1152

- Gérard Bailly, Amélie Lelong:
Speech dominoes and phonetic convergence.
1153-1156

- Mátyás Brendel, Riccardo Zaccarelli, Laurence Devillers:
A quick sequential forward floating feature selection algorithm for emotion detection from speech.
1157-1160

- Géza Kiss, Jan P. H. van Santen:
Automated vocal emotion recognition using phoneme class specific features.
1161-1164

- Adrian Pass, Jianguo Zhang, Darryl Stewart:
Feature selection for pose invariant lip biometrics.
1165-1168

- Hussein Hussein, Rüdiger Hoffmann:
Signal-based accent and phrase marking using the fujisaki model.
1169-1172

- Jangwon Kim, Sungbok Lee, Shrikanth S. Narayanan:
A study of interplay between articulatory movement and prosodic characteristics in emotional speech production.
1173-1176

ASR:
Feature Extraction I, II
- Shang-wen Li, Liang-Che Sun, Lin-Shan Lee:
Improved phoneme recognition by integrating evidence from spectro-temporal and cepstral features.
1177-1180

- Suman V. Ravuri, Nelson Morgan:
Using spectro-temporal features to improve AFE feature extraction for ASR.
1181-1184

- Ibon Saratxaga, Inma Hernáez, Igor Odriozola, Eva Navas, Iker Luengo, Daniel Erro:
Using harmonic phase information to improve ASR rate.
1185-1188

- Kazumasa Yamamoto, Eiichi Sueyoshi, Seiichi Nakagawa:
Speech recognition using long-term phase information.
1189-1192

- Jan Zelinka, Jan Trmal, Ludek Müller:
Low-dimensional space transforms of posteriors in speech recognition.
1193-1196

- Christian Plahl, Ralf Schlüter, Hermann Ney:
Hierarchical bottle neck features for LVCSR.
1197-1200

- Frantisek Grézl, Martin Karafiát:
Hierarchical neural net architectures for feature extraction in ASR.
1201-1204

- Vivek Kumar Rangarajan Sridhar, Rohit Prasad, Prem Natarajan:
Mutual information analysis for feature and sensor subset selection in surface electromyography based speech recognition.
1205-1208

- Bernd T. Meyer, Birger Kollmeier:
Learning from human errors: prediction of phoneme confusions based on modified ASR training.
1209-1212

- Bo Li, Khe Chai Sim:
Hidden logistic linear regression for support vector machine based phone verification.
2614-2617

- Tim Ng, Bing Zhang, Long Nguyen:
Jointly optimized discriminative features for speech recognition.
2618-2621

- Florian Müller, Alfred Mertins:
Invariant integration features combined with speaker-adaptation methods.
2622-2625

- Mark Raugas, Vivek Kumar Rangarajan Sridhar, Rohit Prasad, Prem Natarajan:
Multi resolution discriminative models for subvocalic speech recognition.
2626-2629

- Fabio Valente, Mathew Magimai-Doss, Christian Plahl, Suman V. Ravuri, Wen Wang:
A comparative large scale study of MLP features for Mandarin ASR.
2630-2633

- Cong-Thanh Do, Dominique Pastor, Gaël Le Lan, André Goalic:
Recognizing cochlear implant-like spectrally reduced speech with HMM-based ASR: experiments with MFCCs and PLP coefficients.
2634-2637

Speech Perception:
Cross Language and Age
- Kazuhiro Kondo, Takayuki Kanda, Yosuke Kobayashi, Hiroyuki Yagyu:
Speech intelligibility of diagonally localized speech with competing noise using bone-conduction headphones.
1213-1216

- Pierre L. Divenyi:
Masking of vowel-analog transitions by vowel-analog distracters.
1217-1220

- François Pellegrino, Emmanuel Ferragne, Fanny Meunier:
2010, a speech oddity: phonetic transcription of reversed speech.
1221-1224

- Hsin-Yi Lin, Janice Fon:
Perception on pitch reset at discourse boundaries.
1225-1228

- Marjorie Dole, Michel Hoen, Fanny Meunier:
Effect of spatial separation on speech-in-noise comprehension in dyslexic adults.
1229-1232

- Ellen Marklund, Francisco Lacerda, Anna Ericsson:
Speech categorization context effects in seven- to nine-month-old infants.
1233-1236

- Diane Kewley-Port, Larry E. Humes, Daniel Fogerty:
Changes in temporal processing of speech across the adult lifespan.
1237-1240

- Jared Bernstein, Jian Cheng, Masanori Suzuki:
Fluency and structural complexity as predictors of L2 oral proficiency.
1241-1244

- Marco van de Ven, Benjamin V. Tucker, Mirjam Ernestus:
Semantic facilitation in bilingual everyday speech comprehension.
1245-1248

- Bo-ren Hsieh, Ho-hsien Pan:
L2 experience and non-native vowel categorization of L1-Mandarin speakers.
1249-1252

- Mirjam Wester:
Cross-lingual talker discrimination.
1253-1256

- Takashi Otake:
Dajare is not the lowest form of wit.
1257-1260

SLP Systems
- Rafael Torres, Shota Takeuchi, Hiromichi Kawanami, Tomoko Matsui, Hiroshi Saruwatari, Kiyohiro Shikano:
Comparison of methods for topic classification in a speech-oriented guidance system.
1261-1264

- Pere Comas, Jordi Turmo, Lluís Màrquez:
Using dependency parsing and machine learning for factoid question answering on spoken documents.
1265-1268

- Carolina Parada, Abhinav Sethy, Mark Dredze, Frederick Jelinek:
A spoken term detection framework for recovering out-of-vocabulary words using the web.
1269-1272

- Hung-yi Lee, Chia-Ping Chen, Ching-feng Yeh, Lin-Shan Lee:
Improved spoken term detection by discriminative training of acoustic models based on user relevance feedback.
1273-1276

- Sebastian Tschöpel, Daniel Schneider:
A lightweight keyword and tag-cloud retrieval algorithm for automatic speech recognition transcripts.
1277-1280

- Noboru Kanedera, Tetsuo Funada, Seiichi Nakagawa:
Lecture subtopic retrieval by retrieval keyword expansion using subordinate concept.
1281-1284

- Hiroaki Nanjo, Yusuke Iyonaga, Takehiko Yoshimi:
Spoken document retrieval for oral presentations integrating global document similarities into local document similarities.
1285-1288

- Joseph Polifroni, Stephanie Seneff:
Combining word-based features, statistical language models, and parsing for named entity recognition.
1289-1292

- Azeddine Zidouni, Sophie Rosset, Hervé Glotin:
Efficient combined approach for named entity recognition in spoken language.
1293-1296

- Sree Harsha Yella, Vasudeva Varma, Kishore Prahallad:
Prominence based scoring of speech segments for automatic speech-to-speech summarization.
1297-1300

- Zihan Liu, Lei Xie, Wei Feng:
Maximum lexical cohesion for fine-grained news story segmentation.
1301-1304

- Xiaoxuan Wang, Lei Xie, Bin Ma, Engsiong Chng, Haizhou Li:
Phoneme lattice based texttiling towards multilingual story segmentation.
1305-1308

Quality of Experiencing Speech Services (Special Session)
- Anton Schlesinger, Marinus M. Boone:
The characterization of the relative information content by spectral features for the objective intelligibility assessment of nonlinearly processed speech.
1309-1312

- Marcel Wältermann, Alexander Raake, Sebastian Möller:
Analytical assessment and distance modeling of speech transmission quality.
1313-1316

- Nicolas Côté, Vincent Koehl, Valérie Gautier-Turbin, Alexander Raake, Sebastian Möller:
An intrusive super-wideband speech quality model: DIAL.
1317-1320

- Sebastian Egger, Raimund Schatz, Stefan Scherer:
It takes two to tango - assessing the impact of delay on conversational interactivity on perceived speech quality.
1321-1324

- Sebastian Möller, Florian Hinterleitner, Tiago H. Falk, Tim Polzehl:
Comparison of approaches for instrumentally predicting the quality of text-to-speech systems.
1325-1328

- Imre Kiss, Joseph Polifroni, Chao Wang, Ghinwa F. Choueiter, Mike Phillips:
A hybrid architecture for mobile voice user interfaces.
1329-1332

- Markku Turunen, Jaakko Hakulinen, Tomi Heimonen:
Assessment of spoken and multimodal applications: lessons learned from laboratory and field studies.
1333-1336

- Klaus-Peter Engelbrecht, Hamed Ketabdar, Sebastian Möller:
Improving cross database prediction of dialogue quality using mixture of experts.
1337-1340

Language Processing
Speech and Audio Segmentation
- Sarah Hoffmann, Beat Pfister:
Fully automatic segmentation for prosodic speech corpora.
1389-1392

- Vahid Khanagha, Khalid Daoudi, Oriol Pont, Hussein M. Yahia:
A novel text-independent phonetic segmentation algorithm based on the microcanonical multiscale formalism.
1393-1396

- You-Yu Lin, Yih-Ru Wang, Yuan-Fu Liao:
Phone boundary detection using sample-based acoustic parameters.
1397-1400

- Utpala Musti, Asterios Toutios, Slim Ouni, Vincent Colotte, Brigitte Wrobel-Dautcourt, Marie-Odile Berger:
HMM-based automatic visual speech segmentation using facial data.
1401-1404

- David Wang, Robert Vogt, Sridha Sridharan:
Bayes factor based speaker segmentation for speaker diarization.
1405-1408

- Qiang Huang, Stephen J. Cox:
Using high-level information to detect key audio events in a tennis game.
1409-1412

Prosody:
Analysis
- Catherine Lai:
What do you mean, you're uncertain?: the interpretation of cue words and rising intonation in dialogue.
1413-1416

- Yi-Fen Liu, Shu-Chuan Tseng, Jyh-Shing Roger Jang, C.-H. Alvin Chen:
Coping imbalanced prosodic unit boundary detection with linguistically-motivated prosodic features.
1417-1420

- Zhigang Chen, Guoping Hu, Wei Jiang:
Improving prosodic phrase prediction by unsupervised adaptation and syntactic features extraction.
1421-1424

- Yujia Li, Tan Lee:
Perception-based automatic approximation of F0 contours in Cantonese speech.
1425-1428

- Raul Fernandez, Bhuvana Ramabhadran:
Discriminative training and unsupervised adaptation for labeling prosodic events with limited training data.
1429-1432

- Erin Cvejic, Jeesun Kim, Chris Davis, Guillaume Gibert:
Prosody for the eyes: quantifying visual prosody using guided principal component analysis.
1433-1436

Systems for LVCSR and Rich Transcription
- Naveen Parihar, Ralf Schlüter, David Rybach, Eric A. Hansen:
Parallel lexical-tree based LVCSR on multi-core processors.
1485-1488

- Jike Chong, Ekaterina Gonina, Kisun You, Kurt Keutzer:
Exploring recognition network representations for efficient speech inference on highly parallel platforms.
1489-1492

- Diamantino Caseiro:
WFST compression for automatic speech recognition.
1493-1496

- Ivan Bulyko:
Speech recognizer optimization under speed constraints.
1497-1500

- Florian Metze, Roger Hsiao, Qin Jin, Udhyakumar Nallasamy, Tanja Schultz:
The 2010 CMU GALE speech-to-text system.
1501-1504

- Tin Lay Nwe, Hanwu Sun, Bin Ma, Haizhou Li:
Speaker diarization in meeting audio for single distant microphone.
1505-1508

- Fernando Batista, Helena Moniz, Isabel Trancoso, Hugo Meinedo, Ana Isabel Mata, Nuno J. Mamede:
Extending the punctuation module for european portuguese.
1509-1512

- Sakriani Sakti, Ryosuke Isotani, Hisashi Kawai, Satoshi Nakamura:
Utilizing a noisy-channel approach for Korean LVCSR.
1513-1516

- Markus Nußbaum-Thom, Simon Wiesler, Martin Sundermeyer, Christian Plahl, Stefan Hahn, Ralf Schlüter, Hermann Ney:
The RWTH 2009 quaero ASR evaluation system for English and German.
1517-1520

Phonetics
- Benjamin Munson, Renata Solum:
When is indexical information about speech activated? evidence from a cross-modal priming experiment.
1521-1524

- Benjamin Munson:
The influence of actual and perceived sexual orientation on diadochokinetic rate in women and men.
1525-1528

- Kristine M. Yu:
Laryngealization and features for Chinese tonal recognition.
1529-1532

- Viet Son Nguyen, Eric Castelli, René Carré:
Production and perception of vietnamese short vowels in V1V2 context.
1533-1536

- Gertraud Fenk-Oczlon, August Fenk:
Measuring basic tempo across languages and some implications for speech rhythm.
1537-1540

- Yukari Hirata, Shigeaki Amano:
Durational structure of Japanese single/geminate stops in three- and four-mora words spoken at varied rates.
1541-1544

- Shin-ichiro Sano, Tomohiko Ooigawa:
Distribution and trichotomic realization of voiced velars in Japanese - an experimental study.
1545-1548

- Jagoda Sieczkowska, Bernd Möbius, Grzegorz Dogil:
Specification in context - devoicing processes in Polish, French, american English and German sonorants.
1549-1552

- Kuniko Nielsen:
Phonetic imitation of Japanese vowel devoicing.
1553-1556

- Mary Stevens, John Hajek:
Post-aspiration in standard Italian: some first cross-regional acoustic evidence.
1557-1560

- Mirko Grimaldi, Andrea Calabrese, Francesco Sigona, Luigina Garrapa, Bianca Sisinni:
Articulatory grounding of southern salentino harmony processes.
1561-1564

- Yuuki Tanida, Taiji Ueno, Satoru Saito, Matthew A. Lambon Ralph:
Effects of accent typicality and phonotactic frequency on nonword immediate serial recall performance in Japanese.
1565-1567

- Osamu Fujimura:
How abstract is phonetics?.
1568-1571

Speech Production:
Vocal Tract Modeling and Imaging
- Adam C. Lammert, Michael I. Proctor, Shrikanth S. Narayanan:
Data-driven analysis of realtime vocal tract MRI using correlated image regions.
1572-1575

- Michael I. Proctor, Daniel Bone, Athanasios Katsamanis, Shrikanth S. Narayanan:
Rapid semi-automatic segmentation of real-time magnetic resonance images for parametric vocal tract analysis.
1576-1579

- Yoon-Chul Kim, Shrikanth S. Narayanan, Krishna S. Nayak:
Improved real-time MRI of oral-velar coordination using a golden-ratio spiral view order.
1580-1583

- Erik Bresch, Athanasios Katsamanis, Louis Goldstein, Shrikanth S. Narayanan:
Statistical multi-stream modeling of real-time MRI articulatory speech data.
1584-1587

- G. Ananthakrishnan, Pierre Badin, Julián Andrés Valdés Vargas, Olov Engwall:
Predicting unseen articulations from multi-speaker articulatory models.
1588-1591

- Chao Qin, Miguel Á. Carreira-Perpiñán:
Estimating missing data sequences in x-ray microbeam recordings.
1592-1595

- Chao Qin, Miguel Á. Carreira-Perpiñán, Mohsen Farhadloo:
Adaptation of a tongue shape model by local feature transformations.
1596-1599

- Sungbok Lee, Shrikanth S. Narayanan:
Vocal tract contour analysis of emotional speech by the functional data curve representation.
1600-1603

- Adam C. Lammert, Louis Goldstein, Khalil Iskarous:
Locally-weighted regression for estimating the forward kinematics of a geometric vocal tract model.
1604-1607

- Michael Reimer, Frank Rudzicz:
Identifying articulatory goals from kinematic data using principal differential analysis.
1608-1611

- Zuheng Ming, Denis Beautemps, Gang Feng, Sébastien Schmerber:
Estimation of speech lip features from discrete cosinus transform.
1612-1615

- Farzaneh Ahmadi, Ian Vince McLoughlin, Hamid R. Sharifzadeh:
Autoregressive modelling for linear prediction of ultrasonic speech.
1616-1619

Speech Intelligibility Enhancement for All Ages, Health Conditions and Environments (Special Session)
- Takayuki Arai, Nao Hodoshima:
Enhanced speech yielding higher intelligibility for all listeners and environments.
1620-1623

- Seyed Omid Sadjadi, Sanjay A. Patil, John H. L. Hansen:
Quality conversion of non-acoustic signals for facilitating human-to-human speech communication under harsh acoustic conditions.
1624-1627

- Keigo Nakamura, Tomoki Toda, Hiroshi Saruwatari, Kiyohiro Shikano:
The use of air-pressure sensor in electrolaryngeal speech enhancement based on statistical voice conversion.
1628-1631

- Gibak Kim, Philipos C. Loizou:
A new binary mask based on noise constraints for improved speech intelligibility.
1632-1635

- Yan Tang, Martin Cooke:
Energy reallocation strategies for speech enhancement in known noise conditions.
1636-1639

- Jing Chen, Thomas Baer, Brian C. J. Moore:
Effects of enhancement of spectral changes on speech quality and subjective speech intelligibility.
1640-1643

ASR:
Acoustic Model Adaptation
- Catherine Breslin, K. K. Chin, Mark J. F. Gales, Kate Knill, Haitian Xu:
Prior information for rapid speaker adaptation.
1644-1647

- Jonas Lööf, Ralf Schlüter, Hermann Ney:
Discriminative adaptation for log-linear acoustic models.
1648-1651

- Dimitra Vergyri, Lori Lamel, Jean-Luc Gauvain:
Automatic speech recognition of multiple accented English data.
1652-1655

- Jinyu Li, Yu Tsao, Chin-Hui Lee:
Shrinkage model adaptation in automatic speech recognition.
1656-1659

- Jinyu Li, Dong Yu, Yifan Gong, Li Deng:
Unscented transform with online distortion estimation for HMM adaptation.
1660-1663

- Michael L. Seltzer, Alex Acero:
HMM adaptation using linear spline interpolation with integrated spline parameter training for robust speech recognition.
1664-1667

SLP Systems for Information Extraction/Retrieval
- Dong Wang, Simon King, Nicholas W. D. Evans, Raphaël Troncy:
CRF-based stochastic pronunciation modeling for out-of-vocabulary spoken term detection.
1668-1671

- Chia-Ping Chen, Hung-yi Lee, Ching-feng Yeh, Lin-Shan Lee:
Improved spoken term detection by feature space pseudo-relevance feedback.
1672-1675

- Aren Jansen, Kenneth Church, Hynek Hermansky:
Towards spoken term discovery at scale with zero resources.
1676-1679

- Evandro B. Gouvêa, Tony Ezzat:
Vocabulary independent spoken query: a case for subword units.
1680-1683

- Shih-Hsiang Lin, Yao-Ming Yeh, Berlin Chen:
Extractive speech summarization - from the view of decision theory.
1684-1687

- Gabriel Murray, Giuseppe Carenini, Raymond T. Ng:
The impact of ASR on abstractive vs. extractive meeting summaries.
1688-1691

Speech Representation
- Li Deng, Michael L. Seltzer, Dong Yu, Alex Acero, Abdel-rahman Mohamed, Geoffrey E. Hinton:
Binary coding of speech spectrograms using a deep auto-encoder.
1692-1695

- Juhan Nam, Gautham J. Mysore, Joachim Ganseman, Kyogu Lee, Jonathan S. Abel:
A super-resolution spectrogram using coupled PLCA.
1696-1699

- Georgios Tzedakis, Yannis Pantazis, Olivier Rosec, Yannis Stylianou:
Fast least-squares solution for sinusoidal, harmonic and quasi-harmonic models.
1700-1703

- Afsaneh Asaei, Hervé Bourlard, Philip N. Garner:
Sparse component analysis for speech recognition in multi-speaker environment.
1704-1707

- Trond Skogstad, Torbjørn Svendsen:
Intra-frame variability as a predictor of frame classifiability.
1708-1711

- Tetsuya Shimamura, Ngoc Dinh Nguyen:
Autocorrelation and double autocorrelation based spectral representations for a noisy word recognition system.
1712-1715

Voice Conversion
- Elina Helander, Hanna Silén, Joaquín Míguez, Moncef Gabbouj:
Maximum a posteriori voice conversion using sequential monte carlo methods.
1716-1719

- Pierre Lanchantin, Xavier Rodet:
Dynamic model selection for spectral voice conversion.
1720-1723

- Takashi Nose, Takao Kobayashi:
Speaker-independent HMM-based voice conversion using quantized fundamental frequency.
1724-1727

- Daisuke Saito, Shinji Watanabe, Atsushi Nakamura, Nobuaki Minematsu:
Probabilistic integration of joint density model and speaker model for voice conversion.
1728-1731

- Zhi-Zheng Wu, Tomi Kinnunen, Engsiong Chng, Haizhou Li:
Text-independent F0 transformation with non-parallel data for voice conversion.
1732-1735

- Xiaodan Zhuang, Lijuan Wang, Frank K. Soong, Mark Hasegawa-Johnson:
A minimum converted trajectory error (MCTE) approach to high quality speech-to-lips conversion.
1736-1739

Prosody:
Language-Specific Models
- Anastasia Karlsson, David House, Jan-Olof Svantesson, Damrong Tayanin:
Influence of lexical tones on intonation in kammu.
1740-1743

- Satoshi Nambu, Yong-cheol Lee:
Phonetic realization of second occurrence focus in Japanese.
1744-1747

- Jianjing Kuang:
Prosodic grouping and relative clause disambiguation in Mandarin.
1748-1751

- Ya Li, Jianhua Tao, Meng Zhang, Shifeng Pan, Xiaoying Xu:
Text-based unstressed syllable prediction in Mandarin.
1752-1755

- Tomás Dubeda:
"flat pitch accents" in Czech.
1756-1759

- Tomás Dubeda:
Positional variability of pitch accents in Czech.
1760-1763

- Shyamal Das Mandal, Arup Saha, Tulika Basu, Keikichi Hirose, Hiroya Fujisaki:
Modeling of sentence-medial pauses in bangla readout speech: occurrence and duration.
1764-1767

- Adrian Leemann, Lucy Zuberbühler:
Declarative sentence intonation patterns in 8 swiss German dialects.
1768-1771

- Je Hun Jeon, Yang Liu:
Syllable-level prominence detection with acoustic evidence.
1772-1775

- Sankalan Prasad, Kalika Bali:
Prosody cues for classification of the discourse particle "hã" in hindi.
1776-1779

- Yuan Jia, Aijun Li:
Interaction of syntax-marked focus and wh-question induced focus in standard Chinese.
1780-1783

- Samer Al Moubayed, Jonas Beskow:
Prominence detection in Swedish using syllable correlates.
1784-1787

- Na Zhi, Daniel Hirst, Pier Marco Bertinetto:
Automatic analysis of the intonation of a tone language. applying the momel algorithm to spontaneous standard Chinese (beijing).
1788-1791

- Raymond W. M. Ng, Cheung-Chi Leung, Ville Hautamäki, Tan Lee, Bin Ma, Haizhou Li:
Towards long-range prosodic attribute modeling for language recognition.
1792-1795

- Robert Schubert, Oliver Jokisch, Diane Hirschfeld:
A modified parameterization of the Fujisaki model.
1796-1799

ASR:
Language Modeling and Speech Understanding I
- Saeedeh Momtazi, Friedrich Faubel, Dietrich Klakow:
Within and across sentence boundary language model.
1800-1803

- Ruhi Sarikaya, Stanley F. Chen, Abhinav Sethy, Bhuvana Ramabhadran:
Impact of word classing on shrinkage-based language models.
1804-1807

- Stanislas Oger, Vladimir Popescu, Georges Linarès:
Combination of probabilistic and possibilistic language models.
1808-1811

- Brandon Ballinger, Cyril Allauzen, Alexander Gruenstein, Johan Schalkwyk:
On-demand language model interpolation for mobile speech input.
1812-1815

- Tim Schlippe, Chenfei Zhu, Jan Gebhardt, Tanja Schultz:
Text normalization based on statistical machine translation and internet user support.
1816-1819

- Tanel Alumäe, Mikko Kurimo:
Efficient estimation of maximum entropy language models with n-gram features: an SRILM extension.
1820-1823

- Christian Gillot, Christophe Cerisara, David Langlois, Jean Paul Haton:
Similar n-gram language model.
1824-1827

- Markpong Jongtaveesataporn, Sadaoki Furui:
Topic and style-adapted language modeling for Thai broadcast news ASR.
1828-1831

- Ahmad Emami, Hong-Kwang Jeff Kuo, Imed Zitouni, Lidia Mangu:
Augmented context features for Arabic speech recognition.
1832-1835

- Lucía Ortega, Isabel Galiano, Lluís F. Hurtado, Emilio Sanchis, Encarna Segarra:
A statistical segment-based approach for spoken language understanding.
1836-1839

- Benjamin Lecouteux, Raphaël Rubino, Georges Linarès:
Improving back-off models with bag of words and hollow-grams.
2418-2421

- Ciprian Chelba, Thorsten Brants, Will Neveitt, Peng Xu:
Study on interaction between entropy pruning and kneser-ney smoothing.
2422-2425

- Hitoshi Yamamoto, Ken Hanazawa, Kiyokazu Miki, Koichi Shinoda:
Dynamic language model adaptation using keyword category classification.
2426-2429

- Welly Naptali, Masatoshi Tsuchiya, Seiichi Nakagawa:
Integration of cache-based model and topic dependent class model with soft clustering and soft voting.
2430-2433

- Frédéric Duvert, Renato de Mori:
Conditional models for detecting lambda-functions in a spoken language understanding system.
2434-2437

- Md. Akmal Haidar, Douglas D. O'Shaughnessy:
Novel weighting scheme for unsupervised language model adaptation using latent dirichlet allocation.
2438-2441

- Qun Feng Tan, Kartik Audhkhasi, Panayiotis G. Georgiou, Emil Ettelaie, Shrikanth S. Narayanan:
Automatic speech recognition system channel modeling.
2442-2445

- Takanobu Oba, Takaaki Hori, Atsushi Nakamura:
Round-robin discrimination model for reranking ASR hypotheses.
2446-2449

- Hasim Sak, Murat Saraclar, Tunga Güngör:
On-the-fly lattice rescoring for real-time automatic speech recognition.
2450-2453

First and Second Language Acquisition
- Angela Cooper, Yue Wang:
Cantonese tone word learning by tone and non-tone language speakers.
1840-1843

- Anne Cutler, Janise Shanley:
Validation of a training method for L2 continuous-speech segmentation.
1844-1847

- Jiahong Yuan:
Linguistic rhythm in foreign accent.
1848-1849

- Mee Sonu, Keiichi Tajima, Hiroaki Kato, Yoshinori Sagisaka:
The effect of a word embedded in a sentence and speaking rate variation on the perceptual training of geminate and singleton consonant distinction.
1850-1853

- Chiharu Tsurutani:
Foreign accent matters most when timing is wrong.
1854-1857

- Hyejin Hong, Jina Kim, Minhwa Chung:
Effects of Korean learners' consonant cluster reduction strategies on English speech recognition performance.
1858-1861

- June S. Levitt, William F. Katz:
The effects of EMA-based augmented visual feedback on the English speakers' acquisition of the Japanese flap: a perceptual study.
1862-1865

- Hinako Masuda, Takayuki Arai:
Perception of voiceless fricatives by Japanese listeners of advanced and intermediate level English proficiency.
1866-1869

- Lya Meister, Einar Meister:
Perception of estonian vowel categories by native and non-native speakers.
1870-1873

- Qin Shi, Kun Li, Shilei Zhang, Stephen M. Chu, Ji Xiao, ZhiJian Ou:
Spoken English assessment system for non-native speakers using acoustic and prosodic features.
1874-1877

- Elena E. Lyakso, Olga V. Frolova, Anna V. Kurazhova, Julia S. Gaikova:
Russian infants and children's sounds and speech corpuses for language acquisition studies.
1878-1881

- Julia Monnin, Hélène Loevenbruck:
Language-specific influence on phoneme development: French and drehu data.
1882-1885

- Jeffrey J. Holliday, Mary E. Beckman, Chanelle Mays:
Did you say susi or shushi? measuring the emergence of robust fricative contrasts in English- and Japanese-acquiring children.
1886-1889

Spoken Language Resources, Systems and Evaluation I, II
- Josef R. Novak, Paul R. Dixon, Sadaoki Furui:
An empirical comparison of the t3, juicer, HDecode and sphinx3 decoders.
1890-1893

- Philip N. Garner, John Dines:
Tracter: a lightweight dataflow framework.
1894-1897

- Marelie H. Davel, Febe de Wet:
Verifying pronunciation dictionaries using conflict analysis.
1898-1901

- Brandon Roy, Soroush Vosoughi, Deb Roy:
Automatic estimation of transcription accuracy and difficulty.
1902-1905

- Benjamin Lambert, Rita Singh, Bhiksha Raj:
Creating a linguistic plausibility dataset with non-expert annotators.
1906-1909

- Xinhui Hu, Ryosuke Isotani, Hisashi Kawai, Satoshi Nakamura:
Construction and evaluations of an annotated Chinese conversational corpus in travel domain for the language model of speech recognition.
1910-1913

- Thad Hughes, Kaisuke Nakajima, Linne Ha, Atul Vasu, Pedro J. Moreno, Mike LeBeau:
Building transcribed speech corpora quickly and cheaply for many languages.
1914-1917

- Heidi Christensen, Jon Barker, Ning Ma, Phil D. Green:
The CHiME corpus: a resource and a challenge for computational hearing in multisource environments.
1918-1921

- Wen Cao, Dongning Wang, Jinsong Zhang, Ziyu Xiong:
Developing a Chinese L2 speech database of Japanese learners with narrow-phonetic labels for computer assisted pronunciation training.
1922-1925

- Shogo Ishikawa, Shinya Kiriyama, Yoichi Takebayashi, Shigeyoshi Kitazawa:
How children acquire situation understanding skills?: a developmental analysis utilizing multimodal speech behavior corpus.
1926-1929

- Ina Wechsung, Stefan Schaffer, Robert Schleicher, Anja Naumann, Sebastian Möller:
The influence of expertise and efficiency on modality selection strategies and perceived mental effort.
1930-1933

- Christine Kühnel, Benjamin Weiss, Sebastian Möller:
Parameters describing multimodal interaction - definitions and three usage scenarios.
1934-1937

- Alexander Zgorzelski, Alexander Schmitt, Tobias Heinroth, Wolfgang Minker:
Repair strategies on trial: which error recovery do users like best?.
1938-1941

- Maryam Kamvar, Doug Beeferman:
Say what? why users choose to speak their web queries.
1966-1969

- Jonathan Teutenberg, Catherine I. Watson:
The effect of audience familiarity on the perception of modified accent.
1970-1973

- Korin Richmond, Robert A. J. Clark, Susan Fitt:
On generating combilex pronunciations via morphological analysis.
1974-1977

- Florian Gödde, Sebastian Möller:
Say it as you mean it - analyzing free user comments in the VOICE awards corpus.
1978-1981

- Viktor Rozgic, Bo Xiao, Athanasios Katsamanis, Brian R. Baucom, Panayiotis G. Georgiou, Shrikanth S. Narayanan:
A new multichannel multi modal dyadic interaction database.
1982-1985

- Dau-Cheng Lyu, Tien Ping Tan, Engsiong Chng, Haizhou Li:
SEAME: a Mandarin-English code-switching speech corpus in south-east asia.
1986-1989

Speech Production:
Analysis
- Daniel Felps, Christian Geng, Michael Berger, Korin Richmond, Ricardo Gutierrez-Osuna:
Relying on critical articulators to estimate vocal tract spectra in an articulatory-acoustic database.
1990-1993

- Vikram Ramanarayanan, Dani Byrd, Louis Goldstein, Shrikanth S. Narayanan:
Investigating articulatory setting - pauses, ready position, and rest - using real-time MRI.
1994-1997

- Chao Qin, Miguel Á. Carreira-Perpiñán:
Articulatory inversion of american English /turnr/ by conditional density modes.
1998-2001

- Atef Ben Youssef, Pierre Badin, Gérard Bailly:
Can tongue be recovered from face? the answer of data-driven statistical models.
2002-2005

- Francisco Torreira, Mirjam Ernestus:
Phrase-medial vowel devoicing in spontaneous French.
2006-2009

- Chierh Cheng, Yi Xu, Michele Gubian:
Exploring the mechanism of tonal contraction in taiwan Mandarin.
2010-2013

Paralanguage & Cognition
- Benjamin Weiss, Felix Burkhardt:
Voice attributes affecting likability perception.
2014-2017

- Kristiina Jokinen, Kazuaki Harada, Masafumi Nishida, Seiichi Yamamoto:
Turn-alignment using eye-gaze and speech in conversational interaction.
2018-2021

- Tet Fei Yap, Julien Epps, Eliathamby Ambikairajah, Eric H. C. Choi:
An investigation of formant frequencies for cognitive load classification.
2022-2025

- Martijn Goudbeek, Mirjam Broersma:
Language specific effects of emotion on phoneme duration.
2026-2029

- Matthew Black, Athanasios Katsamanis, Chi-Chun Lee, Adam C. Lammert, Brian R. Baucom, Andrew Christensen, Panayiotis G. Georgiou, Shrikanth S. Narayanan:
Automatic classification of married couples' behavior using audio features.
2030-2033

- Gideon Kowadlo, Patrick Ye, Ingrid Zukerman:
Influence of gestural salience on the interpretation of spoken requests.
2034-2037

Robust ASR Against Noise
- Vikramjit Mitra, Hosung Nam, Carol Y. Espy-Wilson, Elliot Saltzman, Louis Goldstein:
Robust word recognition using articulatory trajectories and gestures.
2038-2041

- Takeshi Yamada, Tomohiro Nakajima, Nobuhiko Kitawaki, Shoji Makino:
Performance estimation of noisy speech recognition considering recognition task complexity.
2042-2045

- Friedrich Faubel, Dietrich Klakow:
Estimating noise from noisy speech features with a monte carlo variant of the expectation maximization algorithm.
2046-2049

- Satoshi Tamura, Eriko Hishikawa, Wataru Taguchi, Satoru Hayamizu:
Template-based spectral estimation using microphone array for speech recognition.
2050-2053

- Aleem Mushtaq, Yu Tsao, Chin-Hui Lee:
A particle filter feature compensation approach to robust speech recognition.
2054-2057

- Chanwoo Kim, Richard M. Stern:
Nonlinear enhancement of onset for robust speech recognition.
2058-2061

- Shirin Badiezadegan, Richard C. Rose:
Mask estimation in non-stationary noise environments for missing feature based robust speech recognition.
2062-2065

- Lae-Hoon Kim, Kyung-Tae Kim, Mark Hasegawa-Johnson:
Robust automatic speech recognition with decoder oriented ideal binary mask estimation.
2066-2069

- Gökhan Ince, Kazuhiro Nakadai, Tobias Rodemann, Hiroshi Tsujino, Jun-ichi Imura:
A robust speech recognition system against the ego noise of a robot.
2070-2073

- Kuo-Hao Wu, Chia-Ping Chen:
Empirical mode decomposition for noise-robust automatic speech recognition.
2074-2077

- Wooil Kim, Jun-Won Suh, John H. L. Hansen:
An effective feature compensation scheme tightly matched with speech recognizer employing SVM-based GMM generation.
2078-2081

- Jort F. Gemmeke, Tuomas Virtanen:
Artificial and online acquired noise dictionaries for noise robust ASR.
2082-2085

- Akira Saito, Yoshihiko Nankaku, Akinobu Lee, Keiichi Tokuda:
Voice activity detection based on conditional random fields using multiple features.
2086-2089

- Yong Zhao, Biing-Hwang Juang:
A comparative study of noise estimation algorithms for VTS-based robust speech recognition.
2090-2093

- Frank Seide, Pei Zhao:
On using missing-feature theory with cepstral features - approximations to the multivariate integral.
2094-2097

- Yang Sun, Jort F. Gemmeke, Bert Cranen, Louis ten Bosch, Lou Boves:
Using a DBN to integrate sparse classification and GMM-based ASR.
2098-2101

Voice Conversion and Speech Synthesis
- Axel Röbel:
Shape-invariant speech transformation with the phase vocoder.
2146-2149

- Kayoko Yanagisawa, Mark Huckvale:
A phonetic alternative to cross-language voice conversion in a text-dependent context: evaluation of speaker identity.
2150-2153

- Esther Klabbers, Alexander Kain, Jan P. H. van Santen:
Evaluation of speaker mimic technology for personalizing SGD voices.
2154-2157

- Kumi Ohta, Tomoki Toda, Yamato Ohtani, Hiroshi Saruwatari, Kiyohiro Shikano:
Adaptive voice-quality control based on one-to-many eigenvoice conversion.
2158-2161

- Fernando Villavicencio, Jordi Bonada:
Applying voice conversion to concatenative singing-voice synthesis.
2162-2165

- Miaomiao Wang, Miaomiao Wen, Keikichi Hirose, Nobuaki Minematsu:
Improved generation of fundamental frequency in HMM-based speech synthesis using generation process model.
2166-2169

- Ming Lei, Yi-Jian Wu, Frank K. Soong, Zhen-Hua Ling, Li-Rong Dai:
A hierarchical F0 modeling method for HMM-based speech synthesis.
2170-2173

- Javier Latorre, Mark J. F. Gales, Heiga Zen:
Training a parametric-based logF0 model with the minimum generation error criterion.
2174-2177

- Miaomiao Wen, Miaomiao Wang, Keikichi Hirose, Nobuaki Minematsu:
Improving Mandarin segmental duration prediction with automatically extracted syntax features.
2178-2181

- Daniel R. van Niekerk, Etienne Barnard:
An intonation model for TTS in sepedi.
2182-2185

- Michael Pucher, Dietmar Schabus, Junichi Yamagishi:
Synthesis of fast speech with interpolation of adapted HSMMs and its evaluation by blind and sighted listeners.
2186-2189

- Gabriel Webster, Sacha Krstulovic, Kate Knill:
A comparison of pronunciation modeling approaches for HMM-TTS.
2190-2193

- Zhen-Hua Ling, Korin Richmond, Junichi Yamagishi:
HMM-based text-to-articulatory-movement prediction and analysis of critical articulators.
2194-2197

Detection, Classification, and Segmentation
- Jiaxing Ye, Takumi Kobayashi, Tetsuya Higuchi:
Audio-based sports highlight detection by fourier local auto-correlations.
2198-2201

- Hynek Boril, Abhijeet Sangwan, Taufiq Hasan, John H. L. Hansen:
Automatic excitement-level detection for sports highlights generation.
2202-2205

- Jörg-Hendrik Bach, Jörn Anemüller:
Detecting novel objects in acoustic scenes through classifier incongruence.
2206-2209

- Stavros Ntalampiras, Ilyas Potamitis, Nikos Fakotakis:
A multidomain approach for automatic home environmental sound classification.
2210-2213

- Patrick Cardinal, Vishwa Gupta, Gilles Boulianne:
Content-based advertisement detection.
2214-2217

- Stavros Ntalampiras, Ilyas Potamitis, Nikos Fakotakis:
Identification of abnormal audio events based on probabilistic novelty detection.
2218-2221

- Norbert Braunschweiler, Mark J. F. Gales, Sabine Buchholz:
Lightly supervised recognition for automatic alignment of large coherent speech recordings.
2222-2225

- Oshry Ben-Harush, Itshak Lapidot, Hugo Guterman:
Incremental diarization of telephone conversations.
2226-2229

- Srikanth Cherla, V. Ramasubramanian:
Audio analytics by template modeling and 1-pass DP based decoding.
2230-2233

- Mariusz Ziólko, Jakub Galka, Bartosz Ziólko, Tomasz Drwiega:
Perceptual wavelet decomposition for speech segmentation.
2234-2237

- Venkatesh Keri, Kishore Prahallad:
A comparative study of constrained and unconstrained approaches for segmentation of speech signal.
2238-2241

- Morgan Sonderegger, Joseph Keshet:
Automatic discriminative measurement of voice onset time.
2242-2245

- Yi Ren Leng, Tran Huy Dat, Norihide Kitaoka, Haizhou Li:
Selective gammatone filterbank feature for robust sound event recognition.
2246-2249

Compressive Sensing for Speech and Language Processing (Special Session)
- Allen Y. Yang, Zihan Zhou, Yi Ma, Shankar Sastry:
Towards a robust face recognition system using compressive sensing.
2250-2253

- Tara N. Sainath, Bhuvana Ramabhadran, David Nahamoo, Dimitri Kanevsky, Abhinav Sethy:
Sparse representation features for speech recognition.
2254-2257

- Abhinav Sethy, Tara N. Sainath, Bhuvana Ramabhadran, Dimitri Kanevsky:
Data selection for language modeling using sparse representations.
2258-2261

- Jort F. Gemmeke, Ulpu Remes, Kalle J. Palomäki:
Observation uncertainty measures for sparse imputation.
2262-2265

- Tara N. Sainath, Sameer Maskey, Dimitri Kanevsky, Bhuvana Ramabhadran, David Nahamoo, Julia Hirschberg:
Sparse representations for text categorization.
2266-2269

- Garimella S. V. S. Sivaram, Sriram Ganapathy, Hynek Hermansky:
Sparse auto-associative neural networks: theory and application to speech recognition.
2270-2273

ASR:
Lexical and Pronunciation Modeling
- Chi Hu, Xiaodan Zhuang, Mark Hasegawa-Johnson:
FSM-based pronunciation modeling using articulatory phonological code.
2274-2277

- Denis Jouvet, Dominique Fohr, Irina Illina:
Detailed pronunciation variant modeling for speech transcription.
2278-2281

- Line Adde, Bert Réveil, Jean-Pierre Martens, Torbjørn Svendsen:
A minimum classification error approach to pronunciation variation modeling of non-native proper names.
2282-2285

- Antoine Laurent, Sylvain Meignier, Téva Merlin, Paul Deléglise:
Acoustics-based phonetic transcription method for proper nouns.
2286-2289

- Tim Schlippe, Sebastian Ochs, Tanja Schultz:
Wiktionary as a source for automatic pronunciation extraction.
2290-2293

- Ibrahim Badr, Ian McGraw, James R. Glass:
Learning new word pronunciations from spoken examples.
2294-2297

Speaker Recognition and Diarization
- I-Fan Chen, Shih-Sian Cheng, Hsin-Min Wang:
Phonetic subspace mixture model for speaker diarization.
2298-2301

- Martin Zelenák, Carlos Segura, Javier Hernando:
Overlap detection for speaker diarization by fusing spectral and spatial features.
2302-2305

- Alfred Dielmann, Giulia Garau, Hervé Bourlard:
Floor holder detection and end of speaker turn prediction in meetings.
2306-2309

- Carlos Vaquero, Alfonso Ortega, Jesús A. Villalba, Antonio Miguel, Eduardo Lleida:
Confidence measures for speaker segmentation and their relation to speaker verification.
2310-2313

- Anthony Larcher, Christophe Lévy, Driss Matrouf, Jean-François Bonastre:
Decoupling session variability modelling and speaker characterisation.
2314-2317

- Cheung-Chi Leung, Donglai Zhu, Kong-Aik Lee, Bin Ma, Haizhou Li:
Incorporating MAP estimation and covariance transform for SVM based speaker recognition.
2318-2321

Speech and Audio Classification
- Stéphane Rossignol, Olivier Pietquin:
Single-speaker/multi-speaker co-channel speech classification.
2322-2325

- Oriol Vinyals, Gerald Friedland, Nelson Morgan:
Discriminative training for hierarchical clustering in speaker diarization.
2326-2329

- Jürgen T. Geiger, Frank Wallhoff, Gerhard Rigoll:
GMM-UBM based open-set online speaker diarization.
2330-2333

- Ladan Golipour, Douglas D. O'Shaughnessy:
A segment-based non-parametric approach for monophone recognition.
2334-2337

- Taras Butko, Climent Nadeu:
A fast one-pass-training feature selection technique for GMM-based acoustic event detection with audio-visual data.
2338-2341

- Nobuhide Yamakawa, Tetsuro Kitahara, Toru Takahashi, Kazunori Komatani, Tetsuya Ogata, Hiroshi G. Okuno:
Effects of modelling within- and between-frame temporal variations in power spectra on non-verbal sound recognition.
2342-2345

Emotion Recognition
- Ling He, Margaret Lech, Nicholas Allen:
On the importance of glottal flow spectral energy for the recognition of emotions in speech.
2346-2349

- Laurence Devillers, Christophe Vaudable, Clément Chastagnol:
Real-life emotion-related states detection in call centers: a cross-corpora study.
2350-2353

- Ali Hassan, Robert I. Damper:
Multi-class and hierarchical SVMs for emotion recognition.
2354-2357

- David Philippou-Hübner, Bogdan Vlasenko, Tobias Grosser, Andreas Wendemuth:
Determining optimal features for emotion recognition from speech by applying an evolutionary algorithm.
2358-2361

- Martin Wöllmer, Angeliki Metallinou, Florian Eyben, Björn Schuller, Shrikanth S. Narayanan:
Context-sensitive multimodal emotion recognition from speech and facial expression using bidirectional LSTM modeling.
2362-2365

- Kartik Audhkhasi, Shrikanth S. Narayanan:
Data-dependent evaluator modeling and its application to emotional valence classification from speech.
2366-2369

Speech Coding, Modeling, and Transmission
- Zhanyu Ma, Arne Leijon:
Modelling speech line spectral frequencies with dirichlet mixture models.
2370-2373

- Zhanyu Ma, Arne Leijon:
PDF-optimized LSF vector quantization based on beta mixture models.
2374-2377

- José Enrique García Laínez, Alfonso Ortega, Antonio Miguel, Eduardo Lleida:
Non-linear predictive vector quantization of feature vectors for distributed speech recognition.
2378-2381

- Lasse Laaksonen, Mikko Tammi, Vladimir Malenovsky, Tommy Vaillancourt, Mi Suk Lee, Tomofumi Yamanashi, Masahiro Oshikiri, Claude Lamblin, Balázs Kövesi, Lei Miao, Deming Zhang, Jon Gibbs, Holly Francois:
Superwideband extension of g.718 and g.729.1 speech codecs.
2382-2385

- José L. Carmona, Angel M. Gomez, Antonio M. Peinado, José L. Pérez-Córdoba, José A. González:
A multipulse FEC scheme based on amplitude estimation for CELP codecs over packet networks.
2386-2389

- Anssi Rämö, Henri Toukomaa:
Voice quality evaluation of recent open source codecs.
2390-2393

- Bengt J. Borgström, Per Henrik Borgström, Abeer Alwan:
Efficient HMM-based estimation of missing features, with applications to packet loss concealment.
2394-2397

- Xiaoqiang Xiao, Robert M. Nickel:
Speech inventory based discriminative training for joint speech enhancement and low-rate speech coding.
2398-2401

- Qipeng Gong, Peter Kabal:
Quality-based playout buffering with FEC for conversational voIP.
2402-2405

- Masatsune Tamura, Takehiko Kagoshima, Masami Akamine:
Sub-band basis spectrum model for pitch-synchronous log-spectrum and phase based on approximation of sparse coding.
2406-2409

- Sundar Harshavardhan, Chandra Sekhar Seelamantula, Thippur V. Sreenivas:
A multimodal density function estimation approach to formant tracking.
2410-2413

- Heikki Rasilo, Unto K. Laine, Okko Johannes Räsänen:
Estimation studies of vocal tract shape trajectory using a variable length and lossy kelly-lochbaum model.
2414-2417

Speech Perception:
Processing and Intelligibility
- Serajul Haque, Roberto Togneri:
A feature extraction method for automatic speech recognition based on the cochlear nucleus.
2454-2457

- Samuel Thomas, Kailash Patil, Sriram Ganapathy, Nima Mesgarani, Hynek Hermansky:
A phoneme recognition framework based on auditory spectro-temporal receptive fields.
2458-2461

- Amy V. Beeston, Guy J. Brown:
Perceptual compensation for effects of reverberation in speech identification: a computer model based on auditory efferent processing.
2462-2465

- Barbara Schuppler, Mirjam Ernestus, Wim A. van Dommelen, Jacques C. Koreman:
Predicting human perception and ASR classification of word-final [t] by its acoustic sub-segmental properties.
2466-2469

- Matthew Robertson, Guy J. Brown, Wendy Lecluyse, Manasa Panda, Christine M. Tan:
A speech-in-noise test based on spoken digits: comparison of normal and impaired listeners using a computer model.
2470-2473

- Takayuki Kagomiya, Seiji Nakagawa:
Evaluation of bone-conducted ultrasonic hearing-aid regarding transmission of paralinguistic information: a comparison with cochlear implant simulator.
2474-2477

- Tim Jürgens, Stefan Fredelake, Ralf M. Meyer, Birger Kollmeier, Thomas Brand:
Challenging the speech intelligibility index: macroscopic vs. microscopic prediction of sentence recognition in normal and hearing-impaired listeners.
2478-2481

- Verena N. Uslar, Thomas Brand, Mirko Hanke, Rebecca Carroll, Esther Ruigendijk, Cornelia Hamann, Birger Kollmeier:
Does sentence complexity interfere with intelligibility in noise? evaluation of the oldenburg linguistically and audiologically controlled sentence test (OLACS).
2482-2485

- Juan-Pablo Ramirez, Hamed Ketabdar, Alexander Raake:
Intelligibility predictions for speech against fluctuating masker.
2486-2489

- Masashi Ito, Keiji Ohara, Akinori Ito, Masafumi Yano:
An effect of formant amplitude in vowel perception.
2490-2493

- Christopher I. Petkov, Benjamin Wilson:
Functional imaging of brain regions sensitive to communication sounds in primates.
2494-2497

Spoken Language Understanding and Spoken Language Translation I, II
- Ye-Yi Wang:
Strategies for statistical spoken language understanding with small amount of data - an empirical study.
2498-2501

- Bassam Jabaian, Laurent Besacier, Fabrice Lefèvre:
Investigating multiple approaches for SLU portability to a new language.
2502-2505

- Anja Austermann, Seiji Yamada, Kotaro Funakoshi, Mikio Nakano:
Learning naturally spoken commands for a robot.
2506-2509

- Amparo Albalate, Aparna Suchindranath, David Suendermann, Wolfgang Minker:
A semi-supervised cluster-and-label approach for utterance classification.
2510-2513

- Silvia Quarteroni, Giuseppe Riccardi:
Classifying dialog acts in human-human and human-machine spoken conversations.
2514-2517

- Fei Liu, Yang Liu:
Exploring speaker characteristics for meeting summarization.
2518-2521

- Shasha Xie, Hui Lin, Yang Liu:
Semi-supervised extractive speech summarization via co-training algorithm.
2522-2525

- Asli Çelikyilmaz, Dilek Hakkani-Tür:
Extractive summarization using a latent variable model.
2526-2529

- Emil Ettelaie, Panayiotis G. Georgiou, Shrikanth S. Narayanan:
Hierarchical classification for speech-to-speech translation.
2530-2533

- Matthias Paulik, Alex Waibel:
Rapid development of speech translation using consecutive interpretation.
2534-2537

- Sameer Maskey, Steven J. Rennie, Bowen Zhou:
Combining many alignments for speech to speech translation.
2538-2541

- Pierre Gotab, Géraldine Damnati, Frédéric Béchet, Lionel Delphin-Poulat:
Online SLU model adaptation with a partial oracle.
2862-2865

- Om Deshmukh, Harish Doddala, Ashish Verma, Karthik Visweswariah:
Role of language models in spoken fluency evaluation.
2866-2869

- Sibel Yaman, Dilek Hakkani-Tür, Gökhan Tür:
Social role discovery from spoken language using dynamic Bayesian networks.
2870-2873

- Michelle Hewlett Sanchez, Gökhan Tür, Luciana Ferrer, Dilek Hakkani-Tür:
Domain adaptation and compensation for emotion detection.
2874-2877

- Sankaranarayanan Ananthakrishnan, Rohit Prasad, Prem Natarajan:
Phrase alignment confidence for statistical machine translation.
2878-2881

- Ian R. Lane, Alex Waibel:
Named-entity projection and data-driven morphological decomposition for field maintainable speech-to-speech translation systems.
2882-2885

Social Signals in Speech (Special Session)
- Paul M. Brunet, Marcela Charfuelan, Roderick Cowie, Marc Schröder, Hastings Donnan, Ellen Douglas-Cowie:
Detecting Politeness and efficiency in a cooperative social interaction.
2542-2545

- Nick Campbell, Stefan Scherer:
Comparing measures of synchrony and alignment in dialogue speech timing with respect to turn-taking activity.
2546-2549

- Emina Kurtic, Guy J. Brown, Bill Wells:
Resources for turn competition in overlap in multi-party conversations: speech rate, pausing and duration.
2550-2553

- Khiet P. Truong, Dirk Heylen:
Disambiguating the functions of conversational sounds with prosody: the case of 'yeah'.
2554-2557

- Marcela Charfuelan, Marc Schröder, Ingmar Steiner:
Prosody and voice quality of vocal social signals: the case of dominance in scenario meetings.
2558-2561

- Daniel Neiberg, Joakim Gustafson:
The prosody of Swedish conversational grunts.
2562-2565

Physiology and Pathology of Spoken Language
- Christophe Mertens, Francis Grenez, Lise Crevier-Buchman, Jean Schoentgen:
Reliable tracking based on speech sample salience of vocal cycle length perturbations.
2566-2569

- Hideki Kasuya, Hajime Yoshida, Satoshi Ebihara, Hiroki Mori:
Longitudinal changes of selected voice source parameters.
2570-2573

- Ali Alpan, Jean Schoentgen, Youri Maryn, Francis Grenez:
Automatic perceptual categorization of disordered connected speech.
2574-2577

- Heejin Kim, Panying Rong, Torrey M. Loucks, Mark Hasegawa-Johnson:
Kinematic analysis of tongue movement control in spastic dysarthria.
2578-2581

- Irene Jacobi, Lisette van der Molen, Maya van Rossum, Frans J. M. Hilgers:
Pre- and short-term posttreatment vocal functioning in patients with advanced head and neck cancer treated with concomitant chemoradiotherapy.
2582-2585

- Joan K. Y. Ma, Rüdiger Hoffmann:
Acoustic analysis of intonation in parkinson's disease.
2586-2589

Speaker Diarization
- Carlos Vaquero, Oriol Vinyals, Gerald Friedland:
A hybrid approach to online speaker diarization.
2638-2641

- Simon Bozonnet, Nicholas W. D. Evans, Xavier Anguera, Oriol Vinyals, Gerald Friedland, Corinne Fredouille:
System output combination for improved speaker diarization.
2642-2645

- Simon Bozonnet, Nicholas W. D. Evans, Corinne Fredouille, Dong Wang, Raphaël Troncy:
An integrated top-down/bottom-up approach to speaker diarization.
2646-2649

- Deepu Vijayasenan, Fabio Valente, Hervé Bourlard:
Advances in fast multistream diarization based on the information bottleneck framework.
2650-2653

- Giulia Garau, Alfred Dielmann, Hervé Bourlard:
Audio-visual synchronisation for speaker diarisation.
2654-2657

- Kyu Jeong Han, Shrikanth S. Narayanan:
An improved cluster model selection method for agglomerative hierarchical speaker clustering using incremental Gaussian mixture models.
2658-2661

- Nigel G. Ward, Olac Fuentes, Alejandro Vega:
Dialog prediction for a general model of turn-taking.
2662-2665

- Tobias Herbig, Franz Gerl, Wolfgang Minker:
Speaker tracking in an unsupervised speech controlled system.
2666-2669

- Paula Lopez-Otero, Laura Docío Fernández, Carmen García-Mateo:
MultiBIC: an improved speaker segmentation technique for TV shows.
2670-2673

Multi-Modal ASR, Including Audio-Visual ASR
- John-Paul Hosom, Tom Jakobs, Allen Baker, Susan Fager:
Automatic speech recognition for assistive writing in speech supplemented word prediction.
2674-2677

- Alexey Karpov, Andrey Ronzhin, Konstantin Markov, Milos Zelezný:
Viseme-dependent weight optimization for CHMM-based audio-visual speech recognition.
2678-2681

- Louis H. Terry, Karen Livescu, Janet B. Pierrehumbert, Aggelos K. Katsaggelos:
Audio-visual anticipatory coarticulation modeling by human and machine.
2682-2685

- Matthias Janke, Michael Wand, Tanja Schultz:
Impact of lack of acoustic feedback in EMG-based silent speech recognition.
2686-2689

- Chong-Jia Ni, Wenju Liu, Bo Xu:
Using prosody to improve Mandarin automatic speech recognition.
2690-2693

- Satoshi Tamura, Masato Ishikawa, Takashi Hashiba, Shin'ichi Takeuchi, Satoru Hayamizu:
A robust audio-visual speech recognition using audio-visual voice activity detection.
2694-2697

- Dorothea Kolossa, Jike Chong, Steffen Zeiler, Kurt Keutzer:
Efficient manycore CHMM speech recognition for audiovisual and multistream data.
2698-2701

- Takami Yoshida, Kazuhiro Nakadai:
Two-layered audio-visual integration in voice activity detection and automatic speech recognition for robots.
2702-2705

- Panikos Heracleous, Norihiro Hagita:
Non-audible murmur recognition based on fusion of audio and visual streams.
2706-2709

Speaker and Language Recognition