Cosmin Munteanu (short bio)
University of Toronto Mississauga, Canada
Gerald Penn (short bio)
University of Toronto, Canada
HCI research has long been dedicated to better and more naturally facilitating information transfer between humans and machines. Unfortunately, our most natural form of communication, speech, is also one of the most difficult modalities to be understood by machines. Despite significant recent advances towards understanding speech, HCI has been relatively timid in embracing this modality as a central research focus - partly due to the relatively discouraging accuracy of speech understanding in some genres (exaggerated claims from the industry notwithstanding), but also due to the intrinsic difficulty of designing and evaluating speech and natural language interfaces. On the engineering side, improving speech technology with respect to largely arbitrary measures of performance has led to systems that deviate from user-centered design principles, and that fail to consider usability or usefulness.
The goal of this course is to inform the HCI community of the current state of speech and natural language research, to dispel some of the myths surrounding speech-based interaction, as well as to provide an opportunity for researchers and practitioners to learn more about how speech recognition and speech synthesis work, their limitations, and how they could be used to enhance current interaction paradigms.
Our approach is two-fold: present new concepts to the audience, and foster discussions and exchange of ideas. Slides are used to introduce the main points, while videos and audio clips are played to illustrate examples. After each main concept is presented, time is allocated for interaction with the audience.
Variations of this turorial have been presented at: HCII 2016; MobileHCI 2010-2015; CHI 2011-2017, and I/ITSEC 2010-2016.
The course will be beneficial to all HCI researchers or practitioners without a strong expertise in ASR or TTS, who still believe in fulfilling HCI's goal of developing methods and systems that allow humans to naturally interact with the ever-increasingly ubiquitous mobile technology, but are disappointed with the lack of success in using speech and natural language to achieve this goal.
No prior technical experience is required for the participants. The classroom activities will be conducted using the participants' smartphones (Android or iPhone), but the built-in phone functions will be used - no software download will be required. Participants will work in small groups, ensuring that even participants without smartphone are able to fully contribute.