Presentation by Dr. Emmanuel Dupoux
How do infants bootstrap into spoken language? models and challenges
Abstract: Human infants learn spontaneously and effortlessly the language(s) spoken in their environments, despite the extraordinary complexity of the task. Here, I will present an overview of the early phases of language acquisition and focus on one area where a modeling approach is currently being conducted using tools of signal processing and automatic speech recognition: the unsupervised acquisition of phonetic categories. During their first year of life, infants construct a detailed representation of the phonemes of their native language and lose the ability to distinguish nonnative phonemic contrasts. Unsupervised statistical clustering is not sufficient; it does not converge on the inventory of phonemes, but rather on contextual allophonic units or subunits. I present an information-theoretic algorithm that groups together allophonic variants based on three sources of information that can be acquired independently: the statistical distribution of their contexts, the phonetic plausibility of the grouping, and the existence of lexical minimal pairs. This algorithm is tested on several natural speech corpora. We find that these three sources of information are probably not language specific. What is presumably unique to language is the way in which they are combined to optimize the emergence of linguistic categories.
Bio: Dr. Emmanuel Dupoux is the director of the Laboratoire de Sciences Cognitives et Psycholinguistique in Paris. He conducts research on the early phases of language and social acquisition in human infants, using a mix of behavioral and brain imaging techniques as well as computational modeling. He teaches at the École des Hautes Études en Sciences sociales where he has set up an interdisciplinary graduate program in Cognitive Science.