A Study Of Lip Movements During Spontaneous Dialog And Its Application To Voice Activity Detection
This paper presents a quantitative and comprehensive study of the lip movements of a given speaker in different speech / non speech contexts, with a particular focus on silences (i.e., when no sound is produced by the speaker). The aim is to characterize the relationship between “lip activity” and “speech activity”, and then to use visual speech information as a Voice Activity Detector (VAD). To this aim, an original audio-visual corpus was recorded with two speakers involved in a face-to-face spontaneous dialog, although being in separate rooms. Each speaker communicated with the other using a microphone, a camera, a screen, and headphones. This system was used to capture separate audio stimuli for each speaker and to monitor each speaker’s lip movements in synchrony with the recorded sound. A comprehensive analysis was carried out on the lip shapes and lip movements corresponding to either silence sections or non-silence sections (i.e. speech + non-speech audible events). A single visual parameter, defined to characterize the lip movements, was