Date of Award
Master of Science (MS)
Ramazan S. Aygun
Daniel M. Rochowiak
Heggere S. Ranganath
Closed captions, Face clustering, Face recognition, Speaker diarization, Speaker identification, Television
Human face recognition (Computer science), Image processing--Digital techniques., Biometric identification., Television broadcasting of news--Technological innovations., Closed captioning.
Cable, satellite, and broadcast television (TV) networks produce a tremendous amount of information every day. Identifying the speaker throughout a video at specific times would be useful. Previous research has identified speakers on pre-trained faces for TV shows and movies. News videos are challenging because new faces often appear. By using an unsupervised clustering algorithm, this paper proposes to label speakers using just the available information in the news video without external information. Our proposed framework segments the audio by speaker, parses closed captions to identify possible names of speakers, identifies talking persons, performs optical character recognition on text that appears while a person speaks, and checks if a name appears on screen during a speaker's audio segments. Our framework utilizes face detection, face recognition, face clustering, face landmarking, natural language processing tools, parsing rules, and speaker diarization. Our results indicate 63.6% accuracy for identifying speakers for CNN news.
Woo, Daniel N., "Unsupervised speaker identification for TV news" (2014). Theses. 70.