Date of Award
2014
Document Type
Thesis
Degree Name
Master of Science (MS)
Department
Computer Science
Committee Chair
Ramazan S. Aygun
Committee Member
Daniel M. Rochowiak
Committee Member
Heggere S. Ranganath
Committee Member
Closed captions, Face clustering, Face recognition, Speaker diarization, Speaker identification, Television
Subject(s)
Human face recognition (Computer science), Image processing--Digital techniques, Biometric identification, Television broadcasting of news--Technological innovations, Closed captioning
Abstract
Cable, satellite, and broadcast television (TV) networks produce a tremendous amount of information every day. Identifying the speaker throughout a video at specific times would be useful. Previous research has identified speakers on pre-trained faces for TV shows and movies. News videos are challenging because new faces often appear. By using an unsupervised clustering algorithm, this paper proposes to label speakers using just the available information in the news video without external information. Our proposed framework segments the audio by speaker, parses closed captions to identify possible names of speakers, identifies talking persons, performs optical character recognition on text that appears while a person speaks, and checks if a name appears on screen during a speaker's audio segments. Our framework utilizes face detection, face recognition, face clustering, face landmarking, natural language processing tools, parsing rules, and speaker diarization. Our results indicate 63.6% accuracy for identifying speakers for CNN news.
Recommended Citation
Woo, Daniel N., "Unsupervised speaker identification for TV news" (2014). Theses. 70.
https://louis.uah.edu/uah-theses/70