Daniel N. Woo

Date of Award


Document Type


Degree Name

Master of Science (MS)


Computer Science

Committee Chair

Ramazan S. Aygun

Committee Member

Daniel M. Rochowiak

Committee Member

Heggere S. Ranganath

Committee Member

Closed captions, Face clustering, Face recognition, Speaker diarization, Speaker identification, Television


Human face recognition (Computer science), Image processing--Digital techniques, Biometric identification, Television broadcasting of news--Technological innovations, Closed captioning


Cable, satellite, and broadcast television (TV) networks produce a tremendous amount of information every day. Identifying the speaker throughout a video at specific times would be useful. Previous research has identified speakers on pre-trained faces for TV shows and movies. News videos are challenging because new faces often appear. By using an unsupervised clustering algorithm, this paper proposes to label speakers using just the available information in the news video without external information. Our proposed framework segments the audio by speaker, parses closed captions to identify possible names of speakers, identifies talking persons, performs optical character recognition on text that appears while a person speaks, and checks if a name appears on screen during a speaker's audio segments. Our framework utilizes face detection, face recognition, face clustering, face landmarking, natural language processing tools, parsing rules, and speaker diarization. Our results indicate 63.6% accuracy for identifying speakers for CNN news.



