Author

Daniel N. Woo

Date of Award

2014

Document Type

Thesis

Degree Name

Master of Science (MS)

Department

Computer Science

Committee Chair

Ramazan S. Aygun

Committee Member

Daniel M. Rochowiak

Committee Member

Heggere S. Ranganath

Committee Member

Closed captions, Face clustering, Face recognition, Speaker diarization, Speaker identification, Television

Subject(s)

Human face recognition (Computer science), Image processing--Digital techniques., Biometric identification., Television broadcasting of news--Technological innovations., Closed captioning.

Abstract

Cable, satellite, and broadcast television (TV) networks produce a tremendous amount of information every day. Identifying the speaker throughout a video at specific times would be useful. Previous research has identified speakers on pre-trained faces for TV shows and movies. News videos are challenging because new faces often appear. By using an unsupervised clustering algorithm, this paper proposes to label speakers using just the available information in the news video without external information. Our proposed framework segments the audio by speaker, parses closed captions to identify possible names of speakers, identifies talking persons, performs optical character recognition on text that appears while a person speaks, and checks if a name appears on screen during a speaker's audio segments. Our framework utilizes face detection, face recognition, face clustering, face landmarking, natural language processing tools, parsing rules, and speaker diarization. Our results indicate 63.6% accuracy for identifying speakers for CNN news.

Share

COinS
 
 

To view the content in your browser, please download Adobe Reader or, alternately,
you may Download the file to your hard drive.

NOTE: The latest versions of Adobe Reader do not support viewing PDF files within Firefox on Mac OS and if you are using a modern (Intel) Mac, there is no official plugin for viewing PDF files within the browser window.