Date of Award


Document Type


Degree Name

Master of Science (MS)


Computer Science

Committee Chair

Ramazan S. Aygun

Committee Member

Daniel Rochowiak

Committee Member

Marc Pusey


Proteins--Analysis., Crystallization., Data integration (Computer science)


Many companies have developed commercial screen kits of different combinations of chemicals for protein crystallization trials. Typically, scientists may use screen kits from various companies for crystallizing a single protein. The data representation as well as naming conventions used by these different companies make the automated analysis of crystallization experiments difficult and time-consuming. Matching headers among the input and output screens need to be identified and then the data has to be copied under corresponding headers in the output file. In order to reduce the human effort required to deal with this problem, we present an algorithm based on linguistic schema matching and data integration to automatically find the matching elements between the two schemas of screen kits using three syntactic similarity measures and then transform the input screen file to the required output screen format. This approach is tested on several commercial screens from different companies and evaluated using two metrics. The results of the experiments showed an overall accuracy of 97\% and an F-measure value of 0.99 which were significantly better than the two other matchers we compared with. The protein screen kits also have inconsistent naming of chemicals as there is no standard format for the names used in the screens which makes the analysis task difficult. In this thesis, our proposed method also produces an output file with consistent names for the chemicals.



To view the content in your browser, please download Adobe Reader or, alternately,
you may Download the file to your hard drive.

NOTE: The latest versions of Adobe Reader do not support viewing PDF files within Firefox on Mac OS and if you are using a modern (Intel) Mac, there is no official plugin for viewing PDF files within the browser window.