Author

Imren Dinc

Date of Award

2016

Document Type

Dissertation

Degree Name

Doctor of Philosophy (PhD)

Department

Computer Science

Committee Chair

Ramazan S. Aygun

Committee Member

Marc L. Pusey

Committee Member

Heggere Ranganath

Committee Member

Letha Etzkorn

Committee Member

Huaming Zhang

Subject(s)

Experimental design, Data mining, Proteins--Analysis

Abstract

Data analytics focuses on processing raw data to extract important information and involves data acquisition, data analysis, and visualization. To maximize the amount of information obtained, an efficient experimental design method is necessary. Many existing experimental design methods do not consider the previous experimental results when constructing new trial conditions. It is quite difficult to employ supervision in this domain due to limited data size and skewed data distribution. Once the data is collected, proper analysis methods should be developed for making useful decisions regarding the type of data collected (e.g., numerical, time, visual, image analysis). Visualization that encompasses both spatial and temporal properties of data would guide researchers to make effective decisions. In this dissertation, we propose a data analytics framework called "Associative Data Analytics (ADA).'' The ADA is a comprehensive framework consisting of five main stages: 1) designing experiments using a novel method called Associative Experimental Design (AED), 2) ranking the experiment samples, and evaluating the ranking using our novel approach with a Bin-Recall metric, 3) data collection, 4) analyzing the object regions using our novel approach super-thresholding, and 5) the visualization of the experimental results using our Visual-X2 tool. Every stage introduces a new solution to a sub-problem that is addressed in the experimental design and analysis domain. We evaluated the performance of the ADA on the protein crystallization problem. Our results show that the ADA yielded successful results and made a significant impact on this domain. Under the ADA framework, the AED generated novel conditions for difficult crystallizers, which did not have any results showing needles (or better) in the commercial screens. We analyzed the crystal regions in the protein images using super-thresholding. Super-thresholding improved accuracy around 10\% compared to the other best single thresholding method. Final results are displayed using the Visual-X2. The numerical results and discussions of each ADA stage are provided in detail.

Share

COinS
 
 

To view the content in your browser, please download Adobe Reader or, alternately,
you may Download the file to your hard drive.

NOTE: The latest versions of Adobe Reader do not support viewing PDF files within Firefox on Mac OS and if you are using a modern (Intel) Mac, there is no official plugin for viewing PDF files within the browser window.