Date of Award
Doctor of Philosophy (PhD)
Ramazan S. Aygun
Marc L. Pusey
Experimental design., Data mining., Proteins--Analysis.
Data analytics focuses on processing raw data to extract important information and involves data acquisition, data analysis, and visualization. To maximize the amount of information obtained, an efficient experimental design method is necessary. Many existing experimental design methods do not consider the previous experimental results when constructing new trial conditions. It is quite difficult to employ supervision in this domain due to limited data size and skewed data distribution. Once the data is collected, proper analysis methods should be developed for making useful decisions regarding the type of data collected (e.g., numerical, time, visual, image analysis). Visualization that encompasses both spatial and temporal properties of data would guide researchers to make effective decisions. In this dissertation, we propose a data analytics framework called "Associative Data Analytics (ADA).'' The ADA is a comprehensive framework consisting of five main stages: 1) designing experiments using a novel method called Associative Experimental Design (AED), 2) ranking the experiment samples, and evaluating the ranking using our novel approach with a Bin-Recall metric, 3) data collection, 4) analyzing the object regions using our novel approach super-thresholding, and 5) the visualization of the experimental results using our Visual-X2 tool. Every stage introduces a new solution to a sub-problem that is addressed in the experimental design and analysis domain. We evaluated the performance of the ADA on the protein crystallization problem. Our results show that the ADA yielded successful results and made a significant impact on this domain. Under the ADA framework, the AED generated novel conditions for difficult crystallizers, which did not have any results showing needles (or better) in the commercial screens. We analyzed the crystal regions in the protein images using super-thresholding. Super-thresholding improved accuracy around 10\% compared to the other best single thresholding method. Final results are displayed using the Visual-X2. The numerical results and discussions of each ADA stage are provided in detail.
Dinc, Imren, "Associative data analytics and its application to protein crystallization analysis" (2016). Dissertations. 96.