Evaluating Feature Correlation in Large Alzheimer's Disease Multi-Domain Datasets
There has been an increasing use of Data Science in the research of Alzheimer’s disease (AD). With that comes the importance of knowing which features are most useful to include in relevant data sets, specifically which features are uniquely informative and which ones can be replaced by another to avoid redundancy. This is a complicated task considering the vast heterogeneity of AD and its impact in so many areas such as gene expression and brain morphology. This is why, in the literature, we’ve seen an increase in the use of a vast diversity of biomarkers and various assays to collect data helpful for studying AD. In this work, we analyze the significance of MRI, microarray, and other phenotypic features to provide AD researchers guidance in data cleaning. We do this by discovering which features across multiple datasets are redundant due to being highly correlated in both diseased cases as well as healthy controls.