Feature selection method reducing correlations among features by embedding domain knowledge

No Thumbnail Available
Date
2022-10-01
Journal Title
Journal ISSN
Volume Title
Publisher
Elsevier
Abstract
Selecting proper descriptors, also known as features, is one of the key problems in modeling for materials properties using machine learning models. Redundant features reduce accuracy of machine learning modeling, and results of purely data-driven feature selection methods are often inconsistent with materials domain knowledge. Herein, a feature selection method embedded with materials domain knowledge named NCOR-FS is proposed to select higher quality features. The method translates materials domain knowledge about highly correlated features into Non-Co-Occurrence Rules (NCORs), which allows to quantify the degree to which NCORs are violated by feature subsets and to design optimization process for FS method based on swarm intelligence algorithm. Experiments on seven datasets show that compared with multiple other FS methods commonly used in materials, NCOR-FS selects the feature subset with more appropriate number of highly correlated features, which improves the prediction accuracy and interpretability of the ML model. NCOR-FS can be applied to any materials systems, and the idea of embedding domain knowledge into data-driven algorithm is expected to facilitate constructing extensive machine learning models embedded with materials domain knowledge. © 2022 Acta Materialia Inc. Published by Elsevier Ltd.
Description
Keywords
Correlations, Materials, Machine Learning, Data, Algorithms, Prediction equations
Citation
Liu, Y., Zou, X., Ma, S., Avdeev, M., & Shi, S. (2022). Feature selection method reducing correlations among features by embedding domain knowledge. Acta Materialia, 238, 118195. doi:10.1016/j.actamat.2022.118195
Collections