Feature selection method reducing correlations among features by embedding domain knowledge

dc.contributor.authorLiu, Yen_AU
dc.contributor.authorZou, Xen_AU
dc.contributor.authorMa, Sen_AU
dc.contributor.authorAvdeev, Men_AU
dc.contributor.authorShi, SQen_AU
dc.date.accessioned2024-12-05T21:27:53Zen_AU
dc.date.available2024-12-05T21:27:53Zen_AU
dc.date.issued2022-10-01en_AU
dc.date.statistics2024-06-27en_AU
dc.description.abstractSelecting proper descriptors, also known as features, is one of the key problems in modeling for materials properties using machine learning models. Redundant features reduce accuracy of machine learning modeling, and results of purely data-driven feature selection methods are often inconsistent with materials domain knowledge. Herein, a feature selection method embedded with materials domain knowledge named NCOR-FS is proposed to select higher quality features. The method translates materials domain knowledge about highly correlated features into Non-Co-Occurrence Rules (NCORs), which allows to quantify the degree to which NCORs are violated by feature subsets and to design optimization process for FS method based on swarm intelligence algorithm. Experiments on seven datasets show that compared with multiple other FS methods commonly used in materials, NCOR-FS selects the feature subset with more appropriate number of highly correlated features, which improves the prediction accuracy and interpretability of the ML model. NCOR-FS can be applied to any materials systems, and the idea of embedding domain knowledge into data-driven algorithm is expected to facilitate constructing extensive machine learning models embedded with materials domain knowledge. © 2022 Acta Materialia Inc. Published by Elsevier Ltd.en_AU
dc.description.sponsorshipThis work was supported by the National Natural Science Foundation of China (Grant No. 52073169), the National Key Research and Development Program of China (Grant No. 2021YFB3802100), and the State Key Program of National Natural Science Foundation of China (Grant No. 61936001). We appreciated the High Performance Computing Center of Shanghai University and Shanghai Engineering Research Center of Intelligent Computing System for providing the computing resources and technical support.en_AU
dc.identifier.articlenumber118195en_AU
dc.identifier.citationLiu, Y., Zou, X., Ma, S., Avdeev, M., & Shi, S. (2022). Feature selection method reducing correlations among features by embedding domain knowledge. Acta Materialia, 238, 118195. doi:10.1016/j.actamat.2022.118195en_AU
dc.identifier.issn1359-6454en_AU
dc.identifier.journaltitleActa Materialiaen_AU
dc.identifier.urihttps://doi.org/10.1016/j.actamat.2022.118195en_AU
dc.identifier.urihttps://apo.ansto.gov.au/handle/10238/15781en_AU
dc.identifier.volume238en_AU
dc.languageEnglishen_AU
dc.language.isoenen_AU
dc.publisherElsevieren_AU
dc.subjectCorrelationsen_AU
dc.subjectMaterialsen_AU
dc.subjectMachine Learningen_AU
dc.subjectDataen_AU
dc.subjectAlgorithmsen_AU
dc.subjectPrediction equationsen_AU
dc.titleFeature selection method reducing correlations among features by embedding domain knowledgeen_AU
dc.typeJournal Articleen_AU
Files
License bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
1.66 KB
Format:
Plain Text
Description:
Collections