Repository logo


A general framework to govern machine learning oriented materials data quality

dc.contributor.authorLiu, Yen_AU
dc.contributor.authorYang, ZWen_AU
dc.contributor.authorZou, XXen_AU
dc.contributor.authorLin, YXen_AU
dc.contributor.authorMa, SCen_AU
dc.contributor.authorZuo, Wen_AU
dc.contributor.authorZou, Zen_AU
dc.contributor.authorWang, Hen_AU
dc.contributor.authorAvdeev, Men_AU
dc.contributor.authorShi, SQen_AU
dc.date.accessioned2026-05-13T23:58:27Zen_AU
dc.date.issued2025-09en_AU
dc.date.statistics2025-06-25en_AU
dc.description.abstractMachine learning (ML) is increasingly applied in materials discovery and property prediction, mainly due to its advantage of low-cost and efficient data analysis process. The materials data quality can heavily influence the performance of ML models. However, most current data quality improvement approaches are purely data-driven, neglecting materials domain knowledge and data quality issues latent in the entire process of ML modelling. Here, we address the definition of high-quality data and propose a general framework for ML-oriented MATerials Data Quality Governance incorporating domain knowledge (MAT-DQG), involving nine dimensions defining WHAT materials data quality should be evaluated, lifecycle models guiding WHEN to execute data governance activities in the entire process of ML modelling, and processing models guiding HOW to detect and address issues related to materials data quality. 60 datasets from materials ML studies are assembled to demonstrate potential utility and applications of MAT-DQG, including mining complicated structure-activity relationships in metals, inorganic non-metals, polymers, and composite materials. MAT-DQG identifies and resolves issues in 17 datasets and as a result prediction accuracy improvements of up to 49 % are achieved. Our work lays a foundation for governing ML-oriented materials data and ensuring its reusability and reliability, which advances the frontiers of materials discovery and design. © 2025 Elsevier B.V. All rights are reserved.en_AU
dc.identifier.articlenumber101050en_AU
dc.identifier.citationLiu, Y., Yang, Z., Zou, X., Lin, Y., Ma, S., Zuo, W., Zou, Z., Wang, H., Avdeev, M., & Shi, S. (2025). A general framework to govern machine learning oriented materials data quality. Materials Science and Engineering: R: Reports, 166, 101050. doi:10.1016/j.mser.2025.101050en_AU
dc.identifier.issn0927-796Xen_AU
dc.identifier.journaltitleMaterials Science and Engineering: R: Reportsen_AU
dc.identifier.urihttps://doi.org/10.1016/j.mser.2025.101050en_AU
dc.identifier.urihttps://apo.ansto.gov.au/handle/10238/17216en_AU
dc.identifier.volume166en_AU
dc.languageEnglishen_AU
dc.language.isoenen_AU
dc.publisherElsevier BVen_AU
dc.subjectMachine Learningen_AU
dc.subjectMaterialsen_AU
dc.subjectDataen_AU
dc.subjectMetalsen_AU
dc.subjectPolymersen_AU
dc.subjectComposite materialsen_AU
dc.subjectKnowledge managementen_AU
dc.subjectDefectsen_AU
dc.subjectlen_AU
dc.titleA general framework to govern machine learning oriented materials data qualityen_AU
dc.typeJournal Articleen_AU

Files

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
1.66 KB
Format:
Plain Text
Description:

Collections