A general framework to govern machine learning oriented materials data quality
| dc.contributor.author | Liu, Y | en_AU |
| dc.contributor.author | Yang, ZW | en_AU |
| dc.contributor.author | Zou, XX | en_AU |
| dc.contributor.author | Lin, YX | en_AU |
| dc.contributor.author | Ma, SC | en_AU |
| dc.contributor.author | Zuo, W | en_AU |
| dc.contributor.author | Zou, Z | en_AU |
| dc.contributor.author | Wang, H | en_AU |
| dc.contributor.author | Avdeev, M | en_AU |
| dc.contributor.author | Shi, SQ | en_AU |
| dc.date.accessioned | 2026-05-13T23:58:27Z | en_AU |
| dc.date.issued | 2025-09 | en_AU |
| dc.date.statistics | 2025-06-25 | en_AU |
| dc.description.abstract | Machine learning (ML) is increasingly applied in materials discovery and property prediction, mainly due to its advantage of low-cost and efficient data analysis process. The materials data quality can heavily influence the performance of ML models. However, most current data quality improvement approaches are purely data-driven, neglecting materials domain knowledge and data quality issues latent in the entire process of ML modelling. Here, we address the definition of high-quality data and propose a general framework for ML-oriented MATerials Data Quality Governance incorporating domain knowledge (MAT-DQG), involving nine dimensions defining WHAT materials data quality should be evaluated, lifecycle models guiding WHEN to execute data governance activities in the entire process of ML modelling, and processing models guiding HOW to detect and address issues related to materials data quality. 60 datasets from materials ML studies are assembled to demonstrate potential utility and applications of MAT-DQG, including mining complicated structure-activity relationships in metals, inorganic non-metals, polymers, and composite materials. MAT-DQG identifies and resolves issues in 17 datasets and as a result prediction accuracy improvements of up to 49 % are achieved. Our work lays a foundation for governing ML-oriented materials data and ensuring its reusability and reliability, which advances the frontiers of materials discovery and design. © 2025 Elsevier B.V. All rights are reserved. | en_AU |
| dc.identifier.articlenumber | 101050 | en_AU |
| dc.identifier.citation | Liu, Y., Yang, Z., Zou, X., Lin, Y., Ma, S., Zuo, W., Zou, Z., Wang, H., Avdeev, M., & Shi, S. (2025). A general framework to govern machine learning oriented materials data quality. Materials Science and Engineering: R: Reports, 166, 101050. doi:10.1016/j.mser.2025.101050 | en_AU |
| dc.identifier.issn | 0927-796X | en_AU |
| dc.identifier.journaltitle | Materials Science and Engineering: R: Reports | en_AU |
| dc.identifier.uri | https://doi.org/10.1016/j.mser.2025.101050 | en_AU |
| dc.identifier.uri | https://apo.ansto.gov.au/handle/10238/17216 | en_AU |
| dc.identifier.volume | 166 | en_AU |
| dc.language | English | en_AU |
| dc.language.iso | en | en_AU |
| dc.publisher | Elsevier BV | en_AU |
| dc.subject | Machine Learning | en_AU |
| dc.subject | Materials | en_AU |
| dc.subject | Data | en_AU |
| dc.subject | Metals | en_AU |
| dc.subject | Polymers | en_AU |
| dc.subject | Composite materials | en_AU |
| dc.subject | Knowledge management | en_AU |
| dc.subject | Defects | en_AU |
| dc.subject | l | en_AU |
| dc.title | A general framework to govern machine learning oriented materials data quality | en_AU |
| dc.type | Journal Article | en_AU |
Files
License bundle
1 - 1 of 1