Data quantity governance for machine learning in materials science
dc.contributor.author | Liu, Y | en_AU |
dc.contributor.author | Yang, ZW | en_AU |
dc.contributor.author | Zou, XX | en_AU |
dc.contributor.author | Ma, S | en_AU |
dc.contributor.author | Liu, D | en_AU |
dc.contributor.author | Avdeev, M | en_AU |
dc.contributor.author | Shi, S | en_AU |
dc.date.accessioned | 2024-02-28T00:49:48Z | en_AU |
dc.date.available | 2024-02-28T00:49:48Z | en_AU |
dc.date.issued | 2023-05-31 | en_AU |
dc.date.statistics | 2024-02-28 | en_AU |
dc.description.abstract | Data-driven machine learning (ML) is widely employed in the analysis of materials structure–activity relationships, performance optimization and materials design due to its superior ability to reveal latent data patterns and make accurate prediction. However, because of the laborious process of materials data acquisition, ML models encounter the issue of the mismatch between a high dimension of feature space and a small sample size (for traditional ML models) or the mismatch between model parameters and sample size (for deep-learning models), usually resulting in terrible performance. Here, we review the efforts for tackling this issue via feature reduction, sample augmentation and specific ML approaches, and show that the balance between the number of samples and features or model parameters should attract great attention during data quantity governance. Following this, we propose a synergistic data quantity governance flow with the incorporation of materials domain knowledge. After summarizing the approaches to incorporating materials domain knowledge into the process of ML, we provide examples of incorporating domain knowledge into governance schemes to demonstrate the advantages of the approach and applications. The work paves the way for obtaining the required high-quality data to accelerate materials design and discovery based on ML. © The Author(s) 2023. Published by Oxford University Press on behalf of China Science Publishing & Media Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. | en_AU |
dc.description.sponsorship | This work was supported in part by the National Natural Science Foundation of China (92270124 and 52073169), the National Key Research and Development Program of China (2021YFB3802101) and the Key Research Project of Zhejiang Laboratory (2021PE0AC02). | en_AU |
dc.format.medium | Electronic-eCollection | en_AU |
dc.identifier.articlenumber | nwad125 | en_AU |
dc.identifier.citation | Liu, Y., Yang, Z., Zou, X., Ma, S., Liu, D., Avdeev, M., & Shi, S. (2023). Data quantity governance for machine learning in materials science. National Science Review, 10(7), nwad125. doi:10.1093/nsr/nwad125 | en_AU |
dc.identifier.issn | 2095-5138 | en_AU |
dc.identifier.issn | 2053-714X | en_AU |
dc.identifier.issue | 7 | en_AU |
dc.identifier.journaltitle | National Science Review | en_AU |
dc.identifier.uri | http://dx.doi.org/10.1093/nsr/nwad125 | en_AU |
dc.identifier.uri | https://doi.org/10.1093/nsr/nwad125 | en_AU |
dc.identifier.uri | https://apo.ansto.gov.au/handle/10238/15465 | en_AU |
dc.identifier.volume | 10 | en_AU |
dc.language | English | en_AU |
dc.language.iso | en | en_AU |
dc.publisher | Oxford University Press (OUP) | en_AU |
dc.subject | Data | en_AU |
dc.subject | Machine Learning | en_AU |
dc.subject | Materials | en_AU |
dc.subject | Prediction equations | en_AU |
dc.subject | Augmentation | en_AU |
dc.subject | Composite materials | en_AU |
dc.subject | Sampling | en_AU |
dc.title | Data quantity governance for machine learning in materials science | en_AU |
dc.type | Journal Article | en_AU |
dcterms.dateAccepted | 2023-04-26 | en_AU |
Files
License bundle
1 - 1 of 1