Data quantity governance for machine learning in materials science

dc.contributor.authorLiu, Yen_AU
dc.contributor.authorYang, ZWen_AU
dc.contributor.authorZou, XXen_AU
dc.contributor.authorMa, Sen_AU
dc.contributor.authorLiu, Den_AU
dc.contributor.authorAvdeev, Men_AU
dc.contributor.authorShi, Sen_AU
dc.date.accessioned2024-02-28T00:49:48Zen_AU
dc.date.available2024-02-28T00:49:48Zen_AU
dc.date.issued2023-05-31en_AU
dc.date.statistics2024-02-28en_AU
dc.description.abstractData-driven machine learning (ML) is widely employed in the analysis of materials structure–activity relationships, performance optimization and materials design due to its superior ability to reveal latent data patterns and make accurate prediction. However, because of the laborious process of materials data acquisition, ML models encounter the issue of the mismatch between a high dimension of feature space and a small sample size (for traditional ML models) or the mismatch between model parameters and sample size (for deep-learning models), usually resulting in terrible performance. Here, we review the efforts for tackling this issue via feature reduction, sample augmentation and specific ML approaches, and show that the balance between the number of samples and features or model parameters should attract great attention during data quantity governance. Following this, we propose a synergistic data quantity governance flow with the incorporation of materials domain knowledge. After summarizing the approaches to incorporating materials domain knowledge into the process of ML, we provide examples of incorporating domain knowledge into governance schemes to demonstrate the advantages of the approach and applications. The work paves the way for obtaining the required high-quality data to accelerate materials design and discovery based on ML. © The Author(s) 2023. Published by Oxford University Press on behalf of China Science Publishing & Media Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.en_AU
dc.description.sponsorshipThis work was supported in part by the National Natural Science Foundation of China (92270124 and 52073169), the National Key Research and Development Program of China (2021YFB3802101) and the Key Research Project of Zhejiang Laboratory (2021PE0AC02).en_AU
dc.format.mediumElectronic-eCollectionen_AU
dc.identifier.articlenumbernwad125en_AU
dc.identifier.citationLiu, Y., Yang, Z., Zou, X., Ma, S., Liu, D., Avdeev, M., & Shi, S. (2023). Data quantity governance for machine learning in materials science. National Science Review, 10(7), nwad125. doi:10.1093/nsr/nwad125en_AU
dc.identifier.issn2095-5138en_AU
dc.identifier.issn2053-714Xen_AU
dc.identifier.issue7en_AU
dc.identifier.journaltitleNational Science Reviewen_AU
dc.identifier.urihttp://dx.doi.org/10.1093/nsr/nwad125en_AU
dc.identifier.urihttps://doi.org/10.1093/nsr/nwad125en_AU
dc.identifier.urihttps://apo.ansto.gov.au/handle/10238/15465en_AU
dc.identifier.volume10en_AU
dc.languageEnglishen_AU
dc.language.isoenen_AU
dc.publisherOxford University Press (OUP)en_AU
dc.subjectDataen_AU
dc.subjectMachine Learningen_AU
dc.subjectMaterialsen_AU
dc.subjectPrediction equationsen_AU
dc.subjectAugmentationen_AU
dc.subjectComposite materialsen_AU
dc.subjectSamplingen_AU
dc.titleData quantity governance for machine learning in materials scienceen_AU
dc.typeJournal Articleen_AU
dcterms.dateAccepted2023-04-26en_AU
Files
Original bundle
Now showing 1 - 2 of 2
Loading...
Thumbnail Image
Name:
Data quantity governance for machine learning in materials science.pdf
Size:
498.76 KB
Format:
Adobe Portable Document Format
Description:
Loading...
Thumbnail Image
Name:
nwad125.pdf
Size:
501.76 KB
Format:
Adobe Portable Document Format
Description:
License bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
1.66 KB
Format:
Plain Text
Description:
Collections