Data quantity governance for machine learning in materials science

Liu, Y; Yang, ZW; Zou, XX; Ma, S; Liu, D; Avdeev, M; Shi, S

Data quantity governance for machine learning in materials science

dc.contributor.author	Liu, Y	en_AU
dc.contributor.author	Yang, ZW	en_AU
dc.contributor.author	Zou, XX	en_AU
dc.contributor.author	Ma, S	en_AU
dc.contributor.author	Liu, D	en_AU
dc.contributor.author	Avdeev, M	en_AU
dc.contributor.author	Shi, S	en_AU
dc.date.accessioned	2024-02-28T00:49:48Z	en_AU
dc.date.available	2024-02-28T00:49:48Z	en_AU
dc.date.issued	2023-05-31	en_AU
dc.date.statistics	2024-02-28	en_AU
dc.description.abstract	Data-driven machine learning (ML) is widely employed in the analysis of materials structure–activity relationships, performance optimization and materials design due to its superior ability to reveal latent data patterns and make accurate prediction. However, because of the laborious process of materials data acquisition, ML models encounter the issue of the mismatch between a high dimension of feature space and a small sample size (for traditional ML models) or the mismatch between model parameters and sample size (for deep-learning models), usually resulting in terrible performance. Here, we review the efforts for tackling this issue via feature reduction, sample augmentation and specific ML approaches, and show that the balance between the number of samples and features or model parameters should attract great attention during data quantity governance. Following this, we propose a synergistic data quantity governance flow with the incorporation of materials domain knowledge. After summarizing the approaches to incorporating materials domain knowledge into the process of ML, we provide examples of incorporating domain knowledge into governance schemes to demonstrate the advantages of the approach and applications. The work paves the way for obtaining the required high-quality data to accelerate materials design and discovery based on ML. © The Author(s) 2023. Published by Oxford University Press on behalf of China Science Publishing & Media Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.	en_AU
dc.description.sponsorship	This work was supported in part by the National Natural Science Foundation of China (92270124 and 52073169), the National Key Research and Development Program of China (2021YFB3802101) and the Key Research Project of Zhejiang Laboratory (2021PE0AC02).	en_AU
dc.format.medium	Electronic-eCollection	en_AU
dc.identifier.articlenumber	nwad125	en_AU
dc.identifier.citation	Liu, Y., Yang, Z., Zou, X., Ma, S., Liu, D., Avdeev, M., & Shi, S. (2023). Data quantity governance for machine learning in materials science. National Science Review, 10(7), nwad125. doi:10.1093/nsr/nwad125	en_AU
dc.identifier.issn	2095-5138	en_AU
dc.identifier.issn	2053-714X	en_AU
dc.identifier.issue	7	en_AU
dc.identifier.journaltitle	National Science Review	en_AU
dc.identifier.uri	http://dx.doi.org/10.1093/nsr/nwad125	en_AU
dc.identifier.uri	https://doi.org/10.1093/nsr/nwad125	en_AU
dc.identifier.uri	https://apo.ansto.gov.au/handle/10238/15465	en_AU
dc.identifier.volume	10	en_AU
dc.language	English	en_AU
dc.language.iso	en	en_AU
dc.publisher	Oxford University Press (OUP)	en_AU
dc.subject	Data	en_AU
dc.subject	Machine Learning	en_AU
dc.subject	Materials	en_AU
dc.subject	Prediction equations	en_AU
dc.subject	Augmentation	en_AU
dc.subject	Composite materials	en_AU
dc.subject	Sampling	en_AU
dc.title	Data quantity governance for machine learning in materials science	en_AU
dc.type	Journal Article	en_AU
dcterms.dateAccepted	2023-04-26	en_AU

Files

Original bundle

Now showing 1 - 2 of 2

Name:: Data quantity governance for machine learning in materials science.pdf
Size:: 498.76 KB
Format:: Adobe Portable Document Format
Description:

Download

Name:: nwad125.pdf
Size:: 501.76 KB
Format:: Adobe Portable Document Format
Description:

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 1.66 KB
Format:: Plain Text
Description:

Download

Collections

Journal Articles