An automatic descriptors recognizer customized for materials science literature

dc.contributor.authorLiu, Yen_AU
dc.contributor.authorGe, Xen_AU
dc.contributor.authorYang, Zen_AU
dc.contributor.authorSun, Sen_AU
dc.contributor.authorLiu, Den_AU
dc.contributor.authorAvdeev, Men_AU
dc.contributor.authorShi, SHen_AU
dc.date.accessioned2024-12-13T04:32:04Zen_AU
dc.date.available2024-12-13T04:32:04Zen_AU
dc.date.issued2022-10en_AU
dc.date.statistics2024-04-11en_AU
dc.description.abstractMaterials science literature contains domain knowledge about numerous descriptors, which play a critical role in data-driven materials design. However, automatically extracting descriptors from literature remains challenging. Here, we develop an automatic descriptors recognizer based on natural language processing (NLP) to mine latent descriptors, which consists of a conditional data augmentation model incorporating materials domain knowledge (cDA-DK), coarse- and fine-grained descriptors subrecognizers (CGDR and FGDR). cDA-DK conducts augmenting training data of text mining model, which can significantly reduce the cost of manually labeling and enhance the robustness of its model. On this basis, CGDR recognizes coarse-grained descriptor entities automatically, and FGDR performs screening of fine-grained descriptors related to specific materials design. Following this, the activation energy of NASICON-type solid electrolytes, which is influenced by complicated descriptors, is taken as an example to demonstrate the potential utility of our recognizer. CGDR extracts 106896 descriptor entities from 1808 relevant articles with an accuracy (F1) of 0.87. Furthermore, with features from 408 descriptors screened by FGDR, six activation energy prediction models are constructed to perform experiments, achieving an optimal prediction performance (R2) of 0.96. This work provides important insight towards the understanding of structure-activity relationships, thus promoting materials design and discovery. © 2022 Elsevier B.V.en_AU
dc.description.sponsorshipThis work was supported in part by the National Key Research and Development Program of China (Grant No. 2021YFB3802100), the National Natural Science Foundation of China (Grant No. 52073169), and the State Key Program of National Natural Science Foundation of China (Grant No. 61936001). We appreciated the High Performance Computing Center of Shanghai University and Shanghai Engineering Research Center of Intelligent Computing System for providing the computing resources and technical support.en_AU
dc.identifier.articlenumber231946en_AU
dc.identifier.citationLiu, Y., Ge, X., Yang, Z., Sun, S., Liu, D., Avdeev, M., & Shi, S. (2022). An automatic descriptors recognizer customized for materials science literature. Journal of Power Sources, 545, 231946. doi:10.1016/j.jpowsour.2022.231946en_AU
dc.identifier.issn0378-7753en_AU
dc.identifier.journaltitleJournal of Power Sourcesen_AU
dc.identifier.pagination231946-en_AU
dc.identifier.urihttps://doi.org/10.1016/j.jpowsour.2022.231946en_AU
dc.identifier.urihttps://apo.ansto.gov.au/handle/10238/15828en_AU
dc.identifier.volume545en_AU
dc.languageEnglishen_AU
dc.language.isoenen_AU
dc.publisherElsevieren_AU
dc.subjectDataen_AU
dc.subjectMaterialsen_AU
dc.subjectNumerical dataen_AU
dc.subjectProgramming languagesen_AU
dc.subjectText Devicesen_AU
dc.subjectDesignen_AU
dc.subjectSolid Electrolytesen_AU
dc.subjectPattern recognitionen_AU
dc.titleAn automatic descriptors recognizer customized for materials science literatureen_AU
dc.typeJournal Articleen_AU
Files
License bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
1.66 KB
Format:
Plain Text
Description:
Collections