An automatic descriptors recognizer customized for materials science literature
dc.contributor.author | Liu, Y | en_AU |
dc.contributor.author | Ge, X | en_AU |
dc.contributor.author | Yang, Z | en_AU |
dc.contributor.author | Sun, S | en_AU |
dc.contributor.author | Liu, D | en_AU |
dc.contributor.author | Avdeev, M | en_AU |
dc.contributor.author | Shi, SH | en_AU |
dc.date.accessioned | 2024-12-13T04:32:04Z | en_AU |
dc.date.available | 2024-12-13T04:32:04Z | en_AU |
dc.date.issued | 2022-10 | en_AU |
dc.date.statistics | 2024-04-11 | en_AU |
dc.description.abstract | Materials science literature contains domain knowledge about numerous descriptors, which play a critical role in data-driven materials design. However, automatically extracting descriptors from literature remains challenging. Here, we develop an automatic descriptors recognizer based on natural language processing (NLP) to mine latent descriptors, which consists of a conditional data augmentation model incorporating materials domain knowledge (cDA-DK), coarse- and fine-grained descriptors subrecognizers (CGDR and FGDR). cDA-DK conducts augmenting training data of text mining model, which can significantly reduce the cost of manually labeling and enhance the robustness of its model. On this basis, CGDR recognizes coarse-grained descriptor entities automatically, and FGDR performs screening of fine-grained descriptors related to specific materials design. Following this, the activation energy of NASICON-type solid electrolytes, which is influenced by complicated descriptors, is taken as an example to demonstrate the potential utility of our recognizer. CGDR extracts 106896 descriptor entities from 1808 relevant articles with an accuracy (F1) of 0.87. Furthermore, with features from 408 descriptors screened by FGDR, six activation energy prediction models are constructed to perform experiments, achieving an optimal prediction performance (R2) of 0.96. This work provides important insight towards the understanding of structure-activity relationships, thus promoting materials design and discovery. © 2022 Elsevier B.V. | en_AU |
dc.description.sponsorship | This work was supported in part by the National Key Research and Development Program of China (Grant No. 2021YFB3802100), the National Natural Science Foundation of China (Grant No. 52073169), and the State Key Program of National Natural Science Foundation of China (Grant No. 61936001). We appreciated the High Performance Computing Center of Shanghai University and Shanghai Engineering Research Center of Intelligent Computing System for providing the computing resources and technical support. | en_AU |
dc.identifier.articlenumber | 231946 | en_AU |
dc.identifier.citation | Liu, Y., Ge, X., Yang, Z., Sun, S., Liu, D., Avdeev, M., & Shi, S. (2022). An automatic descriptors recognizer customized for materials science literature. Journal of Power Sources, 545, 231946. doi:10.1016/j.jpowsour.2022.231946 | en_AU |
dc.identifier.issn | 0378-7753 | en_AU |
dc.identifier.journaltitle | Journal of Power Sources | en_AU |
dc.identifier.pagination | 231946- | en_AU |
dc.identifier.uri | https://doi.org/10.1016/j.jpowsour.2022.231946 | en_AU |
dc.identifier.uri | https://apo.ansto.gov.au/handle/10238/15828 | en_AU |
dc.identifier.volume | 545 | en_AU |
dc.language | English | en_AU |
dc.language.iso | en | en_AU |
dc.publisher | Elsevier | en_AU |
dc.subject | Data | en_AU |
dc.subject | Materials | en_AU |
dc.subject | Numerical data | en_AU |
dc.subject | Programming languages | en_AU |
dc.subject | Text Devices | en_AU |
dc.subject | Design | en_AU |
dc.subject | Solid Electrolytes | en_AU |
dc.subject | Pattern recognition | en_AU |
dc.title | An automatic descriptors recognizer customized for materials science literature | en_AU |
dc.type | Journal Article | en_AU |
Files
License bundle
1 - 1 of 1