|
ABSTRACTS OF ARTICLES OF THE JOURNAL "INFORMATION TECHNOLOGIES".
No. 6. Vol. 31. 2025
DOI: 10.17587/it.31.283-290
E. N. Kryuchkova, Ph.D., professor, E. V. Vopilova, Postgraduate Student,
Polzunov Altai State Technical University, Barnaul, Russia
Algorithms of Automatic Construction of Hierarchical Model of Scientific Field Based on Clustering of Semantic Graphs of Scientific Terminology
Received on 11.04.2024
Accepted on 21.05.2024
The article proposes the model of a scientific thesaurus, represented as the domain hierarchical graph, which connects the scientific field with scientific terminology. The source of training data for model building is partially structured scientific texts, including subject scientific. We propose the algorithms for calculating the significance of semantic relations between scientific terms. The scientific publication semantic model combines the knowledge stored in the thesaurus with information on term usage statistics in the publication text. The experiments of analyzing scientific publications presented in this paper were conducted using the domain semantic graph "Mathematics", built as a result of automatic processing of the text of the mathematical encyclopedia in five volumes.
Keywords: aspect-oriented analysis, scientific vocabulary, semantic graph, classification of scientific text, automatic processing of unstructured texts
P. 283-290
Full text on eLIBRARY
References
- Bruches E. P., Batura T. V. Method for Automatic Term Extraction from Scientific Articles Based on Weak Supervision, Vestnik NGU, Seriya: Informacionnye tekhnologii, 2021, vol. 19, no. 2, pp. 5—16 (in Russian).
- Morozov D. A., Glazkova A. V., Tyutyulnikov M. A., Iom-din B. L. Keyphrase Generation for Abstracts of the Russian-Language Scientific Articles, Vestnik NGU, Seriya: Lingvistika i mezhkul'turnaya kommunikaciya, 2023, vol. 21, no. 1, pp. 54—66 (in Russian).
- Altmami N., Menai M. Automatic Summarization of Scientific Articles: A Survey, Journal of King Saud University — Computer and Information Sciences, 2020, vol. 34, pp. 1011—1028.
- Benites F. Information Retrieval and Knowledge Extraction for Academic Writing, Digital Writing Technologies in Higher Education, 2023, pp. 303—315.
- Ushakov S. N., Savelyev A. O. A comparative review of tasks, approaches and tools for automated knowledge extraction from scientific publication texts, Informatsionnye Tekhnologii, 2024, vol. 30, no. 6, pp. 291—299 (in Russian).
- Borovikova O. I., Kononenko I. S., Sidorova E. A. An approach to information extraction from clinical trials protocols on the basis of medical ontology, Sistemnaya informatika, 2017, no. 9, pp. 93—110 (in Russian).
- Beliga S., Mestrovic A., Martincic-Ipsic S. An Overview of Graph-Based Keyword Extraction Methods and Approaches, Journal of Information and Organizational Sciences, 2015, vol. 39, pp. 1—20.
- Lunev K. V. Graph Methods for Computing Semantic Similarity of a Pair of Keywords and Their Application to the Problem of Keywords Clustering, Programmnaya Inzheneriya, 2018, vol. 9, no. 6, pp. 262—271 (in Russian).
- Dubinina E. Y. Automatic extraction of key lexical units of the scientific texts at the process of summarization, Nauchnaya sessiya GUAP, 2018, vol. 3, pp. 115—118 (in Russian).
- Hossari M., Dev S., Kelleher J. D. TEST: A Terminology Extraction System for Technology Related Terms, Proc. The 2019 11th International Conference on Computer and Automation Engineering, 2019, pp. 78—81.
- Danilov G., Ishankulov T., Kotik K., Orlov Yu., Shifrin M., Potapov A. The Classification of Short Scientific Texts Using Pretrained BERT Model, Public Health and Informatics, 2021, vol. 281, pp. 83—87.
- Dunn A., Dagdelen J., Walker N., Lee S., Rosen A., Ceder G., Persson K., Jain A. Structured information extraction from complex scientific text with fine-tuned large language models, available at: https://doi.org/10.48550/arXiv.2212.05238 (date of access 15.07.24).
- Lukashevich N. V., Dobrov B. V. Designing linguistic ontologies for information systems in broad subject areas, Ontologiya proektirovaniya, 2015, vol. 5, no. 1 (15), pp. 47—69 (in Russian).
- Belwal R., Rai S., Gupta A. A new graph-based extractive text summarization using keywords or topic modeling, Journal of Ambient Intelligence and Humanized Computing, 2021, vol. 12, pp. 8975—8990.
- Yerimbetova A. S., Sagnayeva S. K., Murzin F. A., Tussupov J. A. Creation of tools and algorithms for assessing the relevance of documents, Proceedings of the 3rd Russian-Pacific Conference on Computer Technology and Applications, 2018, pp. 1—4.
- Vinogradov I. M. Ed. Mathematical encyclopedia in 5 volumes, Moscow, Sovetskaya enciklopediya, 1977 (in Russian).
- Bachishe O. I., Kryuchkova E. N., Shushakov D. S. Problems of automatic processing of scientific texts based on extraction of information from encyclopedias of relevant domain areas, Programmnaya Inzheneriya, 2023, vol. 14, no. 1, pp. 42—50 (in Russian).
- Vopilova E. V. Characteristic functions for calculating the significance of terms in a semantic model of scientific knowledge representation, Materialy IX Mezhdunarodnoj konferencii "Znaniya — Ontologii — Teorii" (ZONT— 2023), 2023, pp. 49 (in Russian).
- Kazakov M. G., Kryuchkova E. N. Classification of complex images based on semantic graph, Prikladnaya informatika, 2014, no. 6 (54), pp. 79—89 (in Russian).
- Korney A., Kryuchkova E., Savchenko V. Information Retrieval Approach Using Semiotic Models Based on Multi-layered Semantic Graphs, High-Performance Computing Systems and Technologies in Scientific Research, 2020, vol. 1304, pp. 162—177.
- Korney A. O., Kryuchkova E. N. Text categorization based on a condensed graph, Informatsionnye Tekhnologii, 2021, vol. 27, no. 3, pp. 138—146 (in Russian).
- Vopilova E. V., Kryuchkova E. N. Automatic analysis methods of dynamics of information presentation in texts based on adaptable dictionaries of scientific terms, Programmnaya Inzheneriya, 2024, vol. 15, no. 4, pp. 206—215 (in Russian).
- Natasha. Tools for Russian NLP: segmentation, embeddings, morphology, lemmatization, syntax, NER, fact extraction, available at: https://github.com/natasha (date of access 01.06.24).
To the contents |
|