In Kazakhstan, the training of a large language model KazLLM has been completed, based on 148 billion tokens in Kazakh, English, Russian, and Turkish languages, as reported by the Kapital.kz business information center, citing the press service of the Ministry of Digital Development, Innovations, and Aerospace Industry.
“The model was developed by the team at the Institute of Smart Systems and Artificial Intelligence (ISSAI) at Nazarbayev University with the support and coordination of the Ministry of Digital Development and Aerospace Industry of Kazakhstan and the Ministry of Education and Science of Kazakhstan. This model will be accessible to a wide range of users, including the scientific community, startups, and large corporations. In line with the initiative of the head of state, KazLLM will serve as the foundation for a larger project—TurkLLM, aimed at advancing natural language processing technologies in the Turkic-speaking region. A corresponding agreement was signed at the recent summit of the CSTO. This project will mark an important milestone in the creation of a national AI infrastructure and affirm Kazakhstan's status as a technological leader in the region,” the Ministry of Digital Development stated.
The implementation of the project has contributed not only to the creation of an advanced artificial intelligence tool but also to the growth of competencies and the development of human capital in the field of artificial intelligence.
Linguistic institutes and research-production organizations such as Til Kazyna, JSC “NIT”, Maqsut Narikbayev University, and other institutes have contributed to the realization of this project.
“The launch of the open-source KazLLM model represents a significant step forward in the development of Kazakhstan's artificial intelligence ecosystem. This initiative reflects our commitment to supporting innovation and promoting scientific achievements that contribute to technological progress. I am confident that this advanced model will help overcome digital inequality by providing accessible and inclusive digital services for every Kazakhstani,” noted Minister Jaslan Madiyev.
The model was trained on a base of 148 billion tokens. Two versions have been created with 8 billion and 70 billion parameters. They serve as the foundation for developing new products in the field of artificial intelligence and outperform similar models in quality and accuracy.
In the first phase, KazLLM will be available to developers, startups, and companies to encourage the creation of products and services based on it. Detailed instructions have been prepared to help quickly integrate the model into various projects.
“This model reflects Kazakhstan's commitment to innovation, independence, and the growth of its technological ecosystem. Our team has prepared two versions of KAZ-LLM with 8 billion and 70 billion parameters, built on the Meta Llama architecture and optimized for high-performance systems and environments with limited resources. Thus, developers will be able to download and run our model on both complex servers and laptops,” said Professor Husayn Atakan Varol, Director of the Institute of Smart Systems and Artificial Intelligence (ISSAI) at Nazarbayev University.
Key partners in the creation of the national language model include Beeline Kazakhstan and its IT company QazCode. By combining efforts and experience in developing language models such as Kaz-RoBERTA and creating AI solutions for small language groups in partnership with foreign organizations, these companies have played an important role in creating an innovative and accessible model for Kazakhstani users. Support in the form of a server with 8 DGX H100 computing power significantly accelerated the training process and expanded the model's capabilities.
For comparison: a regular computer takes several days to analyze an archive of 1 million photos, while 8 DGX H100 servers used to train KAZ-LLM can accomplish this task in just a few seconds.
“Our team was actively involved in the development and training of the Kaz-LLM model. The complex process, which included creating a model that considers the peculiarities of the Kazakh language and 50 days of computations, improved context understanding and ensured quality interaction with users. Testing showed that the model effectively solves technical tasks, taking cultural nuances into account. We are confident that Kaz-LLM will become an important tool for all of Kazakhstan, helping to overcome the language digital barrier and improve the quality of digital services in the region,” commented QazCode CEO Alexey Sharavar.
KazLLM is a modern artificial intelligence language model designed for processing, analyzing, and generating texts in the Kazakh language. This unique development aims to promote the use of the Kazakh language in the digital space, supporting business, science, and society. It is capable of performing a wide range of tasks, from translation and document processing to automating communication.
The national model will enable businesses to develop chatbots, customer support systems, automate document flow, and conduct data analysis. For instance, local banks will be able to speed up the processing of requests in the Kazakh language, while retail will enhance user experience by integrating the model into their processes. Educational and scientific institutions will be able to create applications for learning the Kazakh language, as well as tools for analyzing scientific texts and assisting students. For those involved in media and content, there will be opportunities to generate news, improve translation quality, and create tools for writing texts.