Dynamic RACH-Slot Allocation for Collision Minimization in NB-IoT Networks Based on Reinforcement Learning Algorithms

Шаброва А.С., Князев М.А., Колесников А.В.

doi:10.7256/2454-0714.2025.2.73848

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

Back to contents

Software systems and computational methods

Reference:

Shabrova, A.S., Knyazev, M.A., Kolesnikov, A.V. (2025). Dynamic RACH-Slot Allocation for Collision Minimization in NB-IoT Networks Based on Reinforcement Learning Algorithms. Software systems and computational methods, 2, 1–11. . https://doi.org/10.7256/2454-0714.2025.2.73848

Dynamic RACH-Slot Allocation for Collision Minimization in NB-IoT Networks Based on Reinforcement Learning Algorithms

Shabrova Anna Sergeevna

ORCID: 0009-0009-1675-1558

Student; Department of Information Security; Bauman Moscow State Technical University

5, 2nd Baumanskaya str., building 4, Moscow, 105005, Russia

shabrova.anna.2410@list.ru

Other publications by this author

Knyazev Maxim Andreevich

ORCID: 0009-0007-3931-7442

Postgraduate student; Department of Information Security ; MIREA – Russian Technological University, Institute of Artificial Intelligence

78 Vernadsky Ave., Moscow, 119454, Russia

maxiknyaz@mail.ru

Other publications by this author

Kolesnikov Alexander Vladimirovich

ORCID: 0009-0008-9669-804X

Ph.D; Department of Information Security; Bauman Moscow State Technical University

5, 2nd Baumanskaya str., building 4, Basmanny district, Moscow, 105005, Russia

avkolesnikov@bmstu.ru

DOI:

10.7256/2454-0714.2025.2.73848

EDN:

TPWJOS

Received:

27-03-2025

Published:

10-04-2025

Abstract: The subject of this research is the adaptive management of access to Random Access Channels (RACH) in Narrowband Internet of Things (NB-IoT) networks, which frequently face congestion due to high device density and limited channel capacity. The study focuses on the practical application of Reinforcement Learning algorithms, specifically Q-learning and Deep Q-Network (DQN), to address this issue. The authors thoroughly examine the problem of RACH overload and the resulting collisions that cause delays in data transmission and increased energy consumption in connected devices. The article analyzes the limitations and inefficiency of traditional static slot allocation methods and justifies the necessity of implementing a dynamic, learning-based approach capable of adapting to constantly changing network conditions. The research aims to significantly minimize collision rates, improve connection success rates, and reduce the overall energy consumption of NB-IoT devices. The research methodology involved the use of advanced machine learning methods, including Q-learning and DQN, together with simulation modeling conducted in the NS-3 environment, integrating a dedicated RL-agent for dynamic and intelligent RACH slot allocation. The main conclusions of the study highlight the demonstrated effectiveness of the adaptive RL-based approach for optimizing access to communication slots in NB-IoT networks. The scientific novelty lies in the development and integration of a specialized RL-agent capable of dynamically managing slot distribution based on real-time network conditions. As a result of implementing the proposed approach, the number of collisions was reduced by 74%, the number of successful connections increased by 16%, and the energy efficiency of the devices improved by 15% in comparison with traditional static methods. These results clearly demonstrate the practical applicability, and scalability of adaptive RL-based management techniques for enhancing both the performance and reliability of real-world NB-IoT networks.

Keywords:

NB-IoT, Reinforcement Learning, Q-learning, DQN, collisions, NS-3, reinforcement learning, RACH, IoT, Internet of Things
This article is automatically translated. You can find original text of the article here.

introduction

Modern trends in digital transformation have led to an intensive increase in the number of Internet of Things (IoT) devices, which are actively used in various fields, from household devices to industrial and infrastructure solutions. One of the most demanded technologies for ensuring communication of such devices is the Narrowband Internet of Things (NB-IoT) standard, characterized by low data transfer rates and high energy efficiency. However, an increase in the density of device placement has complicated the operation of Random Access Channel (RACH) procedures designed to establish primary communication between devices and base stations ^[4].

Traditional static methods of allocating time slots used in RACH procedures no longer meet modern requirements due to insufficient adaptability to dynamic load changes and unpredictable behavior of a large number of devices ^[5]. The consequence of this is an increase in the number of collisions, increased delays in data transmission and increased power consumption of devices, which together negatively affects the overall performance and reliability of the network ^[3].

In this regard, there is a need to develop and implement new adaptive mechanisms that can quickly respond to changing network load conditions and effectively minimize the likelihood of collisions ^[6]. One of the promising areas for solving this problem is the use of Reinforcement Learning (RL) algorithms, which allow systems to make optimal decisions based on accumulated experience in interacting with the environment ^[8].

The purpose of this study is to develop and experimentally confirm the effectiveness of using RL algorithms (Q-learning and DQN) for dynamic control of the RACH time slot allocation procedure in NB-IoT networks. The paper will conduct a comparative analysis of the results of practical application of the proposed approach with the traditional static method based on operational metrics of the number of collisions, device connection success, data transmission delay and energy consumption indicators.

The results of the study can be used to optimize the functioning of real NB-IoT networks, increase their productivity and energy efficiency, and serve as a basis for further research in the field of adaptive and intelligent management of IoT networks.

CURRENT STATE OF THE TECHNOLOGY

The Narrowband Internet of Things (NB-IoT) is a specialized standard developed by the 3rd Generation Partnership Project (3GPP) consortium, designed to transfer small amounts of data with high energy efficiency and low bandwidth requirements ^[9]. The technology allows you to service a large number of devices, even in conditions of difficult coverage, such as basements and remote rural areas, which increases the urgency of solving the problem of collisions. The problem of minimizing collisions in NB-IoT networks is relevant and is being actively studied by many researchers due to the growing number of devices connected to IoT networks ^[1].

In world practice, both algorithmic and technological methods are used to minimize collisions. One of the traditional approaches is to modify random access algorithms by integrating mechanisms for dynamically managing retry intervals and redistributing time slots.

However, in recent years, special attention has been paid to the use of machine learning methods, in particular Reinforcement Learning (RL), to solve the problem of overload and minimize collisions. RL is a machine learning method where an agent learns to interact with the environment, making optimal decisions based on accumulated experience in order to maximize the reward received ^[8]. Popular algorithms in this area are Q-learning and Deep Q-Network (DQN), which make it possible to effectively adapt network resource management strategies depending on the current state and load dynamics. The study "Transmission Control in NB-IoT With Model-Based Reinforcement Learning" confirms the effectiveness of using RL for adaptive control of access slots, significantly reducing the likelihood of collisions ^[7]. The paper "Analysis of the Effect of the Reliability of the NB-Iot Network on the Intelligent System" provides a detailed analysis and specific recommendations for reducing delays and collisions using similar approaches ^[2].

A multi-agent approach based on distributed network resource management, where multiple agents jointly make decisions, adapting to changes in the load and uncertainty of the environment, is also considered a promising direction.

Despite the variety of proposed methods, the task of creating universal, scalable and highly efficient solutions suitable for practical use in real-world operation of NB-IoT networks remains urgent. In the context of this task, the approach based on RL algorithms presented in this study is a promising direction for effective and adaptive management of network resources and minimizing the number of collisions.

Tasks and tools

To achieve the set research goal, the following tasks were formulated and solved:

- the analysis of the current state of NB-IoT technology has been carried out and the main problems associated with random access procedures have been identified;

- an NB-IoT network model has been designed, consisting of three different types of devices reflecting the specifics of their application;

- network operation and device behavior were simulated using the NS-3 discrete event simulator;

- an RL agent has been developed for dynamic management of the allocation of access time slots;

- a series of experiments was conducted with and without the use of the RL agent (with the traditional static approach);

- the results of experimental studies have been recorded and processed;

- a comparative analysis of the results was performed in terms of the number of collisions, connection status, data transmission delay, and device power consumption;

- conclusions about the advantages and effectiveness of the proposed approach based on the analysis are formulated.

The study used the NS-3 discrete event network simulator, a widely used open source platform written in C++ with support for Python module integration. NS-3 made it possible to implement a detailed model of the NB-IoT network, reflecting realistic operating conditions, including various types of devices and traffic generation patterns. Due to the high accuracy of the simulation, a series of simulations was carried out, which made it possible to collect and analyze data.

NB-IOT NETWORK MODEL DEVELOPMENT AND RL AGENT INTEGRATION

To develop a detailed model of the NB-IoT network in the NS-3 environment, several successive modeling and configuration stages were performed. At the initial stage, a set of network nodes was formed: 50 independent nodes corresponding to NB-IoT devices were created, and a separate node performing the functions of a base station. The nodes were combined into a single point-to-Point network topology, in which each device interacts directly with the base station through a separate communication channel ^[11].

For additional control of the correctness of the model implementation and visual demonstration of data exchange processes, the NetAnim visualization tool integrated into NS-3 was used. This tool made it possible to visually visualize the formed network topology, the location of devices and the base station in real time, as well as observe the dynamic processes of node interaction and data packet exchange between them (see Figure 1).

Figure 1 – Visualization of the network topology

The configuration of the communication channels used between the nodes was set with a high bandwidth of 1 Gbit/s and a minimum data transmission delay of 1 ms. These parameters were chosen in order to minimize the influence of channel characteristics on the simulation results and to provide an opportunity to focus on the study of the effectiveness of the RACH mechanism. After completing the configuration of communication channels, an IPv4 Internet protocol stack was installed on all nodes, ensuring the correct transmission of data packets over the network.

A specially developed DynamicSlotManager component was used to implement the RACH random access mechanism. This component simulates the actual behavior of NB-IoT devices in the process of accessing slots. Within the framework of the basic model, five time RACH slots with a duration of 10 ms were placed. The devices attempted to access the slots and, in case of a collision, went into standby mode, followed by packet retransmission after a random time delay interval (backoff).

To more accurately simulate the behavior of NB-IoT devices, a specialized network application NbIotDeviceApp was developed, implemented as a separate class in the NS-3 simulator. This application allows you to simulate three different types of devices, which differ in the modes and frequency of sending data.:

- periodic devices that send data at fixed intervals (every 60 seconds);

- sporadic devices that send data packets randomly and with an unpredictable frequency, typical for various emergency sensors and event-type devices;

- low-priority devices characterized by irregular data transmission and significant delays in information transmission.

The distribution of nodes according to the specified types was set as follows: 30% of devices of periodic type, 40% of sporadic and 30% of devices with low priority. The size of the data packets sent by the devices was 50 bytes. The energy cost of each device for sending one data packet was modeled at 0.2 joules, while the initial energy reserve of the devices was set at 10 Joules.

A special external Python module has been developed to integrate the reinforcement learning (RL) algorithm into the simulation model. rl_agent.py , which implements RL logic using Q-learning and Deep Q-Network (DQN) methods. The interaction between the NS-3 simulator and an external Python agent was provided through a specially created integration layer on the NS-3 side using the Python C API. This layer called the functions of the RL agent ^[10], transmitted data about the current state of the network to the agent (the number of collisions, the number of successful and unsuccessful connection attempts, the current allocation of slots) and received a new optimized allocation of access slots.

At the initialization stage, the agent set the initial learning parameters (learning coefficients, discount, probability of random selection of actions) and formed an initial uniform distribution of access slots for each device.

Every 30 seconds, the agent received up-to-date network status metrics from the NS-3 simulator and evaluated the current system status based on them.

Using the e-greed strategy, the agent chose an action aimed at optimizing the current situation (reducing collisions, increasing connection success and energy efficiency).

After selecting an action, the RL agent calculated a new optimal allocation of access slots, transferring the resulting configurations back to the NS-3 simulator, where they were directly applied to the network devices.

After applying the new configurations, the RL agent recorded changes in the network status, calculated the reward received, and adjusted the Q-function values in preparation for the next analysis cycle.

Experimental studies were conducted over an interval of 3,600 seconds (one hour), during which network status metrics were recorded every 10 seconds for subsequent analysis and comparative evaluation of the effectiveness of the proposed approach using an RL agent compared with the traditional static slot management method.

COMPARATIVE ANALYSIS OF THE RESULTS OF THE EXPERIMENTS

To evaluate the effectiveness of the integrated RL algorithm, two sets of experiments were conducted: the first using the traditional static approach to the allocation of RACH slots, and the second using the adaptive reinforcement learning algorithm (RL). The time of each experiment was 3600 seconds, with identical initial conditions and network parameters.

The analysis of the number and proportion of collisions showed a significant advantage of using the RL algorithm. With the static approach, the number of collisions reached 50 over the entire simulation period, which corresponds to 2.84% of the total number of access attempts. In the case of the RL algorithm, the number of collisions was significantly reduced to 13, reducing the proportion of collisions to 0.65%. Thus, the adaptive approach has reduced the number of collisions by 74% and the proportion of collisions by 77%.

The average delay for both approaches was kept at a low level and amounted to approximately 0.01 seconds. Despite the absence of a noticeable difference in average latency, reducing the number of collisions when using the RL algorithm contributes to more stable network operation, reducing the need for repeated access attempts, which in the long run leads to an overall increase in data transmission efficiency.

Device power consumption is an important indicator for NB-IoT networks. During the experiment with the traditional static slot allocation, the average residual charge of the devices was 3.7 J, while using the adaptive RL approach it was 3.12 J. This indicates a 15% improvement in energy efficiency due to a reduction in the number of data retransmissions due to reduced collisions.

The number of successfully transmitted data packets has also improved during the experiments. When using the static approach, 1,710 successful connections were recorded. In the case of using the RL algorithm, this value increased to 1990 successful connections, which corresponds to a 16% increase in successful connections. This result confirms the high efficiency of the adaptive approach in terms of increasing the success of data transmission.

A summary table of the comparative analysis of the experimental results is presented below.:

Table 1. Simulation results without the RL agent and using it

Metric	Static distribution	The RL algorithm	Change
Number of collisions	50	13	↓74%
The proportion of collisions (%)	2.84	0.65	↓77%
Average delay (seconds)	0.01	0.01	stable
Average power consumption (J)	3.7	3.12	↓15%
Number of successful connections	1710	1990	↑16%

The results obtained demonstrate the advantages of the proposed adaptive approach based on reinforcement learning algorithms over traditional methods of allocating RACH slots in NB-IoT networks. The adaptability of the RL algorithm to the current network load makes it possible to significantly improve the main network performance indicators, confirming the possibility of practical application of this approach in real conditions.

RESEARCH RESULTS AND THEIR DISCUSSION

The results of the study confirm the high efficiency of using reinforcement learning algorithms for dynamic control of the procedure for accessing RACH time slots in NB-IoT networks. Compared to the traditional static approach, the use of the RL agent has significantly improved key performance indicators. The number of collisions was reduced by 74%, while the share of collisions in the total number of connection attempts decreased by 77%. We also managed to increase the number of successful connections by 16% and improve the energy efficiency of devices by 15%.

The data obtained confirm the expediency of implementing adaptive network resource management methods based on RL algorithms in real NB-IoT networks, as this significantly improves the overall performance and reliability of the network.

As part of further research, it seems advisable to scale the proposed approach in order to study the behavior of reinforcement learning algorithms with an increase in the number of devices and the complexity of traffic generation scenarios. This will make it possible to more accurately assess the performance of the developed solutions in conditions as close as possible to real operational conditions.

An additional promising area is the study of multi-agent approaches, in which several RL agents simultaneously manage various aspects of network interaction. It is assumed that this will provide additional efficiency gains in managing network resources and reduce the likelihood of collisions by coordinating agents.

An important task of future research is the experimental verification of the possibility of integrating the developed RL algorithms with real network equipment and testing in existing networks. This will allow us to identify potential limitations and difficulties that arise during the transition from simulation to practical application.

The authors are convinced that the results of this work can serve as a basis for further research and contribute to the improvement of technologies for intelligent management of Internet of Things networks.

AUTHORS’ CONTRIBUTION

The authors contributed equally to the writing of the article / The authors contributed equally to the writing of the article.

References

1. Liu, Y., Deng, Y., Jiang, N., et al. (2021). Analysis of random access in NB-IoT networks with three coverage enhancement groups: A stochastic geometry approach. IEEE Transactions on Wireless Communications, 20(1), 549-563.
2. Jia, G., Zhu, Y., Li, Y., & Zhu, Z. (2019). Analysis of the effect of the reliability of the NB-IoT network on the intelligent system. Special Section on Innovation and Application of Internet of Things and Emerging Technologies in Smart Sensing, 7, 112809-112820.
3. Sahithya, R., Pouria, Z., Mohieddine, E. S., & Majid, N. (2019). Evaluation, modeling and optimization of coverage enhancement methods of NB-IoT. Electrical Engineering Department, 1, 1-17.
4. Chougrani, H., Kisseleff, S., Martins, W. A., & Chatzinotas, S. (2022). NB-IoT random access for nonterrestrial networks: Preamble detection and uplink synchronization. IEEE Internet of Things Journal, 9(16), 14913-14927. https://doi.org/10.1109/jiot.2021.3123376
5. Agiwal, M., Kumar, M. M., & Jin, H. (2019). Power efficient random access for massive NB-IoT connectivity. Sensors, 19, 1-24.
6. Jiang, N., Deng, Y., & Nallanathan, A. (2018). Deep reinforcement learning for real-time optimization in NB-IoT networks. School of Electronic Engineering and Computer Science, 1, 1-31.
7. Alcaraz, J., Losilla, F., & Gonzalez-Castaño, F.-J. (2023). Transmission control in NB-IoT with model-based reinforcement learning. IEEE Access, 11, 57991-58005. https://doi.org/10.1109/access.2023.3284990
8. Anbazhagan, S., & Mugelan, R. K. (2024). Next-gen resource optimization in NB-IoT networks: Harnessing soft actor-critic reinforcement learning. Computer Networks, 252, 110670-110684. https://doi.org/10.1016/j.comnet.2024.110670
9. Shorin, O. A., & Aslanyan, V. A. (2024). Approaches to integration of NB-IoT technology with 5G network. Economics and Quality of Communication Systems, 3, 56-62.
10. Namiot, D. E., & Ilyushin, E. A. (2025). Architecture of LLM agents. International Journal of Open Information Technologies, 13(1), 2307-8162.
11. Isaeva, O. S. (2023). Construction of a digital profile of Internet of Things devices. Information and Mathematical Technologies in Science and Management, 30(2), 36-44. https://doi.org/10.25729/ESI.2023.30.2.004

First Peer Review

Peer reviewers' evaluations remain confidential and are not disclosed to the public. Only external reviews, authorized for publication by the article's author(s), are made public. Typically, these final reviews are conducted after the manuscript's revision. Adhering to our double-blind review policy, the reviewer's identity is kept confidential.
The list of publisher reviewers can be found here.

The presented article on the topic "Dynamic allocation of RACH slots to minimize collisions in NB-IoT networks based on reinforcement learning algorithms" corresponds to the topic of the journal "Software Systems and Computational Methods" and is devoted to the creation of universal, scalable and highly efficient solutions suitable for practical use in real-world operating conditions of NB-IoT networks. As a goal, the authors indicate the development and experimental confirmation of the effectiveness of the use of RL algorithms (Q-learning and DQN) for dynamic control of the RACH time slot allocation procedure in NB-IoT networks. To achieve this goal, the authors formulated the following tasks: - to analyze the current state of NB-IoT technology and identify the main problems associated with random access procedures; - to design an NB-IoT network model consisting of three different types of devices reflecting the specifics of their application; - to simulate network operation and device behavior using discrete-the NS-3 event simulator; - to develop an RL agent for dynamic management of the allocation of access time slots; - to conduct a series of experiments using the RL agent and without its use (with the traditional static approach); - to record and process the results of experimental studies; - to perform a comparative analysis of the results by the number of collisions, connection status, data transmission delay and power consumption of devices; - to formulate conclusions about the advantages and effectiveness of the proposed approach based on the analysis. The list of references includes foreign sources on the research topic. The style and language of the presentation of the material is quite accessible to a wide range of readers. The volume of the article corresponds to the recommended volume of 12,000 characters or more. The practical significance of the article is clearly justified, the authors point out that the results of the study can be used to optimize the functioning of real NB-IoT networks, increase their productivity and energy efficiency, and serve as a basis for further research in the field of adaptive and intelligent management of IoT networks. The article is quite structured - there is an introduction, research results and discussion, internal division of the main part (current state of technology, development of the Nb-Iot network model and integration of the Rl agent, comparative analysis of the results of the experiments). The authors used the NS-3 discrete event network simulator, a widely used open source platform written in C++ with support for Python module integration. The disadvantages include the following points: there is no scientific novelty in the content of the article. There is no clear identification of the subject, the object of research. It is recommended to clearly identify the scientific novelty of the research, formulate the subject and object. It would also be advisable to add about the prospects for further research, expand the list of references, indicating the works of domestic authors. The article "Dynamic allocation of RACH slots to minimize collisions in NB-IoT networks based on reinforcement learning algorithms" requires further development based on the above comments. After making amendments, it is recommended for reconsideration by the editorial board of the peer-reviewed scientific journal.

Second Peer Review

The article is devoted to the urgent problem of minimizing collisions in the Narrowband Internet of Things (NB-IoT) networks through the dynamic allocation of Random Access Channel (RACH) slots using Reinforcement Learning (RL) algorithms. The authors propose an innovative approach based on Q-learning and Deep Q-Network (DQN), aimed at improving network efficiency. The research was carried out at a high scientific level using modern modeling methods. The authors used the NS-3 discrete event simulator to create a detailed model of the NB-IoT network, which includes three types of devices with different traffic generation patterns. The integration of the RL agent implemented in Python allowed for dynamic management of the allocation of RACH slots. The experiments were conducted under controlled conditions with key metrics recorded: the number of collisions, data transmission delays, power consumption, and connection success. The work is responding to the challenges associated with the growing number of IoT devices and network congestion. The problem of collisions in NB-IoT, aggravated by static resource allocation methods, requires adaptive solutions. The use of RL algorithms, as the authors demonstrate, is a promising direction for improving network performance and energy efficiency. The main contribution of the article is: 1. The development of an RL agent for dynamic management of RACH slots, which reduced the number of collisions by 74%. 2. Experimental confirmation of the advantages of the RL approach over static methods, including an improvement in energy efficiency by 15% and an increase in successful connections by 16%. 3. The proposal of a scalable model suitable for for further research, including multi-agent systems and integration with real hardware. The article is characterized by a clear logical structure, consistent presentation and depth of study of the topic. The introduction substantiates the relevance of the research, the section "Current state of technology" contains a literature review, and the methodological part is detailed and reproducible. The results are presented clearly, including a comparison table, which facilitates the perception of the data. The presentation style meets academic standards, and the terminology is used correctly. The authors convincingly demonstrate the effectiveness of RL algorithms for managing NB-IoT resources. The results obtained have significant practical value and can be applied to optimize real networks. The article makes a significant contribution to the field of intelligent IoT management and opens up areas for further research, such as multi-agent systems and testing in industrial environments. The material will be in demand by researchers in the field of IoT, telecommunications and machine learning, as well as engineers involved in the deployment and optimization of NB-IoT networks. Clear conclusions and practical recommendations make the article useful for a wide range of specialists. The article meets high academic standards, has significant scientific novelty and practical value. I am sure that the work will arouse the interest of the scientific community and will serve as a basis for further research in this field. Recommendation: accept for publication without modification.

Journals

Books

Dynamic RACH-Slot Allocation for Collision Minimization in NB-IoT Networks Based on Reinforcement Learning Algorithms