Reinforcement Learning for Adaptive Resource Management in High-Throughput Satellite Networks
Group Members:
- Earnest George
- Samuel Burt
- Colin Byrne
This project explores how reinforcement learning techniques can be applied to adaptive resource management in high-throughput satellite communication networks. The goal is to investigate whether reinforcement learning agents can dynamically optimize network performance metrics such as latency, throughput, and resource allocation under changing network conditions.
Reinforcement Learning Concepts
To prepare for implementing reinforcement learning algorithms in satellite communication systems, I studied the foundational theory from Reinforcement Learning: An Introduction by Sutton and Barto. The focus of this study included:
- Finite Markov Decision Processes (MDP)
- Dynamic Programming
- Temporal Difference Learning
These concepts provide the mathematical framework used to model decision-making problems in reinforcement learning. In this framework, an agent interacts with an environment by observing states, selecting actions, and receiving rewards. Through repeated interaction, the agent learns a policy that maximizes the expected cumulative reward.
Understanding these principles is essential for applying reinforcement learning to communication networks, where the system must continuously adapt to varying traffic loads, channel conditions, and network topology.
Reinforcement Learning Experiments in Python
To gain practical experience with reinforcement learning algorithms, I developed several implementations in Python using the Gymnasium reinforcement learning framework. These experiments included implementations of:
- Q-Learning
- Deep Q-Learning (DQN)
- Neural-network-based policy approximations
The goal of these experiments was to better understand how reinforcement learning agents interact with environments and how neural networks can be used to approximate optimal policies in complex state spaces.
One of the primary environments used for experimentation was the Taxi-v3 environment, a benchmark reinforcement learning environment where an agent must learn to transport a passenger to a destination within a grid world.
Although simple compared to real communication systems, this environment provides a useful platform for validating reinforcement learning algorithms before applying them to more complex satellite network simulations.
Deep Q-Learning Training Results
A Deep Q-Network (DQN) agent was trained to optimize the total reward received within the Taxi environment. During training, the agent gradually improved its policy as it explored the environment and updated its Q-value estimates using experience replay and a target network.
The moving average reward per episode demonstrates the learning progression of the agent. Initially, the agent performs poorly due to random exploration, but over time it converges toward an improved strategy that consistently achieves higher rewards.
After approximately 700 episodes, the trained model reaches a stable performance level with rewards typically between 3 and 8 per episode.
Team Contributions
This project is being completed as a group effort where each member is focusing on a different aspect of reinforcement learning and its application to satellite communication systems. The goal is to combine these areas of research into a unified simulation and analysis of reinforcement learning techniques for adaptive resource management in high-throughput satellite networks.
Future Work
The next phase of this project will involve building a custom simulation environment that models resource management in satellite communication networks. This environment will allow reinforcement learning algorithms to be evaluated within a more realistic communication network scenario.
The simulation will focus on challenges commonly encountered in high-throughput satellite systems, including:
- Dynamic bandwidth allocation
- Traffic demand variation
- Latency optimization
- Adaptive routing and scheduling
By testing different reinforcement learning algorithms in this simulation environment, the project aims to evaluate how effectively reinforcement learning can improve resource allocation strategies in satellite networks compared to traditional rule-based approaches.
Additional Reinforcement Learning Tests
Additional experiments and implementation tests were conducted while learning and implementing reinforcement learning algorithms. These include early prototypes, algorithm comparisons, and training experiments using Python and the Gymnasium framework.
View Reinforcement Learning Experiments