Svoboda | Graniru | BBC Russia | Golosameriki | Facebook
 
 
Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (17)

Search Parameters:
Keywords = MAPPO

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
22 pages, 1406 KiB  
Article
Multi-Layer Energy Management and Strategy Learning for Microgrids: A Proximal Policy Optimization Approach
by Xiaohan Fang, Peng Hong, Shuping He, Yuhao Zhang and Di Tan
Energies 2024, 17(16), 3990; https://doi.org/10.3390/en17163990 - 12 Aug 2024
Viewed by 674
Abstract
An efficient energy management system (EMS) enhances microgrid performance in terms of stability, safety, and economy. Traditional centralized or decentralized energy management systems are unable to meet the increasing demands for autonomous decision-making, privacy protection, global optimization, and rapid collaboration simultaneously. This paper [...] Read more.
An efficient energy management system (EMS) enhances microgrid performance in terms of stability, safety, and economy. Traditional centralized or decentralized energy management systems are unable to meet the increasing demands for autonomous decision-making, privacy protection, global optimization, and rapid collaboration simultaneously. This paper proposes a hierarchical multi-layer EMS for microgrid, comprising supply layer, demand layer, and neutral scheduling layer. Additionally, common mathematical optimization methods struggle with microgrid scheduling decision problem due to challenges in mechanism modeling, supply–demand uncertainty, and high real-time and autonomy requirements. Therefore, an improved proximal policy optimization (PPO) approach is proposed for the multi-layer EMS. Specifically, in the centrally managed supply layer, a centralized PPO algorithm is utilized to determine the optimal power generation strategy. In the decentralized demand layer, an auction market is established, and multi-agent proximal policy optimization (MAPPO) algorithm with an action-guidance-based mechanism is employed for each consumer, to implement individual auction strategy. The neutral scheduling layer interacts with other layers, manages information, and protects participant privacy. Numerical results validate the effectiveness of the proposed multi-layer EMS framework and the PPO-based optimization methods. Full article
(This article belongs to the Section A1: Smart Grids and Microgrids)
Show Figures

Figure 1

19 pages, 9250 KiB  
Article
Multi-Agent Deep Reinforcement Learning Based Dynamic Task Offloading in a Device-to-Device Mobile-Edge Computing Network to Minimize Average Task Delay with Deadline Constraints
by Huaiwen He, Xiangdong Yang, Xin Mi, Hong Shen and Xuefeng Liao
Sensors 2024, 24(16), 5141; https://doi.org/10.3390/s24165141 - 8 Aug 2024
Viewed by 1035
Abstract
Device-to-device (D2D) is a pivotal technology in the next generation of communication, allowing for direct task offloading between mobile devices (MDs) to improve the efficient utilization of idle resources. This paper proposes a novel algorithm for dynamic task offloading between the active MDs [...] Read more.
Device-to-device (D2D) is a pivotal technology in the next generation of communication, allowing for direct task offloading between mobile devices (MDs) to improve the efficient utilization of idle resources. This paper proposes a novel algorithm for dynamic task offloading between the active MDs and the idle MDs in a D2D–MEC (mobile edge computing) system by deploying multi-agent deep reinforcement learning (DRL) to minimize the long-term average delay of delay-sensitive tasks under deadline constraints. Our core innovation is a dynamic partitioning scheme for idle and active devices in the D2D–MEC system, accounting for stochastic task arrivals and multi-time-slot task execution, which has been insufficiently explored in the existing literature. We adopt a queue-based system to formulate a dynamic task offloading optimization problem. To address the challenges of large action space and the coupling of actions across time slots, we model the problem as a Markov decision process (MDP) and perform multi-agent DRL through multi-agent proximal policy optimization (MAPPO). We employ a centralized training with decentralized execution (CTDE) framework to enable each MD to make offloading decisions solely based on its local system state. Extensive simulations demonstrate the efficiency and fast convergence of our algorithm. In comparison to the existing sub-optimal results deploying single-agent DRL, our algorithm reduces the average task completion delay by 11.0% and the ratio of dropped tasks by 17.0%. Our proposed algorithm is particularly pertinent to sensor networks, where mobile devices equipped with sensors generate a substantial volume of data that requires timely processing to ensure quality of experience (QoE) and meet the service-level agreements (SLAs) of delay-sensitive applications. Full article
(This article belongs to the Section Communications)
Show Figures

Figure 1

26 pages, 6549 KiB  
Article
Reinforcement-Learning-Based Multi-UAV Cooperative Search for Moving Targets in 3D Scenarios
by Yifei Liu, Xiaoshuai Li, Jian Wang, Feiyu Wei and Junan Yang
Drones 2024, 8(8), 378; https://doi.org/10.3390/drones8080378 - 6 Aug 2024
Viewed by 1089
Abstract
Most existing multi-UAV collaborative search methods only consider scenarios of two-dimensional path planning or static target search. To be close to the practical scenario, this paper proposes a path planning method based on an action-mask-based multi-agent proximal policy optimization (AM-MAPPO) algorithm for multiple [...] Read more.
Most existing multi-UAV collaborative search methods only consider scenarios of two-dimensional path planning or static target search. To be close to the practical scenario, this paper proposes a path planning method based on an action-mask-based multi-agent proximal policy optimization (AM-MAPPO) algorithm for multiple UAVs searching for moving targets in three-dimensional (3D) environments. In particular, a multi-UAV high–low altitude collaborative search architecture is introduced that not only takes into account the extensive detection range of high-altitude UAVs but also leverages the benefit of the superior detection quality of low-altitude UAVs. The optimization objective of the search task is to minimize the uncertainty of the search area while maximizing the number of captured moving targets. The path planning problem for moving target search in a 3D environment is formulated and addressed using the AM-MAPPO algorithm. The proposed method incorporates a state representation mechanism based on field-of-view encoding to handle dynamic changes in neural network input dimensions and develops a rule-based target capture mechanism and an action-mask-based collision avoidance mechanism to enhance the AM-MAPPO algorithm’s convergence speed. Experimental results demonstrate that the proposed algorithm significantly reduces regional uncertainty and increases the number of captured moving targets compared to other deep reinforcement learning methods. Ablation studies further indicate that the proposed action mask mechanism, target capture mechanism, and collision avoidance mechanism of the AM-MAPPO algorithm can improve the algorithm’s effectiveness, target capture capability, and UAVs’ safety, respectively. Full article
Show Figures

Figure 1

18 pages, 715 KiB  
Article
Optimizing Drone Energy Use for Emergency Communications in Disasters via Deep Reinforcement Learning
by Wen Qiu, Xun Shao, Hiroshi Masui and William Liu
Future Internet 2024, 16(7), 245; https://doi.org/10.3390/fi16070245 - 11 Jul 2024
Viewed by 598
Abstract
For a communication control system in a disaster area where drones (also called unmanned aerial vehicles (UAVs)) are used as aerial base stations (ABSs), the reliability of communication is a key challenge for drones to provide emergency communication services. However, the effective configuration [...] Read more.
For a communication control system in a disaster area where drones (also called unmanned aerial vehicles (UAVs)) are used as aerial base stations (ABSs), the reliability of communication is a key challenge for drones to provide emergency communication services. However, the effective configuration of UAVs remains a major challenge due to limitations in their communication range and energy capacity. In addition, the relatively high cost of drones and the issue of mutual communication interference make it impractical to deploy an unlimited number of drones in a given area. To maximize the communication services provided by a limited number of drones to the ground user equipment (UE) within a certain time frame while minimizing the drone energy consumption, we propose a multi-agent proximal policy optimization (MAPPO) algorithm. Considering the dynamic nature of the environment, we analyze diverse observation data structures and design novel objective functions to enhance the drone performance. We find that, when drone energy consumption is used as a penalty term in the objective function, the drones—acting as agents—can identify the optimal trajectory that maximizes the UE coverage while minimizing the energy consumption. At the same time, the experimental results reveal that, without considering the machine computing power required for training and convergence time, the proposed key algorithm demonstrates better performance in communication coverage and energy saving as compared with other methods. The average coverage performance is 1045% higher than that of the other three methods, and it can save up to 3% more energy. Full article
Show Figures

Figure 1

20 pages, 6487 KiB  
Article
UAV Swarm Cooperative Dynamic Target Search: A MAPPO-Based Discrete Optimal Control Method
by Dexing Wei, Lun Zhang, Quan Liu, Hao Chen and Jian Huang
Drones 2024, 8(6), 214; https://doi.org/10.3390/drones8060214 - 22 May 2024
Viewed by 993
Abstract
Unmanned aerial vehicles (UAVs) are commonly employed in pursuit and rescue missions, where the target’s trajectory is unknown. Traditional methods, such as evolutionary algorithms and ant colony optimization, can generate a search route in a given scenario. However, when the scene changes, the [...] Read more.
Unmanned aerial vehicles (UAVs) are commonly employed in pursuit and rescue missions, where the target’s trajectory is unknown. Traditional methods, such as evolutionary algorithms and ant colony optimization, can generate a search route in a given scenario. However, when the scene changes, the solution needs to be recalculated. In contrast, more advanced deep reinforcement learning methods can train an agent that can be directly applied to a similar task without recalculation. Nevertheless, there are several challenges when the agent learns how to search for unknown dynamic targets. In this search task, the rewards are random and sparse, which makes learning difficult. In addition, because of the need for the agent to adapt to various scenario settings, interactions required between the agent and the environment are more comparable to typical reinforcement learning tasks. These challenges increase the difficulty of training agents. To address these issues, we propose the OC-MAPPO method, which combines optimal control (OC) and Multi-Agent Proximal Policy Optimization (MAPPO) with GPU parallelization. The optimal control model provides the agent with continuous and stable rewards. Through parallelized models, the agent can interact with the environment and collect data more rapidly. Experimental results demonstrate that the proposed method can help the agent learn faster, and the algorithm demonstrated a 26.97% increase in the success rate compared to genetic algorithms. Full article
Show Figures

Figure 1

19 pages, 747 KiB  
Article
A Multi-Agent RL Algorithm for Dynamic Task Offloading in D2D-MEC Network with Energy Harvesting
by Xin Mi, Huaiwen He and Hong Shen
Sensors 2024, 24(9), 2779; https://doi.org/10.3390/s24092779 - 26 Apr 2024
Cited by 3 | Viewed by 808
Abstract
Delay-sensitive task offloading in a device-to-device assisted mobile edge computing (D2D-MEC) system with energy harvesting devices is a critical challenge due to the dynamic load level at edge nodes and the variability in harvested energy. In this paper, we propose a joint dynamic [...] Read more.
Delay-sensitive task offloading in a device-to-device assisted mobile edge computing (D2D-MEC) system with energy harvesting devices is a critical challenge due to the dynamic load level at edge nodes and the variability in harvested energy. In this paper, we propose a joint dynamic task offloading and CPU frequency control scheme for delay-sensitive tasks in a D2D-MEC system, taking into account the intricacies of multi-slot tasks, characterized by diverse processing speeds and data transmission rates. Our methodology involves meticulous modeling of task arrival and service processes using queuing systems, coupled with the strategic utilization of D2D communication to alleviate edge server load and prevent network congestion effectively. Central to our solution is the formulation of average task delay optimization as a challenging nonlinear integer programming problem, requiring intelligent decision making regarding task offloading for each generated task at active mobile devices and CPU frequency adjustments at discrete time slots. To navigate the intricate landscape of the extensive discrete action space, we design an efficient multi-agent DRL learning algorithm named MAOC, which is based on MAPPO, to minimize the average task delay by dynamically determining task-offloading decisions and CPU frequencies. MAOC operates within a centralized training with decentralized execution (CTDE) framework, empowering individual mobile devices to make decisions autonomously based on their unique system states. Experimental results demonstrate its swift convergence and operational efficiency, and it outperforms other baseline algorithms. Full article
(This article belongs to the Section Communications)
Show Figures

Figure 1

21 pages, 7023 KiB  
Article
COLREGs-Based Path Planning for USVs Using the Deep Reinforcement Learning Strategy
by Naifeng Wen, Yundong Long, Rubo Zhang, Guanqun Liu, Wenjie Wan and Dian Jiao
J. Mar. Sci. Eng. 2023, 11(12), 2334; https://doi.org/10.3390/jmse11122334 - 11 Dec 2023
Cited by 3 | Viewed by 1252
Abstract
This research introduces a two-stage deep reinforcement learning approach for the cooperative path planning of unmanned surface vehicles (USVs). The method is designed to address cooperative collision-avoidance path planning while adhering to the International Regulations for Preventing Collisions at Sea (COLREGs) and considering [...] Read more.
This research introduces a two-stage deep reinforcement learning approach for the cooperative path planning of unmanned surface vehicles (USVs). The method is designed to address cooperative collision-avoidance path planning while adhering to the International Regulations for Preventing Collisions at Sea (COLREGs) and considering the collision-avoidance problem within the USV fleet and between USVs and target ships (TSs). To achieve this, the study presents a dual COLREGs-compliant action-selection strategy to effectively manage the vessel-avoidance problem. Firstly, we construct a COLREGs-compliant action-evaluation network that utilizes a deep learning network trained on pre-recorded TS avoidance trajectories by USVs in compliance with COLREGs. Then, the COLREGs-compliant reward-function-based action-selection network is proposed by considering various TS encountering scenarios. Consequently, the results of the two networks are fused to select actions for cooperative path-planning processes. The path-planning model is established using the multi-agent proximal policy optimization (MAPPO) method. The action space, observation space, and reward function are tailored for the policy network. Additionally, a TS detection method is introduced to detect the motion intentions of TSs. The study conducted Monte Carlo simulations to demonstrate the strong performance of the planning method. Furthermore, experiments focusing on COLREGs-based TS avoidance were carried out to validate the feasibility of the approach. The proposed TS detection model exhibited robust performance within the defined task. Full article
(This article belongs to the Special Issue Autonomous Marine Vehicle Operations—2nd Edition)
Show Figures

Figure 1

20 pages, 1330 KiB  
Article
Multi-UAV Cooperative Searching and Tracking for Moving Targets Based on Multi-Agent Reinforcement Learning
by Kai Su and Feng Qian
Appl. Sci. 2023, 13(21), 11905; https://doi.org/10.3390/app132111905 - 31 Oct 2023
Cited by 3 | Viewed by 1940
Abstract
In this paper, we propose a distributed multi-agent reinforcement learning (MARL) method to learn cooperative searching and tracking policies for multiple unmanned aerial vehicles (UAVs) with limited sensing range and communication ability. Firstly, we describe the system model for multi-UAV cooperative searching and [...] Read more.
In this paper, we propose a distributed multi-agent reinforcement learning (MARL) method to learn cooperative searching and tracking policies for multiple unmanned aerial vehicles (UAVs) with limited sensing range and communication ability. Firstly, we describe the system model for multi-UAV cooperative searching and tracking for moving targets and consider average observation rate and average exploration rate as the metrics. Moreover, we propose the information update and fusion mechanisms to enhance environment perception ability of the multi-UAV system. Then, the details of our method are demonstrated, including observation and action space representation, reward function design and training framework based on multi-agent proximal policy optimization (MAPPO). The simulation results have shown that our method has well convergence performance and outperforms other baseline algorithms in terms of average observation rate and average exploration rate. Full article
(This article belongs to the Special Issue Advances in Unmanned Aerial Vehicle (UAV) System)
Show Figures

Figure 1

16 pages, 2578 KiB  
Article
Research on Reinforcement-Learning-Based Truck Platooning Control Strategies in Highway On-Ramp Regions
by Jiajia Chen, Zheng Zhou, Yue Duan and Biao Yu
World Electr. Veh. J. 2023, 14(10), 273; https://doi.org/10.3390/wevj14100273 - 1 Oct 2023
Cited by 2 | Viewed by 1816
Abstract
With the development of autonomous driving technology, truck platooning control has become a reality. Truck platooning can improve road capacity by maintaining a minor headway. Platooning systems can significantly reduce fuel consumption and emissions, especially for trucks. In this study, we designed a [...] Read more.
With the development of autonomous driving technology, truck platooning control has become a reality. Truck platooning can improve road capacity by maintaining a minor headway. Platooning systems can significantly reduce fuel consumption and emissions, especially for trucks. In this study, we designed a Platoon-MAPPO algorithm to implement truck platooning control based on multi-agent reinforcement learning for a platooning facing an on-ramp scenario on highway. A centralized training, decentralized execution algorithm was used in this paper. Each truck only computes its actions, avoiding the data computation delay problem caused by centralized computation. Each truck considers the truck status in front of and behind itself, maximizing the overall gain of the platooning and improving the global operational efficiency. In terms of performance evaluation, we used the traditional rule-based platooning following model as a benchmark. To ensure fairness, the model used the same network structure and traffic scenario as our proposed model. The simulation results show that the algorithm proposed in this paper has good performance and improves the overall efficiency of the platoon while guaranteeing traffic safety. The average energy consumption decreased by 14.8%, and the road occupancy rate decreased by 43.3%. Full article
(This article belongs to the Special Issue Recent Advance in Intelligent Vehicle)
Show Figures

Figure 1

15 pages, 4673 KiB  
Article
Variable Speed Limit Control for the Motorway–Urban Merging Bottlenecks Using Multi-Agent Reinforcement Learning
by Xuan Fang, Tamás Péter and Tamás Tettamanti
Sustainability 2023, 15(14), 11464; https://doi.org/10.3390/su151411464 - 24 Jul 2023
Cited by 6 | Viewed by 1601
Abstract
Traffic congestion is a typical phenomenon when motorways meet urban road networks. At this special location, the weaving area is a recurrent traffic bottleneck. Numerous research activities have been conducted to improve traffic efficiency and sustainability at bottleneck areas. Variable speed limit control [...] Read more.
Traffic congestion is a typical phenomenon when motorways meet urban road networks. At this special location, the weaving area is a recurrent traffic bottleneck. Numerous research activities have been conducted to improve traffic efficiency and sustainability at bottleneck areas. Variable speed limit control (VSL) is one of the effective control strategies. The primary objective of this paper is twofold. On the one hand, turbulent traffic flow is to be smoothed on the special weaving area of motorways and urban roads using VSL control. On the other hand, another control method is provided to tackle the carbon dioxide emission problem over the network. For both control methods, a multi-agent reinforcement learning algorithm is used (MAPPO: multi-agent proximal policy optimization). The VSL control framework utilizes the real-time traffic state and the speed limit value in the last control step as the input of the optimization algorithm. Two reward functions are constructed to guide the algorithm to output the value of the dynamic speed limit enforced within the VSL control area. The effectiveness of the proposed control framework is verified via microscopic traffic simulation using simulation of urban mobility (SUMO). The results show that the proposed control method could shape a more homogeneous traffic flow, and reduces the total waiting time over the network by 15.8%. In the case of the carbon dioxide minimization strategy, the carbon dioxide emission can be reduced by 10.79% in the recurrent bottleneck area caused by the transition from motorways to urban roads. Full article
(This article belongs to the Special Issue Control System for Sustainable Urban Mobility)
Show Figures

Figure 1

18 pages, 1645 KiB  
Article
A Multi-Agent Deep Reinforcement Learning-Based Popular Content Distribution Scheme in Vehicular Networks
by Wenwei Chen, Xiujie Huang, Quanlong Guan and Shancheng Zhao
Entropy 2023, 25(5), 792; https://doi.org/10.3390/e25050792 - 12 May 2023
Viewed by 1496
Abstract
The Internet of Vehicles (IoV) enables vehicular data services and applications through vehicle-to-everything (V2X) communications. One of the key services provided by IoV is popular content distribution (PCD), which aims to quickly deliver popular content that most vehicles request. However, it is challenging [...] Read more.
The Internet of Vehicles (IoV) enables vehicular data services and applications through vehicle-to-everything (V2X) communications. One of the key services provided by IoV is popular content distribution (PCD), which aims to quickly deliver popular content that most vehicles request. However, it is challenging for vehicles to receive the complete popular content from roadside units (RSUs) due to their mobility and the RSUs’ constrained coverage. The collaboration of vehicles via vehicle-to-vehicle (V2V) communications is an effective solution to assist more vehicles to obtain the entire popular content at a lower time cost. To this end, we propose a multi-agent deep reinforcement learning (MADRL)-based popular content distribution scheme in vehicular networks, where each vehicle deploys an MADRL agent that learns to choose the appropriate data transmission policy. To reduce the complexity of the MADRL-based algorithm, a vehicle clustering algorithm based on spectral clustering is provided to divide all vehicles in the V2V phase into groups, so that only vehicles within the same group exchange data. Then the multi-agent proximal policy optimization (MAPPO) algorithm is used to train the agent. We introduce the self-attention mechanism when constructing the neural network for the MADRL to help the agent accurately represent the environment and make decisions. Furthermore, the invalid action masking technique is utilized to prevent the agent from taking invalid actions, accelerating the training process of the agent. Finally, experimental results are shown and a comprehensive comparison is provided, which demonstrates that our MADRL-PCD scheme outperforms both the coalition game-based scheme and the greedy strategy-based scheme, achieving a higher PCD efficiency and a lower transmission delay. Full article
(This article belongs to the Section Information Theory, Probability and Statistics)
Show Figures

Figure 1

19 pages, 8728 KiB  
Article
Cooperative Decision-Making for Mixed Traffic at an Unsignalized Intersection Based on Multi-Agent Reinforcement Learning
by Huanbiao Zhuang, Chaofan Lei, Yuanhang Chen and Xiaojun Tan
Appl. Sci. 2023, 13(8), 5018; https://doi.org/10.3390/app13085018 - 17 Apr 2023
Cited by 6 | Viewed by 2343
Abstract
Despite rapid advances in vehicle intelligence and connectivity, there is still a significant period in mixed traffic where connected, automated vehicles and human-driven vehicles coexist. The behavioral uncertainty of human-driven vehicles makes decision-making a challenging task in an unsignalized intersection scenario. In this [...] Read more.
Despite rapid advances in vehicle intelligence and connectivity, there is still a significant period in mixed traffic where connected, automated vehicles and human-driven vehicles coexist. The behavioral uncertainty of human-driven vehicles makes decision-making a challenging task in an unsignalized intersection scenario. In this paper, a decentralized multi-agent proximal policy optimization (MAPPO) based on an attention representations algorithm (Attn-MAPPO) was developed to make joint decisions at an intersection to avoid collisions and cross the intersection effectively. To implement this framework, by exploiting the shared information, the system was modeled as a model-free, fully cooperative, multi-agent system. The vehicle employed an attention module to extract the most valuable information from its neighbors. Based on the observation and traffic rules, a joint policy was identified to work more cooperatively based on the trajectory prediction of all the vehicles. To facilitate the collaboration between the vehicles, a weighted reward assignment scheme was proposed to focus more on the vehicles approaching intersections. The results presented the advantages of the Attn-MAPPO framework and validated the effectiveness of the designed reward function. Ultimately, the comparative experiments were conducted to demonstrate that the proposed approach was more adaptive and generalized than the heuristic rule-based model, which revealed its great potential for reinforcement learning in the decision-making of autonomous driving. Full article
(This article belongs to the Special Issue Autonomous Vehicles: Technology and Application)
Show Figures

Figure 1

18 pages, 686 KiB  
Article
Computational Offloading for MEC Networks with Energy Harvesting: A Hierarchical Multi-Agent Reinforcement Learning Approach
by Yu Sun and Qijie He
Electronics 2023, 12(6), 1304; https://doi.org/10.3390/electronics12061304 - 9 Mar 2023
Cited by 9 | Viewed by 2333
Abstract
Multi-access edge computing (MEC) is a novel computing paradigm that leverages nearby MEC servers to augment the computational capabilities of users with limited computational resources. In this paper, we investigate the computational offloading problem in multi-user multi-server MEC systems with energy harvesting, aiming [...] Read more.
Multi-access edge computing (MEC) is a novel computing paradigm that leverages nearby MEC servers to augment the computational capabilities of users with limited computational resources. In this paper, we investigate the computational offloading problem in multi-user multi-server MEC systems with energy harvesting, aiming to minimize both system latency and energy consumption by optimizing task offload location selection and task offload ratio.We propose a hierarchical computational offloading strategy based on multi-agent reinforcement learning (MARL). The proposed strategy decomposes the computational offloading problem into two sub-problems: a high-level task offloading location selection problem and a low-level task offloading ratio problem. The complexity of the problem is reduced by decoupling. To address these sub-problems, we propose a computational offloading framework based on multi-agent proximal policy optimization (MAPPO), where each agent generates actions based on its observed private state to avoid the problem of action space explosion due to the increasing number of user devices. Simulation results show that the proposed HDMAPPO strategy outperforms other baseline algorithms in terms of average task latency, energy consumption, and discard rate. Full article
(This article belongs to the Special Issue AI for Edge Computing)
Show Figures

Figure 1

30 pages, 8215 KiB  
Article
An LEO Constellation Early Warning System Decision-Making Method Based on Hierarchical Reinforcement Learning
by Yu Cheng, Cheng Wei, Shengxin Sun, Bindi You and Yang Zhao
Sensors 2023, 23(4), 2225; https://doi.org/10.3390/s23042225 - 16 Feb 2023
Cited by 2 | Viewed by 2072
Abstract
The cooperative positioning problem of hypersonic vehicles regarding LEO constellations is the focus of this research study on space-based early warning systems. A hypersonic vehicle is highly maneuverable, and its trajectory is uncertain. New challenges are posed for the cooperative positioning capability of [...] Read more.
The cooperative positioning problem of hypersonic vehicles regarding LEO constellations is the focus of this research study on space-based early warning systems. A hypersonic vehicle is highly maneuverable, and its trajectory is uncertain. New challenges are posed for the cooperative positioning capability of the constellation. In recent years, breakthroughs in artificial intelligence technology have provided new avenues for collaborative multi-satellite intelligent autonomous decision-making technology. This paper addresses the problem of multi-satellite cooperative geometric positioning for hypersonic glide vehicles (HGVs) by the LEO-constellation-tracking system. To exploit the inherent advantages of hierarchical reinforcement learning in intelligent decision making while satisfying the constraints of cooperative observations, an autonomous intelligent decision-making algorithm for satellites that incorporates a hierarchical proximal policy optimization with random hill climbing (MAPPO-RHC) is designed. On the one hand, hierarchical decision making is used to reduce the solution space; on the other hand, it is used to maximize the global reward and to uniformly distribute satellite resources. The single-satellite local search method improves the capability of the decision-making algorithm to search the solution space based on the decision-making results of the hierarchical proximal policy-optimization algorithm, combining both random hill climbing and heuristic methods. Finally, the MAPPO-RHC algorithm’s coverage and positioning accuracy performance is simulated and analyzed in two different scenarios and compared with four intelligent satellite decision-making algorithms that have been studied in recent years. From the simulation results, the decision-making results of the MAPPO-RHC algorithm can obtain more balanced resource allocations and higher geometric positioning accuracy. Thus, it is concluded that the MAPPO-RHC algorithm provides a feasible solution for the real-time decision-making problem of the LEO constellation early warning system. Full article
Show Figures

Figure 1

19 pages, 5286 KiB  
Article
A Multi-UCAV Cooperative Decision-Making Method Based on an MAPPO Algorithm for Beyond-Visual-Range Air Combat
by Xiaoxiong Liu, Yi Yin, Yuzhan Su and Ruichen Ming
Aerospace 2022, 9(10), 563; https://doi.org/10.3390/aerospace9100563 - 28 Sep 2022
Cited by 14 | Viewed by 3684
Abstract
To solve the problems of autonomous decision making and the cooperative operation of multiple unmanned combat aerial vehicles (UCAVs) in beyond-visual-range air combat, this paper proposes an air combat decision-making method that is based on a multi-agent proximal policy optimization (MAPPO) algorithm. Firstly, [...] Read more.
To solve the problems of autonomous decision making and the cooperative operation of multiple unmanned combat aerial vehicles (UCAVs) in beyond-visual-range air combat, this paper proposes an air combat decision-making method that is based on a multi-agent proximal policy optimization (MAPPO) algorithm. Firstly, the model of the unmanned combat aircraft is established on the simulation platform, and the corresponding maneuver library is designed. In order to simulate the real beyond-visual-range air combat, the missile attack area model is established, and the probability of damage occurring is given according to both the enemy and us. Secondly, to overcome the sparse return problem of traditional reinforcement learning, according to the angle, speed, altitude, distance of the unmanned combat aircraft, and the damage of the missile attack area, this paper designs a comprehensive reward function. Finally, the idea of centralized training and distributed implementation is adopted to improve the decision-making ability of the unmanned combat aircraft and improve the training efficiency of the algorithm. The simulation results show that this algorithm can carry out a multi-aircraft air combat confrontation drill, form new tactical decisions in the drill process, and provide new ideas for multi-UCAV air combat. Full article
(This article belongs to the Special Issue Artificial Intelligence in Drone Applications)
Show Figures

Figure 1

Back to TopTop