![]() ![]() ![]() In this paper, a decision-making strategy for close-range air combat based on reinforcement learning with variable-scale actions is proposed the actions are the variable-scale virtual pursuit angles and speeds. The current research into decision-making strategies for air combat focuses on the performance of algorithms, while the selection of actions is often ignored, and the actions are often fixed in amplitude and limited in number in order to improve the convergence efficiency, making the strategy unable to give full play to the maneuverability of the aircraft. Flight results are also presented using micro-UAS own at MIT's Real-time indoor Autonomous Vehicle test ENvironment Simulation results are provided that demonstrate the robustness of the method against an opponent beginning from both off ensive and defensive situations. An accompanying fast and e ffective rollout-based policy extraction method is used to accomplish on-line implementation. The method's success is due to extensive feature development, reward shaping and trajectory sampling. Provides a fast response to a rapidly changing tactical situation, long planning horizons, and good performance without explicit coding of air combat tactics. Optimal policy is given a slight performance advantage. In the version of the problem formulation considered, the aircraft learning the This paper presentsĪ formulation of a level flight, fixed velocity, one-on-one air combat maneuvering problem and an approximate dynamic programming (ADP) approach for computing an efficient approximation of the optimal policy. Successfully carrying out these missions autonomously. Yet, theĬomplexity of some tasks, such as air combat, have precluded UAS from Of the dangerous missions currently own by manned aircraft. These programs, operating with six degrees of freedom and realistic aerodynamic representation for both aircraft, provide a means for objective evaluation of weapons systems and pilot performance.Unmanned Aircraft Systems (UAS) have the potential to perform many The outcome in this situation space is predicted for several trial maneuvers, a value is associated with the outcome of each trial maneuver, and finally, the maneuver with the highest predicted value is executed. Both programs use the same technique which maps the physical situation of the two aircraft into a quantized, abstract situation space. The other program operates in a normal batch processing mode. One program drives one of the interacting aircraft, thus replacing one of the human pilots on the NASA Langley Research Center's Differential Maneuvering Simulator, this in real time. The method develops intelligently interactive maneuvers without relying on human pilot experience. ![]() ![]() Two digital computer programs synthesizing optimal maneuvers in one-on-one air-to-air combat situations are described. ![]()
0 Comments
Leave a Reply. |