The intelligent warehouse mobile robot path planning method includes the following steps: first, the mobile robot is trained in a simulated environment;
a1: Set the target when the mobile robot is moving, random initial target point coordinate position information (xt, yt) and target radius range Rm; xt, yt represent the X, Y axis coordinates of the center of the target point in the static map, Rm represents The square area with the side length of dmin centered on (xt, yt) can be counted as reaching the destination in the area, set the current pose of the mobile robot (x, y, θr), x, y is the current position of the mobile robot Position coordinates, θr is the angle between the mobile robot's real-time movement direction and the X axis, and path planning is performed through the position information (θ, d) of the mobile robot's polar coordinates at the target point, and driving forward at a fixed speed, θ is The angle information of the target point in the polar coordinates of the mobile robot, d is the distance information of the target point from the center of the mobile robot;
a2: During the navigation process, the environmental data Li and the target position data Di detected by the laser sensor on the mobile robot are preprocessed and characterized, and then the environmental data Si is fused;
a3: Use the depth determination gradient strategy method to get the next action state a, a ∈ W represents the deflection angle of the mobile robot when performing the action is within the W range;
a4: determine whether the mobile robot has reached the target point (xt, yt), if not, return to a2 to continue navigation, and end navigation if it has been reached;
a5: After the navigation is completed, update the strategy sub-network in the depth determination gradient strategy method according to the reward value, evaluate the network parameters, and save the network parameters in the depth determination gradient strategy method after the training success rate reaches the target success rate; S2 : The actual environment for mobile robot navigation uses the depth determination gradient strategy method in which network parameters are saved in S1 for mobile robot motion selection.