MARG

MARG : MAstering Risky Gap Terrains for Legged Robots with Elevation Mapping

The University of Hong Kong
^*Equal Contribution
^†Corresponding author

Abstract

Deep Reinforcement Learning (DRL) controllers for quadrupedal locomotion have demonstrated impressive performance on challenging terrains, allowing robots to execute complex skills such as climbing, running, and jumping. However, existing blind controllers often struggle to ensure safety and efficient traversal through risky gap terrains, which are typically highly complex, requiring robots to perceive terrain information and select appropriate footholds during locomotion accurately. Meanwhile, existing perception-based controllers still present several practical limitations, including a complex multi-sensor deployment system and expensive computing resource requirements. This paper proposes a DRL controller named MAstering Risky Gap Terrains (MARG), which integrates terrain maps and proprioception to dynamically adjust the action and enhance the robot's stability in these tasks. During the training phase, our controller accelerates policy optimization by selectively incorporating privileged information (e.g., center of mass, friction coefficients) that are available in simulation but unmeasurable directly in real-world deployments due to sensor limitations. We also designed three foot-related rewards to encourage the robot to explore safe footholds. More importantly, a terrain map generation (TMG) model is proposed to reduce the drift existing in mapping and provide accurate terrain maps using only one LiDAR, providing a foundation for zero-shot transfer of the learned policy. The experimental results indicate that MARG maintains stability in various risky terrain tasks.

The extreme parkour terrain, including hurdle, step, single plank bridge, ramp, narrow balance beams, and inclined balance beams.

To adapt to a wider range of terrain and increase the difficulty of risky gap terrain, we designed a new parkour terrain, as illustrated in Figure. This designed terrain encompasses 60 cm high hurdles, 77 cm high steps, a 10 cm wide single plank bridge, 43.5° ramps, 5 cm wide narrow balance beams, and 5 cm wide inclined balance beams. We concurrently train 4096 Unitree Go2 robots on these new terrains, and the video vividly showcases the final results, providing compelling evidence that our controller can successfully traverse these extreme terrains.

The average rewards of six controllers on 4096 Unitree Go1 robots in conventional challenge terrain over 6000 episodes. Each curve's shaded region represents the standard deviation of reward values across 5 different random seeds, indicating the uncertainty in the results.

We generated a traditional challenging terrain, as shown in Figure inspired by the state-of-the-art quadrupedal locomotion controllers, including (MorAL , Vanilla PPO, Concurrent , RMA , and DreamWaQ ). We also utilized a game-inspired curriculum to ensure progressive locomotion policy learning over challenging terrains. As shown in Figure, the performance of these six controllers is highly consistent with that of the risky task. For example, MARG still exhibits the best performance over challenging terrains. MorAL and DreamWaQ exhibit better average rewards than Concurrent, Vanilla PPO, and RMA. Overall, these experiments demonstrate that MARG has significant advantages, whether for risky or conventional challenging terrain. This is attributed to MARG not only acquiring the explicit estimation but also the terrain map $\boldsymbol{h}_{t}$ surrounding the robot, which facilitates the policy to reason about the robot's states.

Success Rate of MARG in Risky Terrain.

We deployed MARG onto the Unitree Go2 robot and conducted multiple tests on various gap terrains. For each test, we recorded whether the robot successfully traversed the terrain, and the results are presented in table, which clearly shows the number of tests, the number of successful traversals, and the calculated success rate for each terrain type. For instance, for the outdoor Single gap, we conducted 5 tests, with 4 successful traversals, achieving a success rate of approximately 80%. This data vividly reflects the adaptability and stability of the MARG in dealing with such gap terrains. However, for some extremely complex terrains like the 9 cm narrow balance beams, the success rate was relatively low, indicating that there is still room for improvement in the MARG algorithm. $$ \bar{v}^n_i = \frac{1}{n} \sum_{t=1}^{n} (\hat{v}_{i,t} - v_{i,t})^2 $$ where $i=1,2,3$ denotes different random seeds; $\bar{v}^n_i$, $\hat{v}_{i,t}$, and $v_{i,t}$ represent the average squared error of the seed $i$ over $n$ time steps, the estimated linear velocity, and the real linear velocity, respectively. To evaluate the estimators' performance and robustness of different algorithms, we further calculated the mean $\bar{V}$ and standard deviation $\delta$ of the three seeds means: $$ \bar{V} =\frac{1}{3}\sum_{i = 1}^{3}\bar{v}^n_i; \delta = \sqrt{\frac{1}{2}\sum_{i = 1}^{3}(\bar{v}^n_i-\bar{V})^2} $$ The results are summarized in Table, which presents the average squared linear velocity errors and their standard deviations for each algorithm at different time steps. Notably, the MARG algorithm consistently demonstrates the smallest error and the lowest standard deviation across all tested steps. This performance highlights that MARG's estimator net achieves exceptional precision and robustness in gap terrain scenarios.

MARG : MAstering Risky Gap Terrains for Legged Robots with Elevation Mapping

Abstract

Extreme Terrain

The extreme parkour terrain, including hurdle, step, single plank bridge, ramp, narrow balance beams, and inclined balance beams.

Conventional Terrain

The conventional challenge terrain of 10 different levels, including slope, pyramid slope, stairs down, stairs up, and discrete obstacles.

The average rewards of six controllers on 4096 Unitree Go1 robots in conventional challenge terrain over 6000 episodes. Each curve's shaded region represents the standard deviation of reward values across 5 different random seeds, indicating the uncertainty in the results.

Success Rate

Real-world tests of marg on Unitree Go2 in complex terrains, including narrow balance beams, outdoor single gap, narrow ditch, soft step, single plank bridge, indoor single gap, drainage ditch, and step.

Success Rate of MARG in Risky Terrain.

Performance over gap terrains