Learning Human-Like Badminton Skills for Humanoid Robots

Realizing versatile and human-like performance in high-demand sports like badminton remains a formidable challenge for humanoid robotics. Unlike standard locomotion or static manipulation, this task demands a seamless integration of explosive whole-body coordination and precise, timing-critical interception. While recent advances have achieved lifelike motion mimicry, bridging the gap between kinematic imitation and functional, physics-aware striking without compromising stylistic naturalness is non-trivial. To address this, we propose Imitation-to-Interaction, a progressive reinforcement learning framework designed to evolve a robot from a “mimic” to a capable “striker.” Our approach establishes a robust motor prior from human data, distills it into a compact, model-based state representation, and stabilizes dynamics via adversarial priors. Crucially, to overcome the sparsity of expert demonstrations, we introduce a manifold expansion strategy that generalizes discrete strike points into a dense interaction volume. We validate our framework through the mastery of diverse skills, including lifts and drop shots, in simulation. Furthermore, we demonstrate the first zero-shot sim-to-real transfer of anthropomorphic badminton skills to a humanoid robot, successfully replicating the kinetic elegance and functional precision of human athletes in the physical world.

Overview of the Framework. The pipeline progressively transforms a kinematic imitator into a dynamic striker through four stages:

(Stage 1) Imitation: A teacher policy learns to robustly track human motions from MoCap data using proprioceptive (blue) and imitation goal (green) observations.

(Stage 2) Distillation: The teacher's capabilities are distilled into a student policy via DAgger. The student operates on a reduced observation space consisting of proprioception, task goals (yellow: target hit/recovery states), and time-to-hit (red), removing dependency on future motion trajectories.

(Stage 3) Stabilization: The student policy is fine-tuned using RL with an AMP discriminator to enforce stylistic plausibility (Style Reward) while minimizing tracking errors, stabilizing the motion against drift.

(Stage 4) Interaction: In the final physics-interactive environment, the policy undergoes refinement with simulated shuttlecock dynamics, generalizing to a dense spatio-temporal manifold to achieve precise, agile striking.

We capture expert badminton motions and employ an optimization-based retargeting scheme to physically map them onto the robot.

Diverse Badminton Skills Learned via the Proposed Framework. The humanoid masters distinct striking techniques including backhand lifts, forehand lifts, and drop shots, with pink dots visualizing the shuttlecock trajectory.

Learning Human-Like Badminton Skills
for Humanoid Robots

Learning Human-Like Badminton Skills for Humanoid Robots

Abstract

Methodology

Motion Data Processing & Retargeting

Diverse Badminton Skills Learned via the Proposed Framework

Real-World Experiments

Learning Human-Like Badminton Skillsfor Humanoid Robots

Learning Human-Like Badminton Skills for Humanoid Robots

Abstract

Methodology

Motion Data Processing & Retargeting

Diverse Badminton Skills Learned via the Proposed Framework

Real-World Experiments

Learning Human-Like Badminton Skills
for Humanoid Robots