Abstract: RaceVLA is the first VLA model specifically designed for racing drones. It processes first-person view (FPV) video streams alongside natural language commands to generate velocity actions (Vx, Vy, Vz) and yaw anglular speed w control signals. This innovative system enables drones to autonomously execute a wide range of flight tasks, including the navigation in novel scenarios in unfamiliar environments. By leveraging a purpose-built training dataset, RaceVLA exhibits robust generalization capabilities.