We are a pioneering robotics company developing general-purpose autonomous systems that seamlessly integrate perception, reasoning, and physical interaction. Our flagship platform combines large-scale vision-language models with real-time robotic control to enable robots to understand natural language instructions, interpret complex visual scenes, and execute precise actions in unstructured environments.
Key Responsibilities
• Model Architecture & Integration
◦ Design and train VLA models that fuse vision encoders (e.g., CLIP, DINOv2, SigLIP), language backbones (e.g., LLaMA-3, Qwen), and action heads (e.g., diffusion policies, ACT, RT-2 style tokenizers).
◦ Implement efficient inference pipelines for real-time operation on edge hardware (e.g., NVIDIA Jetson, custom ASIC/FPGA accelerators).
• Data Engine & Synthetic Pipelines
◦ Build large-scale datasets combining real robot trajectories, teleoperated demos, and simulated data (Isaac Sim, Mujoco, PyBullet).
◦ Develop data augmentation strategies: video infilling, language rephrasing, 3D scene randomization, and cross-embodiment transfer.
• Training & Scaling
◦ Scale pretraining on internet-scale video + text + action triples (e.g., Ego4D, Something-Something, BridgeData).
◦ Fine-tune with reinforcement learning (RLHF, PPO, DPO) and imitation learning (BC, IQL) using reward models for affordance, safety, and task success.
• Robot Deployment & Evaluation
◦ Deploy VLA policies on physical platforms (manipulators, mobile manipulators, humanoids).
◦ Create standardized benchmarks: zero-shot generalization, long-horizon task completion, sim-to-real gap, language ambiguity resolution.
• Safety & Reliability
◦ Implement guardrails: force/torque monitoring, collision prediction, verbal clarification loops, and reversible action primitives.
◦ Conduct failure mode analysis and iterative red-teaming.
Required Qualifications
• MS/PhD in Computer Science, Robotics, or related field (or equivalent experience).
• Experience in one or more of:
◦ Training/inference of vision-language models (CLIP, FLAVA, LLaVA, etc.).
◦ Robot learning from demonstration (ACT, Diffusion Policy, RT-1/2).
◦ Reinforcement learning for manipulation (DrQ-v2, R3M, VIP).
• Strong Python; proficient in PyTorch/JAX.
• Hands-on experience with ROS2, Docker, and real robot hardware.
Preferred Skills
• Published work at CoRL, RSS, ICRA, NeurIPS, or CVPR on VLA, robot foundation models, or multimodal learning.
• Experience with 3D vision (NeRF, Gaussian splatting, point cloud processing).
• Low-level control (impedance/admittance control, MPC, dynamics modeling).
• Distributed training on GPU clusters; familiarity with SLURM, Ray, or Kubernetes.