Under construction! With more updates coming soon.

Our goal is to describe

  • What does it mean for a car to be self driving?
  • What sensors are used in self driving cars?
  • What are the software components: controller, perception, occupancy grid map, planner, maps, simulation.
  • What open source software tools are available to help in development?
  • What does ‘real time’ and ‘safety critical’ mean, for self driving cars?
  • What are the engineering challenges?
  • How does self driving compare to robotics in general?

  • The components of a self driving car
  • The engineering needed to build it, incrementally, and component wise. The best technology is built by evolution of simpler components, from basic to complex.
  • The skills needed to build each component
  • How to find information about building the various components. Some good books describing general robotics or machine learning are available (Probabilistic Robots, and Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow). However, no book seems to exist in print at this time that will describe the full technological stack of a self driving car, from start to end.

At a glance

  • A self driving car has planner, controller, and computer vision sensors
  • [Insert architecture diagram]
    • The planner computes the path the car should follow. For example: drive along the 1st lane, stop at the intersection, wait for crossing traffic, then turn left
    • The controller will set the motors and actuators - i.e., set the gear to forward, set the speed, set the steering, or break
    • The controller will report back the sensor data - i.e., gps position (car is equipped typically with dual gps, to get both position and direction. Also, current time, steering angle, speed, gear.
      • Controller is equipped with some GPS-enhancing sensors, e.g. gyro and IMU, allowing it to estimate position when GPS is temporarily not available, e.g. beause the car is in a tunnel. This allows it to interpolate a more precise GPS
    • The GPS tick from the satellite comes at 1Hz (1pps). The controller needs to report sensor data to perception and to the planner at higer frequency. This could be anywhere between 50Hz to 5Hz.
  • Frequency considerations
    • If the frequency is high, the system will do obstacle avoidance in a timely fashion.
    • However, the compute power necessary to handle high frequency is proportional to the frequency. Higher frequency means, proportionally - more CPU power, and larger FPGA (or FPGAs).
  • The planner <-> controller cycle runs in a continuous loop, at high frequency.
  • However, vision sensors can’t maintain the same high frequency, due to the compute requirements.
  • Vision sensors will use a higher data bandwith, and will run at slower frequency, typically 5-50Hz (but closer to the low end of the range).
  • Vision sensors include one or more cameras, lidars, radars…
    • [Insert diagram here of vision sensors]
  • Data flow for vision is vision_sensor->planner.
    • This is b/c vision sensors are passive sensors.
  • That being said, vision sensor data needs to be cleaned up, and synchronized. Ground points need to be marked.
  • Vision sensor data is not passed directly to the planner. In between, we have the detection nodes. The following types of detections are performed:
    • Auto-calibration of the sensors.
    • Accummulated occupancy grid. This is used by the planner to determine the drivable area. The occupancy grid is discretized into voxels. Voxels that are occupied multiple frames are accummulated, and determine the accummulated occupancy grid.
    • Synchronization of multiple sensors - so data can be processed in a synchronized fashion, not in interspersed chunks.
    • Lidar motion compensation, for rotational lidars - in case it is not performed by the lidar itself.
    • Segmentation - including ground segmentation.
    • SLAM - simultaneous localization and mapping
      • This has a different, simpler implementation when the vehicle moves in a closed circuit - e.g., on a warehouse floor, on a public transportation route, in a well-known campus, or in a parking lot.
      • If an over the road vehicle, the map infrastructure is a lot more complicated, and needs to be dynamically updated, requiring extra back-end infrastructure.
      • SLAM is an industry term for the algorithm designed to improve the quality of localization (GPS) by fusing it with visual sensor data about known static obstacles on the map.
    • Static object detection - specialized for the types of static objects. For example: traffic signage. Parked vehicles. Treating static objects separately in the detection pipeline allows for more efficient algorithms specialized for static objects.
    • Dynamic object detection, incluing pedestrian detection.
    • Object tracking. Object prediction.
      • Sometimes, this is implemented as a single deep learning layer fused with the object detection layer.
      • Other times, this is implemented as separate layers.

Robotics kinematics vs dynamics

  • Robot kinematics and dynamics are two fundamental aspects of robotics that deal with different aspects of robot motion. Here’s a brief explanation of each:
  • Robot Kinematics:
    • Robot kinematics is the branch of robotics that focuses on the study of robot motion without considering the forces and torques involved. It involves analyzing the geometry and motion of a robot’s structure, such as its joints, links, and end effectors, to determine the position, orientation, and velocity of the robot’s various parts.
      • Forward kinematics: It deals with determining the position and orientation of the robot’s end effector (e.g., gripper or tool) given the joint angles or joint displacements. It answers the question, “Where is the end effector located and oriented in the workspace?”
      • Inverse kinematics: It involves solving for the joint angles or joint displacements that will position the robot’s end effector at a desired location and orientation in the workspace. It answers the question, “What joint angles are required to achieve a specific end effector position and orientation?”
    • Robot Dynamics:
      • Robot dynamics, on the other hand, is concerned with the study of robot motion while considering the forces and torques acting on the robot. It involves understanding how these forces and torques affect the motion of the robot’s joints and links, as well as the resulting motion of the end effector.
      • Inverse dynamics: It deals with determining the forces and torques required at the robot’s joints to generate a desired motion or trajectory. It answers the question, “What forces and torques should be applied at the joints to achieve a desired robot motion?”
      • Forward dynamics`: It involves predicting the resulting motion of the robot’s joints and links when specific forces and torques are applied at the joints. It answers the question, “Given applied forces and torques at the joints, how will the robot move?”
    • Robot dynamics is particularly important for tasks such as robot control, trajectory planning, and collision avoidance, as it allows for the prediction and control of a robot’s motion while considering the physical constraints and interactions with the environment.
  • Both robot kinematics and dynamics play important roles in the development and operation of self-driving cars. Here’s how they are relevant:
    • Robot Kinematics:
      • Robot kinematics is crucial for self-driving cars in terms of understanding the vehicle’s position, orientation, and motion in the environment. It involves determining the vehicle’s pose (position and orientation) relative to a reference frame and its corresponding motion parameters, such as velocity and acceleration.
    • For self-driving cars, kinematics is employed in tasks such as:
      • Localization: Determining the vehicle’s position and orientation in a known map or global coordinate system.
      • Mapping: Creating and updating maps of the environment using sensor data and kinematic information.
      • Path Planning: Calculating optimal paths or trajectories for the vehicle to follow to reach a destination while considering constraints and avoiding obstacles.
      • Motion Control: Controlling the vehicle’s steering, acceleration, and braking based on kinematic models and desired trajectories.
    • Robot Dynamics:
      • Robot dynamics becomes relevant in self-driving cars when considering the interaction of the vehicle with the physical world and the forces and torques involved. While kinematics focuses on the motion itself, dynamics deals with the forces and torques required to achieve and maintain that motion.
    • In the context of self-driving cars, dynamics is important for:
      • Vehicle Control: Determining the appropriate forces and torques to be applied to the steering, braking, and acceleration systems for maintaining stability, traction, and maneuverability.
      • Collision Avoidance: Predicting the dynamic behavior of other vehicles, pedestrians, and objects in the environment to plan and execute evasive maneuvers if necessary.
      • Ride Comfort and Safety: Analyzing the effects of vehicle dynamics on passenger comfort, stability, and safety during different driving conditions, such as cornering, braking, and acceleration.

Development cycle

  • The underlying robotics platform needs to support individual developers who specialize in specific components: planner, controller, various perception components.
  • Each component needs to be able to run independently, in unit testing mode - with inputs replayed from recordings, so developers can do their work without having to bring the entire system up
  • Here is where Robot OS (ROS) comes in handy. It is basically a pub/sub system where each node is an independent process, and messages between nodes can be recorded to disk, and can be replayed.
  • Developers can pick a ROS node, start it manually, replay the input messages, and do development to ensure that the output messages are working as expected
  • ROS, however, is not able to do deterministic replays. Question: is there a similar middleware that can accomplish that?
  • Also, ROS is not safety-critical, and not even real time.
    • Real-time means that it can respond to a stimulus witin a very short, quantifiable time interval - so upper bounds for latencies in processing can be estimated from components to the entire system
    • Safety critical means that, additionally, the failure rate is very small, and quantifiable - so failure rates can be estimated from components to the entire system
  • One advantage of ROS is that nodes can be either C++, or Python. The overall system has some nodes implemented in Python, and some in C++. As development gets closer to production, all nodes need to be moved to C++ - and, ideally, multiple nodes need to be merged into one, to achieve higher performance.
  • Whatever the middleware may be, an orchestrator is used to control the start/stop of each component, as well as monitoring their health

Transforms

  • Different algorithms in the system - or different components - are more efficiently implemented in their own coordinate system
  • For example, in a robot arm:
    • Each moving part of the arm can be normalized to view the world through its independent coordinate system.
    • Then, converting from transforms of adjointed parts constituting the arm, we get a dynamic coordinate transform.
    • Composing these dynamic coordinate transforms gives coordinate transforms between any two moving parts of the robot arm
  • In physics, these are called rigid-body transforms
  • In math, these are 3D affine transformations
    • Given by a translation (dx, dy, dz) followed by a rotation (roll, pitch, yaw).
      • The above 3D rotation coordinates are called Euler rotation coordinates
      • The 3D rotation coordinates can also be expressed as quaternions q = (w, x, y, z))
      • Quaternions have several advantages over other representations, such as Euler angles, when it comes to representing rotations in robotics. They avoid the problem of gimbal lock, provide efficient interpolation between orientations, and have a compact representation.
  • ROS has great documentation in regards to transforms. But non-ROS systems also can adopt the same transform architecture.

Simulation

  • The system is difficult to test each time in its entirety. It’s not mere equipment on a rack in a lab - so different testing techniques are necessary
  • Simulation is run either as component-in-the-loop, software-in-the-loop, hardware-in-the-loop

Back-end cloud system

  • Recordings are uploaded to cloud blob storage
    • Could be just the sensor data - or the full recordings
    • This could result in tens of terabytes of recordings per vehicle per day.
    • Upload is either automated - or manual
  • These recordings become input to the Data Lake, where recordings are processed for
    • Offline segmentation and object detection annotations
    • Offline perception
  • Data needs to be curated into scenes, which are sequences of frames

What are the components of a self driving car?

The architecture discussed here is the modular architecture, as opposed to the end to end architecture described for example here

End-to-End Architecture