Tuesday, July 9, 2019

How to Build a Data Pipeline for Autonomous Driving

This time around I wish to dig into methods to leverage the information engineering and knowledge science technologies I’ve been discussing to resolve autonomous driving challenges. I’ll explore methods for gathering data from make sure survey vehicles and the ways to build appropriate data pipelines to fulfill data needs through the process.

Autonomous Driving Challenges


A lot of companies happen to be offering more and more sophisticated advanced driver assistance systems (ADAS) as walking gemstones toward Level 4 autonomy and beyond. If you are not completely acquainted with the numerous players already competing within the self-driving space, Bloomberg includes a recent summary.

Autonomous vehicle (Audio-video) development projects face significant data challenges. Each vehicle deployed for R&D generates a mountain of information:

  • How can you produce a pipeline to maneuver data efficiently from vehicles within the field for your training cluster to coach deep neural systems?
  • How can you efficiently prepare image along with other sensor data and label (annotate) data for DNN training?
  • Just how much storage and compute will you have to train your neural systems? When your training cluster perform-premises or perhaps in the cloud?
  • How can you properly size infrastructure for the data pipelines and training clusters including storage needs, network bandwidth, and compute capacity?
  • The other data flows must you consider?


Data Pipeline for Autonomous Vehicle Development


An autonomous vehicle development program has numerous components, each with unique data management needs. The amount and variety of information creates unique challenges in most areas. This describes a few of the specific data and computing challenges in many key areas:

  • Data collection from test vehicles with full sensor suites
  • Training DNNs using labeled data produced from test vehicles
  • Simulation to check the performance of DNNs and also to create additional training data
  • Mapping to produce detailed representations of physical environments


Data Collection from Test Vehicles


Throughout the data collection process, data should be ingested from each test vehicle within the fleet. The quantity of data you really collect per vehicle will be different based on your sensor suite.

Guideline: Arrange for 1-5TB each hour per vehicle during initial training increase your plan while you receive actual results.

You might find that data collection from test vehicles falls into two phases:

  1. Initial training. If you're training DNNs on your own, you will have to collect all driving data out of your test cars.
  2. Transfer learning. When your DNNs begin to be effective, you might only want or need to gather data from situations in which the test cars don’t succeed or where safety motorists seize control.


During initial learning particular, it’s unlikely that you can transmit data from each vehicle over cellular systems due to both bandwidth limitations and price. It’s much more likely that you will keep data on every vehicle and download it periodically once the vehicle reaches a garage or depot.

This involves data storage infrastructure in each test vehicle as well as in each depot location. As the test fleet expands to various metropolitan areas, you may want to add hub locations to aggregate data for every city. Because there isn’t any single one-size-fits-all solution, NetApp offers a variety of choices to address data collection from test cars, including:

  • In-vehicle ruggedized data collector solutions
  • Storage options enhanced for garage and hub locations to allow your Audio-video operations to scale
  • NetApp cloud services for near-the-cloud as well as in-the-cloud storage to aid both cloud consume and burst-to-cloud needs
  • Data mule solutions for bulk data transport to beat network limitation


NetApp solutions scale to satisfy your capacity and that iOrTo performance needs as the program scales from petabytes to exabytes.

Aggregating Data


As information is collected from test vehicles, it’s typical to aggregate it right into a data lake, in both an information center or perhaps in the cloud (or both). An information lake typically takes the type of a Hadoop deployment with HDFS, an item store, or perhaps a file store.

An incorrectly implemented data lake may become a bottleneck as data builds up, so it’s vital that you give this consideration. Due to the quantity of data being collected, the present best practice for autonomous vehicle development programs is to achieve the data lake and training cluster on-premises, possibly with a few areas of the work within the cloud too.

No comments:

Post a Comment