Use Case: Data-Driven Development

Note

Background: The data-driven development use case covers development processes using large amounts of data, which are not effectively obtainable in real-world settings, thus motivating simulations. For many applications, simulative data can be sufficiently accurate to be integrated into the data-driven development process. This includes training data for machine learning algorithms but also closed-loop reinforcement learning. Potentially interesting data includes raw sensor but also dynamic vehicle data in a wide variety at large scale. Simulations additionally enable data generation beyond the physical limits of vehicle dynamics or sensor configurations. To accumulate large amounts of data, relevant simulation parameters can be automatically sampled along different dimensions. Subsequently, automation and parallelization empower a cost-effective execution of multiple simulations, especially when using already established orchestration tools.

The subsequent demonstration showcases rapid data driven development and specifically addresses the following requirements:

high simulation fidelity
flexibility and containerization
automation and scalability

Getting Started

Requirements and Installation

Important

Make sure that all system requirements are fulfilled. Additionally, a Python installation is required on the host for this use case. We recommend using conda.

Install and activate the conda environment:

conda env install -f env/environment.yml
conda activate carlos_data_driven_development

Alternatively, you can also use Pip:

pip install -r requirements.txt

In the initial demo software-prototyping, we demonstrated the integration of a Function Under Test (FUT) with CARLOS, exploring its capabilities through practical experimentation. While these tests validated the general functionality of our image segmentation module, it became clear that there is considerable potential to improve its performance. Given that this module, like many AD functions, relies heavily on machine learning models trained with specific datasets, the quality and quantity of this training data are crucial.

Permutation-based Data Generation

Given that the specific nature of the data is less critical, the main objective is to generate as much and diverse data as possible. This can be effectively achieved through permutations of the simulation parameters, ensuring both quantity and diversity in the generated dataset.

Run the demo for permutation-based data generation:

# carlos/data-driven-development$
python ./data_generation.py --config data-driven-development-demo-image-segmentation.json

or use the top-level run-demo.sh script:

# carlos$
./run-demo.sh data-driven-development

Data is generated by creating all possible permutations from a set of configuration parameters, managed through a JSON configuration file. This results in different simulation runs in several parameter dimensions, which are simulated in sequence. A comprehensive example configuration is provided within the config folder. While the current implementation is limited to the settings specified below, the provided code is modular and can be easily customized to fit your requirements.

"simulation_configs":
    {
        "permutation_configs":
        {
            "num_executions": 1,
            "sensors_config_files": ["./config/sensors/rgb_segmentation_camera.json"],
            "spawn_point": ["1", "2", "3"],
            "town": ["Town01", "Town10HD"],
            "vehicle_occupancy": ["0.5"],
            "walker_number": ["50"],
            "weather": ["ClearSunset", "WetCloudyNoon"]
        }
    }

This exemplary demo configures 12 different simulation runs, by applying permutations in the CARLA town, the weather settings and the initial spawning point of the ego vehicle. All simulations run for a maximum simulation time of 60s and all relevant image segmentation data topics are recorded in dedicated ROS bags.

Thus, data generation at large scale becomes possible and helps developers to achieve diverse and useful data for any application.

Scenario-based Data Generation

Assuming we improved our model, we are now aiming to evaluate its performance in targeted, real-world scenarios. Hence, we need to generate data in such concrete scenarios, for which the scenario-based data generation feature can be utilized. In this example, we demonstrate how a list of multiple OpenSCENARIO files can be integrated into the data generation pipeline as well to generate data under those specific conditions.

# carlos/data-driven-development$
python ./data_generation.py --config data-driven-development-demo-scenario-execution.json

All scenarios are executed sequentially and data is generated analogous to above. The respective configuration file contains mainly a path or list to specific predefined OpenSCENARIO files:

"simulation_configs":
    {
        "scenario_configs": 
        {
            "execution_number": 1,
            "scenario_files": ["../utils/scenarios/town01.xosc", "../utils/scenarios/town10.xosc"],
            "sensors_config_files": ["./config/sensors/rgb_segmentation_camera.json"]
        }
    }

Following on that initial scenario-based simulation approach, we focus more on the automatic execution and evaluation of scenarios at large scale in the third, automatic testing demo. In addition, a full integration into a CI workflow is provided.

Record Your Own Data

Follow these steps to record your own data:

Specify sensor configuration file(s) to provide sensor data within ROS.
Adjust the parameters in the data pipeline configuration file. A full list of supported parameters is given below.
Start the pipeline with ./data_generation.py --config <your-modified-config-file>.
Observe the recorded ROS 2 bag files for further postprocessing.

You may now adjust the configuration parameters to fit your specific use case. In addition, the pipeline code itself can be updated in the data_generation.py Python file.

Configuration Parameters

The JSON configuration file for the data generation pipeline consists of two main sections: settings and simulation_configs. The settings section specifies general parameters. The simulation_configs section must either contain permutation_configs for permutation-based simulations or scenario_configs for scenario-based simulations.

General Settings (`settings`)

Name	Description	Note	required	default
`max_simulation_time`	Maximum simulation-time duration in seconds of one simulation run before it is terminated		not required	300
`max_real_time`	Maximum real-time duration in seconds of one simulation run before it is terminated		not required	300
`simulation_services`	List of all Docker services which should are executed during a simulation run	Names must match with the service names in docker-compose.yml	required	-
`record_topics`	Dict of ROS 2 topics to be recorded		not required	-
`output_path`	Path for storing generated data		not required	`./data/`

Permutation-based Settings (`permutation_configs`)

Name	Description	Note	required	default
`num_executions`	Number of times a simulation based on a single permutation is executed	Must be an integer	not required	1
`sensors_config_files`	List of sensor configuration files		not required	-
`sensors_config_folder`	List of directories containing sensor configuration files		not required	-
`spawn_point`	List of spawnpoints for the data-generating-vehicle	Only numbers are allowed	required	-
`town`	List of towns for the simulation environment	Town list, Town10 = Town10HD	not required	Town01
`vehicle_number`	List of numbers that spawn a fixed number of vehicles	only used if `vehicle_occupancy` is not set, vehicles are spawned via generate_traffic.py	not required	-
`vehicle_occupancy`	List of numbers between 0 and 1 that spawn vehicles proportionally to the number of available spawn points	vehicles are spawned via generate_traffic.py	not required	-
`weather`	List of weather conditions	Weather conditions list	not required	depends on town, in general "ClearSunset"

Scenario-based Settings

Name	Description	Note	required	default
`num_executions`	Number of times a simulation based on a single permutation is executed	Must be an integer	not required	1
`scenario_files`	List of OpenScenario files (.xosc)		not required	-
`scenario_folder`	List of directories containing OpenScenario files (.xosc)		not required	-
`sensors_config_files`	List of sensor configuration files		not required	-
`sensors_config_folder`	List of directories containing sensor configuration files		not required	-

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Use Case: Data-Driven Development

Getting Started

Requirements and Installation

Permutation-based Data Generation

Scenario-based Data Generation

Record Your Own Data

Configuration Parameters

General Settings (`settings`)

Permutation-based Settings (`permutation_configs`)

Scenario-based Settings

Files

README.md

Latest commit

History

README.md

File metadata and controls

Use Case: Data-Driven Development

Getting Started

Requirements and Installation

Permutation-based Data Generation

Scenario-based Data Generation

Record Your Own Data

Configuration Parameters

General Settings (settings)

Permutation-based Settings (permutation_configs)

Scenario-based Settings

General Settings (`settings`)

Permutation-based Settings (`permutation_configs`)