VeReMi NextGen

Code Dataset License

VeReMi NextGen is a dataset for evaluating Misbehavior Detection Systems (MDSs) in Vehicular Ad hoc Networks (VANETs). It provides simulated V2X message logs with ground-truth labels and a broad set of data manipulation attacks. Compared to previous VeReMi datasets, VeReMi NextGen introduces more realistic traffic scenarios, urban and highway environments, multiple driver profiles, sensor error models, predefined training/validation/test sets, and a broader range of attack types. VeReMi NextGen is part of an accpeted paper, submitted to the Vehicular Networking Conference (VNC) 2026.

This website provides a brief overview of the VeReMi NextGen dataset. For more detailed documentation, please refer to the documentation section of our GitHub repository.

Download

The code for reproducing the dataset, including a Docker container, is available in a GitHub repository. The VeReMi NextGen dataset itself is provided on Zenodo. The corresponding links are listed below:

Cite This Work

If you are using our dataset, please use the following citation:

@inproceedings{Hermann2026vereminextgen,
  author    = {Hermann, Artur and Remmers, Jan-Niklas and Eisermann, Dennis and Erb, Benjamin and Kargl, Frank},
  title     = {VeReMi {NextGen}: A {Dataset} for {Evaluating} {Misbehavior} {Detection} {Systems} in {VANETs}},
  booktitle = {2026 {IEEE} {Vehicular} {Networking} {Conference} ({VNC})},
  date      = {2026-06},
  location  = {Montreal, Canada}
}

Overview

VeReMi NextGen is a simulated dataset which stores the messages received by each simulated vehicle. The dataset targets the evaluation of Misbehavior Detection Systems, which aim to detect incorrect data in authentic V2X messages. VeReMi NextGen consists of:

  • urban and highway scenarios,
  • low- and high-density traffic conditions,
  • three driver profiles,
  • sensor error models,
  • pseudonym changes,
  • 15 attack types,
  • ground-truth labels,
  • predefined training, validation, and test sets,
  • a publicly available dataset generator.

VeReMi NextGen addresses several limitations of previous datasets such as VeReMi and VeReMi Extension.

Feature VeReMi VeReMi Extension VeReMi NextGen
Up-to-date traffic scenario
Multiple driver profiles
Multi-attribute attacks
Urban and highway scenarios
Sensor error models
Received Signal Strength Indicator
Ground-truth labels
Training/validation/test sets
Extensible design
Support for future VRU integration

Simulation Setup

It was generated using MOSAIC (Version 25.0), SUMO (Version 1.22.0), and OMNeT++ (Version 6.1). It was generated using the InTAS traffic scenario. InTAS represents the city of Ingolstadt and provides realistic traffic dynamics, different road types, public transport, traffic lights, and support for Vulnerable Road Users.

The dataset includes four scenario types:

Scenario Environment Density
Urban 2 AM Urban Low density
Urban 7 AM Urban High density
Highway 2 AM Highway Low density
Highway 7 AM Highway High density

Separate geographic areas are used for training/validation and test sets to avoid spatial overlap between model training and evaluation. The used geographic areas are shown in the figure below and have in total 10.2 km²:

Geographic areas used in the InTAS scenario

Figure 1: Geographic areas used for the training/validation and test sets in the InTAS scenario.

Dataset Structure

VeReMi NextGen is organized into training, validation, and test sets. Each set contains urban and highway scenarios with low and high vehicle densities. The complete dataset consists of 180 subsets:

  • 15 attack subsets,
  • 4 scenarios,
  • 3 dataset splits: training, validation, and test.

Each subset is organized as a directory containing one JSON file per receiving vehicle. Each file contains all messages received by that vehicle during the simulation. The dataset provides fixed training, validation, and test sets to support reproducible evaluation of machine-learning-based MDSs.

Scenario Training duration Validation duration Test duration
Urban 2 AM 9000 s 1800 s 7200 s
Urban 7 AM 375 s 75 s 300 s
Highway 2 AM 9000 s 1800 s 7200 s
Highway 7 AM 375 s 75 s 300 s

Attack Types

VeReMi NextGen includes 15 attack types affecting different message attributes.

Category Attack types
Time-related Time Delay Attack
Position-related Constant Position Offset, Random Position Offset, Position Mirroring
Speed-related Constant Speed Offset, Random Speed Offset, Zero Speed Report, Sudden Constant Speed
Heading-related Reversed Heading
Acceleration-related Feigned Braking, Acceleration Multiplication
Multi-parameter Sudden Stop, DoS Attack, Traffic Congestion Sybil, Data Replay

Six of these attacks were newly designed for VeReMi NextGen. The attacks cover a broader range of message attributes than previous datasets, including heading and acceleration.

The configured attacker density is 20%, meaning that 20% of the vehicles act as attackers. Each message contains an attacker field. A value of 1 indicates that the message contains a significant deviation in at least one attribute, while a value of 0 indicates a legitimate message or a deviation below the defined significance threshold.

Reproducing and Extending the Dataset

VeReMi NextGen was generated in two main steps:

  1. Simulation Execution
    The V2X simulation is executed using Eclipse MOSAIC and the InTAS scenario. All received CAMs are collected to create the Baseline dataset.

  2. Post-Processing
    Attacks are integrated into the Baseline dataset. This avoids rerunning expensive simulations for every attack and enables consistent attack generation across all receivers.

We provide a Docker container to create the Baseline dataset, as well as scripts for conducting the post-processing. This enables the reproduction of VeReMi NextGen and supports future extensions of the dataset, such as additional attacks, attributes, scenarios, or entities.

VeReMi NextGen Highlights

Advantage Description
More realistic traffic scenario Uses InTAS instead of LuST, providing more complex road layouts and more realistic traffic dynamics
Urban and highway coverage Includes both urban and highway scenarios with low and high vehicle densities
Heterogeneous driver behavior Introduces normal, cautious, and aggressive driver profiles
Broader attack diversity Provides 15 attack types affecting a wider range of message attributes
New attack types Includes six newly introduced attacks, such as position mirroring, zero speed report, reversed heading, feigned braking, and acceleration multiplication
ML-ready structure Provides predefined training, validation, and test sets
Easier evaluation Includes ground-truth labels directly in each message
Extensible design Provides a public dataset generator for adding new attacks, attributes, or entities
Future VRU support Based on InTAS, which supports Vulnerable Road User simulation and enables future dataset extensions
More challenging benchmark Evaluation results show that attacks in VeReMi NextGen are harder to detect than in VeReMi Extension

Acknowledgement

The dataset was primarily put together by Artur Hermann at the Institute of Distributed Systems, Ulm University.

This work was partially funded by the HORIZON CONNECT project under EU grant agreement no. 101069688 and the ConnRAD project under grant agreement no. 16KISR036.


This site uses Just the Docs, a documentation theme for Jekyll.