facebook tracking

Master Thesis - Falsification using Reinforcement Learning

Scroll to content


One of the key challenges in the successful commercial deployment of autonomous driving (AD) technology is to ensure that they are safe and do not malfunction. Assessing their correctness to determine if they behave as intended is an important task, albeit a difficult one and warrants rigorous methods. Falsification is a technique that systematically searches for counterexamples to refute a correctness specification of the system. Reinforcement Learning (RL) is a subfield of machine learning, where an agent interacts with an environment to learn a policy that maximizes the future expected reward. Recent research has shown the feasibility of formulating falsification as an RL problem. This project aims to investigate that feasibility in the context of AD.  

Project Description 

Given an implementation of an AD decision-making algorithm and a correctness specification, the objective of this project is to use RL to learn a policy for an adversarial agent (e.g., a pedestrian, lead vehicle) that could potentially lead to incorrect decision-making in the AD algorithm, thereby violating the specification. The intention is to employ the RL policy for the adversarial agent to generate adversarial behaviors for the safety verification of the AD algorithm. Students are expected to find a suitable RL algorithm (e.g., Q-learning, Deep Q-Network (DQN), Soft Actor Critic (SAC), etc.), define the states, actions, shape the reward function, and learn a policy for the adversarial agent to obtain behaviors that falsify the specification. Then, the solution is evaluated to obtain useful insights on scalability and industrial applicability. In doing so, apart from applying RL to an industrially relevant problem, students will gain insights into ongoing R&D projects in academia and industry.


This thesis is suitable for two students with a background in mathematics, systems and control, computer science, or similar. Knowledge in learning algorithms and strong programming skills (MATLAB/Python) are meritorious.

Further information

Please send in individual applications with CV, motivational letter and grade transcripts. 

Planned start: January 2022, with some flexibility.

Final application date:  30th of November 2021

Duration: 30 ECTS 

For questions regarding the project, please contact: yuvaraj.selvaraj@zenseact.com 

Additional information

  • Remote status

    Flexible remote

Or, know someone who would be a perfect fit? Let them know!

Gothenburg, Sweden

Lindholmspiren 2
417 56 Gothenburg, Sweden Directions View page


Career site by Teamtailor