Paper - Testing Vision-Based Control Systems Using Learnable Evolutionary Algorithms

  • Metadata:
  • Essay:
    • System testing is one of the central points of discussion in the field of software engineering. As software systems become increasingly complex, the need for test cases that provide adequate coverage is naturally increasing. This issue becomes even more prevalent when observing the emerging field of software-assisted or fully autonomous driving. A setting in which the lives of people are potentially in danger calls for rigorous testing of all components involved. However, even though there are extensive tools for the simulation of the various aspects of these vehicular systems, finding suitable test cases inside these is still challenging. Due to many involved variables, the search space of possible inputs is complex and multidimensional, so guidance for test case generation is needed.
    • The paper "Testing vision-Based Control Systems Using Learnable Evolutionary Algorithms" authored by Raja Ben Abdessalem, Shiva Nejati, Lionel C. Briand, and Thomas Stiftler aims to combat the presented issue with the use of search-based methodologies. They focus on testing Advanced Driver Assistance Systems (ADAS) by developing a method of looking for "critical scenarios" (causing an accident). The authors utilize a multi-objective search to guide the discovery of these scenarios and gauge the direction of "critical" regions that contain a higher amount of these vital test cases.
    • In the following paragraphs, I will give a brief overview of their approach and how they evaluated it and will asses the contributions and strengths of the paper. Afterward, I will talk about some areas of possible improvement, highlight future work and will finally provide some personal thoughts on the paper in the context of the course.
    • The proposed approach

    • To accurately present an algorithm that deals with ADAS, the paper provides a formalization of these systems that characterizes ADAS based on their varying variables, objects, and constraints.
    • Building on that, they adapt the popular multi-objective search algorithm NSGA-II to their approach and describe how they adapted the different parts of the evolutionary algorithm.
    • However, the authors conclude that NSGA-II alone is not sufficient for the task at hand for two main reasons. 1)  Although search algorithms like NSGA-II scale well to more complex search spaces, even they struggle when the search space is overly multidimensional and may require additional guidance. 2) The authors aim to generate insightful representations of the critical regions, and the Pareto fronts alone might not be descriptive enough to gauge the properties of the landscape. The presented solution to these issues is to periodically generate decision trees based on the results of multiple NSGA-II iterations. These decision trees represent increasingly fine-grained partitionings/sub-settings of the test scenarios produced by NSGA-II.
    • By focusing on the leaves of the tree that contain more than 50% critical scenarios for the next iterations of the search, their algorithm automatically guides its exploration to the more relevant parts of the search space. Also, the analysis of these decision trees helps further characterize critical regions of the ADAS search space.
    • To summarize the proposed algorithms "NSGA-II-DT": After initializing a population, each iteration consists of multiple phases. First, several NSGA-II search iterations will be performed. After that, the best Pareto front so far will be computed out of the best solutions generated. From that, a decision tree will be built, the critical leaves of which will serve as the population or the next iteration of the algorithm.
    • Evaluation

    • To evaluate their approach, they perform a case study on a specific ADAS, an Automated Emergency Breaking (AEB) System, aimed at preventing collisions with pedestrian. They label scenarios as critical when a pedestrian is in front of the car (distance < 50cm), the AEB detects him with a high probability (>50%), and he is hit with a speed of over 30 km/h. With the case study, their evaluation aims to answer two main questions.
    • At first, they investigate if the decision tree-based approach makes the search more efficient, i.e., if it increases the quality and quantity of critical scenarios found. They compare NSGA-II-DT to vanilla NSGA-II on account of hypervolume, generational distance, and spread over a runtime duration of 24h. NSGA-II-DT beats its counterpart significantly in both SP and HV and manages to create 78% more unique critical test cases on top of that.
    • The second research question evaluates whether their approach helps the search to converge towards better characterizations of critical regions. They show that the critical regions created by their algorithm become increasingly smaller and improve the GoodnesOfFit metric with successive generations. As there is no baseline, they can compare their results only against the initial randomly selected population.
    • Included in the second question was an investigation regarding the helpfulness of their approach for real-world engineers. They conducted interviews with three engineers of IEE after presenting them with visualizations of trees created by the algorithm. The engineers claimed that the results gave them useful insight into the properties of critical regions and could aid in debugging ADAS systems.
    • Contributions & Strengths

    • The paper offers many strongpoints and makes a significant number of contributions to the field of search-based software engineering.
    • The proposed formalism for ADAS does not only help to specify the presented algorithm but can be used independently for further work using ADAS or even as an example when modeling similar systems. Having a (hopefully) unified mathematical definition of such systems could aid the scientific discussion about them. The fact that they aimed to provide a more general version of their algorithm aimed at different ADAS systems rather than just the examined case study enhances the applicability of their research and significantly widens the range of application.
    • Of course, the main contribution made was the proposition of NSGA-II-DT as a way to make the search more efficient and interpretable. As I will describe in the future works section, the tree-based approach the authors took with the algorithm could have further applications and is a solid contribution even outside the field of ADAS. On top of this, I found the presentation of the hybrid search approach in the paper to be surprisingly digestible, considering there are a lot of systems interacting. The sizable discussion about how they optimized their search algorithm via reuse was insightful as well.
    • When it comes to the evaluation, I think focusing on an approach designed for human understandability highlights an important aspect of search. Also, the paper was really cautious when it came to preserving the validity of its evaluation. The researchers adhered to many popular standards and made the results of their assessment easily accessible via public repositories. While this is to be expected of high-level research, I felt it was still noteworthy in a critique.
    • Finally, I think the paper once again proves the vast potential that search-based approaches can have in a software engineering setting. The authors applied a search to a crucial real-world problem and highlighted the strength of metaheuristic in selected environments. A complex and multidimensional search space like ADAS inputs and a high cost of evaluation (execution of the simulations being the main bottleneck) make traditional optimization impossible and present opportunities for search-based approaches to shine. The authors capitalized on this and even presented an approach that offers greater interpretability than common metaheuristics.
    • Shortcomings

    • Even though the paper presented its approach and the evaluation clearly and cohesively, there is still some criticism to give.
    • The experiment design might be one of the more significant weaknesses of the paper.
    • They use hyperparameters that are recommended by common literature. Still, seeing as their hybrid approach differs quite a bit from standard MOEAs with the inclusion of the decision trees, I think some custom tuning would have been appropriate or at least worth exploring.
    • The statement that the algorithm performs 30 search iterations in 24 hours is not useful without specifying the hardware the experiments were performed on. This omission is somewhat common in research papers, but including it would improve the reproducibility by a fair margin.
    • When evaluating for the usefulness of their approach to developers, having a bigger pool of test subjects could have been helpful to underline their findings further. While the small pool of participants is often out of the control of the researchers (limited resources), I think a smarter survey setup could have yielded more interpretable results: The fact that the assistance of the tool is "useful" to developers is not expressive on its own, as the bar to beeing "useful" can be rather low. If they would have asked for comparisons with other development tools the developers frequently use, the notion of how useful the provided insight is for the engineers would have been much clearer.
    • On a completely different note, the paper states that all of the variables included in the fitness function are to be either maximized or minimized. This would open the possibility of utilizing a single-objective search using weighted sums. While the design of the weights might be complicated and the benefit of interpretability might be reduced, the considerable simpler computations compared to NSGA-II could have made it a candidate to identify singular critical scenarios quickly. Like Professor Shin Yoo stated, the team did not consider the usage of a single-objective search, so I think it is fair to say that there could be unexplored potential in the approach.
    • Potential Future Work

    • Minor criticism aside, the paper lays the groundwork for various kinds of future works. The way they designed their algorithm to work on more general ADAS directly opens the possibility for more case studies like the presented one. Their system is instantly applicable to real-world projects, so that case study might very well be an actual testing environment of automotive developers. Secondly, the tree-based approach showed potential in a complex search problem like the one examined in the article, so it is not unthinkable that this hybrid approach could have many other applications for search problems beyond ADAS. Especially in a software engineering context, where a good understanding of the search space is often vital, the resulting trees could substantially aid developers in their decision making.
    • Personal thoughts in the context of AISWE

    • While this might now be a component of a classical paper critique, I would like to share how I perceived the paper in the context of taking the AISWE course.
    • After seeing multi-objective search as a very theoretical concept in the lecture, this paper proposed my first example of the real-world application of MOEA (other than the testing example). The three metrics used to evaluate Pareto fronts were especially insightful, as I learned about a new one (spread) and could see the discussed shortcomings of GD firsthand (it did not show the improvement the other metrics indicated). The notion of "better understanding the search-space with metaheuristics" came up multiple times in class, but this paper really showed how this concept could be used in a real-world setting.
    • Finally, the algorithms we discussed in the lecture were mostly standalone, but this paper highlights the value of looking at possible ways of combining metaheuristics and other concepts from cs (like search trees). When looking at future applications, this will remind me to not only look at how I could formulate the entire issue as a search problem but rather look into how incorporating search with other mechanisms could provide me with suitable solutions for the task at hand.