“Artificial Intelligence and Deep Learning for Autonomous Driving”

Bernhard Nessler ǀ Johannes Kepler Universität

Summary of the lecture of Bernhard Nessler on “Artificial Intelligence and Deep Learning for Autonomous Driving”
during the 11th Grazer Symposium Virtual Vehicle GSVF, organized by VIRTUAL VEHICLE.

JKU is running a project in cooperation with Audi Electronic Ventures to attempt to apply deep learning methods to the area of artificial intelligence (AI) for autonomous driving.

The world of robotic computing and hence that also of autonomous driving is divided into four levels: sensors, perception, planning and control. These levels need to be coordinated with each other as perfectly as possible and operate in a failsafe manner – all objects and the vehicle’s own position must be clearly recognised, leading to a clear path trajectory which is continuously optimised and implemented without error. This is possible today with good lighting and visibility, even for more complex traffic conditions, however it is problematic for more unfavourable boundary conditions such as rain or poor visibility.

So where do the challenges lie? Above all, in the accurate recognition of road borders and the exact driving lane limits, which still cause problems. Even the unequivocal evaluation and categorisation of non-moving and moving objects creates difficulties for the vehicle’s artificial intelligence (AI) in certain situations. Just as for a human, the AI needs to establish facts and make decisions based on them. Current research is therefore mainly focussed on the improvement of route and trajectory planning together with object prediction.

The problem of social interaction:
What does the other person think that I am thinking that they are thinking …

The crux of the matter is the ability to predict human behaviour, since this is directly linked to the AI’s actions. This interaction between very different traffic participants can easily become the source of misunderstanding – how does the pedestrian think the vehicle will react? Which patterns of movement by other traffic participants does the vehicle’s AI register and what does it conclude the subsequent and probable behavioural sequences will be? Have all stakeholders been identified unambiguously and has their intention been correctly inferred? The use of several commonly occurring examples rapidly demonstrates how difficult the generation of a model from a model development perspective is that is able to even approach the complexity of these scenarios.

A safe, successful and accident-free passage must be achieved out of this heterogenous collection of intentions.
Formal rulebooks such as road traffic regulations and informed principles of social interaction provide the basis for decisions and behaviour. If automated vehicles are to be completely immersed into the prevailing world of individual transportation, then the AI needs to communicate and behave as a human.

Artificial intelligence – The path to “self-driver zero”

How can human cognitive performance be transferred to autonomous driving using current AI technology?

The basis for this is provided by a “Deep Neural Network”, whose “Neural Layers” in the form of processors and processor units are able to deliver the unambiguous identification of a perceived image. The aim is to determine how well a network generalises, meaning the extrapolation from a correctly identified image content to an image with similar, or for a human eye, practically identical content. Surprised researchers recently[1] showed how minimally deviating pixel values can result in a previously correctly recognised object such as a school bus, a pyramid or a wild turkey being mistakenly recognised as an ostrich.

From a human perception viewpoint, such mistakes are incomprehensible. But why is this the case? The system can be deliberately deceived if those deviations from the multi-dimensional AI calculation space of possible deviations are selected that lie as close as possible to the desired result, in this case an ostrich. The irritation is that these points do not lie very far from each other.

[1] C. Szegedy – “Intriguing Properties of Neural Networks”

Combating misinterpretation by using “adversarial training”

What is applicable to images and objects can be demonstrated with numbers, digits and the production of artificial images. The situation becomes dramatic however if a manipulated image provokes an action by the AI. For example a completely harmless yet minimally changed road sign could cause the autonomous vehicle to make a wrong decision, or an inconspicuous but slightly modified advertisement could cause emergency braking.

In order to eliminate such sources of error, the AI is exposed to a hostile attack. This “adversarial training” teaches networks to react correctly to such forms of attack. In a further step, the two “hostile” neural networks are pitted against each other. One network produces randomly generated images that the other has to recognise as being correct or false. This “decider” (discriminator) is permitted to learn many correct images and subsequently trains a generator. This image producer then in turn generates increasingly natural images. This mutual training increases the number of similar images and hence dramatically increases the system’s learning base.

Humans and machines learn from mistakes

The large number of examples of optical illusions shows that even humans can make mistakes and misinterpretations, showing that even we can falsely generalise. Of particular relevance is the human sensitivity to context that causes us to be subject to optical illusion. Both humans and machines make mistakes. The problem is that they are different mistakes. For while human mistakes are completely transparent, traceable and understandable to us, we cannot explain how a neural network suddenly mistakes a school bus for an ostrich. Using an artificially trained system whose mistakes cannot be assessed will become a potential source of danger.

However, if two networks learn in opposition to each other, this permanent competition and continuous learning produces reliable results faster, since in this case potential misclassifications by humans are not transferred to the AI network and the AI is trained auto-didactically in an adversarial network. This system can improve itself exponentially.

The JKU/Audi deep learning project

The “Audi /JKU Deep Learning Project” for autonomous driving in cooperation with Nvidia is concerned with the prediction of behaviour based on LSTM technology developed by Professor Hochreiter (see info box). The project also focussed on the targetted selected of those observation areas that are of particular relevance for an automated vehicle. The satisfactory sensor fusion of GPS, Radar and Lidar represents an as yet unaddressed challenge. In addition, particularly seldom and dangerous traffic situations are to be recognised without errors and dealt with.

A huge challenge that remains is the inner understanding of AI systems and their perception of circumstances that still differ dramatically from the way in which humans subjectively perceive their environment and draw conclusions. This is the reason why AI systems today still make mistakes that humans have difficulty in predicting or understanding. Researchers at the JKU under Professor Sepp Hochreiter are therefore working on a way to reduce this difference on both sides: on one side to increase the human understanding of the processes and decision-making process of AI systems, and on the other to make AI systems understand better which mistakes we intend to allow and which we will not.


Long Short-Term Memory (LSTM)

Completely autonomous driving will only be possible if the vehicle can autonomously recognise and assess all driving and environmental situations, even highly complex ones, and make decisions on the basis of these assessments. This ambitious target can only be reached with the aid of leading-edge methods of artificial intelligence.

Currently, Long Short-Term Memory (LSTM) shows the most promise when it comes to processing sequential inputs. LSTM is a neural network that is able to remember and store items in a selective manner that can be used at a later time such as words in a sentence. This technology is intended for use in the area of autonomous driving.

In most current applications of autonomous driving, decisions are based on images taken at a particular point in time. LSTM works beyond this point in time in order to be able to combine information over a larger period of time for more robust, reliable and faster decision making in autonomous driving.

Based on neural networks, it does this by continuously evaluating camera images and other sensor data.  It also aims to recognise and anticipate situations that develop over a period of time that are not reliably recognisable from a single image, for example another vehicle’s turning manoeuvre, how a pedestrian responds to a vehicle or how two children involved in a conversation react to a vehicle. In addition, the project LSTM4Drive integrates the aspect of awareness. The goal is to make real driving decisions with the aid of LSTM.