Skip to content

Available Project Topics

Daniel Dietsch edited this page Mar 14, 2017 · 63 revisions

This is not a list of elaborated project/thesis proposals. It is rather a list of possible topics together with ideas how to realize them.

For students of the University of Freiburg: Most of these topics can also result in a Bachelor Project, a Bachelor Thesis, a Master Laboratory, a Master Project, as well as a Master Thesis. If you are interested please contact us and we will discuss together how to adapt the topic accordingly.

Minimization of Büchi Nested Word Automata by Lookahead Simulation

Contact: Christian

The classical approach to Büchi automata minimization is by computing simulation relations. Such a relation is usually obtained by reduction to a two-player game. The most efficient approach for standard Büchi automata is lookahead simulation (published in the paper Advanced Automata Minimization) and there exists an open-source implementation.

In a previous project we have extended some simulation-based approaches (direct, delayed, and fair simulation) to Büchi nested word automata. In this project we want to extend lookahead simulation to Büchi nested word automata.

This is ideally a long-term project with both theoretical elaboration and a practical implementation in our automata library. For a short project we would either restrict ourselves to the theoretical part or a reimplementation of the classical algorithm.

Automata Based Regression Verification

Contact: Matthias

Write interpolant automata to file. Use these interpolant automata for the verification of similar programs (e.g., the same program to which a patch was applied).

CDT interface for Ultimate

Contact: Daniel and Matthias

We already have a preliminary version of an Eclipse CDT interface that works with old versions of Eclipse. Adapt our interface to recent version of Eclipse and improve it.

Implementation of Tree Automata Operations

Contact: Alexander

In a previous project we have laid the grounds for tree automata in our automata library. In this project we want to review standard algorithms such as intersection, difference, determinization, etc. for tree automata from the literature and implement them. In particular, for efficiency reasons, we want to develop new on-the-fly algorithms which do not construct the whole automaton at once but only on demand.

Use Modulo Arithmetic to Represent Machine Integer Operations

Contact: Matthias

We have powerful tools for the logical reasoning over mathematical integers and hence we represent integer data types in computer programs by mathematical integers in our analyses. However, using mathematical integers is unsound in general since primitive data types usually have a finite number of values. E.g., an unsigned int in C can take only values from 0 to 4294967295. Our analysis can be made sound by using one of the following transformations.

  1. The Falke Transformation We transform a C program over primitive data types into a program over mathematical integers. After each arithmetic operation we apply a modulo operation. E.g., if x is an unsigned int the expression x + 1 is translated to (x + 1) % 4294967296.
  2. The Nutz Transformation We transform a C program over primitive data types into a program over mathematical integers. Before each comparison operation we apply a modulo operation. E.g., if x and y are unsigned ints, the expression x == y is translated to (x % 4294967296) == (y % 4294967296). Check the following examples for more details NutzTransformation01.c NutzTransformation02.c NutzTransformation03.c NutzTransformation04.c NutzTransformation05.c

The latter transformation is already implemented. Task: implement the first transformation and evaluate both.

A Sound but Efficient Extension of the Hoenicke-Lindenmann Memory Model

Contact: Matthias

Motivation

Our x86 computers store data bytewise. E.g., if a program writes the unsigned int 259 at the address ADDR, the computer will store only the least significant eight bit (here 3 in decimal representation) at ADDR and the next eight bit (here 1 in decimal representation) will be stored at ADDR+1.

In a naive model for the memory of the computer we read and store date bytewise. This is very costly, especially if we use integers to represent data. If the program writes the value of an unsigned int variable x at address ADDR we have to use modulo and division operations to get the value of each byte and we store x%256 at ADDR, we store (x/256)%256 at ADDR+1, ...

The Hoenicke-Lindenmann memory model circumvents this problem. Here, the memory is modeled as an array whose indices are addresses and whose values are integers. If the program writes the value of x to ADDR we just store its value at this address in our array. This is unsound, e.g., in the case were we read data from ADDR+1 (because we stored nothing here). See HoenickeLindenmannConcessions01.c for an example

Basic Idea

Do not store only the integer values in a data array. Use additionally an OrigAddress array and a TypeSize array. The indices of both arrays are addresses. The values of OrigAddress are non-positive integers, the values of TypeSize are positive integers. In the OrigAddress array we store how far the real data is displaced. E.g., if we store an int (bytesize four) at ADDR, we also store

OrigAddress[ADDR]:=0
OrigAddress[ADDR+1]:=-1
OrigAddress[ADDR+2]:=-2
OrigAddress[ADDR+3]:=-3

In the TypeSize array we store how many memory cells would be occupied by the naive memory model. E.g., if we store an int (bytesize four) at ADDR, we also store

TypeSize[ADDR]:=4

These three arrays will allow us to compute the exact data for each address in a way that does not require composition/decomposition to bytes in most cases.

Optimization

Take care that the control flow graph has two parallel edges for reading data. One for the simple case (OrigAddress[ADDR]==0) and one for the case where we have to compose bytes.

Take care that the control flow graph has two parallel edges for storing data. One for the simple case (OrigAddress[ADDR]==0 || size >=TypeSize[ADDR] ) and one for the case where we have to decompose into bytes.

Nice feature of traces that contain at least one simple-case-edge: Will be infeasible in most cases.

Nice feature of traces that contain only simple-case-edges: Additional arrays will be irrelevant in most cases and hence not occur in proofs.

In case the decomposition of integers to bytes would be too costly we could overapproximate data operations in the non-simple case edges. We would loose precision but we would still be sound.

Obstacles

We have to initialize the OrigAddress array with zeros. We can do this using the following assume statement at the beginning of the program.

forall i OrigAddress[i]==0

However, quantifiers are very expensive!

We have much more array operations than before.

Verification of Portable C Code

Contact: Matthias

Once, we had a C2Boogie translation that was independent of type sizes (e.g., sizeof(long) is not fixed to 4 bytes). We discontinued the support for this setting for good reasons:

  1. Unsound in the SV-COMP
  2. Introduces nonlinear arithmetic
  3. ... ?

Once, our plan was to check use exactly the semantics of the C standard, and not the semantics of common compilers/architectures (e.g., if some operation is implementation defined we should just havoc the resulting variable) We discontinued the support for this for good reasons: A lot of nondeterminism is expensive.

It seems to me that other tool try to become more implementation specific.

Let us don't play (only) where the elephants dance.

Furthermore, other tools might not check portable code, because they have problems that we don't have.

  • We can deal with nonlinear arithmetic.
  • Our C2Boogie translation is quite flexible.
  • Our model checker can deal with a lot of nondeterminism.

I think we can write a nice paper in which we explain how ultimate finds bug in code that should strictly follow the C standard and that was meant to be portable.

Parse and preprocess multiple C files

Contact: Daniel

Ultimate relies on the CDT infrastructure to parse C files. But Ultimate currently supports only programs that are contained in a single file and are already preprocessed.

Users expect program analysis tools to be able to analyze programs that are scattered over multiple files, e.g. source files and headers. Our target audience is usually familiar with command line tools and compilers, especially gcc. Our user interface should therefore be similar to gcc's command line interface (i.e., we may add options ff:or specifying files, headers and entry function separately). There are many examples of gcc's syntax for this, e.g. this one.

The main goal of this project is making Ultimate's C interface able to create one analysable representation (one control flow graph) from a C program distributed over several input files and headers.

We expect that the main (early) work will be research regarding the inner workings of CDT. You can see how we currently use CDT in the class CDTParser and there in the method parseAST(...). There, we create a wrapper node around an IASTTranslationUnit. The first question will be: can we create just one translation unit from the whole program (i.e., from multiple input files), or do we have to find another approach?

Verify real-world software

Contact: Daniel

As Ultimate matures, we are interested in gathering experience with the verification of real-world software written in C. To this end, we want to use existing tools (naturally, Ultimate among them) to verify properties of real-world software.

The properties may range from rather simple common runtime errors (e.g., nullpointer dereferences, division by zero, etc.) over finding over/underflows to functional properties specified by annotations.

Possible software sources are

  • Parts of Trezor, a hardware bitcoin wallet.
  • Newlib, an open-source math library
  • Closed-source examples
  • Own proposals

The project would involve selecting a suitable problem, trying to solve it with Ultimate and/or various other tools, and reporting the results.

Migrate to Eclipse e4

Contact: Daniel

Ultimate is still based on the old Eclipse 3.x API. We want to migrate completely to the Eclipse e4 API. Ths requires a comprehensive analysis of the current Eclipse API usage in Ultimate, the identification of necessary changes, and the implementation of those changes.

For an overview over the topic, see the description of RCP and a small migration tutorial.

Implement "Property driven reachability"

Contact: Daniel

Description follows.

Native access to SMT solvers using JavaSMT

Contact: Daniel

We are currently using IPC to comunicate with various external SMT solvers. We would like to explore whether using the Java bindings of various solvers would provide us with performance improvements or not.

To this end, we want to use JavaSMT, an SMTLib abstraction layer.

The project would involve integrating JavaSMT into Ultimate, implementing a wrapper from our own data structures to JavaSMT and back, and compare the performance with our IPC approach.

Clone this wiki locally