Skip to content

Available Project Topics

Matthias Heizmann edited this page Apr 2, 2019 · 63 revisions

This is not a list of elaborated project/thesis proposals. It is rather a list of possible topics together with ideas how to realize them.

For students of the University of Freiburg: Most of these topics can also result in a Bachelor Project, a Bachelor Thesis, a Master Laboratory, a Master Project, as well as a Master Thesis. If you are interested please contact us and we will discuss together how to adapt the topic accordingly.

Petri nets as language acceptors

Contact: Matthias

In Ultimate we use bounded Petri nets a acceptors of regular languages. For an introduction you can have a look at the Petri net seminar slides of Nico Bühler from the 2015/16 Automata theory seminar. Petri nets are very efficient on languages that occur while verifying concurrent programs. There are several topics available. The following list could (in principle) be extended by every Petri net operation that is not yet supported.

  • Petri net minimization
  • Fold the finite prefix of an branching process back to a Petri net.
  • On-demand construction of a difference.
  • Petri net with Büchi acceptance condition.

Automata Based Regression Verification

Contact: Matthias

Write interpolant automata to file. Use these interpolant automata for the verification of similar programs (e.g., the same program to which a patch was applied).

CDT interface for Ultimate

Contact: Daniel and Matthias

We already have a preliminary version of an Eclipse CDT interface that works with old versions of Eclipse. Adapt our interface to recent version of Eclipse and improve it.

Use Modulo Arithmetic to Represent Machine Integer Operations

Contact: Matthias

We have powerful tools for the logical reasoning over mathematical integers and hence we represent integer data types in computer programs by mathematical integers in our analyses. However, using mathematical integers is unsound in general since primitive data types usually have a finite number of values. E.g., an unsigned int in C can take only values from 0 to 4294967295. Our analysis can be made sound by using one of the following transformations.

  1. The Falke Transformation We transform a C program over primitive data types into a program over mathematical integers. After each arithmetic operation we apply a modulo operation. E.g., if x is an unsigned int the expression x + 1 is translated to (x + 1) % 4294967296.
  2. The Nutz Transformation We transform a C program over primitive data types into a program over mathematical integers. Before each comparison operation we apply a modulo operation. E.g., if x and y are unsigned ints, the expression x == y is translated to (x % 4294967296) == (y % 4294967296). Check the following examples for more details NutzTransformation01.c NutzTransformation02.c NutzTransformation03.c NutzTransformation04.c NutzTransformation05.c

The latter transformation is already implemented. Task: implement the first transformation and evaluate both.

A Sound but Efficient Extension of the Hoenicke-Lindenmann Memory Model

Contact: Matthias

Motivation

Our x86 computers store data bytewise. E.g., if a program writes the unsigned int 259 at the address ADDR, the computer will store only the least significant eight bit (here 3 in decimal representation) at ADDR and the next eight bit (here 1 in decimal representation) will be stored at ADDR+1.

In a naive model for the memory of the computer we read and store date bytewise. This is very costly, especially if we use integers to represent data. If the program writes the value of an unsigned int variable x at address ADDR we have to use modulo and division operations to get the value of each byte and we store x%256 at ADDR, we store (x/256)%256 at ADDR+1, ...

The Hoenicke-Lindenmann memory model circumvents this problem. Here, the memory is modeled as an array whose indices are addresses and whose values are integers. If the program writes the value of x to ADDR we just store its value at this address in our array. This is unsound, e.g., in the case were we read data from ADDR+1 (because we stored nothing here). See HoenickeLindenmannConcessions01.c for an example

Basic Idea

Do not store only the integer values in a data array. Use additionally an OrigAddress array and a TypeSize array. The indices of both arrays are addresses. The values of OrigAddress are non-positive integers, the values of TypeSize are positive integers. In the OrigAddress array we store how far the real data is displaced. E.g., if we store an int (bytesize four) at ADDR, we also store

OrigAddress[ADDR]:=0
OrigAddress[ADDR+1]:=-1
OrigAddress[ADDR+2]:=-2
OrigAddress[ADDR+3]:=-3

In the TypeSize array we store how many memory cells would be occupied by the naive memory model. E.g., if we store an int (bytesize four) at ADDR, we also store

TypeSize[ADDR]:=4

These three arrays will allow us to compute the exact data for each address in a way that does not require composition/decomposition to bytes in most cases.

Optimization

Take care that the control flow graph has two parallel edges for reading data. One for the simple case (OrigAddress[ADDR]==0) and one for the case where we have to compose bytes.

Take care that the control flow graph has two parallel edges for storing data. One for the simple case (OrigAddress[ADDR]==0 || size >=TypeSize[ADDR] ) and one for the case where we have to decompose into bytes.

Nice feature of traces that contain at least one simple-case-edge: Will be infeasible in most cases.

Nice feature of traces that contain only simple-case-edges: Additional arrays will be irrelevant in most cases and hence not occur in proofs.

In case the decomposition of integers to bytes would be too costly we could overapproximate data operations in the non-simple case edges. We would loose precision but we would still be sound.

Obstacles

We have to initialize the OrigAddress array with zeros. We can do this using the following assume statement at the beginning of the program.

forall i OrigAddress[i]==0

However, quantifiers are very expensive!

We have much more array operations than before.

Verification of Portable C Code

Contact: Matthias

Once, we had a C2Boogie translation that was independent of type sizes (e.g., sizeof(long) is not fixed to 4 bytes). We discontinued the support for this setting for good reasons:

  1. Unsound in the SV-COMP
  2. Introduces nonlinear arithmetic
  3. ... ?

Once, our plan was to check use exactly the semantics of the C standard, and not the semantics of common compilers/architectures (e.g., if some operation is implementation defined we should just havoc the resulting variable) We discontinued the support for this for good reasons: A lot of nondeterminism is expensive.

It seems to me that other tool try to become more implementation specific.

Let us don't play (only) where the elephants dance.

Furthermore, other tools might not check portable code, because they have problems that we don't have.

  • We can deal with nonlinear arithmetic.
  • Our C2Boogie translation is quite flexible.
  • Our model checker can deal with a lot of nondeterminism.

I think we can write a nice paper in which we explain how ultimate finds bug in code that should strictly follow the C standard and that was meant to be portable.

Migrate to Eclipse e4

Contact: Daniel

Ultimate is still based on the old Eclipse 3.x API. We want to migrate completely to the Eclipse e4 API. Ths requires a comprehensive analysis of the current Eclipse API usage in Ultimate, the identification of necessary changes, and the implementation of those changes.

For an overview over the topic, see the description of RCP and a small migration tutorial.

Native access to SMT solvers using JavaSMT

Contact: Daniel

We are currently using IPC to comunicate with various external SMT solvers. We would like to explore whether using the Java bindings of various solvers would provide us with performance improvements or not.

To this end, we want to use JavaSMT, an SMTLib abstraction layer.

The project would involve integrating JavaSMT into Ultimate, implementing a wrapper from our own data structures to JavaSMT and back, and compare the performance with our IPC approach.

Clone this wiki locally