-
Notifications
You must be signed in to change notification settings - Fork 41
Available Project Topics
This is not a list of elaborated project/thesis proposals. It is rather a list of possible topics together with ideas how to realize them.
For students of the University of Freiburg: Most of these topics can also result in a Bachelor Project, a Bachelor Thesis, a Master Laboratory, a Master Project, as well as a Master Thesis. If you are interested please contact us and we will discuss together how to adapt the topic accordingly.
Contact: Matthias
Write interpolant automata to file. Use these interpolant automata for the verification of similar programs (e.g., the same program to which a patch was applied).
We already have a preliminary version of an Eclipse CDT interface that works with old versions of Eclipse. Adapt our interface to recent version of Eclipse and improve it.
Contact: Alexander
In a previous project we have laid the grounds for tree automata in our automata library. In this project we want to review standard algorithms such as intersection, difference, determinization, etc. for tree automata from the literature and implement them. In particular, for efficiency reasons, we want to develop new on-the-fly algorithms which do not construct the whole automaton at once but only on demand.
Contact: Matthias
We have powerful tools for the logical reasoning over mathematical integers and hence we represent integer data types in computer programs by mathematical integers in our analyses.
However, using mathematical integers is unsound in general since primitive data types usually have a finite number of values. E.g., an unsigned int
in C can take only values from 0 to 4294967295. Our analysis can be made sound by using one of the following transformations.
-
The Falke Transformation
We transform a C program over primitive data types into a program over mathematical integers. After each arithmetic operation we apply a modulo operation. E.g., if x is an unsigned int the expression
x + 1
is translated to(x + 1) % 4294967296
. -
The Nutz Transformation
We transform a C program over primitive data types into a program over mathematical integers. Before each comparison operation we apply a modulo operation. E.g., if x and y are unsigned ints, the expression
x == y
is translated to(x % 4294967296) == (y % 4294967296)
. Check the following examples for more details NutzTransformation01.c NutzTransformation02.c NutzTransformation03.c NutzTransformation04.c NutzTransformation05.c
The latter transformation is already implemented. Task: implement the first transformation and evaluate both.
Contact: Matthias
Our x86 computers store data bytewise. E.g., if a program writes the unsigned int 259 at the address ADDR, the computer will store only the least significant eight bit (here 3 in decimal representation) at ADDR and the next eight bit (here 1 in decimal representation) will be stored at ADDR+1.
In a naive model for the memory of the computer we read and store date bytewise. This is very costly, especially if we use integers to represent data. If the program writes the value of an unsigned int variable x at address ADDR we have to use modulo and division operations to get the value of each byte and we store x%256 at ADDR, we store (x/256)%256 at ADDR+1, ...
The Hoenicke-Lindenmann memory model circumvents this problem. Here, the memory is modeled as an array whose indices are addresses and whose values are integers. If the program writes the value of x to ADDR we just store its value at this address in our array. This is unsound, e.g., in the case were we read data from ADDR+1 (because we stored nothing here). See HoenickeLindenmannConcessions01.c for an example
Do not store only the integer values in a data array. Use additionally an OrigAddress array and a TypeSize array. The indices of both arrays are addresses. The values of OrigAddress are non-positive integers, the values of TypeSize are positive integers. In the OrigAddress array we store how far the real data is displaced. E.g., if we store an int (bytesize four) at ADDR, we also store
OrigAddress[ADDR]:=0
OrigAddress[ADDR+1]:=-1
OrigAddress[ADDR+2]:=-2
OrigAddress[ADDR+3]:=-3
In the TypeSize array we store how many memory cells would be occupied by the naive memory model. E.g., if we store an int (bytesize four) at ADDR, we also store
TypeSize[ADDR]:=4
These three arrays will allow us to compute the exact data for each address in a way that does not require composition/decomposition to bytes in most cases.
Take care that the control flow graph has two parallel edges for reading data.
One for the simple case (OrigAddress[ADDR]==0)
and one for the case where we
have to compose bytes.
Take care that the control flow graph has two parallel edges for storing data.
One for the simple case (OrigAddress[ADDR]==0 || size >=TypeSize[ADDR] )
and
one for the case where we have to decompose into bytes.
Nice feature of traces that contain at least one simple-case-edge: Will be infeasible in most cases.
Nice feature of traces that contain only simple-case-edges: Additional arrays will be irrelevant in most cases and hence not occur in proofs.
In case the decomposition of integers to bytes would be too costly we could overapproximate data operations in the non-simple case edges. We would loose precision but we would still be sound.
We have to initialize the OrigAddress array with zeros. We can do this using the following assume statement at the beginning of the program.
forall i OrigAddress[i]==0
However, quantifiers are very expensive!
We have much more array operations than before.
Contact: Matthias
Once, we had a C2Boogie translation that was independent of type sizes (e.g., sizeof(long) is not fixed to 4 bytes). We discontinued the support for this setting for good reasons:
- Unsound in the SV-COMP
- Introduces nonlinear arithmetic
- ... ?
Once, our plan was to check use exactly the semantics of the C standard, and not the semantics of common compilers/architectures (e.g., if some operation is implementation defined we should just havoc the resulting variable) We discontinued the support for this for good reasons: A lot of nondeterminism is expensive.
It seems to me that other tool try to become more implementation specific.
Let us don't play (only) where the elephants dance.
Furthermore, other tools might not check portable code, because they have problems that we don't have.
- We can deal with nonlinear arithmetic.
- Our C2Boogie translation is quite flexible.
- Our model checker can deal with a lot of nondeterminism.
I think we can write a nice paper in which we explain how ultimate finds bug in code that should strictly follow the C standard and that was meant to be portable.
Contact: Daniel
Ultimate relies on the CDT infrastructure to parse C files. But Ultimate currently supports only programs that are contained in a single file and are already preprocessed.
Users expect program analysis tools to be able to analyze programs that are scattered over multiple files, e.g. source files and headers. Our target audience is usually familiar with command line tools and compilers, especially gcc. Our user interface should therefore be similar to gcc's command line interface (i.e., we may add options ff:or specifying files, headers and entry function separately). There are many examples of gcc's syntax for this, e.g. this one.
The main goal of this project is making Ultimate's C interface able to create one analysable representation (one control flow graph) from a C program distributed over several input files and headers.
We expect that the main (early) work will be research regarding the inner workings of CDT. You can see how we currently use CDT in the class CDTParser
and there in the method parseAST(...)
.
There, we create a wrapper node around an IASTTranslationUnit
. The first question will be: can we create just one translation unit from the whole program (i.e., from multiple input files), or do we have to find another approach?
Contact: Daniel
As Ultimate matures, we are interested in gathering experience with the verification of real-world software written in C. To this end, we want to use existing tools (naturally, Ultimate among them) to verify properties of real-world software.
The properties may range from rather simple common runtime errors (e.g., nullpointer dereferences, division by zero, etc.) over finding over/underflows to functional properties specified by annotations.
Possible software sources are
- Parts of Trezor, a hardware bitcoin wallet.
- Newlib, an open-source math library
- Closed-source examples
- Own proposals
The project would involve selecting a suitable problem, trying to solve it with Ultimate and/or various other tools, and reporting the results.
Contact: Daniel
Ultimate is still based on the old Eclipse 3.x API. We want to migrate completely to the Eclipse e4 API. Ths requires a comprehensive analysis of the current Eclipse API usage in Ultimate, the identification of necessary changes, and the implementation of those changes.
For an overview over the topic, see the description of RCP and a small migration tutorial.
Contact: Daniel
- Interpolating Property Directed Reachability
- Efficient implementation of property directed reachability
- IC3 - Flipping the E in ICE
Description follows.
Contact: Daniel
We are currently using IPC to comunicate with various external SMT solvers. We would like to explore whether using the Java bindings of various solvers would provide us with performance improvements or not.
To this end, we want to use JavaSMT, an SMTLib abstraction layer.
The project would involve integrating JavaSMT into Ultimate, implementing a wrapper from our own data structures to JavaSMT and back, and compare the performance with our IPC approach.
- Home
- Ultimate Development
- Ultimate Build System
- Documentation
- Project Topics