Stefan Mada
Languages, Compilers, and more!
Exploring Suboptimal Lattice Transfer Functions in LLVM with KnownBits
I recently was taking a look at LLVM's analysis passes, and in order to get a better understanding I was reading up on lattices and abstract domains. Lattices are the bread and butter of static program analysis, and are what allows passes such as constant propagation and live variable analysis to have well understood properties. However, these lattices operate on the abstract domain rather than concrete values, and in doing so, there is some precision lost in the process. If you're curious about any of these, I'll help explain what these are briefly! For more detailed information, see the Resources section. Most of the information below is from Static Program Analysis by Anders Møller and Michael I. Schwartzbach, but the later LLVM details are my own.
What is a Lattice?
A lattice is a type of set over a domain with certain properties that make it possible for transfer functions to run to a fixed point, and that fixed point is the same for a given input. More concretely, let's take a simple analysis such as sign analysis of variables. When we concretely execute a program, values can be arbitrary integers.
In contrast, for our analysis, we should consider an abstraction of the integer values by grouping them into three categories, or abstract values: positive (), negative (), and zero ().
We should also be able to handle uncertain information when we don't know the signedness of a value, so we need to add a special abstract value .
For this domain, we must decide the case we are interested in. Here, we only want definite information, so we should only report for an expression if we are sure it is positive.
For completedness, it's useful to have a value for expressions whose values are not numbers, such as in cases where it is a pointer or the expression is unreachable.
So for instance, if a=10
and b=20
, c = input ? a - b : a + b
would yield that and , as we are not sure
of the sign of c
.
This abstract domain consisting of the five abstract values can be arranged with the least precise information at the top, and most precise at the bottom, yielding a lattice:
This ordering reflects that is the set of empty integers while is the set of all integers. The ordering also shows that any element lower on the graph is a subset of any element higher on the graph.
The main use of a lattice is that given a set of transfer functions between abstract values (one such transfer function would say that given , ie: a positive value plus a zero value will yield a positive value) and these transfer functions are monotone (more precise inputs do not yield less precise outputs), then if we apply all of our transfer functions, these is a unique least fixed point to the final set of states.
This is a powerful result, as it tells us that equation systems over complete lattices always have solutions and also that a uniquely most precise solution exists, so the order of executing the transfer functions does not matter.
These lattices are the basis of analysis passes in any compiler, and helps set the stage for a look at one of LLVM's analyses: Known Bits. To learn more about these lattices, you can read Chapter 4 of Static Program Analysis.
LLVM Known Bits
So that was a lot of more formal work, so let's take a look at how LLVM actually implements all of this in its Known Bits analysis.
Known Bits analysis tracks whether a given bit of a variable is a 1, a zero, or unknown (so technically speaking, this is a semi-lattice as LLVM has no way of representing here, but most of the nice properties from above still hold).
To store this abstract domain in code, LLVM has this struct:
struct KnownBits {
APInt Zero;
APInt One;
...
}
An APInt
in LLVM is an arbitray precision integer, so they
can be any number of bits long. In a simple example,
let's say we known for certain the bits of a variable are
0110
. In this case, the KnownBits analysis would yield
this:
Zero = 1001b;
One = 0110b;
This means that for bit positions 0 and 3, we are certain
that there is a zero there, so that is why the 0 and 3 bits
are set in Zero
. This is similarly true for One
. If
we are unsure if a bit is a zero or one, then both Zero
and One
will have a 0
in that bit position.
So that's how LLVM stores this abstract domain! When this pass is run for real, it starts with known information (such as constants, whose bits are all known), and applies specific transfer functions based on what operation was performed. For instance the following C++ code will compile to the below LLVM IR:
int src(int x) {
const int val = 7;
x <<= 2;
return val + x;
}
define i32 @src(i32 noundef %0){
%2 = shl i32 %0, 2
%3 = add nsw i32 %2, 7
ret i32 %3
}
For the LLVM IR, we have fully known bits for 2
and 7
as they are constants. The input argument %0
is completely
unknown, so Zero
and One
would be fully zeroed out. However,
we can then apply transfer functions to get the values of the
variables %2
and %3
. LLVM has implementations of transfer
functions for shl
and add between two variables, and the
KnownBits
of %2
and %3
would be computed from these
functions.
You can see these transfer functions in KnownBits.cpp to see exactly how they are implemented.
Now, let's move on to what's suboptimal about these!
Optimality
We have been talking pretty exclusively about an abstract domain
above, but optimality takes into account concrete values. In
essence, optimality for known bits would say that if we take
any abstract known bits value x
, then if we enumerate
every possible real integer x
represents, apply the underlying
function (such as a real shl
), then turn it back into an abstract
known bits value, that this would yield the same result as
running the abstract transfer function version of shl
on the original x
.
Basically, if we were to brute force every possible value by exiting the abstract domain, doing the computation, and getting the most precise representation of all the possible values, then that gives the same result as just running the abstract transfer function.
This would be a wonderful feature to have for your transfer functions, as it means that your analysis is a precise as it could possibly be, with no room for improvement.
I'll go more into concretization and abstraction functions below to get a better sense of how we can show that LLVM's transfer functions are not optimal.
Concretization and Abstraction
If we go back to the sign analysis example, we can more closely understand the relationship between concrete values, abstract values, and switching between them.
The abstract values are as we already know:
The sets of concrete values are every possibly combination of integer values, such as . This is just the powerset over integers:
Now, an abstraction function takes in an element from the set of concrete values, and gives the most precise abstract value it can. So for , would be the abstraction function. If D is nonempty and contains only positive integers, then , and if D only contains 0, then , and so on.
This allows us to take concrete values into the abstract domain, but we can also do the opposite with a concretization function
For instance, if we have , would be the concretization function. If , then , and if , . This allows us to get the broadest set of concrete values that satisfies the abstract domain.
Optimality: More Precisely
Now that we have the above concepts, we can define optimality with more precision.
Let be an abstraction function where is the concrete set of values and is the abstract domain, and let be the concretization function. Now consider two functions and that represent the concrete operation and the abstract transfer function respectively. We say that is the optimal abstraction of cf if
For instance, for the analysis we will be doing in LLVM, we have , , , .
You can read more about this in Chapter 12 of Static Program Analysis.
Suboptimal LLVM Saturated Signed Addition Function
Now that all the groundwork is laid, it's not too hard to show that LLVM's saturated signed addition transfer function in the known bits domain is suboptimal.
We basically just need to the run the two sides of the equality above in code, and see when they hold. That is:
- Get all possible pairs of abstract values, then run
KnownBits::sadd_sat(val1, val2)
for all pairs - For the same pair of abstract values, enumerate all possible concrete
APInt
values that the values could be, then runAPInt::sadd_sat
for all possible pairs, and generate the most preciseKnownBits
value we can from the result - If they are equal, then
KnownBits::sadd_sat
was as precise as it could be. Otherwise, if the other path yielded a more precise result, thenKnownBits::sadd_sat
was suboptimal
You can see my implementation of this here, and it's only about 200 lines of code to see the results.
The basic code simply runs this:
auto fullKnownBits = enumerateAllValues(bitwidth);
const auto allAbstractPairs = getAllPairs(fullKnownBits);
for(const auto& [bits1, bits2] : allAbstractPairs) {
auto llvmKnownResults = KnownBits::sadd_sat(bits1, bits2);
const auto concreteVals1 = concretization(bits1);
const auto concreteVals2 = concretization(bits2);
const auto concreteResults = computeAllConcretePairs(concreteVals1, concreteVals2);
const auto naiveKnownResults = abstraction(concreteResults);
bool isNaiveMorePrecise = morePrecise(naiveKnownResults, llvmKnownResults);
bool isNaiveLessPrecise = morePrecise(llvmKnownResults, naiveKnownResults);
...
}
Results
I ran an evaluation for a bitwidth of 7. This yielded:
- Time taken: 63.91 seconds
- Total abstract values in this domain: 2,187
- Total pairs of abstract values in this domain: 2,392,578
- Total pairs of concrete values evaluated in this domain: 134,357,696
- Naive approach was more precise: 928,522 times
- Naive approach was less precise: 0 times
- Naive approach was equally precise: 1,464,056 times
- Naive approach was incomparable: 0 times
As you can see, this method of brute-forcing is terribly slow, and the time complexity is a double exponential, so time to evaluate quickly blows up with bitwidth. However, you can see that LLVM is leaving precision on the table in 38% of cases.
Could LLVM do better? Probably, but this is an area of diminishing returns. LLVM definitely won't have time to enumerate possibilities, so the compiler needs to find clever ways to do this in a time complexity likely no worse than with respect to the number of bits. However, this shows that there is room for improvement in LLVM's transfer functions, and I think this is a nice sort of unifying result that combines lattices, optimality, and the ideas of concretization and abstraction all into a nice, cohesive bun.
Please feel free to leave any thoughts below!