Stefan Mada

Languages, Compilers, and more!

Exploring Suboptimal Lattice Transfer Functions in LLVM with KnownBits

I recently was taking a look at LLVM's analysis passes, and in order to get a better understanding I was reading up on lattices and abstract domains. Lattices are the bread and butter of static program analysis, and are what allows passes such as constant propagation and live variable analysis to have well understood properties. However, these lattices operate on the abstract domain rather than concrete values, and in doing so, there is some precision lost in the process. If you're curious about any of these, I'll help explain what these are briefly! For more detailed information, see the Resources section. Most of the information below is from Static Program Analysis by Anders Møller and Michael I. Schwartzbach, but the later LLVM details are my own.

What is a Lattice?

A lattice is a type of set over a domain with certain properties that make it possible for transfer functions to run to a fixed point, and that fixed point is the same for a given input. More concretely, let's take a simple analysis such as sign analysis of variables. When we concretely execute a program, values can be arbitrary integers.

In contrast, for our analysis, we should consider an abstraction of the integer values by grouping them into three categories, or abstract values: positive (++), negative (-), and zero (00).

We should also be able to handle uncertain information when we don't know the signedness of a value, so we need to add a special abstract value \top.

For this domain, we must decide the case we are interested in. Here, we only want definite information, so we should only report ++ for an expression if we are sure it is positive.

For completedness, it's useful to have a value \bot for expressions whose values are not numbers, such as in cases where it is a pointer or the expression is unreachable.

So for instance, if a=10 and b=20, c = input ? a - b : a + b would yield that a,b+a,b \in + and cc \in \top, as we are not sure of the sign of c.

This abstract domain consisting of the five abstract values {+,,0,,}\{+,-,0,\top,\bot\} can be arranged with the least precise information at the top, and most precise at the bottom, yielding a lattice:

simple sign alaysis lattice

This ordering reflects that \bot is the set of empty integers while \top is the set of all integers. The ordering also shows that any element lower on the graph is a subset of any element higher on the graph.

The main use of a lattice is that given a set of transfer functions between abstract values (one such transfer function would say that given a+  b0    (a+b)+a\in+\ \land\ b\in0\implies (a+b) \in +, ie: a positive value plus a zero value will yield a positive value) and these transfer functions are monotone (more precise inputs do not yield less precise outputs), then if we apply all of our transfer functions, these is a unique least fixed point to the final set of states.

This is a powerful result, as it tells us that equation systems over complete lattices always have solutions and also that a uniquely most precise solution exists, so the order of executing the transfer functions does not matter.

These lattices are the basis of analysis passes in any compiler, and helps set the stage for a look at one of LLVM's analyses: Known Bits. To learn more about these lattices, you can read Chapter 4 of Static Program Analysis.

LLVM Known Bits

So that was a lot of more formal work, so let's take a look at how LLVM actually implements all of this in its Known Bits analysis.

Known Bits analysis tracks whether a given bit of a variable is a 1, a zero, or unknown (so technically speaking, this is a semi-lattice as LLVM has no way of representing \bot here, but most of the nice properties from above still hold).

To store this abstract domain in code, LLVM has this struct:

struct KnownBits {
  APInt Zero;
  APInt One;

  ...
}

An APInt in LLVM is an arbitray precision integer, so they can be any number of bits long. In a simple example, let's say we known for certain the bits of a variable are 0110. In this case, the KnownBits analysis would yield this:

Zero = 1001b;
One = 0110b;

This means that for bit positions 0 and 3, we are certain that there is a zero there, so that is why the 0 and 3 bits are set in Zero. This is similarly true for One. If we are unsure if a bit is a zero or one, then both Zero and One will have a 0 in that bit position.

So that's how LLVM stores this abstract domain! When this pass is run for real, it starts with known information (such as constants, whose bits are all known), and applies specific transfer functions based on what operation was performed. For instance the following C++ code will compile to the below LLVM IR:

int src(int x) {
  const int val = 7;
  x <<= 2;

  return val + x;
}
define i32 @src(i32 noundef %0){
  %2 = shl i32 %0, 2
  %3 = add nsw i32 %2, 7
  ret i32 %3
}

For the LLVM IR, we have fully known bits for 2 and 7 as they are constants. The input argument %0 is completely unknown, so Zero and One would be fully zeroed out. However, we can then apply transfer functions to get the values of the variables %2 and %3. LLVM has implementations of transfer functions for shl and add between two variables, and the KnownBits of %2 and %3 would be computed from these functions.

You can see these transfer functions in KnownBits.cpp to see exactly how they are implemented.

Now, let's move on to what's suboptimal about these!

Optimality

We have been talking pretty exclusively about an abstract domain above, but optimality takes into account concrete values. In essence, optimality for known bits would say that if we take any abstract known bits value x, then if we enumerate every possible real integer x represents, apply the underlying function (such as a real shl), then turn it back into an abstract known bits value, that this would yield the same result as running the abstract transfer function version of shl on the original x.

Basically, if we were to brute force every possible value by exiting the abstract domain, doing the computation, and getting the most precise representation of all the possible values, then that gives the same result as just running the abstract transfer function.

This would be a wonderful feature to have for your transfer functions, as it means that your analysis is a precise as it could possibly be, with no room for improvement.

I'll go more into concretization and abstraction functions below to get a better sense of how we can show that LLVM's transfer functions are not optimal.

Concretization and Abstraction

If we go back to the sign analysis example, we can more closely understand the relationship between concrete values, abstract values, and switching between them.

The abstract values are as we already know: {+,,0,,}\{+,-,0,\top,\bot\}

The sets of concrete values are every possibly combination of integer values, such as {1},{421,21},\{1\}, \{-421, 21\}, \emptyset. This is just the powerset over integers: P(Z)\mathcal{P} (\mathbb{Z})

Now, an abstraction function takes in an element from the set of concrete values, and gives the most precise abstract value it can. So for DP(Z)D \in \mathcal{P} (\mathbb{Z}), α(D)\alpha(D) would be the abstraction function. If D is nonempty and contains only positive integers, then α(D)=+\alpha(D)=+, and if D only contains 0, then α(D)=0\alpha(D)=0, and so on.

This allows us to take concrete values into the abstract domain, but we can also do the opposite with a concretization function

For instance, if we have s{+,,0,,}s \in \{+,-,0,\top,\bot\}, γ(s)\gamma(s) would be the concretization function. If s=+s=+, then γ(s)={1,2,3,...}\gamma(s)=\{1,2,3,...\}, and if s=s=\top, γ(s)=Z\gamma(s)=\mathbb{Z}. This allows us to get the broadest set of concrete values that satisfies the abstract domain.

Optimality: More Precisely

Now that we have the above concepts, we can define optimality with more precision.

Let α:L1L2\alpha: L_1 \rightarrow L_2 be an abstraction function where L1L_1 is the concrete set of values and L2L_2 is the abstract domain, and let γ:L2L1\gamma: L_2 \rightarrow L_1 be the concretization function. Now consider two functions cf:L1L1cf:L_1 \rightarrow L_1 and af:L2L2af:L_2 \rightarrow L_2 that represent the concrete operation and the abstract transfer function respectively. We say that afaf is the optimal abstraction of cf if

bL2:af(b):α(cf(γ(b)))\forall b \in L_2: af(b):\alpha(cf(\gamma(b)))

For instance, for the analysis we will be doing in LLVM, we have L1=APIntL_1 = \text{APInt}, L2=KnownBitsL_2 = \text{KnownBits}, af=abstract saturated signed additionaf = \text{abstract saturated signed addition}, cf=concrete saturated signed additioncf = \text{concrete saturated signed addition}.

You can read more about this in Chapter 12 of Static Program Analysis.

Suboptimal LLVM Saturated Signed Addition Function

Now that all the groundwork is laid, it's not too hard to show that LLVM's saturated signed addition transfer function in the known bits domain is suboptimal.

We basically just need to the run the two sides of the equality above in code, and see when they hold. That is:

  1. Get all possible pairs of abstract values, then run KnownBits::sadd_sat(val1, val2) for all pairs
  2. For the same pair of abstract values, enumerate all possible concrete APInt values that the values could be, then run APInt::sadd_sat for all possible pairs, and generate the most precise KnownBits value we can from the result
  3. If they are equal, then KnownBits::sadd_sat was as precise as it could be. Otherwise, if the other path yielded a more precise result, then KnownBits::sadd_sat was suboptimal

You can see my implementation of this here, and it's only about 200 lines of code to see the results.

The basic code simply runs this:

auto fullKnownBits = enumerateAllValues(bitwidth);
const auto allAbstractPairs = getAllPairs(fullKnownBits);

for(const auto& [bits1, bits2] : allAbstractPairs) {
  auto llvmKnownResults = KnownBits::sadd_sat(bits1, bits2);

  const auto concreteVals1 = concretization(bits1);
  const auto concreteVals2 = concretization(bits2);
  const auto concreteResults = computeAllConcretePairs(concreteVals1, concreteVals2);
  const auto naiveKnownResults = abstraction(concreteResults);

  bool isNaiveMorePrecise = morePrecise(naiveKnownResults, llvmKnownResults);
  bool isNaiveLessPrecise = morePrecise(llvmKnownResults, naiveKnownResults);
  ...
}

Results

I ran an evaluation for a bitwidth of 7. This yielded:

As you can see, this method of brute-forcing is terribly slow, and the time complexity is a double exponential, so time to evaluate quickly blows up with bitwidth. However, you can see that LLVM is leaving precision on the table in 38% of cases.

Could LLVM do better? Probably, but this is an area of diminishing returns. LLVM definitely won't have time to enumerate possibilities, so the compiler needs to find clever ways to do this in a time complexity likely no worse than O(n)O(n) with respect to the number of bits. However, this shows that there is room for improvement in LLVM's transfer functions, and I think this is a nice sort of unifying result that combines lattices, optimality, and the ideas of concretization and abstraction all into a nice, cohesive bun.

Please feel free to leave any thoughts below!

Resources

Static Program Analysis

My Optimality Checker Implementation

KnownBits.h

KnownBits.cpp shl

ValueTracking.cpp (more complex transfer functions)