Hash Collisions in Java
Recently I read an interesting article about hash collisions in Haskell’s Aeson library (used to parse JSON). Aeson uses HashMap
to deal with JSON Objects. Using many colliding keys leads to a DoS. Generating colliding keys in Java is quite easy. How does Java mitigate this kind of problem? And why could I break our program anyway?
While trying to reproduce the behaviour we discovered JEP 180, why it did not work for us, and how to make it work.
Breaking our Program: Generating Collisions
We are using openjdk 11.0.12 2021-07-20. All source code snippets are taken from this tar ball.
JDK 11 uses two implementations to calculate hashCode
for String
s. These are located in StringUTF16.java
and StringLatin1.java
.
We are looking at the latter:
// jdk-11.0.12-ga/src/java.base/share/classes/java/lang/StringLatin1.java:193
public static int hashCode(byte[] value) {
int h = 0;
for (byte v : value) {
h = 31 * h + (v & 0xff);
}
return h;
}
Let’s start by generating some colliding String
s. I’m using Go here (fast startup times). Let’s try brute forcing with a length of 3.
First let’s implement Java’s hashCode
:
func javaLatin1Hash(s string) int32 {
var result int32
for i := 0; i < len(s); i++ {
result = 31*result + int32((s[i] & 0xff))
}
return result
}
Now we generate all permutations of String
s with length 3 given an alphabet (ASCII only):
const (
n = 3.0
alphabet = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789"
)
func genStrings() []string {
wordsLen := int(math.Pow(float64(len(alphabet)), n))
words := make([]string, 0, wordsLen)
for i := 0; i < len(alphabet); i++ {
for j := 0; j < len(alphabet); j++ {
for k := 0; k < len(alphabet); k++ {
word := []byte{alphabet[i], alphabet[j], alphabet[k]}
words = append(words, string(word))
}
}
}
return words
}
Calculate the hash for each String
and find a hash with the maximum number of collisions:
func maxCollisions(words []string) []string {
m := make(map[int32][]string)
for _, w := range words {
h := javaLatin1Hash(w)
m[h] = append(m[h], w)
}
max := 0
var maxK int32
for k, v := range m {
l := len(v)
if l > max {
max = l
maxK = k
}
}
return m[maxK]
}
Executing this gives us some colliding String
s:
[1ot 1pU 1q6 2Pt 2QU 2R6 31t 32U 336]
Sadly these are far too few to DoS a HashMap
. My first solution would generate all String
s with a length of 1..n
and n
could be adjusted. This works for n
up to 4. When n
was increased to 5 my machine ran out of memory (64GB). We need another solution, so let’s look at the hash function again:
// jdk-11.0.12-ga/src/java.base/share/classes/java/lang/StringLatin1.java:193
public static int hashCode(byte[] value) {
int h = 0;
for (byte v : value) {
h = 31 * h + (v & 0xff);
}
return h;
}
We can see that the hash of a String
s, consisting of two substrings s1 and s2 is just hash(s)=31^(len(s2)) * hash(s1) + hash(s2)
. So if we concatenate two String
s with the same hash and same length we get the same result, regardless of order.
Let’s look at an example (31^2
means 31 squared):
"ABC".hashCode() == 31 * 31 * 'A' + 31 * 'B' + 'C'
"DE".hashCode() == 31 * 'D' + 'E'
"ABCDE".hashCode() == 31^4 * 'A' + 31^3 * 'B' + 31^2 * 'C' + 31 * 'D' + 'E'
== 31^2 * (31^2 * 'A' + 31 * 'B' + 'C') + (31 * 'D' + 'E')
== 31^len("DE") * "ABC".hashCode() + "DE".hashCode()
This allows us to generate more colliding String
s by just concatenating what we have. We can do this in a loop until we reach some minimum number of colliding String
s:
func genCollisions(minCollisions int) []string {
strs := genStrings()
someCollisions := maxCollisions(strs)
moreCollisions := make([]string, len(someCollisions))
copy(moreCollisions, someCollisions)
for len(moreCollisions) < minCollisions {
for _, s1 := range moreCollisions {
for _, s2 := range someCollisions {
moreCollisions = append(moreCollisions, s1+s2)
}
}
}
return moreCollisions
}
Now that we can generate many colliding String
s, let’s see what happens when we insert them into a HashMap
.
I used these keys to attack an endpoint of our software Cadenza. The attack was successful: all CPU cores at 100%, no termination in sight. Our program is broken.
Comparing a Minimal Proof of Concept to our Application
I started writing this blog post and told some of my friends about my findings. This piqued the interest of one friend in particular: Tom Ganz. He tried to reproduce this against just a HashMap<String, String>
and couldn’t.
Why is our application vulnerable?
The minimal proof of concept that Tom created boils down to:
Map<String, String> m = new HashMap<>();
for (var c : generateCollisions()) {
m.put(c, null);
}
This terminates even with 16e6 keys in a short amount of time. So what’s the difference to our application?
The endpoint under attack basically did something very similar but different. Instead of a String
key a small wrapper class around a String
was used (for improved type safety):
class FooName {
private final String name;
FooName(String name) { this.name = name; }
@Override
public boolean equals(Object o) {
// do the type equality thing
return name.equals(o.name);
}
@Override
public int hashCode() {
return name.hashCode();
}
}
Our first hunch was that the custom class plays a role, so Tom adjusted his proof of concept using such a wrapper class. The results were the same as with our application: inserting a large amount of colliding keys led to a basically unusable program.
So why are HashMap
s with String
keys not susceptible to collisions but wrapper classes are?
JEP 180 - Handle Frequent HashMap Collisions with Balanced Trees
The answer is quite simple: JEP 180 changed the HashMap
implementation to use a balanced tree instead of a linked list when there are many colliding keys.
This changes the complexity of searching a key in a bucket from O(n) to O(log n). The tree variant used is a Red-Black-Tree.
In order for this to work the class used for the key has to implement Comparable
. Otherwise we don’t know how to build a tree from the keys.
By adding a Comparable
implementation to the class FooName
we can take advantage of JEP 180. Thus making our program resilient against hash collisions again:
class FooName implements Comparable<FooName> {
// same as before
@Override
public int compareTo(FooName o) {
return this.name.compareTo(o.name);
}
}
Conclusion: When you put user defined, unrestricted inputs into a HashMap
ensure that the key implements Comparable
.
The title image by Nomen4Omen depicting a Red Black Tree is published under the CC BY-SA 4.0 license and was used without modifications.