Beware Of findFirst() And findAny()
After filtering a Java 8 Stream
it is common to use findFirst()
or findAny()
to get the element that survived the filter.
But that might not do what you really meant and subtle bugs can ensue.
What’s Wrong With Them?
As we can see from their Javadoc both methods return an arbitrary element from the stream—unless the stream has an encounter order, in which case findFirst()
returns the first element.
Easy.
A simple example looks like this:
public Optional<Customer> findCustomer(String customerId) {
return customers.stream()
.filter(customer -> customer.getId().equals(customerId))
.findFirst();
}
Of course this is just the fancy version of the good old for-each-loop:
public Optional<Customer> findCustomer(String customerId) {
for (Customer customer : customers)
if (customer.getId().equals(customerId))
return Optional.of(customer);
return Optional.empty();
}
But both variants contain the same potential bug: they are built on the implicit assumption that there can only be one customer with any given ID.
Now, this might be a very reasonable assumption. Maybe this is a known invariant, guarded by dedicated parts of the system, relied upon by others. In that case this is totally fine.
Often the code relies on a unique matching element but does nothing to assert this.
But in many cases I see out in the wild, it is not. Maybe the customers were just loaded from an external source that makes no guarantees about the uniqueness of their IDs. Maybe an existing bug allowed two books with the same ISBN. Maybe the search term allows surprisingly many unforeseen matches (did anyone say regular expressions?).
Often the code’s correctness relies on the assumption that there is a unique element matching the criteria but it does nothing to enforce or assert this.
Worse, the misbehavior is entirely data-driven, which might hide it during testing. Unless we have this scenario in mind, we might simply overlook it until it manifests in production.
Even worse, it fails silently! If the assumption that there is only one such element proves to be wrong, we won’t notice this directly. Instead the system will misbehave subtly for a while before the effects are observed and the cause can be identified.
So of course there is nothing inherently wrong with findFirst()
and findAny()
.
But it is easy to use them in a way that leads to bugs within the modeled domain logic.
Failing Fast
So let’s fix this! Say we’re pretty sure that there’s at most one matching element and we would like the code to fail fast if there isn’t. With a loop we have to manage some ugly state and it would look as follows:
public Optional<Customer> findOnlyCustomer(String customerId) {
boolean foundCustomer = false;
Customer resultCustomer = null;
for (Customer customer : customers)
if (customer.getId().equals(customerId))
if (!foundCustomer) {
foundCustomer = true;
resultCustomer = customer;
} else {
throw new DuplicateCustomerException();
}
return foundCustomer
? Optional.of(resultCustomer)
: Optional.empty();
}
Now, streams give us a much nicer way.
We can use the often neglected reduce
about which the documentation says:
Performs a reduction on the elements of this stream, using an associative accumulation function, and returns an Optional describing the reduced value, if any.
It even gives a code snippet to which reduce
is equivalent:
boolean foundAny = false;
T result = null;
for (T element : this stream) {
if (!foundAny) {
foundAny = true;
result = element;
}
else
result = accumulator.apply(result, element);
}
return foundAny ? Optional.of(result) : Optional.empty();
Doesn’t that look similar to our loop above?! Crazy coincidence…
So all we need is an accumulator that throws the desired exception as soon as it is called:
public Optional<Customer> findOnlyCustomerWithId_manualException(String customerId) {
return customers.stream()
.filter(customer -> customer.getId().equals(customerId))
.reduce((element, otherElement) -> {
throw new DuplicateCustomerException();
});
}
This looks a little strange but it does what we want. To make it more readable, we should put it into a Stream utility class and give it a nice name:
public static <T> BinaryOperator<T> toOnlyElement() {
return toOnlyElementThrowing(IllegalArgumentException::new);
}
public static <T, E extends RuntimeException> BinaryOperator<T>
toOnlyElementThrowing(Supplier<E> exception) {
return (element, otherElement) -> {
throw exception.get();
};
}
Now we can call it as follows:
// if a generic exception is fine
public Optional<Customer> findOnlyCustomer(String customerId) {
return customers.stream()
.filter(customer -> customer.getId().equals(customerId))
.reduce(toOnlyElement());
}
// if we want a specific exception
public Optional<Customer> findOnlyCustomer(String customerId) {
return customers.stream()
.filter(customer -> customer.getId().equals(customerId))
.reduce(toOnlyElementThrowing(
DuplicateCustomerException::new));
}
How is that for intention revealing code?
It should be noted that, unlike findFirst
and findAny
, this is of course no short-circuiting operation and will materialize the entire stream.
That is, if there is indeed only one element. The processing of course stops as soon as a second element is encountered.
Summary
We have seen how findFirst
and findAny
do not suffice to express the assumption that there is at most one element left in the stream.
If we want to express that assumption and make sure the code fails fast if it is violated, we need to reduce(toOnlyElement())
.
You can find the code on GitHub and use it as you like—it is in the public domain.
Thanks to Boris Terzic for making me aware of this intention mismatch in the first place. The title image Stand Out from The Crowd was published by Steven Depolo under CC-BY 2.0.