Debugging Deadlocks Using Java Synchronization Aids
This article explains how to isolate and test concurrent operations that led to deadlock and demonstrates how CountDownLatch and CyclicBarrier could help achieve this.
Join the DZone community and get the full member experience.
Join For FreeOne of the most famous deadlocks is the one encountered in the well-known problem of the ‘dining philosophers.’ Briefly, it is said that ‘n’ philosophers sit at a round table aiming for Chinese food.
On the table, there are ‘n’ chopsticks, one between every two philosophers. As the venue is a pleasant and productive one, they are not only eating but also thinking, alternating between the two. In order to be able to eat, each needs to acquire two chopsticks first, eat, then put them back on the table and get back to thinking. Without getting into further details, one can easily observe that in the situation where each philosopher grabs the chopstick to his right and then waits for the one on the left without realizing the former, the deadlock appears.
Moving to threads, the simplest deadlock example is when one holds a lock forever while the others, aiming to acquire the same lock, block waiting. Simpler put, for two threads, if the former holds lock A and aims for lock B, while the latter holds lock B and aims for lock A, both will wait forever.
Similarly, at the database level, deadlocks appear when two or more concurrent transactions are blocked, each waiting for the others to release a resource they need.
Abstract
A while ago, I was confronted with a database deadlock that unfortunately appeared in one of our productions. The problem was identified and successfully fixed. Yet, what kept my attention during this investigation was the attempt to reproduce the issue in an isolated manner by trying to guarantee the concurrent execution of the actions deadlocking.
In this article, first a “straightforward” deadlock is fabricated, then the focus is set on how to execute two actions “as simultaneous as possible” by leveraging two Java synchronization aids: CyclickBarrier
and CountDownLatch
.
Set-Up
The proof of concept is built with:
- Java 21
- PostgreSQL 13
- PostgreSQL Driver version 42.7.5.
For simplicity, Spring Boot version 3.4.4, Flyway 10.20.1, and Apache Maven 3.9.9 are also used.
The Deadlock
Let Entity1
and Entity2
be two simple entities. They are similar and described only by a unique identifier (the primary key) and a text.
@Entity
@Table(name = "entity1")
public class Entity1 {
@Id
@Column(name = "id")
private Long id;
@Column(name = "text", nullable = false)
private String text;
...
}
@Entity
@Table(name = "entity2")
public class Entity2 {
@Id
@Column(name = "id")
private Long id;
@Column(name = "text", nullable = false)
private String text;
...
}
EntityProcessor
is a component that defines two operations, each running in its own transaction but involving the same two entities, of type Entity1
and of Entity2
respectively. Obviously, it is assumed they exist.
1. process1():
- begins the transaction
- reads Entity1 from the database
- modifies its text
- reads Entity2 from the database
- modifies its text
- commits the transaction
2. process2():
- begins the transaction
- reads Entity2 from the database
- modifies its text
- reads Entity1 from the database
- modifies its text
- commits the transaction
@Service
public class EntityProcessor {
private final Entity1Repository entity1Repo;
private final Entity2Repository entity2Repo;
public EntityProcessor(Entity1Repository entity1Repo, Entity2Repository entity2Repo) {
this.entity1Repo = entity1Repo;
this.entity2Repo = entity2Repo;
}
@Transactional
public void process1(long entity1Id, long entity2Id) {
final int index = 1;
processEntity1(index, entity1Id);
processEntity2(index, entity2Id);
}
@Transactional
public void process2(long entity1Id, long entity2Id) {
final int index = 2;
processEntity2(index, entity2Id);
processEntity1(index, entity1Id);
}
private void processEntity1(int index, long entityId) {
Entity1 entity1 = entity1Repo.findById(entityId)
.orElseThrow(() -> new RuntimeException("Entity1 not found"));
entity1.setText("Set by process " + index);
}
private void processEntity2(int index, long entityId) {
Entity2 entity2 = entity2Repo.findById(entityId)
.orElseThrow(() -> new RuntimeException("Entity2 not found"));
entity2.setText("Set by process " + index);
}
}
As part of this POC, the “business” goal is to modify the text of the two entities twice, irrespective of their order, but in parallel. The above operations (transactions) lead to a deadlock if executed concurrently due to the opposite lock sequences. To resolve the problem, or at least to significantly reduce the risk of blocking, both should lock the two entities in the same order.
When process1()
and process2()
(as defined above) are run in parallel, the below exception is obtained, highlighting the deadlock.
[virtual-61] DEBUG SqlExceptionHelper#could not execute statement [update entity1 set text=? where id=?]
org.postgresql.util.PSQLException: ERROR: deadlock detected
Detail: Process 36964 waits for ShareLock on transaction 29998; blocked by process 48636.
Process 48636 waits for ShareLock on transaction 29999; blocked by process 36964.
Hint: See server log for query details.
Where: while updating tuple (0,7) in relation "entity1"
at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2733)
at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:2420)
at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:372)
...
at com.hcd.deadlock.service.EntityProcessor$$SpringCGLIB$$0.process2(<generated>)
The stack trace clearly outlines the issue.
The Parallel Execution
The deadlock was introduced in the previous section. The main goal of this article is to create the proper setup for guaranteeing the actual concurrent execution of the two transactions.
For this experiment, two Java synchronization aids are used — CyclicBarrier
and CountDownLatch
. A unit test is created for each, with the following common setup.
@SpringBootTest
@Rollback(false)
class Test {
@Autowired
private EntityProcessor entityProcessor;
@Autowired
private Entity1Repository entity1Repo;
@Autowired
private Entity2Repository entity2Repo;
private Entity1 entity1;
private Entity2 entity2;
@BeforeEach
void setUp() {
entity1 = entity1Repo.save(new Entity1(1L));
entity2 = entity2Repo.save(new Entity2(2L));
}
@AfterEach
void tearDown() {
entity1Repo.delete(entity1);
entity2Repo.delete(entity2);
}
}
First Entity1
and Entity2
are created, then EntityProcessor#process1()
and EntityProcessor#process2()
are executed concurrently. Ultimately, the clean-up is done, and the two entities are deleted. Thus, the state of the database remains unaltered, as these tests are not transactional in order to try to simulate as much as possible real use-cases.
Aid 1: The CountDownLatch
A CountDownLatch
is “a synchronization aid that allows one or more threads to wait until a set of operations being performed in other threads completes” [1].
The test prepares and schedules two threads for executing EntityProcessor#process1()
and EntityProcessor#process2()
respectively and constructs a CountDownLatch
. The CountDownLatch
is initialized with a count of 1 and thus acts as a toggle or a simple gate.
The behavior of the threads is similar and described by the ProcessTask
Runnable
. They first invoke latch.await()
and consequently block into a waiting state, until “the gate” is opened by the main thread which invokes the latch.countDown()
, brings its count to zero and starts the actual parallel processing.
@Test
void run() {
CountDownLatch latch = new CountDownLatch(1);
try (ExecutorService exec = Executors.newVirtualThreadPerTaskExecutor()) {
Future<?> future1 = exec.submit(new ProcessTask(latch,
() -> entityProcessor.process1(entity1.getId(), entity2.getId())));
Future<?> future2 = exec.submit(new ProcessTask(latch,
() -> entityProcessor.process2(entity1.getId(), entity2.getId())));
latch.countDown();
future1.get();
future2.get();
} catch (ExecutionException | InterruptedException e) {
throw new RuntimeException(e);
}
log.info("All processors completed.");
}
private record ProcessTask(CountDownLatch latch, Runnable runnable) implements Runnable {
@Override
public void run() {
try {
latch.await();
} catch (InterruptedException e) {
throw new RuntimeException(e);
}
runnable.run();
}
}
Aid 2: The CyclicBarrier
A CyclicBarrier
“allows a set of threads to all wait for each other to reach a common barrier point” [2], in this case, the start of the actual processing – the execution of EntityProcessor#process1()
and EntityProcessor#process2()
.
Similarly as above, the test prepares and schedules the two threads and constructs a CyclicBarrier
with three parties — the two tasks and the main thread — then calls barrier.await()
– call number one. The behavior of the threads is similar to the former and described by the ProcessTask
Runnable
. They first invoke barrier.await()
— call number two and number three. Once all three parties call barrier.await()
, they will all proceed together and start the actual parallel processing.
@Test
void run() {
CyclicBarrier barrier = new CyclicBarrier(3);
try (ExecutorService exec = Executors.newVirtualThreadPerTaskExecutor()) {
Future<?> future1 = exec.submit(new ProcessTask(barrier,
() -> entityProcessor.process1(entity1.getId(), entity2.getId())));
Future<?> future2 = exec.submit(new ProcessTask(barrier,
() -> entityProcessor.process2(entity1.getId(), entity2.getId())));
barrier.await();
future1.get();
future2.get();
} catch (ExecutionException | InterruptedException | BrokenBarrierException e) {
throw new RuntimeException(e);
}
log.info("All processors completed.");
}
private record ProcessTask(CyclicBarrier barrier, Runnable runnable) implements Runnable {
@Override
public void run() {
try {
barrier.await();
} catch (InterruptedException | BrokenBarrierException e) {
log.error("Could not await().", e);
}
runnable.run();
}
}
Takeaways
In the case of the CountDownLatch
, the processing threads run independently and in parallel, but start the actual processing — runnable.run()
— only when released by the main thread, which decrements the latch to zero.
When using the CyclicBarrier
, the situation is very similar, each of the parties wait at the barrier and the last one that arrives, releases them.
Unlike CountDownLatch
, which is a one-time use only, a CyclicBarrier
may be reused multiple times, making it particularly useful when having multiple phases where a set of threads must wait for each other and synchronize repeatedly.
This article presents two ways of testing deadlock situations in isolation, which might prove quite useful when such issues are detected.
When executing either of the above tests, the deadlock appears, and the corresponding exception is thrown. Nevertheless, although the purpose here is not to describe how to resolve deadlocks in general, it is worth reiterating that ordering concurrent lock operations resolves such problems or significantly reduces the risk of deadlocking.
Resources
- [1], [2] – Java 21 Reference Documentation
- [3] – POC Source Code
Published at DZone with permission of Horatiu Dan. See the original article here.
Opinions expressed by DZone contributors are their own.
Comments