Everything About Transactions in Java: From ACID to Distributed Systems
Table of contents
Transactions are invisible when they work correctly and impossible to explain when they fail. A transaction bug in a payment system can make money disappear, appear twice, or leave an order and its payment in an inconsistent state without throwing a single exception. Most bugs come not from a missing @Transactional annotation, but from not understanding what that annotation actually does, when it stops working, and why.
This post explains transactions from first principles, not from the API. When you understand why databases need transactions, why ACID is designed the way it is, and why Spring chose a proxy-based approach, every pitfall becomes obvious rather than mysterious.
Part 1: Why Transactions Exist
The root problem: concurrent access and partial failure
Start with the simplest case: transferring money between two accounts.
-- Transfer 500,000 from account A to account B
UPDATE accounts SET balance = balance - 500000 WHERE id = 'A';
UPDATE accounts SET balance = balance + 500000 WHERE id = 'B';
Two separate SQL statements. What happens if:
- The server crashes after the first statement? Account A loses 500,000 and account B receives nothing.
- The second statement fails due to a constraint violation? Same outcome.
- Two users simultaneously read A’s balance, both see enough funds, and both transfer? A is debited twice.
User 1: reads A.balance = 1,000,000 |
|
User 2: reads A.balance = 1,000,000 |
|
User 1: writes A.balance = 500,000 | <- deducts 500k
|
User 2: writes A.balance = 500,000 | <- deducts another 500k but read from 1,000,000!
|
Actual: A.balance = 500,000 | (should be 0 if both transfers were valid)
or A.balance = -500,000 | (if no check exists)
This is a lost update: one of several race conditions that transactions solve.
Why “just execute SQL” is not enough
Before transactions existed (and in systems designed incorrectly today), application-level locking was the solution: set a flag in the database, check the flag before writing, clear it afterward. The problems:
- The flag is not atomic with the write: the check and the write are two separate operations, leaving a race condition window.
- Crashes are not handled: if the application crashes after setting the flag but before clearing it, the flag is stuck permanently.
- Composition is impossible: two independent code paths do not know they are sharing the same resource.
- No portability: every team implements locking differently.
Transactions solve all of this at the database layer, where there is enough information and control to do it correctly.
E-commerce order creation: real failure modes
// Without transaction -- real code from a production legacy system:
public void placeOrder(OrderRequest request) {
// Step 1: Create order
long orderId = orderDao.insert(request);
// Step 2: Deduct inventory
for (OrderItem item : request.getItems()) {
inventoryDao.deduct(item.getProductId(), item.getQuantity());
// If the third deduct fails (out of stock) -> order created, 2 items already deducted
// Database is in an inconsistent state
}
// Step 3: Create payment record
paymentDao.createPending(orderId, request.getTotalAmount());
// If this fails -> order created, inventory deducted, no payment record
}
Each failure point leaves the system in a different broken state. There is no way to know which state the system is in after a failure. This is why transactions exist: group multiple operations into a single unit where either everything succeeds or everything is rolled back to the initial state.
Part 2: Understanding ACID Correctly
Atomicity: all or nothing
Why it exists: Partial updates are inconsistent states with no automatic recovery path.
How the database implements it: Write-Ahead Log (WAL). Before changing data on disk, PostgreSQL writes a log entry to the WAL buffer. The WAL is flushed to disk before a transaction commits. On crash:
Transaction running -> crash -> restart
PostgreSQL reads WAL
If it sees BEGIN but no COMMIT -> undo all changes from that transaction
If it sees COMMIT -> redo if data was not yet flushed to disk
WAL entries:
[BEGIN txn=1234]
[UPDATE accounts SET balance=500000 WHERE id='A', old_val=1000000]
[UPDATE accounts SET balance=1500000 WHERE id='B', old_val=1000000]
[COMMIT txn=1234]
Crash after COMMIT entry -> recovery redoes both UPDATEs
Crash before COMMIT entry -> recovery undoes both UPDATEs
Common misconception: Atomicity does not guarantee performance. A transaction with 10,000 UPDATE statements is still atomic: all 10,000 or none.
Consistency: the database does not violate its own rules
Why it exists: Constraints (NOT NULL, FOREIGN KEY, CHECK) only have meaning if they are always true, not just usually true.
Important misconception: Consistency in ACID refers to database consistency, not consistency in distributed systems. It means database rules (constraints, triggers) are not violated after the transaction completes. This is the weakest property in ACID. Application code can produce data that is consistent from the database’s perspective but inconsistent from a business logic perspective.
-- Database "consistent" but business logic is wrong:
BEGIN;
UPDATE accounts SET balance = -500000 WHERE id = 'A'; -- Valid if no CHECK constraint
COMMIT;
-- Database is OK (no constraint violated), but a negative balance is wrong for business
Isolation: concurrent transactions do not see each other
Why it exists: Multiple transactions running concurrently must behave as if they run sequentially. Without isolation, every race condition is possible.
How PostgreSQL implements it: MVCC (Multi-Version Concurrency Control). Instead of locking data on reads, PostgreSQL keeps multiple versions of each row. Each transaction sees a snapshot of the database at the moment it started (or when the statement started, depending on isolation level).
Row "account A" in MVCC:
Version 1: balance=1,000,000 (created_by=txn_100, deleted_by=txn_200)
Version 2: balance=500,000 (created_by=txn_200, deleted_by=null)
Transaction txn_300 starts while txn_200 is running:
-> txn_300 sees Version 1 (txn_200 has not committed yet)
-> txn_300 reads balance=1,000,000
Transaction txn_400 starts after txn_200 commits:
-> txn_400 sees Version 2
-> txn_400 reads balance=500,000
Durability: committed data is not lost
Why it exists: RAM is volatile. After telling a user “transaction committed,” the database must guarantee the data survives a crash.
How it is implemented: WAL flush-to-disk before the commit returns. The fsync() system call ensures the OS does not buffer the write. This is why commits are slower than ordinary writes: they must wait for disk I/O.
Trade-off: synchronous_commit = off in PostgreSQL increases throughput but may lose roughly 1 to 2 seconds of data after a crash. Acceptable for session logs, not acceptable for financial data.
Transaction lifecycle: from BEGIN to COMMIT
Application PostgreSQL
| |
|-- BEGIN ---------->| PostgreSQL assigns a transaction ID (XID)
| | Snapshot created (depending on isolation level)
| |
|-- UPDATE --------->| Change written to shared buffer (memory)
| | WAL entry written to WAL buffer
| | Lock acquired on affected rows
| |
|-- SELECT --------->| Read from snapshot, uncommitted changes not visible
| |
|-- COMMIT --------->| WAL buffer flushed to disk (fsync)
| | Locks released
| | XID marked as committed in pg_xact
|<-- OK -------------|
| |
| | [Background] VACUUM cleans up old versions
Part 3: Concurrency Problems Transactions Solve
Lost update
Two transactions read the same value, both modify it, and both write back. One write overwrites the other.
Time T1 (User 1 votes) T2 (User 2 votes) DB
1 READ votes = 100
2 READ votes = 100
3 WRITE votes = 101
4 WRITE votes = 101 <- T1's update lost!
5 COMMIT
Result: votes = 101 (should be 102)
-- Reproduce in PostgreSQL (isolation: READ COMMITTED)
-- Session 1:
BEGIN;
SELECT vote_count FROM posts WHERE id = 1; -- 100
-- (pause)
UPDATE posts SET vote_count = 101 WHERE id = 1;
COMMIT;
-- Session 2 (running concurrently):
BEGIN;
SELECT vote_count FROM posts WHERE id = 1; -- 100
UPDATE posts SET vote_count = 101 WHERE id = 1; -- Overwrites Session 1!
COMMIT;
Fix: Atomic update (no read before write), optimistic locking, or SELECT FOR UPDATE.
-- Atomic update -- no race condition
UPDATE posts SET vote_count = vote_count + 1 WHERE id = 1;
Dirty read
A transaction reads data written by a transaction that has not yet committed. If that other transaction rolls back, the data you read never actually existed.
Time T1 (Payment processing) T2 (Fraud check) DB
1 BEGIN
2 UPDATE payment SET status='PROCESSING'
3 BEGIN
4 READ payment status = 'PROCESSING' <- Dirty read!
5 (decides: not fraud, allow)
6 -- payment gateway fails
7 ROLLBACK
8 (fraud check based on data that never existed)
PostgreSQL has no dirty reads at any isolation level. The minimum is READ COMMITTED. MySQL has Read Uncommitted but it is almost never used.
Non-repeatable read
The same query within the same transaction, run twice, returns different results.
Time T1 (Report generation) T2 (Order update) DB
1 BEGIN
2 SELECT total FROM orders WHERE id=1 -> 500k
3 BEGIN
4 UPDATE orders SET total=600k WHERE id=1
5 COMMIT
6 SELECT total FROM orders WHERE id=1 -> 600k <- Different from first read!
7 COMMIT
T1 sees two different values for the same row within the same transaction.
Production impact: A report running inside a long transaction produces a final total that does not match the sum of its parts, because data changed while the report was running.
Phantom read
A query returns a different set of rows when run a second time, because another transaction inserted or deleted rows.
Time T1 (Inventory check) T2 (New reservation) DB
1 BEGIN
2 SELECT COUNT(*) FROM reservations
WHERE product_id=5 AND date='2026-06-07' -> 9 (max is 10)
3 BEGIN
4 INSERT INTO reservations (product_id, date)
VALUES (5, '2026-06-07')
5 COMMIT
6 SELECT COUNT(*) FROM reservations
WHERE product_id=5 AND date='2026-06-07' -> 10 <- Phantom row!
7 (T1 sees a slot available -> books -> slot was already full)
8 INSERT INTO reservations... <- Overbooking!
Phantom reads involve new rows. Non-repeatable reads involve existing rows being modified.
Part 4: Isolation Levels Deep Dive
The four isolation levels
| Level | Dirty Read | Non-Repeatable Read | Phantom Read |
|---|---|---|---|
| READ UNCOMMITTED | Possible | Possible | Possible |
| READ COMMITTED | No | Possible | Possible |
| REPEATABLE READ | No | No | Possible* |
| SERIALIZABLE | No | No | No |
*PostgreSQL REPEATABLE READ actually prevents phantom reads due to MVCC snapshot-based isolation.
READ COMMITTED: PostgreSQL default
Each statement takes a fresh snapshot. The transaction sees all commits that happened before that statement started.
// Spring Boot -- default isolation (READ COMMITTED)
@Transactional
public ReportResult generateReport(Long merchantId) {
// Statement 1 takes a snapshot at t=100
long orderCount = orderRepo.countByMerchant(merchantId);
// Between the two queries, another transaction commits more orders
// Statement 2 takes a new snapshot at t=105
BigDecimal totalRevenue = orderRepo.sumRevenueByMerchant(merchantId);
// orderCount and totalRevenue may be inconsistent with each other!
return new ReportResult(orderCount, totalRevenue);
}
When it is acceptable: Most OLTP operations, short reads, single-row updates. When the transaction is brief, the window for inconsistency is small.
When it is not acceptable: Reports, audits, any operation that reads many rows and requires a consistent view of the entire dataset.
REPEATABLE READ
The entire transaction uses the same snapshot, taken at the moment the transaction started.
@Transactional(isolation = Isolation.REPEATABLE_READ)
public ReportResult generateReport(Long merchantId) {
// Both queries use the snapshot taken at BEGIN
long orderCount = orderRepo.countByMerchant(merchantId); // consistent
BigDecimal totalRevenue = orderRepo.sumRevenueByMerchant(merchantId); // consistent
// Guaranteed: orderCount and totalRevenue are consistent with each other
return new ReportResult(orderCount, totalRevenue);
}
Trade-off: A longer transaction means old row versions must be kept longer. VACUUM cannot clean them up. This causes table bloat. In PostgreSQL, long REPEATABLE READ transactions make autovacuum ineffective, and the table grows over time.
PostgreSQL vs MySQL:
- PostgreSQL REPEATABLE READ: snapshot-based, does not use range locks, does not block concurrent writes.
- MySQL InnoDB REPEATABLE READ: uses gap locks for range queries, can block inserts from concurrent transactions.
SERIALIZABLE: strongest and most expensive
Transactions behave as if they run completely sequentially, even if they are actually concurrent. PostgreSQL implements this with SSI (Serializable Snapshot Isolation), which detects serialization conflicts and aborts one transaction if needed.
@Transactional(isolation = Isolation.SERIALIZABLE)
public void bookLastTicket(Long eventId, Long userId) {
int available = ticketRepo.countAvailable(eventId);
if (available > 0) {
ticketRepo.createBooking(eventId, userId);
ticketRepo.decrementAvailable(eventId);
}
}
// If two users call this simultaneously:
// PostgreSQL detects a serialization conflict -> aborts one transaction
// Application receives SerializationFailureException -> must retry
When SERIALIZABLE is needed:
- Booking and reservation systems (prevent overbooking)
- Complex financial operations
- Any operation with complex read-then-write dependencies
Trade-offs:
- Throughput decreases due to conflict detection overhead
- Application must implement retry logic
- Abort rate increases under high contention
Choosing the right isolation level in Spring
// No setting -> uses database default (PostgreSQL: READ COMMITTED)
@Transactional
public void defaultMethod() { ... }
// Explicit
@Transactional(isolation = Isolation.REPEATABLE_READ)
public void consistentReport() { ... }
// Quarkus -- similar approach
@Transactional
@io.quarkus.narayana.jta.QuarkusTransaction
public void quarkusMethod() { ... }
// Isolation level via JPA properties or raw JDBC
Part 5: Spring Transaction Management
Why Spring created @Transactional
Before Spring, transaction management was written by hand:
// Before Spring -- JDBC boilerplate:
Connection conn = dataSource.getConnection();
conn.setAutoCommit(false);
try {
// business logic
conn.commit();
} catch (Exception e) {
conn.rollback();
throw e;
} finally {
conn.setAutoCommit(true);
conn.close();
}
The problem: this boilerplate must be repeated in every service method. If method A calls method B and both need a transaction, the Connection object must be threaded through the entire call chain. Not maintainable.
Spring solves this with AOP: the @Transactional annotation causes a Spring interceptor to automatically wrap the method in transaction management code.
The proxy mechanism: what actually happens
@Service
public class OrderService {
@Transactional
public void placeOrder(OrderRequest request) {
// Business logic
}
}
// When Spring starts, it creates:
class OrderService$$SpringProxy extends OrderService {
@Override
public void placeOrder(OrderRequest request) {
TransactionStatus status = transactionManager.getTransaction(txDef);
try {
super.placeOrder(request); // calls the real method
transactionManager.commit(status);
} catch (RuntimeException e) {
transactionManager.rollback(status);
throw e;
}
}
}
// The injected bean is the proxy, not OrderService directly:
@Autowired
private OrderService orderService; // Actually OrderService$$SpringProxy
Component A Spring Proxy OrderService
| | |
|-- placeOrder() -->| |
| |-- getTransaction() ------>| (TransactionManager)
| |<-- TransactionStatus -----|
| | |
| |-- super.placeOrder() ---->|
| | |-- (SQL operations)
| |<-- return ----------------|
| | |
| |-- commit() -------------->|
|<-- return --------| |
JpaTransactionManager vs DataSourceTransactionManager
// DataSourceTransactionManager -- used with plain JDBC
@Bean
public PlatformTransactionManager transactionManager(DataSource dataSource) {
return new DataSourceTransactionManager(dataSource);
}
// JpaTransactionManager -- used with Hibernate/JPA
// Spring Boot auto-configures this if JPA dependency is present
@Bean
public PlatformTransactionManager transactionManager(EntityManagerFactory emf) {
JpaTransactionManager tm = new JpaTransactionManager(emf);
tm.setEntityManagerFactory(emf);
return tm;
}
// JpaTransactionManager synchronizes the JPA EntityManager with the JDBC transaction
// -> Both JPA and JDBC operations in the same transaction use the same Connection
An important hidden cost: JpaTransactionManager must create an EntityManager for every transaction. EntityManager creation is not free: it allocates a persistence context, flush queue, and related structures. At thousands of transactions per second, this overhead is significant.
Part 6: Common @Transactional Pitfalls
Pitfall 1: Self-invocation problem
@Service
public class OrderService {
@Transactional
public void placeOrder(OrderRequest request) {
createOrder(request);
reserveInventory(request);
}
@Transactional(propagation = Propagation.REQUIRES_NEW)
public void reserveInventory(OrderRequest request) {
// Intended to run in a separate transaction for independent logging
// BUT: called from the same class -> bypasses the proxy -> no new transaction!
inventoryRepo.reserve(request);
auditLog.record("RESERVED", request);
}
}
// Why? Because this.reserveInventory() calls the method directly on the object,
// not through the Spring proxy. The proxy only intercepts calls from outside the class.
Fix 1: Inject self
@Service
public class OrderService {
@Autowired
private OrderService self; // inject the proxy of this bean
@Transactional
public void placeOrder(OrderRequest request) {
createOrder(request);
self.reserveInventory(request); // call through proxy -> REQUIRES_NEW takes effect
}
@Transactional(propagation = Propagation.REQUIRES_NEW)
public void reserveInventory(OrderRequest request) {
inventoryRepo.reserve(request);
}
}
Fix 2: Extract to a separate class (preferred)
@Service
public class OrderService {
@Autowired
private InventoryService inventoryService;
@Transactional
public void placeOrder(OrderRequest request) {
createOrder(request);
inventoryService.reserveInventory(request); // different bean -> goes through proxy
}
}
@Service
public class InventoryService {
@Transactional(propagation = Propagation.REQUIRES_NEW)
public void reserveInventory(OrderRequest request) {
inventoryRepo.reserve(request);
}
}
Pitfall 2: Private methods do not have transactions
@Service
public class OrderService {
@Transactional // <- HAS NO EFFECT on private methods
private void createOrderInternal(OrderRequest request) {
// Spring proxy cannot override private methods
// -> no transaction
}
public void placeOrder(OrderRequest request) {
createOrderInternal(request); // Called directly, not through proxy
}
}
Spring proxy uses subclassing (CGLIB) or interface proxy (JDK dynamic proxy). Neither can override private methods. The compiler does not warn, the runtime does not throw, and the transaction annotation is silently ignored.
Fix: Methods that need transactions must be public (or at least protected with CGLIB).
Pitfall 3: Checked exceptions do not trigger rollback
@Service
public class PaymentService {
@Transactional
public void processPayment(PaymentRequest request) throws PaymentException {
chargeCard(request);
updateBalance(request);
// If chargeCard throws PaymentException (checked) -> NO rollback!
// Spring defaults to rolling back only for RuntimeException and Error
}
}
// Fix:
@Transactional(rollbackFor = PaymentException.class)
public void processPayment(PaymentRequest request) throws PaymentException {
chargeCard(request);
updateBalance(request);
}
// Or: rollback for all Exception types
@Transactional(rollbackFor = Exception.class)
public void processPayment(PaymentRequest request) throws Exception { ... }
Why does Spring not roll back for checked exceptions by default? It is a Spring design decision based on Java convention: checked exceptions represent expected, recoverable errors; unchecked exceptions represent unexpected, non-recoverable failures. In practice this convention is inconsistently applied, so always be explicit about rollbackFor.
Pitfall 4: @Async and @Transactional cannot be combined
@Service
public class NotificationService {
@Async
@Transactional
public void sendConfirmationEmail(Long orderId) {
Order order = orderRepo.findById(orderId).orElseThrow();
// order.getItems() -> may throw LazyInitializationException
// because the method runs on a different thread with no HTTP request context,
// and OSIV will not work correctly here
emailService.send(buildEmail(order));
}
}
// @Async runs on a separate thread pool -> transaction context is not propagated
// The new thread has no TransactionSynchronizationManager binding
// -> @Transactional is ignored or creates an unrelated new transaction
Fix: Do not mix @Async with @Transactional. Fetch data inside a transaction first, then pass a DTO into the async method.
@Service
public class OrderService {
@Transactional
public void confirmOrder(Long orderId) {
Order order = orderRepo.findById(orderId)
.orElseThrow();
order.confirm();
OrderConfirmedDto dto = OrderConfirmedDto.from(order); // Fetch all needed data
notificationService.sendConfirmationEmail(dto); // Pass DTO, not entity
}
}
@Service
public class NotificationService {
@Async // No @Transactional needed -- no DB operations
public void sendConfirmationEmail(OrderConfirmedDto dto) {
emailService.send(buildEmail(dto));
}
}
Pitfall 5: Long-running transactions hold locks
@Transactional // long transaction = connection and locks held for a long time
public OrderResult processLargeOrder(LargeOrderRequest request) {
// Step 1: Validate (fast, ~5ms)
validateRequest(request);
// Step 2: Call external pricing API (slow, ~200ms)
PricingResult pricing = externalPricingService.calculate(request); // External call!
// Step 3: Call inventory API (slow, ~150ms)
InventoryResult inventory = externalInventoryService.check(request); // External call!
// Step 4: Database operations (fast, ~10ms)
Order order = createOrder(request, pricing, inventory);
// Total transaction time: ~365ms
// Connection held: 365ms
// Locks held: 365ms
// HikariCP pool exhaustion under concurrent requests!
}
Fix: Only wrap database operations in the transaction. External calls go outside.
public OrderResult processLargeOrder(LargeOrderRequest request) {
validateRequest(request);
// External calls OUTSIDE the transaction
PricingResult pricing = externalPricingService.calculate(request); // 200ms, no lock held
InventoryResult inventory = externalInventoryService.check(request); // 150ms, no lock held
// Only DB operations inside the transaction
return createOrderTransactionally(request, pricing, inventory); // ~10ms
}
@Transactional
private OrderResult createOrderTransactionally(LargeOrderRequest request,
PricingResult pricing,
InventoryResult inventory) {
return createOrder(request, pricing, inventory);
}
Pitfall 6: Nested @Transactional misunderstanding
@Service
public class OrderService {
@Transactional
public void placeOrder(OrderRequest request) {
Order order = createOrder(request);
try {
auditService.logOrderCreated(order.getId()); // REQUIRES_NEW
} catch (Exception e) {
log.warn("Audit logging failed, continuing...");
// Assumes audit failure does not affect the order
}
// Order still commits because audit runs in a separate transaction
// This is CORRECT when using REQUIRES_NEW
}
}
@Service
public class AuditService {
@Transactional(propagation = Propagation.REQUIRED) // <- BUG: REQUIRED instead of REQUIRES_NEW
public void logOrderCreated(Long orderId) {
auditRepo.save(new AuditLog(orderId, "ORDER_CREATED"));
// If this rolls back -> it rolls back the caller's order transaction too!
}
}
// With REQUIRED: auditService runs in the SAME transaction as orderService
// If auditService throws and the caller catches -> the transaction is still marked "rollback-only"
// The whole transaction rolls back on commit
// Fix: REQUIRES_NEW for audit, because audit failure should not roll back the business operation
@Transactional(propagation = Propagation.REQUIRES_NEW)
public void logOrderCreated(Long orderId) {
auditRepo.save(new AuditLog(orderId, "ORDER_CREATED"));
}
Part 7: Transaction Propagation
REQUIRED: default, join if present, create if not
@Transactional(propagation = Propagation.REQUIRED) // default
public void methodA() {
methodB(); // methodB joins methodA's transaction
}
@Transactional(propagation = Propagation.REQUIRED)
public void methodB() {
// If methodA has a transaction -> methodB uses the same one
// If there is no transaction -> methodB creates a new one
}
Caller has a transaction: Caller has no transaction:
[---methodA txn---] [---methodB txn (new)---]
[methodB joins] methodB
When to use: Default for service methods. Business operations must be atomic with their caller.
A subtle problem: If methodB rolls back (by throwing an unchecked exception), it rolls back the entire transaction including methodA’s work, even if methodA catches the exception. The transaction is marked rollback-only and cannot be committed afterward.
@Transactional
public void methodA() {
try {
methodB(); // methodB throws and triggers rollback
} catch (Exception e) {
// Assumes the exception was handled and methodA continues
}
orderRepo.save(order); // <- throws UnexpectedRollbackException!
// The transaction was already marked rollback-only by methodB
}
REQUIRES_NEW: always create a new transaction
@Transactional(propagation = Propagation.REQUIRES_NEW)
public void auditLog(String action) {
// Suspends the caller's transaction if present
// Creates a new, completely independent transaction
// Commit or rollback does not affect the caller's transaction
auditRepo.save(new AuditEntry(action));
}
[---caller txn (suspended)---]
[---new txn---]
[---commit or rollback---]
[---caller txn (resumed)------]
When to use:
- Audit logging (audit must not be lost when the business transaction rolls back)
- Sending notifications (send only after confirmation)
- Operations that must commit independently
Trade-off: Consumes an additional connection from the pool for the new transaction, because the old transaction is still holding its connection. With deeply nested REQUIRES_NEW calls, the connection pool can be exhausted.
NESTED: savepoint within the parent transaction
@Transactional(propagation = Propagation.NESTED)
public void nestedOperation() {
// Creates a SAVEPOINT inside the parent transaction
// Rollback returns only to the SAVEPOINT, not the full transaction
// Commit only happens when the parent transaction commits
}
[---outer txn---]
SAVEPOINT sp1
[---nested operation---]
RELEASE SAVEPOINT sp1 (if successful)
or ROLLBACK TO SAVEPOINT sp1 (if failed)
[---outer txn commits or rolls back entirely---]
When to use: Partial retry in batch processing: if one item fails, roll back only that item and continue with the rest.
Do not use when: True isolation is needed. NESTED is still inside the outer transaction and does not create an independent transaction like REQUIRES_NEW.
SUPPORTS, NOT_SUPPORTED, MANDATORY, NEVER
// SUPPORTS: Use a transaction if one exists, do not create one if not
@Transactional(propagation = Propagation.SUPPORTS)
public List<Order> readOrders() {
// Suitable for read operations that do not need a transaction
// but benefit from one if available
}
// NOT_SUPPORTED: Suspend the transaction if one exists, run without one
@Transactional(propagation = Propagation.NOT_SUPPORTED)
public void longRunningRead() {
// Suspends the caller's transaction to avoid holding the connection
// No transaction needed for a pure read
}
// MANDATORY: A transaction from the caller is required, throws if absent
@Transactional(propagation = Propagation.MANDATORY)
public void criticalDbWrite() {
// Ensures callers do not call this method outside a transaction
// Fail fast instead of data corruption
}
// NEVER: Must not have a transaction, throws if one is present
@Transactional(propagation = Propagation.NEVER)
public void cacheLoad() {
// Ensures this operation is not wrapped in a transaction
// Example: cache warming does not need (and should not have) transaction overhead
}
Part 8: Hibernate and JPA Transaction Internals
Persistence context: the center of everything
The persistence context is an identity map and change tracker for all entities within a unit of work. Every entity loaded in a session is tracked.
@Transactional
public void updateOrder(Long orderId) {
// Load entity -> Hibernate stores a reference in the persistence context
Order order = orderRepo.findById(orderId).orElseThrow();
// Hibernate also stores a snapshot of the initial state
// Modify the entity
order.setStatus("PROCESSING");
order.setProcessedAt(Instant.now());
// No save() needed -- Hibernate detects changes automatically on flush
// (dirty checking at flush time)
}
// When the @Transactional method returns -> flush -> commit
// Hibernate compares current state vs snapshot -> generates UPDATE SQL
// -> UPDATE orders SET status='PROCESSING', processed_at=... WHERE id=?
Entity states and transitions
[New / Transient]
| entityManager.persist(entity)
v
[Managed / Persistent] <--- entityManager.find() / JPQL query
| (inside an active session)
| entityManager.detach(entity)
| session closes
v
[Detached]
| entityManager.merge(detachedEntity)
v
[Managed / Persistent]
| entityManager.remove(entity)
v
[Removed]
| flush / commit
v
[Deleted from DB]
@Transactional
public void demonstrateStates(Long orderId) {
// MANAGED state -- Hibernate tracks all changes
Order managed = em.find(Order.class, orderId);
managed.setStatus("UPDATED"); // Will generate UPDATE on flush
// DETACHED -- Hibernate does not track
em.detach(managed);
managed.setStatus("DETACHED_CHANGE"); // No SQL generated
// MERGE -- Hibernate reloads and applies changes
Order reattached = em.merge(managed);
// Hibernate executes: SELECT * FROM orders WHERE id=?
// Then applies state from the managed object -> generates UPDATE
}
Flush: when does SQL actually run?
This is the source of many confusing bugs: SQL does not run immediately when you call save() or persist().
@Transactional
public void bugScenario() {
Order order = new Order("PENDING");
orderRepo.save(order); // SQL has NOT run yet! Only queued in the persistence context
// If you query immediately using native SQL:
int count = jdbcTemplate.queryForObject(
"SELECT COUNT(*) FROM orders WHERE status='PENDING'", Integer.class);
// count may be 0 because the INSERT has not been flushed to the DB yet!
}
FlushMode:
AUTO(default): flush before each query (so the query sees pending changes) and before commitCOMMIT: flush only before commit, does not guarantee queries see the latest changesMANUAL: flush only whenentityManager.flush()is called explicitly
// Manual flush for precise control:
@Transactional
public void batchInsert(List<OrderData> data) {
for (int i = 0; i < data.size(); i++) {
Order order = new Order(data.get(i));
em.persist(order);
if (i % 50 == 0) {
em.flush(); // Flush batch to DB
em.clear(); // Clear persistence context to avoid memory bloat
}
}
}
Dirty checking: the hidden cost of convenience
Hibernate detects changes by comparing the current state with the snapshot stored when the entity was loaded. This comparison happens at every flush.
@Transactional
public void loadManyEntities() {
// Load 10,000 orders into the persistence context
List<Order> orders = orderRepo.findAll();
// Hibernate stores 10,000 snapshots
// ... some logic that does not modify orders ...
// On flush: Hibernate compares 10,000 current states vs 10,000 snapshots
// Even if nothing changed, the overhead still occurs
}
// Fix: @Immutable or StatelessSession for read-only bulk operations
@Entity
@org.hibernate.annotations.Immutable
public class ProductCatalog {
// Never modified -> Hibernate skips dirty checking
}
Part 9: Locking Strategies
Optimistic locking: assume no conflict, detect at commit
Optimistic locking does not lock a row on read. Instead it tracks the version of the data and fails the commit if the version has changed.
@Entity
public class Account {
@Id
private Long id;
private BigDecimal balance;
@Version // Hibernate automatically increments on update
private Long version;
}
// Thread 1 and Thread 2 both read the same account (version=5)
// Thread 1 updates first -> version becomes 6
// Thread 2 tries to update with WHERE id=? AND version=5 -> 0 rows affected
// -> OptimisticLockException
-- SQL generated by Hibernate:
UPDATE accounts
SET balance = 900000, version = 6
WHERE id = 123 AND version = 5; -- <- Optimistic check
-- If 0 rows affected -> OptimisticLockException
Retry strategy:
@Service
public class AccountService {
@Transactional
@Retryable(
retryFor = OptimisticLockingFailureException.class,
maxAttempts = 3,
backoff = @Backoff(delay = 100, multiplier = 2)
)
public void transferFunds(Long fromId, Long toId, BigDecimal amount) {
Account from = accountRepo.findById(fromId).orElseThrow();
Account to = accountRepo.findById(toId).orElseThrow();
from.deduct(amount);
to.add(amount);
}
// On conflict -> retry up to 3 times with exponential backoff
}
When to use optimistic locking:
- Conflicts are rare (low contention)
- Reads far outnumber writes (product catalog updates)
- User-facing forms (edit and save)
When not to use:
- High contention (many concurrent updates to the same rows)
- Retry is not appropriate for the use case
- When a conflict must never happen (bank accounts with many concurrent transfers)
Pessimistic locking: lock immediately on read
// PESSIMISTIC_WRITE: SELECT ... FOR UPDATE
@Lock(LockModeType.PESSIMISTIC_WRITE)
@Query("SELECT a FROM Account a WHERE a.id = :id")
Account findByIdForUpdate(@Param("id") Long id);
@Transactional
public void transferFunds(Long fromId, Long toId, BigDecimal amount) {
// Lock both accounts immediately on read
// Guarantees nobody else can modify them while we process
Long firstId = Math.min(fromId, toId); // Consistent lock ordering!
Long secondId = Math.max(fromId, toId); // Prevents deadlock
Account first = accountRepo.findByIdForUpdate(firstId);
Account second = accountRepo.findByIdForUpdate(secondId);
if (firstId.equals(fromId)) {
first.deduct(amount);
second.add(amount);
} else {
second.deduct(amount);
first.add(amount);
}
}
-- SQL generated:
SELECT * FROM accounts WHERE id = ? FOR UPDATE;
-- Row is locked until the transaction commits or rolls back
-- Concurrent transactions block at this statement
PESSIMISTIC_READ vs PESSIMISTIC_WRITE:
// PESSIMISTIC_READ: SELECT ... FOR SHARE
// Allows concurrent reads, blocks writes
// Use when you need to ensure data does not change while you read it for calculation
@Lock(LockModeType.PESSIMISTIC_READ)
Account findByIdForShare(@Param("id") Long id);
// -> SELECT ... FOR SHARE (PostgreSQL)
// Multiple transactions can hold a SHARE lock simultaneously
// Only WRITE locks (FOR UPDATE) must wait
Deadlock: when two transactions wait for each other
T1 locks Account A -> waits for lock on Account B
T2 locks Account B -> waits for lock on Account A
-> Deadlock! Both wait for each other forever
PostgreSQL detects deadlocks automatically (via timeout or dependency graph analysis) and aborts one transaction:
ERROR: deadlock detected
DETAIL: Process 1234 waits for ShareLock on transaction 5678; blocked by process 5678.
Process 5678 waits for ShareLock on transaction 1234; blocked by process 1234.
HINT: See server log for query details.
Prevention:
// 1. Consistent lock ordering (shown above)
// 2. Reduce lock scope and duration
// 3. Timeout on lock acquisition
@Transactional
public void processWithTimeout(Long orderId) {
// PostgreSQL: SET lock_timeout = '5s' before the query
// If the lock is not acquired within 5s -> throws LockTimeoutException instead of deadlock
}
// application.properties
spring.jpa.properties.jakarta.persistence.lock.timeout=5000 // 5 seconds
Part 10: Transactions in Microservices
Why ACID does not work across services
In a monolith, all operations use the same database. A single transaction covers everything. In microservices:
Order Service -> Order DB (PostgreSQL)
Payment Service -> Payment DB (PostgreSQL)
Inventory Service -> Inventory DB (PostgreSQL)
// There is no mechanism to span a single transaction across all three DBs
// Each service has its own transaction and commits independently
Failure scenario without a distributed transaction:
1. Order Service: CREATE order -> commit OK
2. Payment Service: CHARGE payment -> commit OK
3. Inventory Service: DEDUCT inventory -> FAIL
State: Order exists, money charged, inventory not deducted
-> Inconsistent state with no automatic recovery
Two-Phase Commit (2PC): why it is not widely used
2PC is a protocol for coordinating a distributed transaction:
Phase 1 (Prepare):
Coordinator -> Order Service: "Prepare to commit"
Coordinator -> Payment Service: "Prepare to commit"
Coordinator -> Inventory Service: "Prepare to commit"
If all return "Ready":
Phase 2 (Commit):
Coordinator -> all: "Commit now"
Problems:
Blocking: During Phase 1, all services lock resources. If the coordinator crashes between Phase 1 and Phase 2, resources are locked indefinitely.
Coordinator single point of failure: If the coordinator crashes after Phase 1 but before Phase 2, participants do not know whether to commit or roll back.
Network partition: If the commit message does not reach one service, that service does not know what to do.
Performance: A minimum of four network round-trips per transaction. Each round-trip adds latency and holds locks longer.
2PC exists (JTA/XA in Java) but is almost never used in microservices because of these trade-offs.
CAP theorem: why you must choose
CAP theorem: in a distributed system, at most two of these three properties can be guaranteed simultaneously:
- Consistency: Every read sees the most recent write
- Availability: Every request receives a response (not an error)
- Partition Tolerance: The system continues operating despite a network partition
Network partitions are unavoidable in production, so a choice between C and A is required. Microservices typically choose AP (availability and partition tolerance) and accept eventual consistency.
Part 11: The Saga Pattern
Why Saga exists
A Saga is a sequence of local transactions, one per service, with compensation transactions to undo work when a step fails.
Saga: Place Order
Step 1: Order Service -> CREATE order (local tx)
Step 2: Inventory Service -> RESERVE inventory (local tx)
Step 3: Payment Service -> CHARGE payment (local tx)
Step 4: Shipping Service -> SCHEDULE shipment (local tx)
If Payment FAILS:
Compensation Step 3: Payment Service -> VOID charge
Compensation Step 2: Inventory Service -> RELEASE reservation
Compensation Step 1: Order Service -> CANCEL order
Choreography vs Orchestration
Choreography: Services coordinate through events:
Order Service: CREATE order -> publish OrderCreated event
|
Inventory Service: consume OrderCreated -> RESERVE -> publish InventoryReserved
|
Payment Service: consume InventoryReserved -> CHARGE -> publish PaymentCharged
|
Shipping Service: consume PaymentCharged -> SCHEDULE shipment
Failure:
Payment Service: CHARGE fails -> publish PaymentFailed
|
Inventory Service: consume PaymentFailed -> RELEASE reservation -> publish InventoryReleased
|
Order Service: consume InventoryReleased -> CANCEL order
// Choreography with Spring Events (in-process) or Kafka (cross-service)
@Service
public class PaymentService {
@Transactional
@KafkaListener(topics = "inventory.reserved")
public void handleInventoryReserved(InventoryReservedEvent event) {
try {
PaymentResult result = chargePayment(event.getOrderId(), event.getAmount());
kafkaTemplate.send("payment.charged",
new PaymentChargedEvent(event.getOrderId(), result.getTransactionId()));
} catch (PaymentException e) {
kafkaTemplate.send("payment.failed",
new PaymentFailedEvent(event.getOrderId(), e.getReason()));
}
}
}
Orchestration: A single Orchestrator coordinates the entire flow:
Order Orchestrator:
Step 1: call Order Service -> CREATE
Step 2: call Inventory Service -> RESERVE
Step 3: call Payment Service -> CHARGE
Step 4: call Shipping Service -> SCHEDULE
If Step 3 fails:
call Inventory Service -> RELEASE (compensation)
call Order Service -> CANCEL (compensation)
// Orchestration with Spring State Machine or Temporal
@Component
public class PlaceOrderSaga {
@Autowired private OrderClient orderClient;
@Autowired private InventoryClient inventoryClient;
@Autowired private PaymentClient paymentClient;
@Transactional
public void execute(PlaceOrderCommand command) {
SagaState state = sagaStateRepo.create(command);
try {
// Step 1
state.setOrderId(orderClient.createOrder(command));
sagaStateRepo.save(state);
// Step 2
state.setReservationId(inventoryClient.reserve(state.getOrderId(), command.getItems()));
sagaStateRepo.save(state);
// Step 3
state.setPaymentId(paymentClient.charge(state.getOrderId(), command.getAmount()));
sagaStateRepo.save(state);
state.markCompleted();
sagaStateRepo.save(state);
} catch (Exception e) {
compensate(state);
throw e;
}
}
private void compensate(SagaState state) {
if (state.getPaymentId() != null) {
paymentClient.void_(state.getPaymentId());
}
if (state.getReservationId() != null) {
inventoryClient.release(state.getReservationId());
}
if (state.getOrderId() != null) {
orderClient.cancel(state.getOrderId());
}
}
}
Choreography vs Orchestration trade-offs:
| Choreography | Orchestration | |
|---|---|---|
| Coupling | Loose (events) | Tighter (direct calls) |
| Visibility | Hard to trace flow | Easy to trace (single place) |
| Debugging | Harder | Easier |
| Failure handling | Distributed, harder to guarantee | Centralized, easier to guarantee |
| Scalability | Good | Orchestrator is a potential bottleneck |
Idempotency is required in Sagas:
// Each step must be idempotent -- safe to retry
@Transactional
@KafkaListener(topics = "payment.charge.requested")
public void chargePayment(ChargePaymentCommand command) {
// Check idempotency key before processing
if (paymentRepo.existsByIdempotencyKey(command.getIdempotencyKey())) {
// Already processed, return existing result
return;
}
Payment payment = new Payment(command);
payment.setIdempotencyKey(command.getIdempotencyKey());
paymentRepo.save(payment);
// Process payment...
}
Part 12: Transactional Outbox Pattern
The classic problem: dual write
@Transactional
public void placeOrder(OrderRequest request) {
Order order = createOrder(request); // DB write
orderRepo.save(order);
// COMMIT (DB OK)
// Publish message OUTSIDE the transaction
kafkaTemplate.send("order.placed", new OrderPlacedEvent(order.getId()));
// If Kafka publish fails -> DB committed, message not sent
// Order exists but downstream services do not know about it
}
The reverse is equally broken:
// Publish first, commit after:
kafkaTemplate.send("order.placed", event);
// Kafka OK
// DB commit fails -> Order does not exist, but message was published
// Downstream services process an order that does not exist
There is no way to guarantee atomicity between a DB write and a Kafka message publish using plain Kafka.
Transactional Outbox: the solution
Instead of publishing directly, write the message to an outbox table in the same transaction as the business data. A separate process reads the outbox and publishes.
-- Schema
CREATE TABLE outbox_messages (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
aggregate_type VARCHAR(100) NOT NULL, -- 'ORDER'
aggregate_id VARCHAR(100) NOT NULL, -- order ID
event_type VARCHAR(100) NOT NULL, -- 'ORDER_PLACED'
payload JSONB NOT NULL,
created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
published_at TIMESTAMP WITH TIME ZONE, -- NULL = not yet published
retry_count INT DEFAULT 0
);
CREATE INDEX idx_outbox_unpublished ON outbox_messages (created_at)
WHERE published_at IS NULL;
@Transactional
public void placeOrder(OrderRequest request) {
Order order = createOrder(request);
orderRepo.save(order);
// Write outbox message in the SAME transaction
OutboxMessage message = OutboxMessage.builder()
.aggregateType("ORDER")
.aggregateId(order.getId().toString())
.eventType("ORDER_PLACED")
.payload(objectMapper.writeValueAsString(new OrderPlacedEvent(order)))
.build();
outboxRepo.save(message);
// Both order and outbox message are committed atomically
// Or both roll back -- no dual write problem
}
// Outbox poller -- runs as a separate component
@Component
public class OutboxPoller {
@Scheduled(fixedDelay = 1000) // Every second
@Transactional
public void pollAndPublish() {
List<OutboxMessage> pending = outboxRepo.findUnpublished(50);
for (OutboxMessage msg : pending) {
try {
kafkaTemplate.send(
topicFor(msg.getEventType()),
msg.getAggregateId(),
msg.getPayload()
).get(5, TimeUnit.SECONDS); // Synchronous send with timeout
msg.markPublished();
outboxRepo.save(msg);
} catch (Exception e) {
msg.incrementRetry();
outboxRepo.save(msg);
log.error("Failed to publish outbox message {}", msg.getId(), e);
}
}
}
}
Debezium: Change Data Capture instead of polling
Debezium reads the PostgreSQL WAL (write-ahead log) and publishes changes to Kafka. No polling needed, lower latency (milliseconds instead of ~1 second), and no additional load on the database.
# Debezium connector configuration
{
"name": "outbox-connector",
"config": {
"connector.class": "io.debezium.connector.postgresql.PostgresConnector",
"database.hostname": "postgres",
"database.dbname": "orders",
"table.include.list": "public.outbox_messages",
"transforms": "outbox",
"transforms.outbox.type": "io.debezium.transforms.outbox.EventRouter"
}
}
Why many companies use the Outbox Pattern:
- Amazon, Uber, and Netflix have all documented using this pattern
- Guarantees at-least-once delivery. Combined with idempotency on the consumer side, this achieves exactly-once semantics
- No distributed transaction or 2PC needed
- Auditable: the outbox table is a complete history of every event sent
Part 13: Production Debugging and Monitoring
Investigating lock contention and deadlocks
-- View all active locks and who is blocking whom
SELECT
blocked.pid AS blocked_pid,
blocked.usename AS blocked_user,
blocked.query AS blocked_query,
now() - blocked.query_start AS blocked_duration,
blocking.pid AS blocking_pid,
blocking.query AS blocking_query,
blocking.state AS blocking_state
FROM pg_stat_activity blocked
JOIN pg_stat_activity blocking
ON blocking.pid = ANY(pg_blocking_pids(blocked.pid))
ORDER BY blocked_duration DESC;
-- View long-running transactions (longer than 1 minute)
SELECT
pid,
now() - xact_start AS txn_duration,
now() - query_start AS query_duration,
state,
wait_event_type,
wait_event,
left(query, 100) AS query_snippet
FROM pg_stat_activity
WHERE xact_start IS NOT NULL
AND now() - xact_start > INTERVAL '1 minute'
ORDER BY txn_duration DESC;
-- Kill a long-running transaction (emergency only)
SELECT pg_terminate_backend(pid)
FROM pg_stat_activity
WHERE now() - xact_start > INTERVAL '10 minutes'
AND state != 'idle';
Connection exhaustion investigation
// HikariCP metrics via Micrometer
@Bean
public MeterRegistryCustomizer<MeterRegistry> hikariMetrics(DataSource dataSource) {
return registry -> {
if (dataSource instanceof HikariDataSource ds) {
ds.setMetricRegistry(registry);
}
};
}
// Metrics:
// hikaricp.connections.active -> connections currently in use
// hikaricp.connections.pending -> requests waiting for a connection (CRITICAL if > 0)
// hikaricp.connections.timeout -> count of connection acquisition timeouts
// hikaricp.connections.acquire -> histogram, time to acquire a connection
// Alert when pending > 5 for 30 seconds -> pool is exhausted
// Enable leak detection
# application.yml
spring:
datasource:
hikari:
leak-detection-threshold: 30000 # 30 seconds without release -> log warning
Hibernate statistics in production
// Warning: generate_statistics has overhead in production
// Enable only when debugging, disable afterward
@Scheduled(fixedRate = 60000) // Every minute
public void logHibernateStats() {
Statistics stats = sessionFactory.getStatistics();
long queryCount = stats.getQueryExecutionCount();
long slowQueries = stats.getQueryExecutionMaxTime();
if (slowQueries > 1000) { // > 1 second
log.warn("Slow query detected: {}ms, SQL: {}",
slowQueries, stats.getQueryExecutionMaxTimeQueryString());
}
// Reset each interval to calculate rates
stats.clear();
}
Slow transaction detection with Micrometer
@Aspect
@Component
public class TransactionMonitoringAspect {
private final MeterRegistry meterRegistry;
@Around("@annotation(transactional)")
public Object monitorTransaction(ProceedingJoinPoint pjp,
Transactional transactional) throws Throwable {
Timer.Sample sample = Timer.start(meterRegistry);
String methodName = pjp.getSignature().toShortString();
try {
Object result = pjp.proceed();
sample.stop(Timer.builder("transaction.duration")
.tag("method", methodName)
.tag("outcome", "success")
.register(meterRegistry));
return result;
} catch (Exception e) {
sample.stop(Timer.builder("transaction.duration")
.tag("method", methodName)
.tag("outcome", "failure")
.register(meterRegistry));
throw e;
}
}
}
Grafana alert rules:
# Transaction duration P99 > 5 seconds
- alert: SlowTransactions
expr: histogram_quantile(0.99, rate(transaction_duration_seconds_bucket[5m])) > 5
for: 2m
labels:
severity: warning
# Connection pool pending > 0
- alert: ConnectionPoolPressure
expr: hikaricp_connections_pending > 0
for: 30s
labels:
severity: critical
Part 14: Real Production Incidents
Incident 1: Money transferred twice
Symptom: A customer reports being debited twice for a single transaction. The amounts are identical. Timestamps are 200ms apart.
Investigation:
-- Find duplicate payments within a short time window
SELECT
account_id,
amount,
COUNT(*) as count,
MIN(created_at) as first,
MAX(created_at) as last
FROM payment_transactions
WHERE created_at > NOW() - INTERVAL '1 hour'
GROUP BY account_id, amount
HAVING COUNT(*) > 1;
Root cause: Frontend retry logic combined with a non-idempotent backend.
User clicks "Pay" -> Request 1 sent
Network timeout (30s) -> User sees an error -> Clicks again
Request 1 is still processing (slow because of external payment gateway)
Request 2 is also processed
Both succeed -> Two payments
Fix:
@Transactional
public PaymentResult processPayment(PaymentRequest request) {
// Check idempotency key BEFORE processing
Optional<Payment> existing = paymentRepo.findByIdempotencyKey(
request.getIdempotencyKey()
);
if (existing.isPresent()) {
return PaymentResult.from(existing.get()); // Return previous result
}
Payment payment = createPayment(request);
payment.setIdempotencyKey(request.getIdempotencyKey());
paymentRepo.save(payment);
return PaymentResult.from(payment);
}
// Database constraint enforces uniqueness even under race conditions:
// CREATE UNIQUE INDEX idx_payments_idempotency ON payments (idempotency_key);
Incident 2: Inventory drops below zero
Symptom: inventory_stock for product X is -47 after a flash sale. 147 orders succeeded for a product with only 100 in stock.
Root cause: Lost update caused by missing locks.
// Broken code:
@Transactional
public void reserveInventory(Long productId, int quantity) {
Product product = productRepo.findById(productId).orElseThrow();
// 100 concurrent requests all read stock = 100
if (product.getStock() < quantity) {
throw new InsufficientStockException();
}
product.setStock(product.getStock() - quantity); // Lost update!
// Each request sets stock = 100 - 1 = 99, but concurrent reads do not see each other's writes
productRepo.save(product);
}
Fix:
@Transactional
public void reserveInventory(Long productId, int quantity) {
// Atomic UPDATE with check -- no prior read needed
int updated = productRepo.decrementStockIfAvailable(productId, quantity);
if (updated == 0) {
throw new InsufficientStockException();
}
}
// Repository:
@Modifying
@Query("""
UPDATE Product p
SET p.stock = p.stock - :quantity
WHERE p.id = :productId AND p.stock >= :quantity
""")
int decrementStockIfAvailable(@Param("productId") Long productId,
@Param("quantity") int quantity);
// Atomic: check and update in a single SQL statement
// Database-level atomicity eliminates the race condition
Incident 3: Order created with no payment
Symptom: About 50 orders per day are in state “CREATED” with no corresponding payment record.
Root cause: A checked exception did not trigger rollback.
@Transactional
public void placeOrder(OrderRequest request) throws PaymentException {
Order order = createAndSaveOrder(request);
// ORDER is saved inside the transaction
try {
paymentService.charge(order.getId(), request.getAmount());
// PaymentException (checked) is caught here
} catch (PaymentException e) {
log.error("Payment failed for order {}", order.getId(), e);
// Developer assumed the transaction would roll back
// BUT: checked exceptions do NOT trigger rollback by default!
throw e; // Rethrown, but the transaction did not roll back
}
}
// Result: ORDER commits successfully, payment fails -> inconsistent state
Fix:
@Transactional(rollbackFor = PaymentException.class) // Explicit rollback
public void placeOrder(OrderRequest request) throws PaymentException {
Order order = createAndSaveOrder(request);
paymentService.charge(order.getId(), request.getAmount());
// If PaymentException is thrown -> the entire order rolls back
}
Incident 4: Connection pool exhausted during peak hours
Symptom: Every day at 9:00 AM, all requests start timing out. Metrics show hikaricp_connections_pending spiking. Database CPU is low; the database is not busy.
Root cause: Transaction scope too wide, including email sending.
@Transactional
public RegistrationResult registerUser(RegistrationRequest request) {
User user = createUser(request);
userRepo.save(user);
sendWelcomeEmail(user); // SMTP call: 2 to 5 seconds, sometimes times out at 30 seconds!
// Connection is held for the entire SMTP call
return RegistrationResult.success(user.getId());
}
9 AM is peak registration time. Each registration holds a connection for 2 to 5 seconds. Pool size is 10, so only 10 concurrent registrations exhaust the pool. Every subsequent request waits and then times out, creating a cascade.
Fix:
@Transactional
public RegistrationResult registerUser(RegistrationRequest request) {
User user = createUser(request);
userRepo.save(user);
return RegistrationResult.success(user.getId());
// Transaction commits here, connection released
}
// Call email OUTSIDE the transaction
public RegistrationResult register(RegistrationRequest request) {
RegistrationResult result = registerUser(request); // transaction
sendWelcomeEmail(result.getUserId()); // after transaction commits
return result;
}
Part 15: Transaction Checklist
Before writing a transaction
[] Does this operation actually need a transaction?
- If it is a single SQL statement -> the DB is already atomic
- If it is a pure read -> @Transactional(readOnly=true) or no annotation needed
[] Which isolation level is appropriate?
- READ COMMITTED: default, most OLTP operations
- REPEATABLE READ: consistent snapshot (reports, audits)
- SERIALIZABLE: critical correctness (booking, financial)
[] Transaction scope -- include only:
- Database operations
- Exclude: external API calls, file I/O, email/SMS, heavy computation
[] Is idempotency needed?
- If the operation may be retried -> needs an idempotency key
During implementation
[] @Transactional(rollbackFor = Exception.class) when using checked exceptions
[] Propagation is chosen deliberately (do not blindly rely on REQUIRED)
[] No self-invocation (calling @Transactional methods within the same class)
[] Method is public (Spring proxy does not intercept private/package-private)
[] @Async and @Transactional are not mixed
[] Locking strategy chosen correctly:
- Optimistic: low contention, retry acceptable
- Pessimistic: high contention, correctness critical
[] Lock ordering is consistent when locking multiple entities (prevents deadlock)
[] readOnly=true on all read-only service methods
Code review red flags
[] @Transactional on all methods without considering scope
[] External API calls inside a @Transactional method
[] try-catch swallowing exceptions without rethrowing -> may hide rollback behavior
[] @Transactional on a private method (silently a no-op)
[] Checked exceptions not declared in rollbackFor
[] EAGER fetching on OneToMany/ManyToMany
[] N+1 queries not detected (requires Hibernate statistics)
[] Outbox pattern missing for critical events
[] Missing idempotency key for payment/financial operations
Production deployment checklist
[] spring.jpa.open-in-view=false
[] HikariCP pool metrics exposed and alerted
[] Transaction duration metrics exposed and alerted (P99 > threshold)
[] Slow query log enabled (log_min_duration_statement = 100ms)
[] Deadlock detection: pg_stat_activity monitoring
[] Connection pool sized correctly:
pool_size = (core_count x 2) + 1 per instance
total = pool_size x number_of_instances < DB max_connections
[] Statement timeout configured (prevents hung transactions):
spring.jpa.properties.jakarta.persistence.query.timeout=30000
Incident response debugging checklist
Symptom: Application timeout / slow
[] Check hikaricp_connections_pending > 0?
-> Pool exhausted -> find long transactions
SELECT pid, now()-xact_start, query FROM pg_stat_activity
WHERE xact_start IS NOT NULL ORDER BY xact_start;
Symptom: High database CPU
[] Check pg_stat_statements for top CPU consumers
[] Check seq_scan in pg_stat_user_tables
[] Run EXPLAIN ANALYZE on slow queries
Symptom: Data inconsistency
[] Check transaction boundaries -- is the operation atomic?
[] Check rollbackFor -- do checked exceptions trigger rollback?
[] Check idempotency -- is there duplicate processing?
[] Check isolation level -- is a stale read causing inconsistency?
Symptom: Deadlock
[] Check pg_log for "deadlock detected"
[] Verify lock ordering is consistent
[] Increase lock_timeout and implement retry
[] Reduce transaction scope to decrease lock hold time
Conclusion
Transactions are the abstraction that lets you write code as if you are the only user of the database, while thousands of other users are doing the same thing. But this abstraction has a cost: lock contention, connection hold time, rollback overhead, and it breaks down when operations cross a single database boundary.
Three principles that prevent 90% of transaction bugs:
1. Transaction scope equals database operations only. External calls, network I/O, and heavy computation do not belong inside a transaction. Every millisecond inside a transaction is a millisecond a connection and a lock are held.
2. Be explicit about rollback behavior. Do not assume Spring will roll back correctly. Use rollbackFor = Exception.class or be explicit about which exceptions trigger rollback. Test rollback paths, not just happy paths.
3. Idempotency is required, not optional. Any operation that can be retried (due to network timeout, a user clicking again, or a consumer retrying) must be safe to call multiple times. An idempotency key combined with a unique constraint is the simplest reliable pattern.
In microservices, accept that distributed consistency must be achieved differently: through the Saga pattern, the Outbox pattern, and eventual consistency, rather than trying to make ACID work across service boundaries. These patterns are more complex but honest about the fundamental trade-offs of distributed systems.
Transaction bugs rarely manifest immediately, rarely throw a clear exception, and are rarely easy to reproduce. Monitoring, observability, and understanding the underlying mechanisms are the only reliable ways to detect and fix them before they become a production incident at 2 AM.