Friday, March 21, 2008

An Implementation of Pessimistic Locking with JPA

I've worked with a Java application for several years that has relied upon pessimistic since before I was hired back in 1999. This was a time before Hibernate and certainly the newer Java Persistence Architecture. I have rewritten the data layer for some of our transactional processing a few times since then as the application changed from a dot-com to a B2B dot-com to a startup with a successful business, and now, to a department acquired by a large company. The data layer that existed when I joined was an interesting XML object database stored in SQL Server 7. All who touched it will not forget the powerful lesson we learned, so I will not talk anymore of that here. Actually I do need to mention that it used pessimistic locking for doing transaction processing on those XML clobs. That is the last mention of it. At some point we got a bit smarter and decided that we ought to be using our relational database to store records structured in tables defined by a schema.

After moving to a more typical object-relational mapping in Java, our system still depended upon pessimistic locking to implement transaction processing. This was accomplished in SQL Server 7, and later SQL Server 2000, using REPEATABLE_READ isolation for read-only connections and READ COMMITTED isolation for read-write connections. In a read-only connection the REPEATABLE_READ isolation ensured that shared locks in the database were held for the duration of a database transaction. Read-write connections relied on doing an UPDATE on rows before reading them to get an exclusive lock that was held for the duration of the database transaction. This effectively allowed concurrent reads, but did not allow writes to be concurrent with reads or other writes. This provides consistent reading of a header object and its line items by locking the header row with a read-only shared lock (REPEATABLE READ and a SELECT) or an exclusive write lock (READ COMMITTED with an UPDATE). It worked, but it meant that locks are held on data that is only being read.

SQL Server 2005 provides a feature called row level versioning that is similar to multiversion row concurrency (MVCC) in PostgreSQL and other databases. The SNAPSHOT isolation and READ COMMITTED isolation with row-level versioning are features of SQL Server 2005 that allow reading that does not take any locks. The SNAPSHOT isolation ensures that all of the data read in the transaction is unaffected by updates that occur during the read-only transaction. This means we do not need to take a read lock in REPEATABLE READ isolation in order to have consistency. When writing we still just need to do an UPDATE in order to get an exclusive lock on the database row.

Microsoft has a lot of information on the row version concurrency new in SQL Server 2005. When reading in SNAPSHOT isolation the database reads "all data that was committed before the start of each transaction." Updates are a little more complex:
Uses row versions to select rows to update. Tries to acquire an exclusive lock on the actual data row to be modified, and if the data has been modified by another transaction, an update conflict occurs and the snapshot transaction is terminated.
This is just what the we want. (In PostgreSQL the SERIALIZABLE isolation is nearly identical to SNAPSHOT isolation of SQL Server. The READ COMMITTED isolation of PostgreSQL is also nearly identical to that of SQL Server when the READ_COMMITTED_SNAPSHOT setting of SQL Server is turned on.) It may not be obvious even if you read the Microsoft link that this SNAPSHOT mode of SQL Server is server-side optimistic locking on the database server. The JPA specification (section 3.4.3) requires these behaviors of optimistic locking:

If transaction T1 calls lock(entity, LockModeType.READ) on a versioned object, the entity
manager must ensure that neither of the following phenomena can occur:

  • P1 (Dirty read): Transaction T1 modifies a row. Another transaction T2 then reads that row and obtains the modified value, before T1 has committed or rolled back. Transaction T2 eventually commits successfully; it does not matter whether T1 commits or rolls back and whether it does so before or after T2 commits.

  • P2 (Non-repeatable read): Transaction T1 reads a row. Another transaction T2 then modifies or deletes that row, before T1 has committed. Both transactions eventually commit successfully.



Now I shall explain pessimistic locking using these tools and the Java Persistence Architecture (JPA), which is really pretty simple. The JPA does not specify direct support for this. It has separate find() and lock() methods of the EntityManager. One could do this:
public Object findAndLock(EntityManager em, Class c, Object key) {
Object e = em.find(c, key);
em.lock(e);
return e;
}

This is pretty good, but it's only approximation of an atomic find-and-lock method. In a system with high concurrency, we will encounter JPA OptimisticLockExceptions. Let us presume we have an application with a legacy data layer (not JPA) that has this high concurrency, and let us also presume that we cannot remove all of this concurrency. (Often I find the serenity prayer is useful for constrained design.) The problem above is that the find() loads the row before we lock() it. If another thread also tries to find() and then lock(), only one can succeed, and the other will fail, either from the JPA locking rules defined in the JPA specification, or from the rules defined for SNAPSHOT isolation. The result of the failure is an exception, and the application code would have to retry the entire transaction, but that is one of the things we must accept we cannot change---not immediately anyway.

There is something we can do to avoid this race condition. We just have to do the lock() first rather than find() first. The lock() method takes an entity as a parameter, but we have a way out of the catch-22 because the entity need not be read from the database yet. The EntityManager.getReference() method gives us an object that looks like the entity we would get from find(), but the EntityManager.getReference() method can give us "an instance, whose state may be lazily fetched." So we use this code:
public Object findAndLock(EntityManager em, Class c, Object key) {
Object e = em.getReference(c, key);
em.lock(e);
return e;
}

The reference returned is a proxy instance created with java.lang.reflect.Proxy.newProxyInstance(). The proxy instance can return just the primary key from the entity reference that would have been passed to getReference(). That way the lock() method can get the entity key and lock the row for the entity without first loading it. This removes the race possibility and gives us true pessimistic locking.

I hope that some JPA implementations will offer a solution for pessimistic locking without requiring special API extensions. This solution is one that I've implemented in my own partial JPA implementation, and it works very well. I have not used any other JPA implementations or even Hiberrnate, so there may be a better way to do this. I have only see references to changing the transaction isolation level, and I hope what I have written explains why that is not a solution.

4 comments:

Unknown said...

Perhaps you can help me please. I am working on a program that uses JPA/Hibernate accessing SQL Server 2005. I want to do a something like a sql statement " with(holdlock, rowlock) " that you would do to avoid someone else from getting this record before I have a chance to update it. Would you use Optimistic Locking with JPA to obtain the same effect? Any ideas on how to go about doing this or references on the web on how to do this? Thanks.

John Paul said...

One thing you can do is immediately lock the entity after you find it just like the application on my implementation does.

You can call EntityManager.find() or EntityManager.getReference() to get the entity or its reference and then EntityManager.lock(entity) to lock it. You may need to call EntityManager.flush() if the implementation is, in my opinion at least, buggy to execute the SQL UPDATE from the lock() call.

However, unless the implementation of the EntityManager that you are using does something similar to what my implementation does, you'll need to handle a PersistenceException from your call to lock() since another part of the application may have updated the entity between your find() or getReference() call and your lock() call. An exception will be thrown if your application has high concurrency with multiple threads or requests trying to lock the same entity simultaneously.

My experience with the JPA is only with my own partial implementation as I've integrated it into parts of an existing application. The Hibernate implementation you are using may have options for locking hints that can do true pessimistic locking in a nonstandard way. Check their documentation for more info about locking.

John Paul said...

Actually, I think my use of the term "buggy" above is wrong. As long as your EntityManager is not using FlushModeType.COMMIT, the SQL should be executed immediately. Be sure to read the JPA Javadocs.

Anonymous said...

I just ran across this, and I think you have a misconception of FlushModeType.AUTO (the only alternative to FlushModeType.COMMIT in EJB 3.0). JPA does not require any SQL to be executed immediately when the flush mode is FlushModeType.AUTO. Far from it. It requires in that case only that any pending inserts, updates, and deletes that could change the result of a query in the same transaction be performed before that query is run. Other interactions with the database can be scheduled at the discretion of the EntityManager, at any time up to a flush or transaction commit. FlushModeType.COMMIT removes the requirement for flushing any SQL to the database before an explicit flush or transaction commit. (That can be a significant performance optimization if many entities are being queried, because dirty checks don't need to be performed before queries.)

Furthermore, the JPA spec explicitly provides that if EntityManager.lock() is implemented by taking out a DB lock, that may be deferred. The lock may be taken out as late as transaction commit time. (Java Persistence API specification, section 3.4.) Individual persistence implementations may choose to take out the lock immediately, but you depend on that behavior at your peril. The behavior of EntityManager.lock() is defined in terms of preventing dirty and non-repeatable reads from happening silently. It does not require them to be altogether prevented. The persistence implementation has the option to detect them after the fact and prevent the transaction from committing successfully.

If you genuinely need a pessimistic lock, then your most reliable option is to execute the appropriate SQL for that as a native query.