Understanding the effective data fetching with JPA Entity Graphs (Part-1)

JPA Sep 18, 2020

Part-1: The problem

JPA provides 2 types of fetching strategy for the entities that have relationship between each other (such as OneToOne, OneToMany..), :

  • FetchType.LAZY
  • FetchType.EAGER

This configuration alone is criticised for being statically declared and applied to every fetch call. I would like to explain what exactly that means with an example. Having the following entities:

Entity-Relation Diagram

City Entity:

@Entity
public class City {
    @Id
    @GeneratedValue(strategy = GenerationType.IDENTITY)
    private Long id;
    private String name;

    // getters and setters

}

User Entity:

@Entity
@Table(name = "users")
public class User {

    @Id
    @GeneratedValue(strategy = GenerationType.IDENTITY)
    private Long id;
    private String name;
    private String email;
    private String password;
    private String phone;

    // getters and setters
}

Address entity

@Entity
public class Address {

    @Id
    @GeneratedValue(strategy = GenerationType.IDENTITY)
    private Long id;

    private String title;
    private String street;
    private String flat;
    private String postalCode;

    @OneToOne
    @JoinColumn(name = "city_id")
    private City city;

    @ManyToOne
    @JoinColumn(name = "user_id")
    private User user;

    // getters and setters
}

I have inserted following entities to the database:

We should keep in mind that address is the owning side of the both relations, because it has the foreign keys, and it is also called child entity.

Now if I will invoke find all method of the address repository addressRepository.findAll() the following queries are being retrieved:

-- Hibernate:
    select
        address0_.id as id1_0_,
        address0_.city_id as city_id6_0_,
        address0_.flat as flat2_0_,
        address0_.postal_code as postal_c3_0_,
        address0_.street as street4_0_,
        address0_.title as title5_0_,
        address0_.user_id as user_id7_0_
    from
        address address0_
-- Hibernate:
    select
        city0_.id as id1_1_0_,
        city0_.name as name2_1_0_
    from
        city city0_
    where
        city0_.id=?
-- Hibernate:
    select
        user0_.id as id1_2_0_,
        user0_.email as email2_2_0_,
        user0_.name as name3_2_0_,
        user0_.password as password4_2_0_,
        user0_.phone as phone5_2_0_
    from
        users user0_
    where
        user0_.id=?
-- Hibernate:
    select
        user0_.id as id1_2_0_,
        user0_.email as email2_2_0_,
        user0_.name as name3_2_0_,
        user0_.password as password4_2_0_,
        user0_.phone as phone5_2_0_
    from
        users user0_
    where
        user0_.id=?
-- Hibernate:
    select
        city0_.id as id1_1_0_,
        city0_.name as name2_1_0_
    from
        city city0_
    where
        city0_.id=?

So what happens here is step by step as follows:

  • find all addresses -> returns A1, A2, A3
  • For A1, find the referenced city -> returns C1
  • For A1, find the referenced user -> returns U1
  • For A2, the city referenced is already stored in the persistence context so it is not retrieved again, find the referenced user -> returns U2
  • For A3, the user referenced is already in the persistence context so it not retrieved again, find the referenced city -> returns C2

This behaviour is thanks to first level caching mechanism. Once an entity is in managed state, The Entity Manager keeps it in the cache, so that it is not retrieved from database again.

But why there are so many calls? The problem resides in the fetching strategy. The child entity annotations (OneToOne, ManyToOne) by default configured to use FetchType.EAGER. So when an address is retrieved from database, JPA immediately calls its parents as well.

If we had very large number of rows in each table the list of separate calls would be incredibly long as well and obviously this is very bad in terms of performance. Eager fetching strategy generates this issue named N+1 problem, which means we are invoking 1 select query and it generates other N number separate calls for its children. That's it is recommended to use lazy fetching mechanism. So if I update my Address entity to use lazy fetching:

@OneToOne(fetch = FetchType.LAZY)
@JoinColumn(name = "city_id")
private City city;

@ManyToOne(fetch = FetchType.LAZY)
@JoinColumn(name = "user_id")
private User user;

and invoke the find all method again I can see hibernate logs only 1 select query:

-- Hibernate:
    select
        address0_.id as id1_0_,
        address0_.city_id as city_id6_0_,
        address0_.flat as flat2_0_,
        address0_.postal_code as postal_c3_0_,
        address0_.street as street4_0_,
        address0_.title as title5_0_,
        address0_.user_id as user_id7_0_
    from
        address address0_

The retrieved address object contains proxies (not the actual reference) to the parent objects. If I try to access the city or user of the retrieved address within the transaction then it will be retrieved from database again. And if I do this operation in a loop of addresses N+1 Problem will appear again.

Another problem is that sometimes I would like to retrieve the addresses lazily, but sometimes with its user and (or) city information. So setting the fetch type on the entity affects all retrieve calls and does not give the flexibility for different fetch types. That's why we can make use of entity graphs in order to generate multiple fetch plans for different purposes.

Possible solutions and particularly entity graphs are discussed in the Part-2, keep reading.

Tags