Hibernate Batch/Bulk Insert/Update

작동 원리

Hibernate는 평소에는 PreparedStatement#execute(String)으로 쿼리를 실행하는데, batch 모드가 되면 JDBC 표준 PreparedStatement#addBatch(String) 메소드를 호출해주는 것 뿐이다.
session.flush()를 호출한다는 것은 Statement.executeBatch()를 호출한다는 의미인 듯 하다.
기본적으로 addBatch/executeBatch는 insert 성능을 향상시켜준다. What You Didn't Know About JDBC Batch

@Id 에 대한 주의점

GenerationType.IDENTITY ID에 대해서는 Batch Insert가 작동하지 않는다. ID를 리턴받아 Entity 객체에 채워줘야 하기 때문으로 보인다.
- How do persist and merge work in JPA
- Hibernate ORM 5.4.18.Final User Guide
따라서 Batch Insert 를 올바로 작동시키려면 @GeneratedValue(strategy = GenerationType.SEQUENCE) 같은 PK 값을 미리 알 수 있는 방식을 사용해야한다.

For IDENTITY columns, Hibernate cannot delay the INSERT statement until flush time because the identifier value can only be generated by executing the statement.
For this reason, Hibernate disables JDBC batch inserts for entities using the IDENTITY generator strategy.

기억할 점

batch 옵션은 Hibernate가 직접 Insert 문을 insert into xxx (…) values(…), (….), …. 형태로 합쳐주는 것이 아니다. 단지 addBatch를 할 뿐이다.
MySQL
- MySQL Batch Insert는 여러건의 insert 문을 insert into xxx (…) values(…), (…), (…)로 바꿔줌으로써 성능을 향상 시킬 수 있다.
- MySQL JDBC의 경우 JDBC URL에 rewriteBatchedStatements=true 옵션을 추가해주면 된다.
- MySQL의 경우 실제로 생성된 쿼리는 logger=com.mysql.jdbc.log.Slf4JLogger&profileSQL=true 옵션으로 로그를 통해 확인할 수 있다.
- hibernate-batch-size-test MySQL기반 batch 예제

기본 설정

Hibernate Property

<prop key="hibernate.jdbc.batch_size">[동시 Insert 갯수]</prop>

지정된 갯수만큼 처리하고 트랜잭션을 닫거나 session.flush(); session.clear(); 호출
session.clear()를 호출하는 이유는, insert 된 데이터들이 First Level Cache 메모리에 남아 있어서 메모리 고갈 발생할 수 있기 때문이다.
org.hibernate.jdbc.AbstractBatcher 혹은 org.hibernate.engine.jdbc.batch.internal.AbstractBatchImpl 와 org.hibernate.engine.jdbc.batch.internal.BatchingBatch 에서 남기는 로그 확인. 로거는 Hibernate 버전마다 다름.

관계에 대한 Batch Insert

Hibernate Batch Processing – Why you may not be using it. (Even if you think you are) Hibernate Batch 사용 주의사항

Batch 설정을 해도 아래 코드는 배치로 insert 되지 않는다.

public void doBatch() {
Session session = sessionFactory.openSession();
Transaction tx = session.beginTransaction();
for ( int i=0; i<100000; i++ ) {
    Customer customer = new Customer(.....);
    Cart cart = new Cart(...);
    customer.setCart(cart) // note we are adding the cart to the customer, so this object
     // needs to be persisted as well
    session.save(customer);
    if ( (i + 1) % 20 == 0 ) { //20, same as the JDBC batch size
        //flush a batch of inserts and release memory:
        session.flush();
        session.clear();
    }
}
tx.commit();
session.close();
}

이유는 Customer-Cart가 연관 관계가 있어서 하나의 Customer insert 후에 곧바로 Cart가 insert되고, 그 다음에 다시 Customer가 insert 되기 때문에 Customer는 Customer끼리, Cart는 Cart끼리 묶이지 않기 때문이다.

다음 처럼 설정하면 insert 문들을 모두 정렬해서 묶어서 처리한다.

<prop key="hibernate.order_inserts">true</prop>
<prop key="hibernate.order_updates">true</prop>

Versioned Data

hibernate.jdbc.batch_versioned_data=true 로 설정하면 @Version 컬럼에 대해 대응하여 배치가 실행된다.
즉, where 조건에 @Version 필드에 대한 조건이 들어간다.
기본으로는 자동으로 될 수도 있는데, 일반적으로 true로 하는게 안전할 수 있다.
단, hibernate.jdbc.batch_versioned_data=true 일 경우 batch update 가 되긴하지만, 일부 JDBC 드라이버의 경우 업데이트된 필드 갯수를 오탐하는 경우도 있다고 한다. 이때는 false로 바꿔야 하며, @Version 이 있을 경우 batch update 는 작동하지 않게 된다.

권남

사이드바

목차

Hibernate Batch/Bulk Insert/Update

작동 원리

@Id 에 대한 주의점

기억할 점

기본 설정

관계에 대한 Batch Insert

Versioned Data

참조

권남

사용자 도구

사이트 도구

사이드바

목차

Hibernate Batch/Bulk Insert/Update

작동 원리

@Id 에 대한 주의점

기억할 점

기본 설정

관계에 대한 Batch Insert

Versioned Data

참조

문서 도구