JPA2: Case-insensitive like matching anywhere

JavaCriteriaEclipselinkJpa 2.0Hibernate Criteria

Java Problem Overview


I have been using Hibernate Restrictions in JPA 1.0 ( Hibernate driver ). There is defined Restrictions.ilike("column","keyword", MatchMode.ANYWHERE) which tests if the keyword matching the column anywhere and it is case-insensitive.

Now, I am using JPA 2.0 with EclipseLink as driver so I have to use "Restrictions" build-in JPA 2.0. I found CriteriaBuilder and method like, I have also found out how to make it matching anywhere ( although it is aweful and manual ), but still I haven't figured out how to do it case-insensitive.

There is my current aweful solution:

CriteriaBuilder builder = em.getCriteriaBuilder();
CriteriaQuery<User> query = builder.createQuery(User.class);
EntityType<User> type = em.getMetamodel().entity(User.class);
Root<User> root = query.from(User.class);

// Where   
// important passage of code for question  
query.where(builder.or(builder.like(root.get(type.getDeclaredSingularAttribute("username", String.class)), "%" + keyword + "%"),
        builder.like(root.get(type.getDeclaredSingularAttribute("firstname", String.class)), "%" + keyword + "%"),
        builder.like(root.get(type.getDeclaredSingularAttribute("lastname", String.class)), "%" + keyword + "%")
        ));

// Order By
query.orderBy(builder.asc(root.get("lastname")),
            builder.asc(root.get("firstname")));

// Execute
return em.createQuery(query).
            setMaxResults(PAGE_SIZE + 1).
            setFirstResult((page - 1) * PAGE_SIZE).
            getResultList();

Questions:

Is there any function like in Hibernate driver?

Am I using the JPA 2.0 criteria correctly? This is awkward and uncomfortable solution in compare to Hibernate Restrictions.

Or can anybody help me how to change my solution to be case-insensitive, please?

Thanks a lot.

Java Solutions


Solution 1 - Java

It may seem a little awkward at first, but it is type-safe. Building queries from strings isn't, so you notice errors at runtime instead of at compile time. You can make the queries more readable by using indentations or taking each step separately, instead of writing an entire WHERE clause in a single line.

To make your query case-insensitive, convert both your keyword and the compared field to lower case:

query.where(
    builder.or(
        builder.like(
            builder.lower(
                root.get(
                    type.getDeclaredSingularAttribute("username", String.class)
                )
            ), "%" + keyword.toLowerCase() + "%"
        ), 
        builder.like(
            builder.lower(
                root.get(
                    type.getDeclaredSingularAttribute("firstname", String.class)
                )
            ), "%" + keyword.toLowerCase() + "%"
        ), 
        builder.like(
            builder.lower(
                root.get(
                    type.getDeclaredSingularAttribute("lastname", String.class)
                )
            ), "%" + keyword.toLowerCase() + "%"
        )
    )
);

Solution 2 - Java

As I commented in the (currently) accepted answer, there is a pitfall using on one hand DBMS' lower() function and on the other hand java's String.toLowerCase() as both method are not warrantied to provide the same output for the same input string.

I finally found a much safer (yet not bullet-proof) solution which is to let the DBMS do all the lowering using a literal expression:

builder.lower(builder.literal("%" + keyword + "%")

So the complete solution would look like :

query.where(
    builder.or(
        builder.like(
            builder.lower(
                root.get(
                    type.getDeclaredSingularAttribute("username", String.class)
                )
            ), builder.lower(builder.literal("%" + keyword + "%")
        ), 
        builder.like(
            builder.lower(
                root.get(
                    type.getDeclaredSingularAttribute("firstname", String.class)
                )
            ), builder.lower(builder.literal("%" + keyword + "%")
        ), 
        builder.like(
            builder.lower(
                root.get(
                    type.getDeclaredSingularAttribute("lastname", String.class)
                )
            ), builder.lower(builder.literal("%" + keyword + "%")
        )
    )
);

Edit:
As @cavpollo requested me to give example, I had to think twice about my solution and realized it's not that much safer than the accepted answer:

DB value* | keyword | accepted answer | my answer
------------------------------------------------
elie     | ELIE    | match           | match
Élie     | Élie    | no match        | match
Élie     | élie    | no match        | no match
élie     | Élie    | match           | no match

Still, I prefer my solution as it does not compare the outcome out two different functions that are supposed to work alike. I apply the very same function to all character arrays so that comparing the output become more "stable".

A bullet-proof solution would involve locale so that SQL's lower() become able to correctly lower accented characters. (But this goes beyond my humble knowledge)

*Db value with PostgreSQL 9.5.1 with 'C' locale

Solution 3 - Java

This work for me :

CriteriaBuilder critBuilder = em.getCriteriaBuilder();

CriteriaQuery<CtfLibrary> critQ = critBuilder.createQuery(Users.class);
Root<CtfLibrary> root = critQ.from(Users.class);

Expression<String> path = root.get("lastName");
Expression<String> upper =critBuilder.upper(path);
Predicate ctfPredicate = critBuilder.like(upper,"%stringToFind%");
critQ.where(critBuilder.and(ctfPredicate));
em.createQuery(critQ.select(root)).getResultList();

Solution 4 - Java

Easier and more efficient to enforce case insensitity within the database than JPA.

  1. Under the SQL 2003, 2006, 2008 standards, can do this by adding COLLATE SQL_Latin1_General_CP1_CI_AS OR COLLATE latin1_general_cs to the following:
  1. In Oracle, can set NLS Session/Configuration parameters

      SQL> ALTER SESSION SET NLS_COMP=LINGUISTIC;
      SQL> ALTER SESSION SET NLS_SORT=BINARY_CI;
      SQL> SELECT ename FROM emp1 WHERE ename LIKE 'McC%e';
    
      ENAME
      ----------------------
      McCoye
      Mccathye
    

    Or, in init.ora (or OS-specific name for initialization parameter file):

     NLS_COMP=LINGUISTIC
     NLS_SORT=BINARY_CI
    

    Binary sorts can be case-insensitive or accent-insensitive. When you specify BINARY_CI as a value for NLS_SORT, it designates a sort that is accent-sensitive and case-insensitive. BINARY_AI designates an accent-insensitive and case-insensitive binary sort. You may want to use a binary sort if the binary sort order of the character set is appropriate for the character set you are using. Use the NLS_SORT session parameter to specify a case-insensitive or accent-insensitive sort:

     Append _CI to a sort name for a case-insensitive sort.
     Append _AI to a sort name for an accent-insensitive and case-insensitive sort. 
    

    For example, you can set NLS_SORT to the following types of values:

     FRENCH_M_AI
     XGERMAN_CI
    

Setting NLS_SORT to anything other than BINARY [with optional _CI or _AI] causes a sort to use a full table scan, regardless of the path chosen by the optimizer. BINARY is the exception because indexes are built according to a binary order of keys. Thus the optimizer can use an index to satisfy the ORDER BY clause when NLS_SORT is set to BINARY. If NLS_SORT is set to any linguistic sort, the optimizer must include a full table scan and a full sort in the execution plan.

Or, if NLS_COMP is set to LINGUISTIC, as above, then sort settings can be applied locally to indexed columns, rather than globally across the database:

    CREATE INDEX emp_ci_index ON emp (NLSSORT(emp_name, 'NLS_SORT=BINARY_CI'));

Reference: ORA 11g Linguistic Sorting and String Searching ORA 11g Setting Up a Globalization Support Environment

Solution 5 - Java

If you are using a database like Postgres which supports ilike which provides a much better performance as using the lower() function none of the provided solution solves the issue properly.

A solution can be a custom function.

The HQL query you are writing is:

SELECT * FROM User WHERE (function('caseInSensitiveMatching', name, '%test%')) = true

Where the caseInSensitiveMatching is the function name of our custom function. The name is the path to the property which you want to compare with and the %test% is the pattern which you want to match it against.

The goal is to convert the HQL query into the following SQL query:

SELECT * FROM User WHERE (name ilike '%test%') = true

To achieve this we have to implement our own dialect with our custom function registered:

    public class CustomPostgreSQL9Dialect extends PostgreSQL9Dialect {
    	/**
    	 * Default constructor.
    	 */
    	public CustomPostgreSQL9Dialect() {
    		super();
    		registerFunction("caseInSensitiveMatching", new CaseInSensitiveMatchingSqlFunction());
    	}
    
    	private class CaseInSensitiveMatchingSqlFunction implements SQLFunction {
    
    		@Override
    		public boolean hasArguments() {
    			return true;
    		}
    
    		@Override
    		public boolean hasParenthesesIfNoArguments() {
    			return true;
    		}
    
    		@Override
    		public Type getReturnType(Type firstArgumentType, Mapping mapping) throws QueryException {
    			return StandardBasicTypes.BOOLEAN;
    		}
    
    		@Override
    		public String render(Type firstArgumentType, @SuppressWarnings("rawtypes") List arguments,
    				SessionFactoryImplementor factory) throws QueryException {
    
    			if (arguments.size() != 2) {
    				throw new IllegalStateException(
    						"The 'caseInSensitiveMatching' function requires exactly two arguments.");
    			}
    
    			StringBuilder buffer = new StringBuilder();
    
    			buffer.append("(").append(arguments.get(0)).append(" ilike ").append(arguments.get(1)).append(")");
    
    			return buffer.toString();
    		}
    
    	}
    
    }

The above optimization produced in our situation a performance improvement of a factor of 40 compared to the version with the lower function as Postgres could leverage the index on the corresponding column. In our situation the query execution time could be reduced from 4.5 seconds to 100 ms.

The lower prevents an efficient usage of the index and as such it is much slower.

Solution 6 - Java

Desperated workaround for OpenJPA 2.3.0 and Postgresql

public class OpenJPAPostgresqlDictionaryPatch extends PostgresDictionary {

  @Override
  public SQLBuffer toOperation(String op, SQLBuffer selects, SQLBuffer from, SQLBuffer where, SQLBuffer group, SQLBuffer having, SQLBuffer order, boolean distinct, long start, long end, String forUpdateClause, boolean subselect) {
    String whereSQL = where.getSQL();
    int p = whereSQL.indexOf("LIKE");
    int offset = 0;
    while (p != -1) {
      where.replaceSqlString(p + offset, p + offset + 4, "ILIKE");
      p = whereSQL.indexOf("LIKE", p + 1);
      offset++;
    }
    return super.toOperation(op, selects, from, where, group, having, order, distinct, start, end, forUpdateClause, subselect);
  }

}

This is a fragile and ugly workaround for doing case insensitive LIKE operation with OpenJPA and Postgresql database. It replaces the LIKE operator to ILIKE operator in the generated SQL.

It is too bad that OpenJPA DBDictionary does not allow to change operator names.

Solution 7 - Java

to use the approach of Thomas Hunziker with the criteria builder of hibernate you can provide a specific predicate implementation like the following

public class ILikePredicate extends AbstractSimplePredicate implements Serializable {

    private final Expression<String> matchExpression;

    private final Expression<String> pattern;

    public ILikePredicate(
        CriteriaBuilderImpl criteriaBuilder,
        Expression<String> matchExpression,
        Expression<String> pattern) {
        super(criteriaBuilder);
        this.matchExpression = matchExpression;
        this.pattern = pattern;
    }

    public ILikePredicate(
        CriteriaBuilderImpl criteriaBuilder,
        Expression<String> matchExpression,
        String pattern) {
        this(criteriaBuilder, matchExpression, new LiteralExpression<>(criteriaBuilder, pattern));
    }

    public Expression<String> getMatchExpression() {
        return matchExpression;
    }

    public Expression<String> getPattern() {
        return pattern;
    }

    @Override
    public void registerParameters(ParameterRegistry registry) {
        Helper.possibleParameter(getMatchExpression(), registry);
        Helper.possibleParameter(getPattern(), registry);
    }

    @Override
    public String render(boolean isNegated, RenderingContext renderingContext) {
        String match = ((Renderable) getMatchExpression()).render(renderingContext);
        String pattern = ((Renderable) getPattern()).render(renderingContext);
        return String.format("function('caseInSensitiveMatching', %s, %s) = %s", match, pattern, !isNegated);
    }
}

Solution 8 - Java

As weltraumpirat answer, in short for each desired field on your Root add the following Predicate to your predicate list

criteriaBuilder.like(criteriaBuilder.lower(root.get(<desired field on your root>)), "%" + text.toLowerCase(Locale.ROOT) + "%")

Then obtain TypedQuery with desired OR-AND as follow

entityManager.createQuery(criteriaQuery.where(criteriaBuilder.and(predicateList.toArray(new Predicate[]{}))));

Solution 9 - Java

Please consider to use

CriteriaBuilder.like(Expression<String> x, Expression<String> pattern, char escapeChar);

for matching anywhere.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionGaimView Question on Stackoverflow
Solution 1 - JavaweltraumpiratView Answer on Stackoverflow
Solution 2 - JavaGhurdylView Answer on Stackoverflow
Solution 3 - JavaalpankoView Answer on Stackoverflow
Solution 4 - JavaGlen BestView Answer on Stackoverflow
Solution 5 - JavaThomas HunzikerView Answer on Stackoverflow
Solution 6 - JavamnesarcoView Answer on Stackoverflow
Solution 7 - JavaDanny GräfView Answer on Stackoverflow
Solution 8 - JavaLunaticView Answer on Stackoverflow
Solution 9 - JavaPhuong TranView Answer on Stackoverflow