While working on a Symfony project using Doctrine and MySQL, I discover that our application was relying on a doctrine method:
Doctrine query that could be translated like this:
SELECT * FROM User WHERE email COLLATE utf8mb4_0900_ai_ci = <?php echo strtolower('firstname.lastname@example.org'); ?>
How MYSQL manage this:
Assuming that “MySQL is insensitive” is a shortcut that you can easily spot when you check mysql documentation : the official documentation refer to the “default character set” and “collation” settings to define the case sensitivity. MySQL is talking about sensitivity which make it work like if the field was insensitive (default one is
utf8mb4_0900_ai_ci). So, when I hear "MySQL is key insensitive", I "agree" but it is not magic, it's about the way we've setup the database and our table/columns definition.
How PHP manage it:
strtolower function, even if it doesn't take any options or settings it can be impacted by the locale as mentioned in the documentation. (It would be strongly recommended to use
mb_strtolower instead when it come to manage string with accents)
At this point, we have:
MySQL have is own way and settings to manage case-sensitive string that may change depending the used version.
PHP have his own way and configuration to manage how he “lower” strings.
Example could be given on how they both manage differently the “lower” function:
Our issue with that…
Some sensitive features, including our identification process (the
getByEmail()), rely on the fact that MySQL settings and PHP settings must be aligned on how they are managing Case Sensitivity. That's something we mustn't rely on for many reasons (thinking about depending on the release version of both of them). We can have an idea on what could happen with a different configuration with our actual functionals and integration tests that are using SQLite which is "case sensitive" and already have differents behavior. Why using doctrine ORM if our code only work for MySQL ?
How to fix this ?
Using the couple
LOWER(user.email) = LOWER(:email) make our code more reliable and less coupling between our storage engine, his configuration and our PHP setting.
It will ensure that we’re comparing the same thing and could also perfectly work with SQLite, Redshift and other Doctrine ORM supported database engine.
Why it should be fixed:
Trusting Mysql case sensitivity settings for managing some sensitive feature like an identification process, which is not explicit in the code and could be improved in the code too, is not a good idea.
That’s exactly the value of using an ORM: making some queries database agnostic and abstract our database and their settings.
Performance issue with old version of MySQL:
And thats normal: asking mysql engine to identify similar lowercase string make it lowering the value on all the rows. Best alternative is to create an index using expression key to have a functionnal index (introduced in mysql 8.0.13)
CREATE INDEX IDX_userloweremail on User ( (lower(email)) )
With this new index, performances are better and so you don’t have any excuse to continue to use php strtolower function.
Let’s go !