Post by Bruce MomjianPost by Peter GeogheganI have long advocated adopting ICU as our defacto standard "collation
provider", primarily so that we can directly control collations and
collation versioning. I think that doing this would solve many
problems. Besides, even SQLite has optional ICU support. PostgreSQL is
the only major database system that I'm aware of that relies on
operating system collations exclusively.
I am hopeful ICU has improved enough since we last researched that
support for it will soon be added.
There is a patch available that is not ready to be submitted, and
doesn't have a real advocate, but is at least enough to convince me
that it's very doable. Performance is certainly no impediment to
adopting ICU, even without considering that it effectively
re-introduces abbreviated keys for text when the C collation is not
used.
The best argument for ICU is the evidently lax attitude that the glibc
people have towards the correctness and consistency of their
collations:
https://bugzilla.redhat.com/show_bug.cgi?id=1320356#c3
Here, Carlos O'Donnell, a glic committer, says "Regarding (b), the
collations in glibc may change from build to build depending on
changes in the algorithms or locales. You cannot rely on the collation
stay the same once the process exits (nor can you rely upon it via a
shared memory mapping to another process sorting strings in memory)".
Frankly, we have no excuse for not heeding his warning.
I'm not annoyed at the glibc people for taking this position. There
is, quite simply, a misalignment of incentives. For the glibc people,
the assumption is that any problem with collations leads only to
slight annoyance from end users, as when the GUI produces subtly wrong
ordering. Whereas, for us, any inconsistency is an extremely serious
problem. Here we have the maintainers of glibc telling us that they
feel like it's okay that that can happen at any time. Surely that
isn't good enough.
ICU as a project has every incentive to see things the same way as we
do. The library explicitly decouples collation rule versions from
algorithm versions. All of this is carefully considered, for the
benefit of the numerous major database systems that use ICU.
--
Peter Geoghegan
--
Sent via pgsql-docs mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-docs