Discussion:
[DOCS] Pg_upgrade and collation
(too old to reply)
Bruce Momjian
2016-06-17 15:43:11 UTC
Permalink
The attached patch documents that pg_upgrade requires old/new servers to
use compatibile collation library versions as well.

I would like to apply this to all PG branches.
--
Bruce Momjian <***@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+ Ancient Roman grave inscription +
Alvaro Herrera
2016-06-17 21:51:54 UTC
Permalink
Post by Bruce Momjian
The attached patch documents that pg_upgrade requires old/new servers to
use compatibile collation library versions as well.
--- 61,68 ----
checking for compatible compile-time settings, including 32/64-bit
binaries. It is important that
any external modules are also binary compatible, though this cannot
! be checked by <application>pg_upgrade</>. Compatible collation
! library versions must also be used.
</para>
I think it would be useful to indicate what to do if they are not
compatible.
--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
--
Sent via pgsql-docs mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-docs
Bruce Momjian
2016-06-17 22:01:59 UTC
Permalink
Post by Alvaro Herrera
Post by Bruce Momjian
The attached patch documents that pg_upgrade requires old/new servers to
use compatibile collation library versions as well.
Well, this is a much larger issue than pg_upgrade, e.g. moving a data
directory from one cluster to another with a different collation library
version could also cause problems, and I don't know that is documented
at all.

If we want to go larger, we have to do this in a more central location.
Post by Alvaro Herrera
Post by Bruce Momjian
--- 61,68 ----
checking for compatible compile-time settings, including 32/64-bit
binaries. It is important that
any external modules are also binary compatible, though this cannot
! be checked by <application>pg_upgrade</>. Compatible collation
! library versions must also be used.
</para>
I think it would be useful to indicate what to do if they are not
compatible.
The indexes don't work reliably. We don't document what happens if
shared objects don't match either, but again, if we want to clarify
this, we need to do it more centrally. Ideas?
--
Bruce Momjian <***@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+ Ancient Roman grave inscription +
--
Sent via pgsql-docs mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-docs
Bruce Momjian
2016-06-17 22:09:35 UTC
Permalink
Post by Bruce Momjian
Post by Bruce Momjian
The attached patch documents that pg_upgrade requires old/new servers to
use compatibile collation library versions as well.
Well, this is a much larger issue than pg_upgrade, e.g. moving a data
directory from one cluster to another with a different collation library
version could also cause problems, and I don't know that is documented
at all.
If we want to go larger, we have to do this in a more central location.
Frankly, pg_upgrade is, by definition, upgrading on the same server, so
I don't even see how they could have mismatched collation library
versions, but it seemed good to document it. The larger issue of moving
clusters is a separate issue that needs documentation somewhere else.
--
Bruce Momjian <***@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+ Ancient Roman grave inscription +
--
Sent via pgsql-docs mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-docs
Alvaro Herrera
2016-06-17 22:11:58 UTC
Permalink
Post by Bruce Momjian
Post by Bruce Momjian
Post by Bruce Momjian
The attached patch documents that pg_upgrade requires old/new servers to
use compatibile collation library versions as well.
Well, this is a much larger issue than pg_upgrade, e.g. moving a data
directory from one cluster to another with a different collation library
version could also cause problems, and I don't know that is documented
at all.
If we want to go larger, we have to do this in a more central location.
Frankly, pg_upgrade is, by definition, upgrading on the same server, so
I don't even see how they could have mismatched collation library
versions, but it seemed good to document it.
By this argument, the proposed patch seems pointless to me.
Post by Bruce Momjian
The larger issue of moving clusters is a separate issue that needs
documentation somewhere else.
Sure.
--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
--
Sent via pgsql-docs mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-docs
Bruce Momjian
2016-06-20 15:16:36 UTC
Permalink
Post by Alvaro Herrera
Post by Bruce Momjian
Frankly, pg_upgrade is, by definition, upgrading on the same server, so
I don't even see how they could have mismatched collation library
versions, but it seemed good to document it.
By this argument, the proposed patch seems pointless to me.
Post by Bruce Momjian
The larger issue of moving clusters is a separate issue that needs
documentation somewhere else.
Sure.
In looking at the docs, it seems it would go in the Backup section
somewhere:

https://www.postgresql.org/docs/9.6/static/backup.html

Seems it would apply to both of these backup sections:

24.2. File System Level Backup
24.3. Continuous Archiving and Point-in-Time Recovery (PITR)

and also here:

25.2. Log-Shipping Standby Servers

It seems odd to put it in all of these places, but where can we
centrally put it?
--
Bruce Momjian <***@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+ Ancient Roman grave inscription +
--
Sent via pgsql-docs mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-docs
Peter Geoghegan
2016-06-28 21:58:58 UTC
Permalink
On Fri, Jun 17, 2016 at 2:51 PM, Alvaro Herrera
Post by Bruce Momjian
--- 61,68 ----
checking for compatible compile-time settings, including 32/64-bit
binaries. It is important that
any external modules are also binary compatible, though this cannot
! be checked by <application>pg_upgrade</>. Compatible collation
! library versions must also be used.
</para>
Unfortunately, the reality is that as things stand, there is no way to
test compatibility on all platforms. Glibc does have a notion of
collation versioning, though [1].

I have long advocated adopting ICU as our defacto standard "collation
provider", primarily so that we can directly control collations and
collation versioning. I think that doing this would solve many
problems. Besides, even SQLite has optional ICU support. PostgreSQL is
the only major database system that I'm aware of that relies on
operating system collations exclusively.

I've avoided committing to work on it because I'm concerned that it
would not be well received.

[1] https://www.gnu.org/software/autoconf/manual/autoconf-2.63/html_node/Special-Shell-Variables.html
--
Peter Geoghegan
--
Sent via pgsql-docs mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-docs
Bruce Momjian
2016-06-28 22:20:11 UTC
Permalink
Post by Peter Geoghegan
On Fri, Jun 17, 2016 at 2:51 PM, Alvaro Herrera
Post by Bruce Momjian
--- 61,68 ----
checking for compatible compile-time settings, including 32/64-bit
binaries. It is important that
any external modules are also binary compatible, though this cannot
! be checked by <application>pg_upgrade</>. Compatible collation
! library versions must also be used.
</para>
Unfortunately, the reality is that as things stand, there is no way to
test compatibility on all platforms. Glibc does have a notion of
collation versioning, though [1].
Yes, the patch text is clearly weasel-words in that we can't explain how
to detect incompatible.
Post by Peter Geoghegan
I have long advocated adopting ICU as our defacto standard "collation
provider", primarily so that we can directly control collations and
collation versioning. I think that doing this would solve many
problems. Besides, even SQLite has optional ICU support. PostgreSQL is
the only major database system that I'm aware of that relies on
operating system collations exclusively.
I am hopeful ICU has improved enough since we last researched that
support for it will soon be added.
--
Bruce Momjian <***@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+ Ancient Roman grave inscription +
--
Sent via pgsql-docs mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-docs
Peter Geoghegan
2016-06-28 22:43:18 UTC
Permalink
Post by Bruce Momjian
Post by Peter Geoghegan
I have long advocated adopting ICU as our defacto standard "collation
provider", primarily so that we can directly control collations and
collation versioning. I think that doing this would solve many
problems. Besides, even SQLite has optional ICU support. PostgreSQL is
the only major database system that I'm aware of that relies on
operating system collations exclusively.
I am hopeful ICU has improved enough since we last researched that
support for it will soon be added.
There is a patch available that is not ready to be submitted, and
doesn't have a real advocate, but is at least enough to convince me
that it's very doable. Performance is certainly no impediment to
adopting ICU, even without considering that it effectively
re-introduces abbreviated keys for text when the C collation is not
used.

The best argument for ICU is the evidently lax attitude that the glibc
people have towards the correctness and consistency of their
collations:

https://bugzilla.redhat.com/show_bug.cgi?id=1320356#c3

Here, Carlos O'Donnell, a glic committer, says "Regarding (b), the
collations in glibc may change from build to build depending on
changes in the algorithms or locales. You cannot rely on the collation
stay the same once the process exits (nor can you rely upon it via a
shared memory mapping to another process sorting strings in memory)".
Frankly, we have no excuse for not heeding his warning.

I'm not annoyed at the glibc people for taking this position. There
is, quite simply, a misalignment of incentives. For the glibc people,
the assumption is that any problem with collations leads only to
slight annoyance from end users, as when the GUI produces subtly wrong
ordering. Whereas, for us, any inconsistency is an extremely serious
problem. Here we have the maintainers of glibc telling us that they
feel like it's okay that that can happen at any time. Surely that
isn't good enough.

ICU as a project has every incentive to see things the same way as we
do. The library explicitly decouples collation rule versions from
algorithm versions. All of this is carefully considered, for the
benefit of the numerous major database systems that use ICU.
--
Peter Geoghegan
--
Sent via pgsql-docs mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-docs
Alvaro Herrera
2016-06-28 22:50:15 UTC
Permalink
Post by Peter Geoghegan
The best argument for ICU is the evidently lax attitude that the glibc
people have towards the correctness and consistency of their
https://bugzilla.redhat.com/show_bug.cgi?id=1320356#c3
Here, Carlos O'Donnell, a glic committer, says "Regarding (b), the
collations in glibc may change from build to build depending on
changes in the algorithms or locales. You cannot rely on the collation
stay the same once the process exits (nor can you rely upon it via a
shared memory mapping to another process sorting strings in memory)".
Frankly, we have no excuse for not heeding his warning.
I'm not annoyed at the glibc people for taking this position. There
is, quite simply, a misalignment of incentives. For the glibc people,
the assumption is that any problem with collations leads only to
slight annoyance from end users, as when the GUI produces subtly wrong
ordering. Whereas, for us, any inconsistency is an extremely serious
problem. Here we have the maintainers of glibc telling us that they
feel like it's okay that that can happen at any time. Surely that
isn't good enough.
Uhmm. Until now I saw all this ICU thing as having fringe benefit on
strange platforms only, but it is seeming more and more like we need to
take it seriously. I'm not prepared to spend effort on it myself,
though.
--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
--
Sent via pgsql-docs mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-docs
Peter Geoghegan
2016-06-28 23:06:49 UTC
Permalink
On Tue, Jun 28, 2016 at 3:50 PM, Alvaro Herrera
Post by Alvaro Herrera
Uhmm. Until now I saw all this ICU thing as having fringe benefit on
strange platforms only, but it is seeming more and more like we need to
take it seriously. I'm not prepared to spend effort on it myself,
though.
Let me put it this way: If we lived in a world where
internationalization was a new idea, and someone proposed collation
support that relied on the OS today, the patch would be rejected in
about 2 minutes. The author would be pointed in the direction of
"Notes to Operator Class Implementors" within the nbtree README.

There are numerous user-visible benefits to ICU support, too, like:

* Case-insensitive collations become possible (with work in other
areas). No more contrib/citext hack. This is something that we seem to
want to work towards.

* Abbreviated keys in indexes with collated text becomes possible.
(Already mentioned that abbreviated keys for collated text + sorting
are effectively reintroduced.)

* More useful collations available for certain languages, such as
Japanese. Apparently, the JIS X 4061 algorithm produces results that
Japanese people find more useful, but glibc doesn't support it, and
never will.

* We might be able to document WAL compatibility usefully, now. The
documentation never gets around to explaining what two instances are
compatible for the purposes of physical replication. I can't think of
any other factor that prevents us from locking that down.

* Upgrade major OS versions without difficulty.

* User-defined collations, where you can mix and match certain facets
of how text is sorted as you please. Basically, ICU offers rich
functionality that we can bubble up to our users without too much
effort, as other database systems have.
--
Peter Geoghegan
--
Sent via pgsql-docs mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-docs
Bruce Momjian
2016-07-02 15:23:23 UTC
Permalink
Post by Bruce Momjian
In looking at the docs, it seems it would go in the Backup section
https://www.postgresql.org/docs/9.6/static/backup.html
24.2. File System Level Backup
24.3. Continuous Archiving and Point-in-Time Recovery (PITR)
25.2. Log-Shipping Standby Servers
It seems odd to put it in all of these places, but where can we
centrally put it?
In looking at the docs, I found that the section "Creating a Database
Cluster", which covers initdb and collations, as the best place to put
this warning. Patch attached.
Patch applied and backpatched.
--
Bruce Momjian <***@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+ Ancient Roman grave inscription +
--
Sent via pgsql-docs mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-docs
Peter Eisentraut
2016-07-09 14:02:24 UTC
Permalink
Post by Peter Geoghegan
I have long advocated adopting ICU as our defacto standard "collation
provider", primarily so that we can directly control collations and
collation versioning. I think that doing this would solve many
problems.
I plan to submit a patch for ICU support for September.
--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
--
Sent via pgsql-docs mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-docs
Peter Geoghegan
2016-07-09 21:48:04 UTC
Permalink
On Sat, Jul 9, 2016 at 7:02 AM, Peter Eisentraut
Post by Peter Eisentraut
Post by Peter Geoghegan
I have long advocated adopting ICU as our defacto standard "collation
provider", primarily so that we can directly control collations and
collation versioning. I think that doing this would solve many
problems.
I plan to submit a patch for ICU support for September.
That's fantastic news! Your knowledge of packaging will be useful
here. I will review your patch.
--
Peter Geoghegan
--
Sent via pgsql-docs mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-docs
Loading...