Database reports the following warning after an OS upgrade from RHEL7 to RHEL8 or to RHEL9:
WARNING: database "test" has a collation version mismatch
DETAIL: The database was created using collation version 2.17, but the operating system provides version 2.28.
HINT: Rebuild all objects in this database that use the default collation and run ALTER DATABASE cmds REFRESH COLLATION VERSION, or build PostgreSQL with the right library version.
Postgres
Greenplum 6.x
RedHat EL8
RedHat EL9
Refer to Note in the PostgreSQL documentation for ALTER COLLATION
When using the GNU C library for collations, the C library's version is used as a proxy for the collation version. Many Linux distributions change collation definitions only when upgrading the C library, but this approach is imperfect as maintainers are free to back-port newer collation definitions to older C library releases.
Reason for the error is described in Greenplum documentation Important Upgrade Considerations:
The GNU C Library, commonly known as glibc, is the GNU Project's implementation of the C standard library. Between EL 7 and 8, the version of glibc changes from 2.17 to 2.28, and between EL 7 and EL 9, the version of glibc changes from 2.17 to 2.34. These are major changes that impact many languages and their collations. The collation of a database specifies how to sort and compare strings of character data. A change in sorting for common languages can have a significant impact on PostgreSQL and Greenplum databases.
The following is for PostgreSQL databases. For Greenplum databases see Migrating VMware Greenplum from Enterprise Linux 7 to 8 or 9
The following query can be used to identify all collations in the current database that need to be refreshed and the objects that depend on them:
SELECT pg_describe_object(refclassid, refobjid, refobjsubid) AS "Collation",
pg_describe_object(classid, objid, objsubid) AS "Object"
FROM pg_depend d JOIN pg_collation c
ON refclassid = 'pg_collation'::regclass AND refobjid = c.oid
WHERE c.collversion <> pg_collation_actual_version(c.oid)
ORDER BY 1, 2;
Also below query can be helpful to specifically identify indexes for collation version refresh :
SELECT DISTINCT indrelid::regclass::text, indexrelid::regclass::text, collname, pg_get_indexdef(indexrelid)
FROM (SELECT indexrelid, indrelid, indcollation[i] coll FROM pg_index, generate_subscripts(indcollation, 1) g(i)) s
JOIN pg_collation c ON coll=c.oid
WHERE collprovider IN ('d', 'c') AND collname NOT IN ('C', 'POSIX');
The indexes listed by the query would need to be rebuilt with REINDEX.
Refer to the PostgreSQL ALTER COLLATION notes:
A change in collation definitions can lead to corrupt indexes and other problems because the database system relies on stored objects having a certain sort order. Generally, this should be avoided, but it can happen in legitimate circumstances, such as when upgrading the operating system to a new major version or when using pg_upgrade to upgrade to server binaries linked with a newer version of ICU. When this happens, all objects depending on the collation should be rebuilt, for example, using REINDEX. When that is done, the collation version can be refreshed using the command ALTER COLLATION ... REFRESH VERSION. This will update the system catalog to record the current collation version and will make the warning go away. Note that this does not actually check whether all affected objects have been rebuilt correctly.
There is no risk of data corruption running the "ALTER COLLATION dbname REFRESH VERSION" once all the indexes have been rebuilt as the command does not do any read/write of the user tables in the database.