Collation Challenges: Sorting it Out
Background: "libc" is commonly used as a shorthand for the "standard C library", a library of standard functions that can be used by all C programs. glibc is the GNU C Library implementation, which is used on all major Linux distributions (e.g. AL, RHEL, Debian/Ubuntu, SuSE). The glibc library, libc.so, provides most of the foundational C routines such as open, read, write, malloc, printf, and literally thousands more. It also provides the interface to the Linux kernel via syscalls.
For the purposes of this talk, the facility of interest is the locale functionality, and more specifically the functions that provide string sorting according to localized collation rules. In order for PostgreSQL to work durably and correctly, sort order must be determinant and immutable. Since glibc implements the sort order, if/when glibc changes the sort order from one version to the next, it breaks the contract with PostgreSQL, and thereby causes data corruption. Indexes that have been persisted to storage may now memorialize the data in the wrong order according to the currently installed version of glibc.
Proposed Solution: A solution, outlined in this talk, demonstrates a method to build a collation compatibility library on a system with a very specific glibc base-version. That may then be used on another Linux system to provide stable collation, and thus avoid breakage due to glibc and/or OS upgrades.