up | Inhaltsverzeichniss | Kommentar

Manual page for COLLDEF(8)

colldef - convert collation sequence source definition


/usr/etc/colldef filename


colldef converts a collation sequence source definition into a format usable by the strxfrm() and strcoll.3 functions. It is used to define the many ways in which strings can be ordered and collated. strxfrm() transforms its first argument and places the result in its second argument. The transformed string is such that it can be correctly ordered with other transformed strings by using strcmp(), strncmp(), or memcmp() (see string.3 and memory.3 strcoll.3 transforms its arguments and does a comparison.

colldef reads the collation sequence source definition from the standard input and stores the converted definition in filename. The output file produced contains the database with collating sequence information in a form usable by system commands and routines.

The collation sequence definition specifies a set of collating elements and the rules defining how strings containing these should be ordered. This is most useful for different language definitions.

The colldef command can support languages whose mapping and collating sequences can be described by the following cases:


The specification file can consist of three statements: charmap, substitute, and order. Of these, only the order statement is required. When charmap or substitute is supplied, these statements must be ordered as above. Any statements after the order statement are ignored.

Lines in the specification file beginning with a # are treated as comments and are ignored. Blank lines are also ignored.

charmap charmapfile
charmap defines where a mapping of the character and collating element symbols to the actual character encoding can be found. The charmapfile filename cannot be a keyword (for example, substitute, order, or with) or special symbols (for example, ..., ;, <, >, or ,).

The format of charmapfile is shown below. Symbol names are separated from their values by TAB or SPACE characters. symbol-value can be specified in a hexadecimal (\x??) or octal (\???) representation, and can be only one character in length.

	symbol-name1	symbol-value1
	symbol-name2	symbol-value2

The following sample charmapfile maps the symbol names, c, h, H, and A-grave, to their respective symbol values.

	c	\x63
	h 	\x68
	H	\110
	A-grave	\300

The symbol names defined in charmapfile can be used in order statements by enclosing the symbol name in angle brackets, <symbol-name>. For example,

	order	(a, <A-grave>);b;<c>;...;<h>;<H>;i;...;z

This statement is equivalent to,

	order	(a, A`);b;c;...;h;H;i;...;z

Symbol names cannot be specified in substitute fields. Symbol names also cannot be combined with any other representation, such as, <c>h, c<h>, <c>\x68, or <c><h>. Symbol names can be used with primary and secondary ordering as in the following example.

	order  a;b;c;(<c>,<h>);d;...;z;\

The charmap statement is optional.

substitute char with repl

The substitute statement substitutes the character char with the string repl.

The simple use of the substitute statement mentioned above substituted a single character with two characters, as with the substitution of ß with ss in German.

substitute "ß" with "ss"

This statement can also be used to specify characters to be ignored by mapping them to the null string.

substitute "m" with ""

This is convenient for simplifying order statements. When used with the statement below, the lower-case m is ignored -- even though it is implicitly included in the order statement.

	order a;...;z

Without the null string mapping statement above, this would be specified as,

	order a;...;l;n;...;z

The substitute statement is optional.

order order_list

order_list is a list of symbols, separated by semicolons, that defines the collating sequence. The special symbol, ..., specifies, in a short-hand form, symbols that are sequential in machine code order. The following example specifies the list of lower-case letters.

order a;b;c;d;...;x;y;z

Of course, this could be further compressed to just a;...;z.

A symbol can be up to two characters in length and can be represented in any one of the following ways:

Any combination of these may be used as well.

The backslash character, \, is used for continuation. In this case, no characters are permitted after the backslash character.

Symbols enclosed in parentheses are assigned the same primary ordering but different secondary ordering. Symbols enclosed in curly brackets are assigned only the same primary ordering. For example,

	order a;b;c;ch;d;(e,e`);f;...;z;\

In the above example, e and e` are assigned the same primary ordering and different secondary ordering, and digits 1 through 9 are assigned the same primary ordering and no secondary ordering. Note that the ellipses cannot be specified within curly brackets. Only primary ordering is assigned to the remaining symbols. Notice how double letters can be specified in the collating sequence (letter ch comes between c and d).

If a character is not included in the order statement it is excluded from the ordering and will be ignored during sorting.


The following example shows the collation specification required to support a hypothetical telephone book sorting sequence.

The sorting sequence is defined by the following rules:

The input specification file for this example contains:

substitute "0" with "zero"
substitute "1" with "one"
substitute "2" with "two"
substitute "3" with "three"
substitute "4" with "four"
substitute "5" with "five"
substitute "6" with "six"
substitute "7" with "seven"
substitute "8" with "eight"
substitute "9" with "nine"

order A;a;B;b;C;c;CH;Ch;ch;D;d;E;e;F;f;\


colldef exits with the following values:

No errors were found and the output was successfully created.
Errors were found.


standard private location for collation orders under the locale locale
standard shared location for collation orders under the locale locale


memory.3 strcoll.3 string.3

[a manual with the abbreviation SSO]

index | Inhaltsverzeichniss | Kommentar

Created by unroff & hp-tools. © by Hans-Peter Bischof. All Rights Reserved (1997).

Last modified 21/April/97