up | Inhaltsverzeichniss | Kommentar

Manual page for CHRTBL(8)

chrtbl - generate character classification table


/usr/etc/chrtbl [ filename ]


chrtbl converts a source description of a character classification table into a form that can be used by the character classification functions and multibyte functions (see ctype.3v and mblen.3 The source description is found in filename. If filename is not given, or just given as `-', chrtbl reads its source description from the standard input.

chrtbl creates one or two output files, the second file is only created if the model token is specified. By default, these files are created in the current working directory. The first file, named by the chrclass token, is always produced and contains the character classification information for all single-byte (7-bit and 8-bit) character code-sets described by one setting of the LC_CTYPE category of locale. The second file, created if the model token is specified, contains information relating to details of width and structure of the coded character set currently under definition. The second file is named by appending `.ci'. to the value specified by the chrclass token.

The first output file contains a binary form of the character classification information described in filename. It is structured in such a way that it can be used at run-time to replace the active version of the ctype[] array in the C-library, For it to be understood at run-time, the output file must be moved to the /usr/share/lib/locale/LC_TYPE or /etc/locale directory (see FILES below) by the super-user or a member of group bin. This file must be readable by user, group, and other; no other permission should be set.

filename contains a sequence of tokens in any order after the chrclass token, each separated by one or more NEWLINE characters or comment lines. The tokens recognized by chrtbl are as follows:

chrclass name
name is the filename or pathname of the character classification file. This is a mandatory token. It must be the first token to be defined, and is usually given the name that relates to a valid setting of the LC_CTYPE category of locale.
model name,args
This optional token chooses the type of character code-set announcement mechanism associated with the character classification table generated by chrtbl. The name of the file created by this token is the name specified by the chrclass token, concatenated with a `.ci'. The arguments to model must be one of the following:
euc x,y,z
The model file contains information describing the required setting for the Extended Unix code-set announcement mechanism. x,y,z relate to the storage widths (in bytes) of EUC code-sets 1, 2 and 3 respectively.
The model file contains information describing the Xerox Character Code Standard (XC1-3-3-0) announcement mechanism. There are no additional arguments required.
iso2022 g0,g1,g2,g3 x
The model file contains information describing a generative version of the ISO-2022 code set announcement mechanism. The multibyte functions driven by this model are capable of handling the standard one or more byte escape sequences as well as all of the standard shift functions. The four arguments g0,g1,g2,g3 define the default width (in bytes) of the four designations (respectively) available under ISO-2022, Maximum integer value of any of these arguments is 2. The fianl argument x is mandatory and must be set to either 7 or 8. It selects the default bit-width of each byte on input and output to/from the multibyte functions.

If the model token is declared without arguments, then it is assumed that there is a set of user-defined rules for character code-set announcement. This is noted in the output file and will be later used to fold in user-defined code into the multibyte functions in the C-library (see mblen.3

Character codes to be classified as upper-case letters.
Character codes to be classified as lower-case letters.
Character codes to be classified as numeric.
Character codes to be classified as a spacing (delimiter) character.
Character codes to be classified as a punctuation character.
Character codes to be classified as a control character.
Character code for the space character.
Character codes to be classified as hexadecimal digits.
Relationship between upper- and lower-case characters.

Any lines with the number sign (#) in the first column are treated as comments and are ignored. Blank lines are also ignored.

A character can be represented as a hexadecimal or octal constant (for example, the letter a can be represented as 0x61 in hexadecimal or 0141 in octal). Hexadecimal and octal constants may be separated by one or more space and tab characters.

The dash (-) may be used to indicate a range of consecutive numbers. Zero or more space characters may be used for separating the dash character from the numbers.

The backslash character (\) is used for line continuation. Only a RETURN is permitted after the backslash character.

The relationship between upper- and lower-case letters (ul) is expressed as ordered pairs of octal and hexadecimal constants:

<upper-case_character lower-case_character>

These two constants may be separated by one or more space characters. Zero or more space characters may be used for separating the angle brackets (<>) from the numbers.


The following is an example of an input file used to create the ASCII code set definition table on a file named ascii.

chrclass	ascii
isupper	0x41 - 0x5a
islower	0x61 - 0x7a
isdigit	0x30 - 0x39
isspace	0x20 0x9 - 0xd
ispunct	0x21 - 0x2f 0x3a - 0x40  \
		0x5b - 0x60 0x7b - 0x7e
iscntrl	0x0 - 0x1f 0x7f
isblank	0x20
isxdigit	0x30 - 0x39 0x61 - 0x66  \
		0x41 - 0x46

ul <0x41 0x61> <0x42 0x62> <0x43 0x63> \ <0x44 0x64> <0x45 0x65> <0x46 0x66> \ <0x47 0x67> <0x48 0x68> <0x49 0x69> \ <0x4a 0x6a> <0x4b 0x6b> <0x4c 0x6c> \ <0x4d 0x6d> <0x4e 0x6e> <0x4f 0x6f> \ <0x50 0x70> <0x51 0x71> <0x52 0x72> \ <0x53 0x73> <0x54 0x74> <0x55 0x75> \ <0x56 0x76> <0x57 0x77> <0x58 0x78> \ <0x59 0x79> <0x5a 0x7a>


run-time location of the character classification tables generated by chrtbl
location for private versions of the classification tables generated by chrtbl


ctype.3v environ.5v


The error messages produced by chrtbl are intended to be self-explanatory. They indicate input errors in the command line or syntactic errors encountered within the input file.

index | Inhaltsverzeichniss | Kommentar

Created by unroff & hp-tools. © by Hans-Peter Bischof. All Rights Reserved (1997).

Last modified 21/April/97