Sei sulla pagina 1di 34

Workbook 11.

Supplements

Red Hat, Inc.

Workbook 11. Supplements


by Red Hat, Inc.
Copyright 2003-2005 Red Hat, Inc.
Revision History
Revision rha030-2.0-2003_11_12-en
2003-11-12
First Revision
Revision rha030-3.0-0-en-2005-08-17T07:23:17-0400 2005-08-17
First Revision
Red Hat, Red Hat Network, the Red Hat "Shadow Man" logo, RPM, the RPM logo, PowerTools, and all Red Hat-based trademarks and logos are
trademarks or registered trademarks of Red Hat, Inc. in the United States and other countries.
Linux is a registered trademark of Linus Torvalds.
Motif and UNIX are registered trademarks of The Open Group.
Windows is a registered trademark of Microsoft Corporation.
Intel and Pentium are a registered trademarks of Intel Corporation. Itanium and Celeron are trademarks of Intel Corporation.
SSH and Secure Shell are trademarks of SSH Communications Security, Inc.
All other trademarks and copyrights referred to are the property of their respective owners.

Published 2005-08-17

Table of Contents
1. Advanced Shell Scripting ...................................................................................................................... 4
Discussion .......................................................................................................................................... 4
Shell Scripting .......................................................................................................................... 4
Branches: if ... then ... [else ...] fi .................................................................. 4
Loops: for ... in ... do ... done .............................................................................. 8
Examples............................................................................................................................................ 9
Example 1. A Script for "Packing" Directories ........................................................................ 9
Online Exercises............................................................................................................................... 11
Specification ........................................................................................................................... 11
Deliverables ............................................................................................................................ 11
Questions.......................................................................................................................................... 11
2. Character Encoding and Internationalization.................................................................................. 16
Files .................................................................................................................................................. 16
What are Files? ....................................................................................................................... 16
What is a Byte?....................................................................................................................... 16
Data Encoding ........................................................................................................................ 17
Text Encoding .................................................................................................................................. 17
ASCII...................................................................................................................................... 17
ISO 8859 and Other Character Sets........................................................................................ 19
Unicode (UCS) ....................................................................................................................... 19
Unicode Transformation Format (UTF-8).............................................................................. 20
Text Encoding and the Open Source Community .................................................................. 21
Internationalization (i18n)................................................................................................................ 21
The LANG environment variable ............................................................................................. 22
Do I Really Have to Know All of This? ................................................................................. 23
3. The RPM Package Manager............................................................................................................... 25
Discussion ........................................................................................................................................ 25
RPM: The Red Hat Package Manager.................................................................................... 25
RPM Components .................................................................................................................. 25
Querying the RPM database................................................................................................... 26
Online Exercises............................................................................................................................... 30
Specification ........................................................................................................................... 31
Deliverables ............................................................................................................................ 31
Questions.......................................................................................................................................... 32

iii

Chapter 1. Advanced Shell Scripting


Key Concepts

Linux uses a general scripting mechanism, where executable text script files can be executed by an
interpreter specified on the initial line.

Within a bash script, any arguments provided when the script was invoked, are available as positional
parameters (i.e, the variables $1, $2, ...).

The read builtin command can be used to read input from the keyboard ("standard in").

The bash shell uses a if ... then ... [else ...] fi syntax to implement conditional
branches.

The test command is often used as the conditional command in if ... then branches.

The bash shell uses a for ... in ... do ... done syntax to implement loops.

Discussion
Shell Scripting
Earlier chapters of this workbook discussed the creation of simple shell scripts. These scripts did little
more than execute a series of commands, optionally accepting user input to define variables.
However, shell scripts are capable of much, much more of this. This chapter will add some valuable tools
to your arsenal, allowing your scripts to make basic if/then/else decisions and loop a set of actions
indefinitely.

Branches: if ... then ... [else ...] fi


Bash syntax for branches
In programming, branches allow programs to choose between one of two (or more) alternate execution
paths. The bash shell, like most programming languages, uses the word if to signify a branch. More
formally, bash uses the following syntax.
if condition
then
command(s)

...

fi

or

Chapter 1. Advanced Shell Scripting


if condition
then
command(s)

...

else
command(s)

...

fi

The commands in these stanzas are executed if the condition "succeeds".

The commands in this stanza are executed if the condition "fails".

When using this syntax, carriage returns are important (i.e., the if and then must occur on separate
lines), but indentations are not.
What does bash expect as a condition? Unlike most programming languages, bash has no internal
syntax for making comparisons (such as $A == apple, or $B > 25). Instead, bash focuses on what shells
were designed to do: run commands. Any command can be used for the condition. The bash shell will
execute the command, and examine its return value. If the command "succeeds" (returns a return value of
0), the the first stanza of commands is executed. If the command fails (returns a return value not equal to
zero), the second stanza of commands is executed (if any).
The following modification to elviss script shut serves as an example.
[elvis@station elvis]$ ls

example.sh

shut

[elvis@station elvis]$ cat shut

#!/bin/bash
# the first argument should be the name of the file to shut.
if ls $1
then
chmod 600 $1
else
echo "The file $1 does not exist."
fi
[elvis@station elvis]$ ./shut example.sh

example.sh
[elvis@station elvis]$ ./shut foo

ls: foo: No such file or directory


The file foo does not exist.

In the first case, the ls command "succeeds" ((because the file example.sh exists, the return value from
the ls command is 0). As a result, the first stanza of the if ... then ... else ... fi clause is
executed. In the second case, the file foo does not exist, so the second (else) stanza of the clause was
executed.

The test Command


The previous example is admittedly awkward. The ls command is being forced to perform a task for
which it wasnt designed: testing for the existence of a file. While it can serve this function, the messages

rha030-3.0-0-en-2005-08-17T07:23:17-0400

Copyright (c) 2003-2005 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other
use is a violation of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise
duplicated whether in electronic or print format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or
otherwise improperly distributed please email training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.

Chapter 1. Advanced Shell Scripting


it prints are distracting. If the specified file exists, its name is printed to the screen. If the file does not
exist, then an error message is printed. If only there was a command which would check for the existence
of a file, and return the appropriate return code, without emitting any other messages...
There is. The command is called test. The test command was designed for exactly this purpose: to be as
the conditional command in bash if ... then statements. More specifically, the test command is
designed to compare strings, integers, and file attributes. It never generates output, but instead
communicates using its return value. The test command returns 0 if the expression it evaluates is true,
and a non zero value otherwise.
Most of tests arguments look similar to command line switches. For example, -e tests for the existence
of a file. The above script could be improved by replacing ls with test -e.
[elvis@station elvis]$ cat shut

#!/bin/bash
# the first argument should be the name of the file to shut.
if test -e $1
then
chmod 600 $1
else
echo "The file $1 does not exist."
fi
[elvis@station elvis]$ ./shut example.sh
[elvis@station elvis]$ ./shut foo

The file foo does not exist.

Notice that the test command tests for the existence of the file, but does not generate any messages to
distract the user.
The following table lists some of the more commonly switches for testing file attributes.
Table 1-1. test Expressions for Examining File Attributes
Expression

Condition

-d FILE

FILE exists, and is a directory.

-e FILE

FILE exists

-f FILE

FILE exists, and is a regular file.

-r FILE

FILE exists, and is readable.

-w FILE

FILE exists, and is writable.

-x FILE

FILE exists, and is executable.

FILE1 -nt FILE2

FILE1 is newer than FILE2.

Table 1-2. test Expressions for Comparing Strings


Expression

Condition

[-n] STRING

the length of STRING is greater than zero.

-z STRING

the length of STRING is zero.

STRING1 = STRING2

STRING1 and STRING2 are equal.

rha030-3.0-0-en-2005-08-17T07:23:17-0400
Copyright (c) 2003-2005 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other use is a violation
of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise duplicated whether in electronic or print
format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or otherwise improperly distributed please email
training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.

Chapter 1. Advanced Shell Scripting


Expression

Condition

STRING1 != STRING2

STRING1 and STRING2 are not equal.

Lastly, the following table lists expressions that allow the test command to use compound logic.
Table 1-3. Logic Expressions for the test Command
Expression

Condition

EXPRESSION1 -a EXPRESSION2

Both EXPRESSION1 and EXPRESSION2 are true.

EXPRESSION1 -o EXPRESSION2

Either EXPRESSION1 or EXPRESSION2 is true.

! EXPRESSION

EXPRESSION is false.

These tables are meant to provide the student with a usable working set of expressions. For a complete
listing, consult the test(1) man page.

Alternate expression for test: [ expression ]


The test command is so commonly used in bash scripting, that a shorter syntax has been developed. The
following two expressions are equivalent.
test expression
[ expression ]

As an example, elviss shut script could be rewritten as the following.


[elvis@station elvis]$ cat shut

#!/bin/bash
# the first argument should be the name of the file to shut.
if [ -e $1 ]
then
chmod 600 $1
else
echo "The file $1 does not exist."
fi

Notice that the test command has been replaced with the alternate [ ... ] syntax.

When using the alternate syntax, care must be taken to include a space after the opening bracket, and
before the closing bracket. 1 For example, the following two constructions of the test command are
wrong.
[-e foo.sh ]
[ -e foo.sh]

The next construction is really wrong.


[-e foo.sh]

rha030-3.0-0-en-2005-08-17T07:23:17-0400

Copyright (c) 2003-2005 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other
use is a violation of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise
duplicated whether in electronic or print format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or
otherwise improperly distributed please email training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.

Chapter 1. Advanced Shell Scripting

Loops: for ... in ... do ... done


Bash syntax for loops
Loops are perhaps the most useful programming structure for automating mundane tasks. Loops allow a
series of commands to be repeated, usually with slight variations in each iteration. Usually, these
variations are implemented using a variable referred to as an iterator. For each iteration of the loop, the
variable takes own a different value. For example, elvis could use the following script to affirm his
affection for his collection of household pets.
[elvis@station elvis]$ cat nice

#!/bin/bash
for PET in kitty doggy gerbil newt
do
echo "nice $PET."
done
[elvis@station elvis]$ ./nice

nice
nice
nice
nice

kitty.
doggy.
gerbil.
newt.

In this script, the shell variable PET is used as the iterator. With each iteration of the loop, the variable
takes on a different value.
More formally, for ... in ... do ... done loops in bash use the following syntax.
for iterator in list
do
command(s)
...

done

For each repetition of the loop, the variable iterator will evaluate to the individual words listed in the
expression list.
For a more practical example, we revisit elviss script shut. The user elvis would now like to modify his
script, so that he can specify multiple files on the command line. To implement this change, he
essentially takes his previous script, and wraps it inside a for ... in .. do ... done loop. Rather
than using the first positional parameter ($1) directly, elvis uses an iterator to iterate through all
arguments supplied on the command line.
[elvis@station elvis]$ cat shut

#!/bin/bash
# the first argument should be the name of the file to shut.
for FILE in $*
do
if [ -e $FILE ]
then
chmod 600 $FILE
else

rha030-3.0-0-en-2005-08-17T07:23:17-0400

Copyright (c) 2003-2005 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other use is
a violation of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise duplicated whether
in electronic or print format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or otherwise improperly distributed
please email training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.

Chapter 1. Advanced Shell Scripting


echo "The file $FILE does not exist."
fi
done

In the following, elvis uses the script to modify the permissions on the files example.sh and nice.
[elvis@station elvis]$ ls -l

total 12
-rwxr-xr-x
-rwxr-xr-x
-rwxrwxr-x

1 elvis
1 elvis
1 elvis

elvis
elvis
elvis

212 Sep
77 Sep
188 Sep

3 10:56 example.sh
4 12:16 nice
4 12:31 shut

[elvis@station elvis]$ ./shut example.sh biz nice baz

The file biz does not exist.


The file baz does not exist.
[elvis@station elvis]$ ls -l

total 12
-rw-------rw-------rwxrwxr-x

1 elvis
1 elvis
1 elvis

elvis
elvis
elvis

212 Sep
77 Sep
188 Sep

3 10:56 example.sh
4 12:16 nice
4 12:31 shut

Notice the use of the $* variable to generate the list. The following table suggests other commonly used
tricks of the trade.
Table 1-4. Common Techniques for Generating Iteration Lists
When you use...

... $i iterates through...

for i in $*

the scripts command line arguments

for i in /etc/*.conf

the files matched by the glob /etc/*.conf

for i in $(command )

the words returned by the command command .

Examples
Example 1. A Script for "Packing" Directories
The user elvis finds that he is often "tarring up" (archiving) directories he is not actively using. He
decides to create a script called pack which will help him archive directories more quickly.
The pack script expects one or more directories to be listed as arguments. For each directory, the script
will create an archive named after the directory, with the extension .tgz appended. If, and only if, the
creation of the archive is successful, the script will then remove the original directory.
As elvis thinks through the directories that users could specify, he realizes that the directories . and ..
could cause problems (why?), so he adds an exclusion for them.
[elvis@station elvis]$ cat pack

#!/bin/bash

rha030-3.0-0-en-2005-08-17T07:23:17-0400

Copyright (c) 2003-2005 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other use is
a violation of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise duplicated whether
in electronic or print format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or otherwise improperly distributed
please email training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.

Chapter 1. Advanced Shell Scripting


for DIR in $*; do

if [ -d $DIR ]

then
if [ "$DIR" == "." -o "$DIR" == ".." ]
then
echo "skipping directory $DIR"
else
tar cvzf $DIR.tgz $DIR && rm -fr $DIR
fi
else
echo "skipping non directory $DIR"
fi

done

The script loops through all of the provided command line arguments.

The script confirms that the argument exists, and that it refers to a directory.

The script here checks that the user has not specified the directory . or ... In practice, there are still
some directory names that can cause problems. Can you think of any?

Finally, here is the line that does the hard work. Notice that the original directory is removed only if
the tar command succeeds.

He tries out his script on two test directories.


[elvis@station elvis]$ mkdir test{1,2}
[elvis@station elvis]$ touch test{1,2}/{one,two,three,four}
[elvis@station elvis]$ ls -R

.:
pack

test1

test2

./test1:
four one

three

two

./test2:
four one

three

two

[elvis@station elvis]$ ./pack test1 test2

test1/
test1/one
test1/two
test1/three
test1/four
test2/
test2/one
test2/two
test2/three
test2/four
[elvis@station elvis]$ ls -R

.:
pack

test1.tgz

rha030-3.0-0-en-2005-08-17T07:23:17-0400

test2.tgz

10

Copyright (c) 2003-2005 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other
use is a violation of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise
duplicated whether in electronic or print format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or
otherwise improperly distributed please email training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.

Chapter 1. Advanced Shell Scripting


The test directories have been "pack"ed.

Online Exercises
Lab Exercise
Objective: Use shell scripting to automate the rotation of images.
Estimated Time: 30 mins.

Specification
Create a script called rotate_cw, which can be used to rotate images 90 degrees. In order to perform
the rotation, you should use the convert command (examine the convert(1) man page, paying particular
attention to the rotate option). The following provides an example of using the convert command to
rotate an image.
[elvis@station elvis]$ cp /usr/share/pixmaps/redhat-main-menu.png .
[elvis@station elvis]$ convert -rotate 90 redhat-main-menu.png /tmp/redhat-mainmenu.png

The file /tmp/redhat-main-menu.png is the same image as redhat-main-menu.png, rotated 90


degrees.
The script should expect as arguments multiple filenames of the images to be rotated. You may assume
that the filenames will not contain directory components (i.e., they will refer to files in the current
directory.) The script should generate a new file of the rotate image, and if the new image is successfully
generated, replace the original image with the rotated image (giving the appearance of rotating the
images "in place").
Leave the script in your home directory, and ensure that it has executable permissions.

Deliverables

1. An executable bash script called ~/rotate_cw, which will rotate images in the local directory whose
filenames (without directory components) are passed as arguments.

11

rha030-3.0-0-en-2005-08-17T07:23:17-0400
Copyright (c) 2003-2005 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other use is a violation
of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise duplicated whether in electronic or print
format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or otherwise improperly distributed please email
training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.

Chapter 1. Advanced Shell Scripting

Questions
1. In order to execute a script, what permission(s) must a user have?
( ) a. read permission
( ) b. write permission
( ) c. execute permission
( ) d. A and C
( ) e. All of the above
2. In order to write an executable bash script, what must the first word of the first line look like?
( ) a. ##bash
( ) b. #!bash
( ) c. !!/bin/bash
( ) d. crunch-bang /bin/bash
( ) e. None of the above
3. When using the Linux scripting mechanism, what may be used as an interpreter?
( ) a. Any executable file within the /usr/bin/ directory
( ) b. Any executable file
( ) c. Only executable files which are listed in the file /etc/interpreters
( ) d. Only executables that ignore lines beginning with #.
( ) e. Only files that meet conditions C and D
4. Which of the following are mechanisms for passing information into shell scripts?
( ) a. Invoking the script with command line arguments.
( ) b. Configuring environment variables before invoking the script.
( ) c. Designing the script to read input from the keyboard (standard in).
( ) d. A and B
( ) e. All of the above

[elvis@station elvis]$ cat script

#!/bin/bash
for i in $*
do
if [ -r $i -a -f $i ]

rha030-3.0-0-en-2005-08-17T07:23:17-0400

12

Copyright (c) 2003-2005 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other use is a violation of U.S.
and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise duplicated whether in electronic or print format without
prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or otherwise improperly distributed please email training@redhat.com or phone tollfree (USA) +1 866 626 2994 or +1 (919) 754 3700.

Chapter 1. Advanced Shell Scripting


then
gzip $i
else
echo "cannot compress $i"
done
fi
[elvis@station elvis]$ ./script rotate_cw

./script: line 10: syntax error near unexpected token done


./script: line 10:
done

5. Which of the following lines could replace the line labeled 1 above, with no effect on script execution?
( ) a. if test -r $i -a -f $i
( ) b. test -r $i -a -f $i
( ) c. if [ -r $i -o -f $i ]
( ) d. if [ -e $i ]
( ) e. None of the above
6. What syntax error exists in the script?
( ) a. The words for and do must occur on the same line.
( ) b. There must be no spaces between [ and -r on the line starting if.
( ) c. The gzip command must be specified using an absolute reference.
( ) d. The last line contains the misspelled word fi.
( ) e. The last two lines (containing done and fi) need to be transposed.
7. What does the variable i iterate through (assuming the syntax error mentioned above is fixed).
( ) a. All (non-hidden) files in the local directory
( ) b. All files in the local directory
( ) c. All files which were previously defined in the environment variable named *.
( ) d. All of the command line arguments provided when the script was invoked.
( ) e. None of the above
The following text is found in the file /etc/bashrc.
if [ "id -gn" = "id -un" -a id -u -gt 99 ]; then
umask 002
else
umask 022
fi

13

rha030-3.0-0-en-2005-08-17T07:23:17-0400
Copyright (c) 2003-2005 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other use is a
violation of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise duplicated whether in
electronic or print format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or otherwise improperly distributed
please email training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.

Chapter 1. Advanced Shell Scripting


8. Which of the following best describes the execution of this text?
( ) a. If the current user is a member of more than one group, then set the current shells umask to 002,
otherwise set it to 022.
( ) b. If the current users group name is the same as the users username, or the user has a userid greater than
99, set the shells umask to 002, otherwise set it to 022.
( ) c. If the user is a member of more than 99 groups, set the shells umask to 002, otherwise set it to 022.
( ) d. If the current user is the root user, set current shells umask to 002, otherwise set it to 022.
( ) e. If the current users group name is the same as the users username, and the user has a userid greater than
99, set the shells umask to 002, otherwise set it to 022.
The following text is found in the file /etc/profile.
for i in /etc/profile.d/*.sh ; do
if [ -r "$i" ]; then
. $i
fi
done

9. Which of the following best describes the execution of this text?


( ) a. For every file in the /etc/profile.d directory that ends .sh, if it is readable, source it.
( ) b. For every file in the /etc/profile.d directory that ends .sh, if it is readable, execute it.
( ) c. For every file in the /etc/profile.d directory that ends .sh, if it is executable, execute it.
( ) d. For every file in the /etc/profile.d directory that ends .sh, if it is not a directory, source it.
( ) e. None of the above.
10. What innocent action could a system administrator make which cause an error when this section of script is
executed?
( ) a. The administrator could place a file that does not end with .sh in the /etc/profile.d directory.
( ) b. The administrator could place a file that does not have executable permissions in the /etc/profile.d
directory.
( ) c. The administrator could place a file that does not have read permissions in the /etc/profile.d directory.
( ) d. The administrator could place a file that does not have write permissions in the /etc/profile.d
directory.
( ) e. The administrator could place a file titled source me.sh in the /etc/profile.d directory.

14

rha030-3.0-0-en-2005-08-17T07:23:17-0400
Copyright (c) 2003-2005 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other
use is a violation of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise
duplicated whether in electronic or print format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or
otherwise improperly distributed please email training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.

Chapter 1. Advanced Shell Scripting

Notes
1. In order to explore why this is the case, note that there is actually a file called /usr/bin/[. What
does this imply?

15

rha030-3.0-0-en-2005-08-17T07:23:17-0400
Copyright (c) 2003-2005 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other
use is a violation of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise
duplicated whether in electronic or print format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or
otherwise improperly distributed please email training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.

Chapter 2. Character Encoding and


Internationalization
Key Concepts

When storing text, computers transform characters into a numeric representation. This process is
referred to as encoding the text.

In order to accommodate the demands of a variety of languages, several different encoding techniques
have been developed. These techniques are represented by a variety of character sets.

The most sophisticated encoding technique is known as the Universal Character Set (UCS), or
Unicode.

The default encoding technique in Red Hat Enterprise Linux is referred to as UTF-8, which allows the
flexibility of Unicode but retains ASCII compatibility.

The LANG environment variable is used to specify a users preferred language and character encoding.

Files
What are Files?
Linux, like most operating systems, stores information that needs to be preserved outside of the context
of any individual process in files. (In this context, and for most of this Workbook, the term file is meant in
the sense of regular file). Linux (and Unix) files store information using a simple model: information is
stored as a single, ordered array of bytes, starting from at first and ending at the last. The number of bytes
in the array is the length of the file. 1
What type of information is stored in files? Here are but a few examples.

The characters that compose the book report you want to store until you can come back and finish it
tomorrow are stored in a file called (say) ~/bookreport.txt.

The individual colors that make up the picture you took with your digital camera are stored in the file
(say) /mnt/camera/dcim/100nikon/dscn1203.jpg.

The characters which define the usernames of users on a Linux system (and their home directories,
etc.) are stored in the file /etc/passwd.

The specific instructions which tell an x86 compatible CPU how to use the Linux kernel to list the files
in a given directory are stored in the file /bin/ls.

16

Chapter 2. Character Encoding and Internationalization

What is a Byte?
At the lowest level, computers can only answer one type of question: is it on or off? What is it? When
dealing with disks, it is a magnetic domain which is oriented up or down. When dealing with memory
chips, it is a transistor which either has current or doesnt. Both of these are too difficult to mentally
picture, so we will speak in terms of light switches that can either be on or off. To your computer, the
contents of your file is reduced to what can be thought of as an array of (perhaps millions of) light
switches. Each light switch can be used to store one bit of information (is it on, or is it off).
Using a single light switch, you cannot store much information. To be more useful, an early convention
was established: group the light switches into bunches of 8. Each series of 8 light switches (or magnetic
domains, or transistors, ...) is a byte. More formally, a byte consists of 8 bits. Each permutation of ons and
offs for a group of 8 switches can be assigned a number. All switches off, well assign 0. Only the first
switch on, well assign 1; only the second switch on, 2; the first and second switch on, 3; and so on. How
many numbers will it take to label each possible permutation for 8 light switches? A mathematician will
quickly tell you the answer is 2^8, or 256. After grouping the light switches into groups of eight, your
computer views the contents of your file as an array of bytes, each with a value ranging from 0 to 255.

Data Encoding
In order to store information as a series of bytes, the information must be somehow converted into a
series of values ranging from 0 to 255. Converting information into such a format is called data encoding.
Whats the best way to do it? There is no single best way that works for all situations. Developing the
right technique to encode data, which balances the goals of simplicity, efficiency (in terms of CPU
performance and on disk storage), resilience to corruption, etc., is much of the art of computer science.
As one example, consider the picture taken by a digital camera mentioned above. One encoding
technique would divide the picture into pixels (dots), and for each pixel, record three bytes of
information: the pixels "redness", "greenness", and "blueness", each on a scale of 0 to 255. The first
three bytes of the file would record the information for the first pixel, the second three bytes the second
pixel, and so on. A picture format known as "PNM" does just this (plus some header information, such as
how many pixels are in a row). Many other encoding techniques for images exist, some just as simple,
many much more complex.

Text Encoding
Perhaps the most common type of data which computers are asked to store is text. As computers have
developed, a variety of techniques for encoding text have been developed, from the simple in concept
(which could encode only the Latin alphabet used in Western languages) to complicated but powerful
techniques that attempt to encode all forms of human written communication, even attempting to include
historical languages such as Egyptian hieroglyphics. The following sections discuss many of the
encoding techniques commonly used in Red Hat Enterprise Linux.

rha030-3.0-0-en-2005-08-17T07:23:17-0400

17

Copyright (c) 2003-2005 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other
use is a violation of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise
duplicated whether in electronic or print format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or
otherwise improperly distributed please email training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.

Chapter 2. Character Encoding and Internationalization

ASCII
One of the oldest, and still most commonly used techniques for encoding text is called ASCII encoding.
ASCII encoding simply takes the 26 lowercase and 26 uppercase letters which compose the Latin
alphabet, 10 digits, and common English punctuation characters (those found on a keyboard), and maps
them to an integer between 0 and 255, as outlined in the following table.
Table 2-1. ASCII Encoding of Printable Characters
Integer Range

Character

33-47

Punctuation: !"#$%&;*(*+,-./

48-57

The digits 0 through 9

58-64

Punctuation: :;<=?>@

65-90

Capital letters A through Z

91-96

Punctuation: [\]^_

97-122

Lowercase letters a through z

123-126

Punctuation: {|}~

What about the integers 0 - 32? These integers are mapped to special keys on early teletypes, many of
which have to do with manipulating the spacing on the page being typed on. The following characters are
commonly called "whitespace" characters.
Table 2-2. ASCII Encoding of Whitespace Characters
Integer

Character

Common Name

Common
Representation

BS

Backspace

\b

HT

Tab

\t

10

LF

Line Feed

\n

12

FF

Form Feed

\f

13

CR

Carriage Return

\r

32

SPACE

Space Bar

127

DEL

Delete

Others of the first 32 integers are mapped to keys which did not directly influence the "printed page", but
instead sent "out of band" control signals between two teletypes. Many of these control signals have
special interpretations within Linux (and Unix).
Table 2-3. ASCII Encoding of Control Signals
Integer

Character

Common Name

EOT

End of Transmission

BEL

Audible Terminal Bell

27

ESC

Escape

rha030-3.0-0-en-2005-08-17T07:23:17-0400

Common
Representation
\a

18

Copyright (c) 2003-2005 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy.
Any other use is a violation of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or
otherwise duplicated whether in electronic or print format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being
used, copied, or otherwise improperly distributed please email training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.

Chapter 2. Character Encoding and Internationalization


Generating Control Characters from the Keyboard: Control and whitespace characters can be
generated from the terminal keyboard directly using the CTRL key. For example, an audible bell can
be generated using CTRL-G, while a backspace can be sent using CTRL-H, and we have already
mentioned that CTRL-D is used to generate an "End of File" (or "End of Transmission"). Can you
determine how the whitespace and control characters are mapped to the various CTRL key
combinations? For example, what CTRL key combination generates a tab? What does CTRL-J
generate? As you explore various control sequences, remember that the reset command will restore
your terminal to sane behavior, if necessary.

What about the values 128-255? ASCII encoding does not use them. The ASCII standard only defines
the first 128 values of a byte, leaving the remaining 128 values to be defined by other schemes.

ISO 8859 and Other Character Sets


Other standard encoding schemes have been developed, which map various glyphs (such as the symbol
for the Yen and Euro), diacritical marks found in many European languages, and non Latin alphabets to
the latter 128 values of a byte which the ASCII standard leaves undefined. The following table lists a few
of these standard encoding schemes, which are referred to as character sets. The following table lists
some character sets which are supported in Linux, including their informal name, formal name, and a
brief description.
Table 2-4. Some ISO 8859 Character Sets supported in Linux
Informal Name

Formal Name

Description

Latin-1

ISO 8859-1

West European languages

Latin-2

ISO 8859-2

Central and East European


languages

Arabic

ISO 8859-6

Latin/Arabic

Greek

ISO 8859-7

Latin/Greek

Latin-9

ISO 8859-15

West European languages

All of these character encoding schemes use a common technique. They preserve the first 128 values of a
byte to encode traditional ASCII, and use the remaining 128 values to encode glyphs unique to the
particular encoding. For example, ISO 8859-1 (Latin-1) uses the value 196 to encode a Latin capital A
with an umlaut (), while ISO-8859-7 (Greek) uses the value 196 to encode the Greek capital letter
Delta (), but both use the value 101 to encode a Latin lowercase e.
Notice a couple of implications about ISO 8859 encoding.
1. Each of the alternate encodings map a single glyph to a single byte, so that the number of letters
encoded in a file equals the number of bytes which are required to encode them.
2. Choosing a particular character set extends the range of characters that can be encoded, but you
cannot encode characters from different character sets simultaneously. For example, you could not
encode both a Latin capital A with a grave and a Greek letter Delta simultaneously.

rha030-3.0-0-en-2005-08-17T07:23:17-0400

19

Copyright (c) 2003-2005 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other
use is a violation of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise
duplicated whether in electronic or print format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or
otherwise improperly distributed please email training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.

Chapter 2. Character Encoding and Internationalization

Unicode (UCS)
In order to overcome the limitations of ASCII and ISO 8859 based encoding techniques, a Universal
Character Set has been developed, commonly referred to as UCS, or Unicode. The Unicode standard
acknowledges the fact that one byte of information, with its ability to encode 256 different values, is
simply not enough to encode the variety of glyphs found in human communication. Instead, the Unicode
standard uses 4 bytes to encode each character. Think of 4 bytes as 32 light switches. If we were to again
label each permutation of on and off for 32 switches with integers, the mathematician would tell you that
you would need 4,294,967,296 (over 4 billion) integers. Thus, Unicode can encode over 4 billion glyphs
(nearly enough for every person on the earth to have their own unique glyph; the user prince would
approve).
What are some of the features and drawbacks of Unicode encoding?
Scale
The Unicode standard will easily be able to encode the variety of glyphs used in human
communication for a long time to come.
Simplicity
The Unicode standard does have the simplicity of a sledgehammer. The number of bytes required to
encode a set of characters is simply the number of characters multiplied by 4.
Waste
While the Unicode standard is simple in concept, it is also very wasteful. The ability to encode 4
billion glyphs is nice, but in reality, much of the communication that occurs today uses less than a
few hundred glyphs. Of the 32 bits (light switches) used to encode each character, the first 20 or so
would always be "off".
ASCII Non-compatibility
For better or for worse, a huge amount of existing data is already ASCII encoded. In order to convert
fully to Unicode, that data, and the programs that expect to read it, would have to be converted.
The Unicode standard is an effective standard in principle, but in many respects it is ahead of its time,
and perhaps forever will be. In practice, other techniques have been developed which attempt to preserve
the scale and versatility of Unicode, while minimizing waste and maintaining ASCII compatibility. What
must be sacrificed? Simplicity.

Unicode Transformation Format (UTF-8)


UTF-8 encoding attempts to balance the flexibility of Unicode, and the practicality and pervasiveness of
ASCII, with a significant sacrifice: variable length encoding. With variable length encoding, each
character is no longer encoded using simply 1 byte, or simply 4 bytes. Instead, the traditional 127 ASCII
characters are encoded using 1 byte (and, in fact, are identical to the existing ASCII standard). The next
most commonly used 2000 or so characters are encoded using two bytes. The next 63000 or so
characters are encoded using three bytes, and the more esoteric characters may be encoded using from
four to six bytes. Details of the encoding technique can be found in the utf-8(7) man page. With full

rha030-3.0-0-en-2005-08-17T07:23:17-0400

20

Copyright (c) 2003-2005 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other
use is a violation of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise
duplicated whether in electronic or print format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or
otherwise improperly distributed please email training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.

Chapter 2. Character Encoding and Internationalization


backwards compatibility to ASCII, and the same functional range of pure Unicode, what is there to lose?
ISO 8859 (and similar) character set compatibility.
UTF-8 attempts to bridge the gap between ASCII, which can be viewed as the primitive days of text
encoding, and Unicode, which can be viewed as the utopia to aspire toward. Unfortunately, the
"intermediate" methods, the ISO 8859 and other alternate character sets, are as incompatible with UTF-8
as they are with each other.
Additionally, the simple relationship between the number of characters that are being stored and the
amount of space (measured in bytes) it takes to store them is lost. How much space will it take to store
879 printed characters? If they are pure ASCII, the answer is 879. If they are Greek or Cyrillic, the
answer is closer to twice that much.

Text Encoding and the Open Source Community


In the traditional development of operating systems, decisions such as what type of character encoding to
use can be made centrally, with the possible disadvantage that the decision is wrong for some community
of the operating systems users. In contrast, in the open source development model, these types of
decisions are generally made by individuals and small groups of contributers. The advantages of the open
source model are a flexible system which can accommodate a wide variety of encoding formats. The
disadvantage is that users must often be educated and made aware of the issues involved with character
encoding, because some parts of the assembled system use one technique while others parts use another.
The library of man pages is an excellent example.
When contributors to the open source community are faced with decisions involving potentially
incompatible formats, they generally balance local needs with an appreciation for adhering to widely
accepted standards where appropriate. The UTF-8 encoding format seems to be evolving as an accepted
standard, and in recent releases has become the default for Red Hat Enterprise Linux.
The following paragraph, extracted from the utf-8(7) man page, says it well:
It can be hoped that in the foreseeable future, UTF-8 will replace
ASCII and ISO 8859 at all levels as the common character encoding on
POSIX systems, leading to a significantly richer environment for handling plain text.

Internationalization (i18n)
As this Workbook continues to discuss many tools and techniques for searching, sorting, and
manipulating text, the topic of internationalization cannot be avoided. In the open source community,
internationalization is often abbreviated as i18n, a shorthand for saying "i-n with 18 letters in between".
Applications which have been internationalized take into account different languages. In the Linux (and
Unix) community, most applications look for the LANG environment variable to determine which
language to use.
At the simplest, this implies that programs will emit messages in the users native language.
[elvis@station elvis]$ echo $LANG

rha030-3.0-0-en-2005-08-17T07:23:17-0400

21

Copyright (c) 2003-2005 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other use is
a violation of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise duplicated whether
in electronic or print format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or otherwise improperly distributed
please email training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.

Chapter 2. Character Encoding and Internationalization


en_US.UTF-8
[elvis@station elvis]$ chmod 666 /etc/passwd

chmod: changing permissions of /etc/passwd: Operation not permitted


[elvis@station elvis]$ export LANG=de_DE.utf8
[elvis@station elvis]$ chmod 666 /etc/passwd

chmod: Beim Setzen der Zugriffsrechte fr /etc/passwd: Die Operation ist nicht erlaubt

More subtly, the choice of a particular language has implications for sorting orders, numeric formats, text
encoding, and other issues.

The LANG environment variable


The LANG environment variable is used to define a users language, and possibly the default encoding
technique as well. The variable is expected to be set to a string using the following syntax:
LL_CC .enc

The variable context consists of the following three components.


Table 2-5. Components of LANG environment variable
Component

Role

LL

Two letter ISO 639 Language Code

CC

(Optional) Two letter ISO 3166 Country Code

enc

(Optional) Character Encoding Code Set

The locale command can be used to examine your current configuration (as can echo $LANG), while
locale -a will list all settings currently supported by your system. The extent of the support for any given
language will vary.
The following tables list some selected language codes, country codes, and code set specifications.
Table 2-6. Selected ISO 639 Language Codes
Code

Language

de

German

el

Greek

en

English

es

Spanish

fr

French

ja

Japanese

zh

Chinese

Table 2-7. Selected ISO 3166 Country Codes


Code

Country

22

rha030-3.0-0-en-2005-08-17T07:23:17-0400
Copyright (c) 2003-2005 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other use is a violation
of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise duplicated whether in electronic or print
format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or otherwise improperly distributed please email
training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.

Chapter 2. Character Encoding and Internationalization


Code

Country

CA

Canada

CN

China

DE

Germany

ES

Spain

FR

France

GB

Britain (UK)

GR

Greece

JP

Japan

NG

Nigeria

US

United States

Table 2-8. Selected Character Encoding Code Sets


Code

Country

utf8

UTF-8

iso88591

ISO 8859-1 (Latin 1)

iso885915

ISO 8859-15 (Latin 10)

iso88596

ISO 8859-6 (Arabic)

iso88592

ISO 8859-2 (Latin 2)

See the gettext info pages (info gettext, or pinfo gettext) for a complete listing.

Do I Really Have to Know All of This?


We have tried to introduce the major concepts and components which affect how text is encoded and
stored within Linux. After reading about character sets and language codes, one might be led to wonder,
do I really need to know about all of this? If you are using simple text, restricted to the Latin alphabet of
26 characters, the answer is no. If you are asking the question 10 years from now, the answer will
hopefully be no. If you do not fit into one of these two categories, however, you should have at least an
acquaintance with the concept of internationalization, character sets, and the role of the LANG
environment variable.
Hopefully, as the open source community converges on a single encoding technique (currently UTF-8
seems the most likely), most of these issues will disappear. Until then, these are some key points to
remember.
1. An ASCII file is already valid in one of the ISO 8559 character sets.
2. An ASCII file is already valid in UTF-8.
3. A file encoded in one of the ISO 8559 character sets is not valid in UTF-8, and must be converted.
4. Using UTF-8, There is a one to one mapping between characters and bytes if and only if all of the
characters are pure ASCII characters.

rha030-3.0-0-en-2005-08-17T07:23:17-0400

23

Copyright (c) 2003-2005 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other
use is a violation of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise
duplicated whether in electronic or print format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or
otherwise improperly distributed please email training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.

Chapter 2. Character Encoding and Internationalization


If you are interested in more information, several man pages provide a more detailed introduction to the
concepts outlined above. Start with charsets(7), and then follow with ascii(7), iso_8859_1(7),
unicode(7) and utf-8(7). Additionally, the iconv command can be used to convert text files from one
form of encoding to another.

Notes
1. While this may seem an obvious way to do things, some operating systems take more elaborate
approaches. The Macintosh operating system, for example, stores file using two arrays of
information, a data fork and a resource fork.

24

rha030-3.0-0-en-2005-08-17T07:23:17-0400
Copyright (c) 2003-2005 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other
use is a violation of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise
duplicated whether in electronic or print format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or
otherwise improperly distributed please email training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.

Chapter 3. The RPM Package Manager


Key Concepts

The rpm command is used to add or remove software from your system.

You must have root privileges to add and remove software with rpm.

Anyone can query installed packages, or package files.

Discussion
RPM: The Red Hat Package Manager
The Red Hat Package Manager is probably the element that most defines the Red Hat Enterprise Linux
distribution. The package manager allows developers a way to build and distribute software,
administrators a way to install and maintain software, and all users a way to query for information about
and verify the integrity of installed software.

The Need for Package Management


Before package management became a mainstream concept, software was primarily distributed as
"tarballs" (i.e., compressed tar archives), and often only in source form. Anyone using an open source
product would need to unpack the tarball, compile the executables (hoping their system had the right
libraries to support it), and install the product on their system, often with components in multiple
directories (such as /etc, /var/lib, /usr/bin/, and /usr/lib).
Usually, scripts which are distributed with open source software make each of the above steps easier, but
even they do not address two fundamental problems. The first is the dependency problem. Often, open
source products reuse code which is distributed in the form of a library. If the appropriate library is not
already installed on the system, the application which depends on it will be useless. The second
fundamental problem is maintenance. What happens when, six months down the road, a new version of
the product is released, or someone decides they do not want a particular product anymore? The
individual files of the product would need to be tracked down and removed or replaced.
RPM is designed to help resolve both of these fundamental problems.

RPM Components
When people speak of RPM, they are speaking of three components collectively: the RPM database, the
rpm executable, and package files.

25

Chapter 3. The RPM Package Manager

The RPM database


The RPM database is the heart of the product, and provides the answer for both problems stated above.
Whenever software is installed by RPM, database entries representing every file installed are created.
Because of the database, the files can be easily removed from the system at a later time.
Additionally, the database maintains a list of dependencies required and provided by various packages.
When installing an application which requires a library, for example, the library can be listed as a specific
dependency, and the database can unambiguously vouch for the presence (or absence) of the library.
The RPM database resides in the directory /var/lib/rpm, but other than knowing that it exists, there is
little need for standard users (or even administrator) to access the directory directly.

The rpm Executable


Users interact with the RPM database through the front end rpm command. The rpm executable is used
by administrators to install, upgrade, or remove software packages, and by any user to answer questions
about installed packages or verify their integrity. We will discuss RPM queries in detail below.

Package Files
Package files are the means by which software is distributed. Packages file are generally named using the
following convention.
name-version-release.arch .rpm

For example, Red Hats first release of the package file for version 4.0.7 of the open source application
zsh compiled for the Intel x86 (and compatible) architecture would conventionally be named
zsh-4.0.7-1.i386.rpm.
Package files are essentially tar archives (though they more closely resemble less familiar cpio archives)
combined with header information which names, versions, and states dependencies for the package.
When people refer to the Red Hat distribution, they are generally referring to the collecting of RPM
package files which compose the software installed on a machine.

Querying the RPM database


When invoked with -q as its first command line switch, the rpm command will perform queries against
the RPM database (or sometimes RPM package files directly).

Formulating RPM queries


When first introduced to rpm, the syntax associated with queries can be a little overwhelming. It helps to
think of every query as being composed of two questions: (1) What packages am I querying? (2) What
question am I asking? Each of the rpm commands query related switches will fall into one of the two
categories.

26

rha030-3.0-0-en-2005-08-17T07:23:17-0400
Copyright (c) 2003-2005 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other use is a violation
of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise duplicated whether in electronic or print
format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or otherwise improperly distributed please email
training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.

Chapter 3. The RPM Package Manager


Table 3-1. RPM Query Options for Specifying Packages
Option

Specification

-a

all installed packages

package-name

the package name

-f filename

the package that owns the file filename

-p

query the package file package-file-name directly. This option is


fundamentally different, as all other options query the RPM database of
installed packages.

package-file-name

Table 3-2. RPM Query Options for Specifying Information


Option

Specification

(default)

package name and version

-i

package information header

-l

list of files owned by the package

--queryformat str

list information specified in format string str

By choosing one option from the first table, and zero or more options from the second table, users can
formulate specific questions for the RPM database.

Query Examples
General Queries
For example, the -a command line switch performs a query against all installed packages. If no other
question is asked, by default rpm returns the package name. Thus rpm -qa will return a list of all
installed packages.
[prince@station prince]$ rpm -qa

basesystem-8.0-2
expat-1.95.5-2
libacl-2.2.3-1
popt-1.8-0.69
rootfiles-7.2-6
cpio-2.5-3
gzip-1.3.3-9
...

If a package name is specified, the rpm will query only that package. What information is returned?
Again, by default, the package name.
[prince@station prince]$ rpm -q bash

bash-2.05b-20.1

rha030-3.0-0-en-2005-08-17T07:23:17-0400

27

Copyright (c) 2003-2005 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other use is
a violation of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise duplicated whether
in electronic or print format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or otherwise improperly distributed
please email training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.

Chapter 3. The RPM Package Manager


While perhaps not the most informative of queries, prince at least receives confirmation that the package
is installed, and a version number. Usually, when querying a package name, more information is
requested. For example, adding -i will generate an information header.
[prince@station prince]$ rpm -qi bash

Name
: bash
Relocations: /usr
Version
: 2.05b
Vendor: Red Hat, Inc.
Release
: 20.1
Build Date: Wed 09 Apr 2003 09:02:36 AM EDT
Install Date: Tue 08 Jul 2003 09:29:33 AM EDT
Build Host: stripples.devel.redhat.com
Group
: System Environment/Shells
Source RPM: bash-2.05b-20.1.src.rpm
Size
: 1619204
License: GPL
Signature
: DSA/SHA1, Mon 09 Jun 2003 06:45:19 PM EDT, Key ID 219180cddb42a60e
Packager
: Red Hat, Inc. <http://bugzilla.redhat.com/bugzilla>
Summary
: The GNU Bourne Again shell (bash).
Description :
The GNU project Bourne Again shell (bash) is a shell or command
language interpreter that is compatible with the Bourne shell
(sh). Bash incorporates useful features from the Korn shell (ksh) and
the C shell (csh) and most sh scripts can be run by bash without
modification. Bash is the default shell for Red Hat Linux.

Or by adding a -l, a list of all installed files is generated.


[prince@station prince]$ rpm -ql bash

/bin/bash
/bin/bash2
/bin/sh
/etc/skel/.bash_logout
/etc/skel/.bash_profile
/etc/skel/.bashrc
/usr/bin/bashbug
/usr/lib/bash
/usr/share/doc/bash-2.05b
/usr/share/doc/bash-2.05b/CHANGES
...

Including both, as in rpm -qil bash, would generate both.


Investigating an Unfamiliar Package
What if you have come across a file in the filesystem, and want to know to which package the file
belongs? rpm -qf ...
[prince@station etc]$ rpm -qf /etc/aep.conf

hwcrypto-1.0-14

Want to know more about the package? Add a -i.


[prince@station etc]$ rpm -qf /etc/aep.conf -i

Name
:
Version
:
Release
:
Install Date:

rha030-3.0-0-en-2005-08-17T07:23:17-0400

hwcrypto
Relocations: (not relocateable)
1.0
Vendor: Red Hat, Inc.
14
Build Date: Tue 04 Feb 2003 06:20:37 AM EST
Tue 01 Apr 2003 11:27:43 AM EST
Build Host: sylvester.devel.redhat.com

28

Copyright (c) 2003-2005 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other use is
a violation of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise duplicated whether
in electronic or print format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or otherwise improperly distributed
please email training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.

Chapter 3. The RPM Package Manager


Group
: System Environment/Base
Source RPM: hwcrypto-1.0-14.src.rpm
Size
: 711506
License: GPL
Signature
: DSA/SHA1, Mon 24 Feb 2003 01:25:46 AM EST, Key ID 219180cddb42a60ePackager
Summary
: Hardware cryptographic accelerator support.
Description :
This package contains the shared libraries used to interface with
hardware cryptographic accelerators under Linux.

Is there any documentation on the system that could tell you more about it? Add a -l to list the files
related to /etc/aep.conf.
[prince@station etc]$ rpm -qf /etc/aep.conf -l

/etc/aep
/etc/aep.conf
/etc/aep/aeptarg.bin
/etc/aeplog.conf
...
/usr/sbin/aepversion
/usr/share/doc/hwcrypto-1.0
/usr/share/doc/hwcrypto-1.0/hwcrypto.txt
/usr/share/doc/hwcrypto-1.0/readme.snmp
/usr/share/snmp/mibs/cnStatTrap.mib

In this case, not much, but maybe /usr/share/doc/hwcyrpto.txt will provide some help. Many
packages include man pages that can be read, or info pages that can be browsed. At least you can locate
some configuration files you might want to peruse to find out more.
Investigating an Unfamiliar Package File
What if you come across a package file which is not yet installed on your system? The rpm command
allows package files to be queried directly with the -p command line switch.
[prince@station RPMS]$ rpm -qil -p xsri-2.1.0-5.i386.rpm

Name
: xsri
Relocations: (not relocateable)
Version
: 2.1.0
Vendor: Red Hat, Inc.
Release
: 5
Build Date: Sat 25 Jan 2003 03:37:15 AM EST
Install Date: (not installed)
Build Host: porky.devel.redhat.com
Group
: Amusements/Graphics
Source RPM: xsri-2.1.0-5.src.rpm
Size
: 27190
License: GPL
Signature
: DSA/SHA1, Mon 24 Feb 2003 12:40:17 AM EST, Key ID 219180cddb42a60ePackager
Summary
: A program for displaying images on the background for X.
Description :
The xsri program allows the display of text, patterns, and images in
the root window, so users can customize the XDM style login screen
and/or the normal X background.
Install xsri if you would like to change the look of your X login
screen and/or X background. It is also used to display the default
background (Red Hat logo).
/usr/bin/xsri
/usr/share/doc/xsri-2.1.0

rha030-3.0-0-en-2005-08-17T07:23:17-0400

29

Copyright (c) 2003-2005 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other use is
a violation of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise duplicated whether
in electronic or print format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or otherwise improperly distributed
please email training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.

Chapter 3. The RPM Package Manager


/usr/share/doc/xsri-2.1.0/README

As mentioned in the table above, this is a fundamentally different type of query. The package file, which
might or might not be installed, is providing the information, not the RPM database.
Formatting Specific Information
What if you would like to generate a list of the 10 largest packages installed on your system? With rpm
-qai, the information header for every package would be displayed, which could be grepped down for
just the sizes, but then youd need names. You could add names, but then the name and size would be on
separate lines. You get the idea.
Fortunately, the rpm command allows users to compose very specific questions by specifying a query
format string. The string is composed of any ASCII text, but tokens of the form %{fieldname} will be
replaced with relevant information field. What can be used as filed names? For starters, any field found in
a packages information header, but theres more. The command rpm --querytags will return a complete
(and intimidating) list of available fields.
For the specific task at hand, prince performs the following query. (Note he needs to explicitly specify a
newline with \n).
[prince@station RPMS]$ rpm -qa --queryformat "%{size} %{name}\n"

0 basesystem
156498 expat
19248 libacl
111647 popt
1966 rootfiles
67679 cpio
162449 gzip
...

Just the information prince wanted. With a syntax of %width{fieldname}, an optional field width can be
specified. Using this to clean up his output, and piping to sort and head, prince generates a list of the 10
largest packages on his system fairly easily.
[prince@station RPMS]$ rpm -qa --queryformat "%10{size} %{name}\n" | sort -rn |
head

170890527
131431309
100436356
84371104
80018678
75838208
55166532
54674111
41939971
36762653

kernel-source
openoffice-i18n
openoffice-libs
gnucash
openoffice
rpmdb-redhat
Omni
tetex-doc
glibc-common
xorg

rha030-3.0-0-en-2005-08-17T07:23:17-0400

30

Copyright (c) 2003-2005 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other
use is a violation of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise
duplicated whether in electronic or print format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or
otherwise improperly distributed please email training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.

Chapter 3. The RPM Package Manager

Online Exercises
Lab Exercise
Objective: Become familiar with RPM queries
Estimated Time: 15 mins.

Specification
1. Create the file ~/bash_files, which contains a list of all files which belong to the bash package,
listing one file per line using absolute references.
2. Create the file ~/sshd_man, which lists the three files which contain man pages associated with the
openssh-server package, one file per line using absolute references.
3. In the file ~/whatis_libcap, include the single word which best completes the following
sentence: The /lib/libcap.so.1.* library is used for getting and setting POSIX.1e __________.
(Do not be concerned if you do not fully understand the answer).
4. Create the file ~/license_counts, which tables the number of occurrences of packages which are
licensed under a given license, for the top 5 most commonly used licenses, sorted in numerically
descending order. If performed correctly, your file should be formatted similarly to the following.
(Do not be concerned if the actual counts or license names are different. Also, you might notice
logically similar licenses, such as LGPL/GPL and GPL/LGPL. Do not make any attempt to combine
them into a single entry.)
[prince@station prince]$ cat license_counts

355
147
53
47
18

GPL
LGPL
BSD
distributable
xorg

Deliverables

1. The file ~/bash_files, which contains a list of all files owned by the bash package, one file per line, using
absolute references.
2. The file ~/sshd_man, which contains a list of the three files which provide man pages for the openssh-server
package, one file per line, using absolute references.
3. The file ~/whatis_libcap, which contains the one word answer for what the library gets and sets.
4. The file ~/license_counts, which tables the various licenses under which packages are distributed,

rha030-3.0-0-en-2005-08-17T07:23:17-0400

31

Copyright (c) 2003-2005 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other
use is a violation of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise
duplicated whether in electronic or print format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or
otherwise improperly distributed please email training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.

Chapter 3. The RPM Package Manager


proceeded by the number of packages to which the license applies, sorted in numerically descending order.

Questions
1. How does almost every RPM query command line begin?
( ) a. rpmquery ...
( ) b. rpm -q ...
( ) c. qpackage ...
( ) d. lsrpm ...
( ) e. None of the above
2. Where is the RPM database located?
( ) a. /tmp/.rpmdb
( ) b. /usr/share/rpm
( ) c. At http://rpmdb.redhat.com
( ) d. /var/lib/rpm
( ) e. None of the above
3. What would be the conventional name of the package file for release 7 of version 2.0.8 of the bash package
compiled for the x86 architecture?
( ) a. bash.i386-2.0.8.7.rpm
( ) b. rpm-bash-2.0.8-7.i386
( ) c. bash-2.0.8-7.i386.rpm
( ) d. bash-2.0.8-i386.rpm
( ) e. None of the above
4. Which of the following command lines would list the package names for all installed packages?
( ) a. rpm -q --dump
( ) b. rpm -qa
( ) c. lsrpm -a
( ) d. rpm -q --name
( ) e. None of the above

rha030-3.0-0-en-2005-08-17T07:23:17-0400

32

Copyright (c) 2003-2005 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other use is a violation
of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise duplicated whether in electronic or print
format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or otherwise improperly distributed please email
training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.

Chapter 3. The RPM Package Manager

5. Which of the following would generate an information header and file list for (only) the package xsnow?
( ) a. rpm -q --list -l xsnow
( ) b. rpm -qa -i -l
( ) c. rpm -q -i xsnow
( ) d. rpm -qil xsnow
( ) e. None of the above
6. Which of the following command lines would list all files which are contained in the same package as
/etc/pwdb.conf?
( ) a. rpm -ql /etc/pwdb.conf
( ) b. rpm -fql /etc/pwdb.conf
( ) c. rpm -qlf /etc/pwdb.conf
( ) d. rpm -qif /etc/pwdb.conf
( ) e. None of the above
7. Which of the following command lines would query the xsane-0.89-3.i386.rpm package file for a list of files
that it contains?
( ) a. rpm -q -p xsane-0.89-3.i386.rpm -l
( ) b. rpm -ql xsane-0.89-3.i386.rpm
( ) c. rpm -qp xsane-0.89-3.i386.rpm
( ) d. rpm -qip xsane-0.89-3.i386.rpm
( ) e. None of the above
8. Which of the following could be used to determine how much disk space the xscreensaver package consumes?
( ) a. rpm -q -i xscreensaver
( ) b. rpm -q -s xscreensaver
( ) c. rpm -qa xscreensaver
( ) d. rpm -q -l xscreensaver
( ) e. None of the above

33

rha030-3.0-0-en-2005-08-17T07:23:17-0400
Copyright (c) 2003-2005 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other use is a violation
of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise duplicated whether in electronic or print
format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or otherwise improperly distributed please email
training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.

Chapter 3. The RPM Package Manager


9. Which of the following commands could be used to determine which version of the up2date package is installed?
( ) a. rpm -q up2date
( ) b. rpm -qi up2date
( ) c. rpm -q --queryformat "%{Version}\n" up2date
( ) d. B and C Only
( ) e. A, B, and C
10. Which of the following would generate a table of installed package sizes and names for all packages files in the
local directory?
( ) a. rpm --queryformat "%10{Size} %{Name}\n" -p *.rpm
( ) b. rpm -q --queryformat "%10{Size} %{Name}\n" *.rpm
( ) c. rpm -q --queryformat "%10{Size} %{Name}\n" -p *.rpm
( ) d. rpm -qf "%10{Size} %{Name}\n" -p *.rpm
( ) e. None of the above

rha030-3.0-0-en-2005-08-17T07:23:17-0400

34

Copyright (c) 2003-2005 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other
use is a violation of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise
duplicated whether in electronic or print format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or
otherwise improperly distributed please email training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.