Sei sulla pagina 1di 40

SPSS MR Utilities

Users Guide
QUT000223DU

COPYRIGHT 2000 BY SPSS LTD

All rights reserved as an unpublished work, and the existence of this notice shall not be construed as an admission or presumption that publication has occurred. No part of the materials may be used, reproduced, disclosed or transmitted to others in any form or by any means except under license by SPSS Ltd. or its authorized distributors.
SPSS Ltd

Maygrove House 67 Maygrove Road


LONDON NW6 2EG

England Please address any comments or queries about this manual to the Support Department at the above address, or via e-mail to: support-uk@spssmr.spss.com All trademarks acknowledged.

Contents
About this Guide ......................................................................................................... iii Typographical conventions ............................................................................................iii 1
1.1 1.2 1.3

Reading non-Quantum data files .......................................................................... 1


Which program to use ......................................................................................................... 1 Using rcolbin ....................................................................................................................... 2 Using mtr............................................................................................................................. 2 Data format to read.............................................................................................................. 2 Input and output files........................................................................................................... 3 Record length and block size............................................................................................... 4 Files with one record per block ........................................................................................... 4 Reading only a given number of records............................................................................. 5 Byte swapping ..................................................................................................................... 5 Using mtread ....................................................................................................................... 6 Restrictions.......................................................................................................................... 6 How to read a mystery data file........................................................................................... 6

1.4 1.5 1.6

2
2.1 2.2 2.3

2.4 2.5

Converting Quantum data to foreign formats ................................................. 9 Which program to use ......................................................................................................... 9 Using wcolbin.................................................................................................................... 10 Using mtw ......................................................................................................................... 10 Data format to write .......................................................................................................... 10 Input and output files......................................................................................................... 11 Record length and block size............................................................................................. 12 Writing only a given number of records ........................................................................... 13 Using mtwrite .................................................................................................................... 13 Restrictions........................................................................................................................ 13 Checking for corrupt Quantum data files ....................................................... 15 Editing Quantum data.............................................................................................. 17 Using ded........................................................................................................................... 17 Record-editing commands................................................................................................. 17 Card-editing commands .................................................................................................... 20 Restrictions........................................................................................................................ 21 Diagnostics ........................................................................................................................ 21 Replacing text with sequential numeric values ........................................... 23 Preparing the text file ........................................................................................................ 23 Using mc............................................................................................................................ 23 Printing selected fields from a file ..................................................................... 27
Which columns and fields to print .................................................................................... 27 Text and column separators in the output ......................................................................... 28 Dealing with blank or short records .................................................................................. 29
Contents / i

3 4
4.1 4.2 4.3 4.4 4.5

5
5.1 5.2

6
6.1 6.2 6.3

SPSS MR Utilities Users Guide

6.4 6.5

Line numbers..................................................................................................................... 29 Restrictions........................................................................................................................ 29

7 8
8.1 8.2

Sorting files ................................................................................................................. 31 ANSI carriage control sequences in files ....................................................... 33 Adding ANSI control sequences....................................................................................... 33 Removing ANSI control sequences .................................................................................. 34

ii / Contents

About this Guide


The SPSS MR Utilities Users Guide describes a set of useful programs that are distributed to users of Quantum and Quancept. These programs are designed to overcome some of the problems and restrictions that come with the standard MS-DOS and Unix tools. Unless noted otherwise, all the programs documented in this manual work on MS-DOS and the Unix operating systems on which Quantum and Quancept are available.

Typographical conventions
The following typographical conventions have been used in this manual: Bold text is used in syntax statements to show words that you must type exactly as they are shown. Italic text is used in syntax statements to show words where you must substitute information of your own. For example, the word filename indicates that the program requires a filename and that you should enter the name of a file in place of the filename parameter. Italic text is also used in the main body of the text to refer to variable parameters from the command line and also to show MS-DOS or Unix commands.
Fixed width type is used to show examples.

In statements of syntax, [square brackets] are used to show optional parameters.

About this guide / iii

1 Reading non-Quantum data files


Data for Quantum runs does not always start off in Quantum format. Many market research data files are created in 360 or 1130 column binary and need converting if they are to be used with Quantum. The programs that do this are rcolbin, mtr and mtreat and the foreign formats they know about are:
ASCII

text

EBCDIC

360/370 column binary 1130 column binary Quantum internal (binary) format. (This uses the 12 lower-order bits (0FFF in hexadecimal) of a 16-bit word to represent the codes &0123456789 in that order; that is &=0800 and 9=0001.)

Normally, you will be reading foreign data files directly from a tape, but the instructions in this document apply to any input device as long as the records are of a fixed length. 

SPSS MR has no utilities for reading variable length records. If you receive a tape or file in this

format, you should ask the person who created the tape or file to create another version in fixed-record format.

1.1 Which program to use


mtr is the main data conversion program. rcolbin and mtread are shell programs/batch files that use mtr in specific formats. They exist to make it easier for beginners or nontechnical users to convert Quantum data files. rcolbin converts a complete 360/370 column binary file into Quantum format. Each record is 160 characters long and records are read in blocks of 1,600 characters. If your data is in this format, use rcolbin. If you want to do anything else, or the data you want to convert is not 80-column card images, use either mtr or mtread. mtread is simply an interactive interface to mtr. Instead of assuming that you will provide all the information about the foreign data format on the command line, mtread prompts you for it as it is running. The one advantage that mtr has over mtread is that it allows you to convert only part of a data file, whereas, mtread always converts the whole file.

Reading non-Quantum data files Chapter 1 / 1

SPSS MR Utilities Users Guide

1.2 Using rcolbin


rcolbin reads a 360/370 column binary data file with a record length of 160 characters and a block size of 1,600 characters. To use it, type: rcolbin foreign_file quantum_file where foreign_file is the name of the 360/370 column binary data file you wish to convert and quantum_file is the name of the Quantum data file you wish to create. Normally, you will be reading the data file directly from a tape, so foreign_file will be the device name of the tape drive you are using. Here are some examples. The first one creates a file called qtdata by converting the file on a tape in a SCSI tape drive called /dev/rst1:
rcolbin /dev/rst1 qtdata

The next example reads a 360/370 column binary file called cbdata from the current directory and creates from it a Quantum data file called qtdata:
rcolbin cbdata qtdata

If you want to convert a 360/370 column binary file that has a different record length and/or block size, or you want to convert a certain number of records only, or you want to convert data from a different format, use mtr or mtread.

1.3 Using mtr


The full syntax of the mtr command is: mtr format rrecord_length bblock_size iinput ooutput [other_options] The order of parameters in the command line is unimportant.

Data format to read


The format parameter defines the type of data you are trying to read and may be one of: asc ebc col or 360
ASCII text EBCDIC

using the IBM translation

360 or 370 column binary

2 / Chapter 1 Reading non-Quantum data files

SPSS MR Utilities Users Guide

1130 qin

1130 column binary Quantum internal (binary) format

If you omit the format option from the command line, mtr prompts for each data type in turn, asking whether the tape is of the particular format in question. For example:
Is this an EBCDIC tape? Is this an ASCII tape?

Type y and press ENTER at the appropriate prompt. If you answer n to all prompts, mtr displays the message Dont know how to read this tape and stops.

Input and output files


You define the name of the foreign file you wish to read with the option ifilename. Most times, you will be reading data from a tape so the name of the input filename will be the name of the tape device you are using. For example, if you are using a SCSI tape drive called /dev/rst0, you would enter this on the command line as:
mtr -360 -i/dev/rst0

The next command assumes that the data is being read from a -inch magnetic tape (/dev/rmt1):
mtr -360 -i/dev/rmt1

The next command names an input file rather than an input device, so mtr will search for a file called efile in the current directory:
mtr -1130 -iefile

If you forget to enter the name of the input file or device, mtr does not prompt you for it. Instead, it waits for you to type in the data; to cancel mtr and re-enter the command with a filename, press CTRL+D. If a tape contains more than one data file and you want to read any file after the first file, you need to tell mtr to skip over the preceding files on the tape. To do this, add the option fnumber to the command line, where number is the number of files to skip. For example, if you want to read the third file on the tape, your command could be:
mtr -c -f2 -i/dev/rst0

Reading non-Quantum data files Chapter 1 / 3

SPSS MR Utilities Users Guide

You enter the name of the Quantum data file you wish to create in a similar way using the option ofilename. For example:
mtr -360 -i/dev/rst0 -oqtdata

This example uses a simple filename as the name of the Quantum file so mtr will create the file in your current directory. If you want to create the file in a different directory, you may enter a full pathname here instead. In both cases, mtr overwrites the file if it already exists. If you omit the output filename from the command, mtr displays the data on the screen as it reads it.

Record length and block size


The r and b parameters tell mtr about the overall structure of the data on the tape. The record length is the number of characters in each record of the data file. You define it as rnn. If the incoming data is in a binary format, the record length must be an even number because each column of data is held as two characters (bytes). A common length for column binary records is 160 characters this is left over from the days when market research data was held on IBM punched cards which had 80 columns per card but this is not a requirement. The block size is the number of characters in each block. This must be a multiple of the record length, for example, 1600 if the record length is 160 characters. If you are reading data from a raw character device such as a tape drive, the block size must be equal to or greater than the size of the largest block to be read. You define the block size as bnn. For example, to convert a data file that is in 360 column binary format with a record length of 160 characters and a block size of 1,600 characters, you would type:
mtr -360 -r160 -b1600 -i/dev/rmt1 -oqtdata

To convert an EBCDIC data file that has a record length of 160 characters and a block size of 3,200 characters, you would type:
mtr -1130 -r160 -b3200 -iefile -oqtdata

Files with one record per block


In some files, each record is written as a separate block, so the record length and block size are the same. In this case, you can either enter the same value with both r and b, or you can enter just the record or block size and use a new option, v, instead. For example:
mtr -360 -r160 -v -iefile -oqtdata

reads a 360/370 column binary data file in which the record length and block size are both 160 characters.

4 / Chapter 1 Reading non-Quantum data files

SPSS MR Utilities Users Guide

It is the same as typing either:


mtr -360 -b160 -v -iefile -oqtdata

or
mtr -360 -r160 -b160 -iefile -oqtdata

If you omit the record length from your command but have defined the block size, mtr asks whether the incoming data has variable length records. If records are written one per block, type y and the record length is set to the same as the block size. The same is true, in reverse, if you define a block size without a record length. 

Using the v option or pressing y to the question about variable length records does not mean that the incoming data file contains records of different lengths. mtr expects all records in a file to be the same length.

Reading only a given number of records


You do not have to read or convert a whole file if you do not want to. To read a given number of records from a file, add the option mnn to the command line, where nn is the number of records you wish to read. For example:
mtr -360 -r160 -b1600 -m50 -i/dev/rst1 -oqtdata

to read and convert 50 records from SCSI tape drive 1. The command assumes that records have a record length of 160 characters, a block size of 1,600 characters and are in 360/370 column binary format. The Quantum data will be written to a file called qtdata in the current directory.

Byte swapping
Some computers hold the two bytes (characters) that make up a column in the opposite order to that used by the majority of computers. This does not affect text formats such as ASCII and EBCDIC, but with binary data the Quantum data will be incorrect if the bytes are not swapped as they are read in. To have mtr swap bytes before it converts to Quantum format, add the option s to the command line.

Reading non-Quantum data files Chapter 1 / 5

SPSS MR Utilities Users Guide

1.4 Using mtread


mtread is an interactive version of mtr which prompts you for the record length and block size of the incoming data file. Running mtread is exactly the same as running mtr with only the i and o parameters on the command line. To use it, type: mtread foreign_file quantum_file 

Do not use mtread if you want to convert a few records only or if the data needs to be byte swapped before conversion.

1.5 Restrictions
Invalid binary data is corrected and converted without warning. If mtr reaches the end of the file midway through a record, the partial record is ignored and cannot be converted. A message to this effect is issued:
Bad block size: Block 58 expected 160 got 100 assuming 80

1.6 How to read a mystery data file


Sometimes, you will be confronted with a data file whose format is a mystery to you. Often data files are transferred from machine to machine in such as way as to lose any clues as to which type of data they contain. You may have used mtr with several different options and have created output that is definitely not a data file. Here are some clues to help you cope. You should have on your machine a program that lets you look at a file on the screen. The standard MS-DOS program is called type, but this does not provide much flexibility. Other programs that you may have on a PC are browse and more. Under Unix, the program you want is more. You would also benefit from using a program that can show files in hexadecimal notation as well as in regular decimal. Hexadecimal (or hex) is a special notation that allows computers to keep information about a character in two bytes rather than three. There are 256 characters in the ASCII data set, and this is exactly 16 times 16. Hexadecimal notation is actually arithmetic in base 16. The numerals in this notation are 1 to 9 and A to F. So, 17 in decimal is 11 in hex (that is, 1 times 16, plus 1) and 255 in decimal is FF in hex. Note that each of the 256 characters (0 to 255 in decimal, 0 to FF in hex) takes up only one byte. You can look at files in hex in MS-DOS by using debug, or any number of other utilities. In Unix, the utility is od.

6 / Chapter 1 Reading non-Quantum data files

SPSS MR Utilities Users Guide

It would also be useful to know how many respondents should be in the data file, and how many cards each respondent should have. Even if this information is approximate, it can give you a hint as to whether the data file is close to the correct size or not. When looking in hex at a file which may be column binary, you may find that every 160 characters or so you will see a repeated pattern, or a similar pattern. When you are dealing with an 80-column record, the serial number will be repeated in approximately the same place on each record, every 160 bytes. When looking at a file as a regular text file, you will see that most of the file consists of blanks, interspersed with blocks, triangles and other graphics-style characters. 

Keep in mind that while much of column binary is record length 160 (two bytes per column), this is not a requirement and any even number may be used.

EBCDIC

is different. EBCDIC in regular text mode looks mostly like line-drawing characters. There are few spaces. However, the best way to tell is to look at the file in hex. The hex codes mostly are numbers of the form Fn, where n is the data code. For example, a code 1 would be a hex F1, a code 2 would be a hex F2, and so on.

Most market research data files you will be converting are in the 80-column format, no matter which type of record you are converting from. So, think 80, 160 and the like when trying to discover the record length. When you are converting, and are not sure of the record length, do not use the m option to read only a few records. With this option, mtr does not tell you if there is anything wrong with the conversion factors you used. It just reports how many records it found and returns you to the system prompt. If you let mtr convert the whole file, it will issue an error message if it found extra characters left over at the end of the operation. For example:
Bad block size: Block 69 expected 1600 got 1437 assuming 1280

This means that the last record was not completely filled before the end if the file was reached. This is the pointer that suggests that your mtr command was not correct for the data file you are converting.

Reading non-Quantum data files Chapter 1 / 7

SPSS MR Utilities Users Guide

Once you have tried to convert the file and have received a message similar to the one shown above, look at the output file using, say, type in MS-DOS or more in Unix. If you have the type of file correct, but the block size, record length, or both, is incorrect, you will see a file where you have numbers but they will seem to march diagonally across the screen, displaced out of their correct spots. For example:
00019595695969696797979879695949392692393459697695954493932932939339393933933 00 02409824098724t0989873245097309874098247124098712340987123409871240983408 00 03 244098723409874t3450987345098723098723409871234098712340987123409871234 0004 309873087959569596969679392692393459697695954493932932939339393933933 000594 0486467638362526474746253939393939 ....

Notice how the serial numbers 0001, 0002, 0003, 0004 and 0005 (shown in bold) seem to move diagonally across the page. The record length used to convert this file was 160 when it should have been 164. When you have converted a multicoded file into Quantum format, you may see a five-sided box symbol or, if you look at the file with vi, the characters ^?, followed by a list of letters, numbers and symbols. This special symbol is the decimal code 127, or hex 7E, that separates the end of the data from the multicodes in each record. In the body of the record, each multicode is shown as an asterisk. If a record contains no multicodes, you will not see this special symbol in that record. 

For a complete description of Quantum data format, see Appendix C of the Quantum Users Guide Volume 4.

8 / Chapter 1 Reading non-Quantum data files

2 Converting Quantum data to foreign formats


From time to time, you may need to give your client a copy of the data file on tape. Not every computer can read data in Quantum format and you may need to convert the data as you create the tape. The programs that do this are wcolbin, mtw and mtwrite, and the foreign formats they know about are:
ASCII

text

EBCDIC

360/370 column binary 1130 column binary Quantum internal (binary) format. (This uses the 12 lower-order bits (0FFF in hexadecimal) of a 16-bit word to represent the codes &0123456789 in that order; that is &=0800 and 9=0001.)

2.1 Which program to use


mtw is the main data conversion program. wcolbin and mtwrite are shell programs/batch files that use mtw. They exist to make it easier for beginners or nontechnical users to convert Quantum data files. wcolbin converts a complete Quantum data file to 360/370 column binary. Each record is 160 characters long and records are read in blocks of 1,600 characters. If this is what you want to do, use wcolbin. If you want to do anything else, or the data you want to convert is not 80-column card images, use either mtw or mtwrite. mtwrite is simply an interactive interface to mtw. Instead of assuming that you will provide all the information about the foreign data format on the command line, mtwrite prompts you for it as it is running. The one advantage that mtw has over mtwrite is that it allows you to convert only part of a data file, whereas, mtwrite always converts the whole file.

Converting Quantum data to foreign formats Chapter 2 / 9

SPSS MR Utilities Users Guide

2.2 Using wcolbin


wcolbin creates a 360/370 column binary data file with a record length of 160 characters and a block size of 1,600 characters. To use it, type: wcolbin quantum_file foreign_file where quantum_file is the name of the Quantum data file and foreign_file is the name of the 360/370 column binary data file you wish to create. Normally, you will be writing the data file directly to a tape, so foreign_file will be the device name of the tape drive you are using. Here are some examples. The first writes to a tape in a SCSI tape drive called /dev/rst1:
wcolbin qtdata /dev/rst1

The next example creates a 360/370 column binary file called cbdata in the current directory:
wcolbin qtdata cbdata

If you want to create a 360/370 column binary file with a different record length and/or block size, or you want to create a certain number of records only, or you want to create data in a different format, use mtw or mtwrite.

2.3 Using mtw


The full syntax of the mtw command is: mtw format rrecord_length bblock_size [mmax_recs] iinput ooutput The order of parameters in the command line is unimportant. Parameters in square brackets are optional.

Data format to write


The format parameter defines the type of data you are trying to write and may be one of: asc ebc col or 360
ASCII text EBCDIC

using the IBM translation

360 or 370 column binary

10 / Chapter 2 Converting Quantum data to foreign formats

SPSS MR Utilities Users Guide

1130 qin

1130 column binary Quantum internal (binary) format

In conversions to ASCII format, any multicodes in the Quantum data that do not correspond to standard ASCII characters are written out as asterisks. For example, a multicode of &1 corresponds to the letter A and will be written out as such, whereas a multicode of 123 has no ASCII equivalent, and will therefore be written out as an asterisk. In conversions to EBCDIC, the Quantum data is converted first to ASCII and then from ASCII into The notes for conversions to ASCII therefore apply.

EBCDIC.

If you omit the format option from the command line, mtw prompts for each data type in turn, asking whether the conversion is of the particular format in question. For example:
Is this an EBCDIC conversion? Is this an ASCII conversion?

Type y and press ENTER at the appropriate prompt. If you answer n to all prompts, mtw displays the message Dont know how to do this conversion and stops.

Input and output files


You define the name of the Quantum data file you wish to convert with the option ifilename. For example:
mtw -360 iqdata

This example uses a simple filename as the name of the Quantum data file so mtw will look for the file in your current directory. If you want to convert a file in a different directory, you may enter a pathname here instead. If you forget to enter the name of the input file, mtw does not prompt you for it. Instead, it waits for you to type in the data; to cancel mtw and re-enter the command with a filename, press CTRL+D. You enter the name of the Quantum data file you wish to create in a similar way using the option ofilename. Most times, you will be writing data to a tape so the name of the output filename will be the name of the tape device you are using. For example, if you are using a SCSI tape drive called /dev/rst0 and are converting the file called qtdata, you would enter this on the command line as:
mtw -360 qtdata -o/dev/rst

The next command assumes that you are writing to a -inch magnetic tape (/dev/rmt1):
mtw -360 -iqtdata -o/dev/rmt1

Converting Quantum data to foreign formats Chapter 2 / 11

SPSS MR Utilities Users Guide

The next command names an output file rather than an output device, so mtw will create a file called efile in the current directory:
mtw -1130 -iqtdata -oefile

If you omit the output filename or device name from the command, mtw displays the data on the screen as it converts it. You may find this useful if you want to use the converted data as the input to another program, since it means that you can pipe the converted data directly from mtw into the second program. There is no need to store an intermediate data file unless you wish to do so.

Record length and block size


The r and b parameters tell mtw how to write the data on the tape. The record length is the number of characters in each record of the foreign data file. You define it as rnn. If you are writing data in a binary format, the record length must be an even number because each column of data is held as two characters (bytes). A common length for column binary records is 160 characters this is left over from the days when market research data was held on IBM punched cards which had 80 columns per card but this is not a requirement. The block size is the number of characters in each block. This must be a multiple of the record length, for example, 1600 if the record length is 160 characters. If you are writing data to a raw character device such as a tape drive, records are grouped into blocks of the given size before being written out. You define the block size as bnn. For example, to convert a Quantum data file into 360 column binary format with a record length of 160 characters and a block size of 1,600 characters, and write it to a -inch magnetic tape, you would type:
mtw -360 -r160 -b1600 -iqtdata -o/dev/rmt1

To create an EBCDIC data file that has a record length of 160 characters and a block size of 3,200 characters, you would type:
mtw -1130 -r160 -b3200 -iqtdata -oefile

If you omit either the record length or the block size from your command, mtw prompts you for them.

12 / Chapter 2 Converting Quantum data to foreign formats

SPSS MR Utilities Users Guide

Writing only a given number of records


You do not have to convert or write out a whole file if you do not want to. To write out a given number of records, add the option mnn, where nn is the number of records you wish to write. For example:
mtw -360 -r160 -b1600 -m50 -iqtdata -o/dev/rst1

to convert and write out 50 records to SCSI tape drive 1. Records will have a record length of 160 characters, a block size of 1,600 characters and will be in 360 column binary format. 

mtw does not skip to the end of the input file before stopping. If you are reading data from a tape, the tape stops in the middle of the Quantum data file and should be repositioned or rewound manually.

2.4 Using mtwrite


mtwrite converts a whole Quantum data file into a format of your choice. It prompts you for the record length and block size of the file you wish to create. Running mtwrite is exactly the same as running mtw with only the i and o parameters on the command line. To use it, type: mtwrite quantum_file foreign_file

2.5 Restrictions
wcolbin, mtw and mtwrite have no facilities for positioning the tape before writing to it. If you wish to write more than one file to a tape, you must either use a non-rewinding tape drive or reposition the tape at the end of the last file before writing the second file to the tape.

Converting Quantum data to foreign formats Chapter 2 / 13

3 Checking for corrupt Quantum data files


Quantum data files hold information about multicodes in a special format. Each multicoded column whose codes cannot be shown as a letter or other symbol is shown as an asterisk. The codes that make up the multicodes are stored at the end of the line, separated from the end of the data with a special character. If you list a multicoded data file under Unix, you will see this character displayed as ^?, whereas, under MS-DOS it appears as a small, five-sided box. Each multicode is represented by two characters, so you would expect to see an even number of characters after the multicode symbol. Data in which the number of character pairs at the end of the lines does not match the number of asterisks in the earlier part of the line is corrupt and will be rejected by Quantum. 

For further information on Quantum data formats, see Appendix C of the Quantum Users Guide Volume 4.

Tab characters have no meaning in Quantum data files and will cause a run to fail. To check for tabs and corrupt multicodes, use the badata program. To run badata, type: badata [v][x][o output_file][input_files] input_files is a list of one or more filenames separated by spaces. A hyphen instead of a filename tells badata to read from the standard input (that is, data you type on your keyboard) rather than from a file. The v option displays the program version number, and x displays a summary of usage. Errors are normally displayed on the screen, but you can use the o option to redirect the output to a file. For example:
badata -o errors data1 data2

badata issues two types of error messages. If it finds a tab character, it reports:
Line number: Tab character in data

If it finds too few or too many characters after the multicode symbol, it reports:
Line number: corrupt record (x multi-punched columns, y character codes)

Checking for corrupt Quantum data files Chapter 3 / 15

4 Editing Quantum data


ded is a data editor designed for handling Quantum data files. You can edit single-card and multicard data files on a record by record basis, where each record is treated as a separate line. With multicard records, you can also edit the individual cards that make up a record.

4.1 Using ded


To edit a Quantum data file, type: ded filename [ser=s1,s2] [crd=c1[,c2]] ser= defines the start and end columns of the serial number and crd= defines the start and end columns of the card type (if the card type is held in a single column, you may omit the end column). These parameters are optional with single-card records, but must be used with multicard records. For example, if your data has two cards per record, with the serial number in columns 1 to 5 and the card type in column 6, you would type:
ded qtdata ser=1,5 crd=6

This example separates the start and end columns of each field with commas, but ded accepts any character except a space, a digit or a tab character as a separator. Editing commands are divided into record-editing commands and card-editing commands. The prompt for record-editing commands is a colon, and for card-editing commands, it is c:.

4.2 Record-editing commands


When you are working in record-editing mode, ded prompts for commands with a colon. Recordediting commands are as follows: prompt pb Switches the colon prompt on and off. pro is an abbreviation for prompt. Switches brief-printing mode on. For each record displayed, ded prints the record serial number and the number of cards in the record. If you named the card type columns on the command line, ded tells you which cards these are. For example:
Record 156 has 5 cards, these are 1,2,2,3,4

This is the default printing mode if you name the serial number field on the command line.

Editing Quantum data Chapter 4 / 17

SPSS MR Utilities Users Guide

pc

Switches character-printing mode on. The contents of each record are displayed with multicodes shown as asterisks. For example:
0015616267*7575*16231204521321** 001562 9438 21232* &- *23

This is the default if you do not name the serial number field on the command line. pp Switches punch (code) printing mode on. The punches in a multicoded column are displayed vertically in that column. Ranges of consecutive codes are shown using the notation start/end (for example, 1/5 for codes 1 to 5 inclusive). For example:
00156162671757541623120452132113 3 / 25 7 9&

p ruler eol n m,n +/[n]

Prints the record in the current printing mode. In a multicard record, all cards are printed at once. Displays an 80-column ruler above each record printed in pp or pc mode. Entering this command when a ruler is already displayed removes the ruler. Prints cards in a multicard record double-spaced. To switch off double-spacing, re-enter the eol (end of line) command. Prints the nth record in the file (for example, 156). Prints records m to n in the file (for example, 156,160). Prints the next (+) or previous () record in the file. If you enter a number after the + or sign, ded skips forward or back that number of records and prints the record at that position. For example, typing -5 at record 156, prints record 151. The printed record becomes the current record to which any changes will apply Goes to the last record in the file and prints it. Locates the record with the given serial number. If the serial number is shorter than the serial number field width, ded pads it on the left with zeros. If you type (156), for example, and the serial number field is five columns wide, ded searches for a record whose serial number is 00156. This command is only valid if you defined the serial number field on the command line. Locates and displays all records with the given serial number. At the end of the search, the pointer is left at the end of the file. This is useful for looking at records with duplicate serial numbers with a view to deciding which one, if any, should be deleted. Lists the whole file. Reports the number of records in the file.

$ (sernum)

g(sernum)

l,$ =

18 / Chapter 4 Editing Quantum data

SPSS MR Utilities Users Guide

/data/ s col=data

Searches for a record containing the given data and displays that record. Overwrites the contents of column/field col with the given data. As in Quantum, the column specification for multicard records must define both the card type and column numbers. Punches must be enclosed in single quotes and strings must be enclosed in dollar signs. Here are some examples:
s 15=& s 45,50=$123456$ s 252=156&

A space is an alternative to the = sign for separating the column and code specifications. a Appends a new record after the current record. Type the data on a new line. Each character you type goes into a new column unless you enclose a string of codes in single quotes. In this case, those codes are treated as a multicode in the current column. At the end of the data, press ENTER and then type a dot on a line by itself to terminate the record. For example:
0015718462137&123

generates a record with 14 columns of data; column 11 is multicoded. i d sort gsort merge Inserts a new record before the current record. Rules for data entry are as described for the append command. Deletes the current record. In a multicard record, all cards are deleted. Sorts the data by serial number (you must have defined the serial number field on the command line). Sorts the data by card type within serial number (you must have defined the serial number and card type fields on the command line). Merges adjacent records with identical serial numbers. Cards in the resultant records are not sorted, neither are duplicate card types merged into a single card. When used with the insert or append commands, this is a useful facility for dealing with missing cards. For example, if card 5 is missing from record 156, you could enter the data for this card by appending it after record 156. If you then run the merge command, the new card 5 will be merged with the rest of the data for respondent 156. cs sernum Changes the serial number to the given number. If the number you give is shorter than the serial number field, it will be padded on the left with zeros. To force blank padding, precede the number with a colon followed by the required number of blanks. For example:
cs : 156

Editing Quantum data Chapter 4 / 19

SPSS MR Utilities Users Guide

e w

Switches into card-editing mode for the current record. Type q to revert to record-editing mode. Writes out the data file saving any changes. To write out a range of records, type the first and last record numbers in the range at the start of the command. To write data out to a different file, type the new filename at the end of the command. Here are some variations of the w command:
1,100 w w newdata 1,100 w newdata

q !command shell

Leave the data editor. Executes the given MS-DOS/Unix command without terminating the editing session. When the command finishes, you are returned to the editor. Starts a subshell in which you can run MS-DOS/Unix commands. When you close the subshell, you are returned to the editor.

4.3 Card-editing commands


To switch from record-editing mode into card-editing mode, type e. The prompt changes to c:. You may use any record-editing commands apart from pb, (sn), merge and w in this mode to refer to individual cards in the current record rather than to the current record as a whole. For example, if you are looking at card 4 of record 156 and you type d, ded deletes card 4 while leaving the rest of record 156 intact. Card-editing commands are as follows: rc ct cs sernum Displays the serial number of the current record. This is useful if the serial number is coded somewhere other than at the start of the card. Lists the card type numbers present in the record. Changes the serial number using the same rules as described above. ded searches through the rest of the data for a record with the given serial number and, if one is found, moves the current card to the end of that record. If no such record is found, ded converts the card to a new record with the given serial number, but leaves it in its current position in the data file. In both cases, the original card is removed from the current record.

20 / Chapter 4 Editing Quantum data

SPSS MR Utilities Users Guide

For example, suppose you are working on card 3 of record 156 and you type cs 200. If there is already a record with serial number 200, the data on the current card is copied into record 200 as a card 3 and the current card is deleted from record 156. However, if there is not already a record 200 in the file, ded copies the data from the current card into a new record with serial number 200, and places the new record immediately after record 156. Card 3 is then deleted from record 156. q Returns to record-editing mode.

4.4 Restrictions
ded does not recognize the Quantum notation /& meaning all 12 punches. The dot notation used in the Unix ed, ex and vi editors for referring to the current line has not been implemented in ded. It is not possible to delete specific punches from a column, nor to emit new punches into a column. Use set commands to name the exact punches required in each column.

4.5 Diagnostics
Various error messages are displayed, mainly to do with buffer errors while reading or writing data. If the file is too large to handle, ded advises you to use the Unix split command to make it smaller.

Editing Quantum data Chapter 4 / 21

5 Replacing text with sequential numeric values


The mc program replaces all occurrences of a given character with a numeric value. The numeric value is different for each replacement made, but the increment between each value and the previous replacement value is the same. For example, you could choose to replace all occurrences of the # symbol with the numbers 1, 3,5, 7, and so on. The first occurrence of # would be replaced by the number 1, the second occurrence of # would be replaced by the number 3, and so on. When replacing a character in this way, you choose: The start value for the first replacement. The incremental value used to calculate subsequent replacement values. The width of the replacement field if it is wider than the replacement value. What to do with blank columns in replacement fields that are longer than the replacement value. The character to be replaced.

5.1 Preparing the text file


Each mc command performs replacements using one character only. The default text that signals a replacement is the @ sign, but you can use any character you like, as long as they are identical for each mc command you want to run.

5.2 Using mc
To use mc, type: mc start_value increment field_width[format] [text] [input_file] [output_file] In the mc command, start_value is the numeric value for the first replacement and increment is the incremental value for each subsequent replacement. The default for both values is 1. field_width is the width of the replacement field. This may be between 1 and 11 columns, the default is 5 columns.

Replacing text with sequential numeric values Chapter 5 / 23

SPSS MR Utilities Users Guide

format is a single character defining what to do when the replacement value is shorter than the field width. The default is to right-justify the replacement value in the field and pad the field on the left with zeroes. You may choose to use blanks instead of zeroes or to suppress them altogether. To do this, enter a format character immediately after the field width. Valid format characters are: B S Z Pad the field on the left with blanks. Suppress leading zeroes. The width of the replacement field then depends on the number of digits in the replacement value. Pad the field on the left with zeroes (the default).

text is the text to be replaced. The default is the @ symbol. You can also use mc from within the ex or vi editors. To do this, edit the text file containing the special replacement symbols using one or other of these editors. Then, type: :lines!mc start_value increment field_width[format] [text] lines is any ex or vi syntax that is valid for referring to lines in the file. Examples are 1,$ and % for all lines in the file, and 10,50 for lines 10 to 50 only. Here is an example. Suppose you have a large file containing a list of magazines. From time to time you want to extract various titles from the list and number them sequentially from 1. Here is part of the master magazine list:
Gardeners World Amateur Gardening Garden News Practical Gardening Amateur Photographer Photography Practical Photography Practical Woodworking Practical Householder Do It Yourself @ @ @ @ @ @ @ @ @ @

Suppose you want to extract a list of all photography magazines and number them sequentially from 1. Here are the steps you would take: 1. Edit the file with ex or vi and delete all magazines that are not to do with photography. It is helpful if all titles to do with a particular topic are grouped, as in the above example, but this is not necessary. 2. Type: :1,$!mc 1 1 3B

24 / Chapter 5 Replacing text with sequential numeric values

SPSS MR Utilities Users Guide

This tells mc to replace all @ symbols (the default text is used because no other text is defined with mc) with numbers starting at 1 and incremented by 1. The replacement field is 3 characters wide and is padded on the left with blanks. 3. Save your work in a new file (:w filename) and quit. Here is the result of running these commands on the example file:
Photography Practical Photography 1 2

Replacing text with sequential numeric values Chapter 5 / 25

6 Printing selected fields from a file


MS-DOS does not provide any programs for extracting and printing information from lines in a file.

The cut, paste and join utilities available with Unix provide some of this functionality, but are generally used in shell scripts rather than on the command line. An alternative is to use awk which is, again, standard on Unix systems. However, although it is extremely flexible, awk is more a programming language than a simple one-line utility. Another option is to use bycol. This easy-to-use program reads a file and prints selected columns or fields from each line. You may also define additional texts to be printed as part of the output. To use bycol, type: bycol [anpx] [sseparator] [what_to_print] [filename] To see a reminder of the command syntax, type: bycol x

6.1 Which columns and fields to print


The what_to_print parameter is a list of one or more column or field references defining the parts of the lines to print. To print a single column, just type its number. To print a field, type the start and end column numbers separated by a comma. If the start column of a field reference is lower than the end column, the field is read from right to left and its columns are printed in that order. All references in the list must be separated by spaces. For example:
bycol 1,2 15,10 4 9 1 29 myfile

prints columns 1 to 2, 15 to 10, 4, 9, 1 and 29 of myfile in that order. The contents of these columns are printed as a single string with no spaces in between them. The $ symbol represents the end of the line, so the notation 1,$ prints the whole line. However, $ often has a special meaning to the shell so it is advisable to enclose references of this kind in single quotes (Unix) or double quotes (MS-DOS). For example, under Unix:
bycol 1,$ myfile

prints the whole of myfile and is the same as typing cat myfile under Unix. Under MS-DOS, the command is:
bycol "1,$" myfile

and is the same as typing type myfile under MS-DOS.


Printing selected fields from a file Chapter 6 / 27

SPSS MR Utilities Users Guide

bycol displays its output on the screen. To write the output to a file, end the line with >filename. For example:
bycol 1,2 15,10 4, 9, 1, 29 myfile > opfile

6.2 Text and column separators in the output


You can print other things besides just what is in the columns you have chosen. You can define additional texts of your own and you can define a character or string that is to be used as a separator between each column, field or text printed. To print a text, type it as part of the what_to_print parameter at the point in the string that you want it to appear, preceded by a + sign. If the text contains spaces, enclose it in single quotes (Unix) or double quotes (MS-DOS). For example: Unix
MS-DOS bycol +Before: bycol +"Before: 25 + " 25 +" After: After: 26 myfile " 26 myfile

prints the word Before, then the contents of column 25, then the word After, and finally the contents of column 26. If you wish, you can define column separators as texts. Here is the very first example again, this time with spaces used as separators: Unix
MS-DOS bycol 1,2 15,10 4 9 1 29 myfile bycol 1,2 " " 15,10 " " 4 " " 9 " " 1 " " 29 myfile

If you want to use the same separator across the whole line, it is quicker to define it once using the s option. You could rewrite the previous example to produce the same output by typing: Unix MS-DOS
bycol -s 1,2 15,10 4 9 1 29 myfile bycol -s" " 1,2 15,10 4 9 1 29 myfile

The characters you use in text strings or as separators are not restricted to letters and numbers. You can use other characters from the list below, but be sure to enclose them in single quotes:
To print a Type Or the octal value

New line

\n

12 15 10 11

Carriage return \r Backspace Tab \b \t

28 / Chapter 6 Printing selected fields from a file

SPSS MR Utilities Users Guide

To print a

Type

Or the octal value

Formfeed Backslash

\f \\

34 134

Here is an example that uses the tab character as the column separator: Unix MS-DOS
bycol -s\t 1,4 10,12 15 56,62 data bycol -s"\t" 1,4 10,12 15 56,62 data

6.3 Dealing with blank or short records


bycol normally ignores blank records. If you want to include blank records in your output, use the a option on the command line. It may happen that records in your files are not all the same length and that some of the columns named on the command line do not exist in some records. If this happens, bycol prints as many columns as it can. If you are printing data in columns with text in the last column, this could mean that the text appears in the wrong column on short records. If you would like all records that are shorter than the highest column named on the command line to be padded with blanks to this length before being printed, include the option p in your command.

6.4 Line numbers


Use the option n to print a line number at the start of each line.

6.5 Restrictions
bycol cannot output lines longer than 1,024 characters. The maximum number of column references, field references, and texts in a command is 1,024.

Printing selected fields from a file Chapter 6 / 29

7 Sorting files
The standard ASCII file sorting programs provided with MS-DOS and Unix have shortcomings when used to sort files based on the contents of more than one field in the line. Under MS-DOS, lines are sorted based on the contents of column 1 or a single column that you choose. Sorting based on fields of columns is not possible. The Unix sorting program is much more sophisticated and allows sorting based on the contents of one or more fields, but the syntax for specifying the fields is not straightforward. asort is a utility that overcomes these limitations and makes it easy to specify sorts using any number of columns and fields. To use asort under Unix, type: asort [options] input_file output_file start1 end1 [ startn endn] To use asort under MS-DOS, type: asort input_file output_file start1 end1 Where input_file is the name of the unsorted input file, output_file is the name of the sorted output file, start1 and end1 are the start and end positions of the first field you want to sort on. To sort on a single column, enter the same value for the start and end columns. Under Unix, you can sort on more than one field by entering the pairs of start and end positions in order of importance, most important first. This is not possible under MS-DOS and asort will issue an error message if you specify more than one column field. The options under Unix are:
Option Explanation

Call sort using the old method of specifying the sort key. This has been provided for backwards compatibility. You may find this option useful in the unlikely event that asort now gives different results from previous versions. By using this option, you should get the same results as you did using the previous version of asort. The default is that this option is off. Call sort in verbose mode. The default is that this option is off.

Here is an example for Unix systems:


asort unsort.txt sort.txt 1 5 10 12 80 80

This command produces a sorted version of unsort.txt in the file sort.txt. Lines are sorted first on the contents of columns 1 to 5 (the highest sort level) and within that on columns 10 to 12. The lowest level of sorting is on column 80 within columns 10 to 12.
Sorting files Chapter 7 / 31

8 ANSI carriage control sequences in files


Nowadays, computer systems and printers print text files as they appear on your screen. Exceptions are that CTRL+L character (ASCII Formfeed) normally starts a new page, a CTRL+J character (ASCII Linefeed) normally starts a new line, and CTRL+M character (ASCII Carriage return) normally returns to the start of the current line. In the past, many systems adopted the ANSI standard for formatting text output. This specifies that the text to be printed on each line starts in position two on the line. The first character position is reserved for printing control characters that determine how the text is to be printed. These control characters are: 1 + 0 Print this line on a new page (equates to ASCII CTRL+L). Start this line at the beginning of the current line (equates to ASCII CTRL+M). Start this line at the beginning of the next but one line (equates to two consecutive ASCII CTRL+J characters).

Anything else means print this text at the beginning of the next line, which equates to the ASCII CTRL+J character. The accepted character to use in ANSI files is the space character. If you have a file with printer controls marked in ANSI format, you can convert it into ASCII format by running the program deftn. Similarly, if you have a file in ASCII format that needs to be converted into ANSI format, you can convert the file using ftnise.

8.1 Adding ANSI control sequences


To add ANSI control sequences to a file, type: ftnise o output_file existing_file For example:
ftnise -o list1.ans list1

to create a file called list1.ans by adding ANSI carriage control characters to the lines in the file called list1.

ANSI carriage control sequences in files Chapter 8 / 33

SPSS MR Utilities Users Guide

8.2 Removing ANSI control sequences


To remove ANSI carriage control sequences from a file, type: deftn existing_file new_file For example:
deftn list1.ans list1.unx

to remove the ANSI carriage control sequences from the file list1.ans and to save the results in a file called list1.unx.

34 / Chapter 8 ANSI carriage control sequences in files

Potrebbero piacerti anche