Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Digitisation of Paper
Records
3
Queensland State Archives: Guideline for the Digitisation of Paper Records
1: Introduction
Digitisation is the process of converting any physical or analogue item into an electronic
representation1. In the context of this guideline, digitisation refers to the creation of digital
images from paper documents by such means as scanning or digital photography.
Queensland State Archives has produced this guideline to provide information to public
authorities about digitisation, to recommend suitable digitisation parameters and to raise
awareness of the recordkeeping factors associated with digitisation.
Many Queensland public authorities have implemented or are considering implementing
systems to digitise their paper records. In most cases, these projects are undertaken with
the goals of
• achieving faster retrieval of information;
• improve access to information, by;
• greater sharing of information; and
• the reduction of the storage space required for paper records.
There are many aspects of digitisation. While the acquisition of scanners and associated
computer hardware may be the initial action that comes to mind when digitisation is
discussed, successful digitisation requires several components, including procedures,
standards, computer software, and appropriately skilled staff.
1.1 Authority
This guideline is issued under Section 25 of the Public Records Act 2002 (the Act). The
guideline is a resource for Queensland public authorities to help them achieve best
practice recordkeeping and information management. This publication is intended to serve
as a guide to public authorities undertaking or considering undertaking digitisation.
Information and recommendations provided in this guideline are considered to be accurate
at the time of publication. Queensland State Archives reserves the right to withdraw,
amend or replace this guideline at any time as technology and the needs of public
authorities change.
1.2 Scope
Paper records exist in a variety of formats including maps, plans, photographs and other
documents of various colours, paper types and sizes.
This guideline provides digitisation recommendations that are broad enough to apply to the
majority of paper records applicable to most public authorities. In some cases, particular
characteristics of different types of paper records may call for different technical
parameters and approaches from those included here. Public authorities should combine
the guidance provided in this document and advice from digitisation-related computer
hardware and software vendors with their own testing to determine the optimum
parameters for their organisation.
This guideline also provides information on how public authorities can apply for
authorisation for the early destruction of certain temporary records that have been
1
Tanner, S. From Vision to Implementation – strategic and management issues for digital collections. 2000. The Electronic Library –
strategic, policy and management issues seminar. Accessed March 2005 at http://heds.herts.ac.uk/resources/papers/Lboro2000.pdf
4
Queensland State Archives: Guideline for the Digitisation of Paper Records
digitised, following the Digitisation Disposal Policy: Policy on the authorisation of the early
disposal of original paper records after digitisation.
Key technical terms have been explained and illustrated with examples and a
comprehensive glossary has been included in Appendix 1: Glossary of Terms and
Acronyms. Definitions of records management terms can be found in Queensland State
Archives’ Glossary of Archival and Recordkeeping Terms2.
1.3 Exclusions
The conversion of other analogue records, such as video or audio recordings, into a digital
form is outside the scope of this document. Likewise, the management of information that
originates in a digital form, such as word processing documents, e-mails, and other born-
digital items is not included.
This guideline does not provide advice on high-quality digitisation of historical documents
for preservation purposes.3
While advice will be provided on some generic features that should be possessed by
computer hardware and software used in the digitisation process, this guideline will not
provide recommendations for particular models of computer equipment or software titles.
Queensland State Archives is unable to provide advice on systems and network
architecture issues relating to digitisation. Public authorities should refer to their existing
computer systems administration and implementation procedures for technical and
systems issues.
1.4 Acknowledgments
Queensland State Archives would like to acknowledge the public authorities which
participated in the development of the Guideline for the Digitisation of Paper Records.
2
Glossary of Archival and Recordkeeping Terms. 2004. Queensland State Archives. Accessed March 2005 at
http://www.archives.qld.gov.au/downloads/GlossaryOfArchivalRKTerms.pdf
3
For information on preservation digitisation of archival or historical documents, please contact Queensland State Archives.
5
Queensland State Archives: Guideline for the Digitisation of Paper Records
6
Queensland State Archives: Guideline for the Digitisation of Paper Records
format. Public authorities should also assess the need to maintain records in their original
form for legal purposes and should seek legal advice if unsure of any requirement. By
scanning original records to a digital format, and retaining only the digital version, public
authorities may be disadvantaged if called upon to authenticate certain records. The
Digitisation Disposal Policy sets out what records are eligible for authorisation for early
disposal and under what conditions. For more information on the authorisation process,
see section 5 of these guidelines.
Day batching
Some public authorities have adopted the practice of day batching, which involves filing
the paper originals of imaged records in batches based on date received or scanned.
Batching places a heavy reliance on the system used to manage the digitised records and
introduces a number of issues including:
• the risk of losing vital contextual information about the business the records document
and their relationship with other records,
• the inability to effectively implement a disposal program, since records batched
together may have different retention periods, and
• the refusal of Queensland State Archives to accept for transfer into its custody
temporary records contained in a batch also holding permanent public records.
Batching is usually associated with the digitisation of new records. It should be noted that
any records which are removed from structured files for any purpose, including digitisation,
must be returned to the file from which they were removed. Further information can be
found in the Public Records Alert, Day batching of records4.
2.3 Management of imaged records
Imaged records require careful management. There is a high risk of technical
obsolescence of hardware and software needed to retrieve information from electronic
storage media. A public authority needs to ensure that its recordkeeping system can
maintain authentic, accurate, complete and accessible imaged records for as long as they
need to be retained. A management plan dealing with the procedures for migration of data
is required to cater for systems being replaced and equipment becoming obsolete.
Some general principles apply to the retention of original paper records and their digital
copies:
• The paper original should be kept for the full period specified in an approved retention
and disposal schedule, unless an early disposal authorisation is granted (see section 5:
Authorisation for early disposal).
• If the image becomes part of a file with other records, for example, in an eDRMS
environment, it should be kept in accordance with the retention and disposal period
given to the parent file. Records should never be removed or ‘culled’ from files.
• An image made purely for access or reference purposes can be destroyed when
reference ceases in accordance with the General Retention and Disposal Schedule
(GRDS) for Administrative Records, class 6.1.2 for duplicate copies of records.
There are some exceptions to these general principles. For example, if key business
decisions, approvals or comments are closely associated with the image copy of a record,
such as in a workflow system, the image and the associated information should be kept for
the full retention period.
4
Public Records Alert No 1/05: Day batching of records. 2005. Queensland State Archives. Accessed March 2005 at
http://www.archives.qld.gov.au/publications/PublicRecordsAlert/PRA105.pdf
7
Queensland State Archives: Guideline for the Digitisation of Paper Records
The paper original may only be disposed of before the digitised image with the explicit
permission from the State Archivist through an approved retention and disposal schedule.
It should be noted that retention and disposal schedules set minimum periods for retention.
Public authorities occasionally have a need to keep records for longer than the approved
retention period. In this situation, if authorisation has not been given for the early
destruction of the original records, the principle of not disposing of the paper original
before the digitised image still applies.
Information Standard 40: Recordkeeping and the Australian Standard AS ISO 15489:
Information and Documentation: Records Management should be consulted for general
advice on the principles and practices for the management of scanned images as a public
record.
Authenticity
Authentic records are those that can be proven and trusted to be what it purports to be and
to have been created, used, transmitted or held by an agency or person to whom these
actions have been attributed5. Public authorities will need to be able to verify the
authenticity and accuracy of the images of business transactions captured by scanning.
The original records must remain readily accessible long enough to allow for verification
that procedures related to the capture of records have been followed.
Measures should be in place to protect the authenticity of the scanned records throughout
their lifecycle. Information about the scanning processes should be maintained, including
documentation about the business processes and the maintenance of systems, to
demonstrate that public records were created and captured in the normal course of
business with reliable systems and equipment. Documentation about the records that are
scanned should be maintained to describe the structure and content of the records and the
business context in which they are created and captured.
Copyright, Intellectual Property and Privacy
Most public authorities will digitise records for ease of information sharing within the
organisation. However, once digitised, information is in a form that makes it easier to
distribute to a wider audience. Any public authority intending to make information
available to a broader audience, for example, publishing images to a website, should be
aware of any copyright, intellectual property and privacy implications.
5
Glossary of Archival and Recordkeeping Terms. 2004. Queensland State Archives. Accessed March 2005 at
http://www.archives.qld.gov.au/downloads/GlossaryOfArchivalRKTerms.pdf
8
Queensland State Archives: Guideline for the Digitisation of Paper Records
9
Queensland State Archives: Guideline for the Digitisation of Paper Records
Decisions on whether or not to digitise a particular record will need to be made on a day to
day basis once digitisation is available to the organisation. A number of questions should
be asked when selecting records for possible digitisation:
• Is there a benefit of digitising this record?
• Is the original suitable for digitising?
• Is the equipment able to fully capture the content of the record?
• Is the original part of a series that also needs to be digitised?
• Are there any special characteristics of the record – eg: colour, double sided, faded?
Some records are physically less suitable for scanning than others. For example, large
format records, bound volumes, photographs, plans and maps, records with reflective
surfaces or fragile material require specialised scanning equipment and techniques.
Other records, such as those handwritten in coloured ink, on coloured paper, or double
sided paper may be accommodated by the available scanning equipment, but need to be
separated and scanned in a different batch with modified settings.
There may be a business decision made not to scan some records. Deciding to digitise
existing paper records in addition to new records can be a large undertaking, and careful
analysis should be conducted to gauge the benefit of doing this. Ideally, existing paper
records that are frequently accessed should be digitised, maximising the benefits of
digitisation. Some records may have such short retention periods that the expense of
digitising them is not warranted.
It is important that staff are made aware of what has been digitised and what hasn’t so that
they will benefit from the convenience of faster access to digitised records without
searching in vain for digitised copies of records that have not been scanned.
3.3 How will digitisation integrate to the existing workflow?
It is essential to the success of a digitisation program that the existing records
management procedures are investigated prior to the introduction of new techniques.
Decisions need to be made on a number of aspects including when the records will be
scanned, how end users are presented with the records, and how will the original paper
records be managed after scanning.
A good understanding of existing practices will not only present the opportunity to integrate
digitisation at the most appropriate stage, but will also provide a point of reference to
measure the performance of digitisation. The introduction of digitisation may also provide
the impetus to streamline business processes around digitised records.
3.4 How will the digitised records be managed?
As described later in section 4: Components of a Digitisation Program, a system to
manage the digitised records is arguably the most important component of the digitisation
system. It is essential that a system is in place that enables access by appropriate
authorised personnel, allows digitised records to be easily found, includes measures to
preserve the authenticity of the records, and provides information about the record and its
context. This descriptive information, known as metadata, is discussed further in section
7: Metadata.
Time related factors encountered in records management, such as record retention
periods, and destruction dates also apply to digitised records. Additionally, the
obsolescence of technology and the deterioration of storage media are time related factors
that are introduced by digitisation and should be addressed in management plans.
10
Queensland State Archives: Guideline for the Digitisation of Paper Records
11
Queensland State Archives: Guideline for the Digitisation of Paper Records
6
Adapted from Western States Digital Imaging Best Practices Version 1.0. 2003. Western States Digital Standards Group. Accessed
March 2005 at http://www.cdpheritage.org/resource/scanning/documents/WSDIBP_v1.pdf
12
Queensland State Archives: Guideline for the Digitisation of Paper Records
13
Queensland State Archives: Guideline for the Digitisation of Paper Records
once nature of some optical media is an inexpensive means of doing this, however, as the
amount of digitised material grows, the slow access times and handling required to use
removable media may not be suitable. In these situations, software security solutions,
such as those included in many eDRMS, should be implemented to provide a similar
assurance of integrity for files stored on line using hard drives and file servers.
Section 8: Storage and media options, covers these and other related issues in detail.
4.2 Computer Software
A system for describing and managing the digitised records
A system to manage the digitised records is arguably the most crucial component of a
digitisation program. The successful implementation of this software should ideally be
completed prior to the commencement of scanning of paper records and the acquisition of
such a system should be a high priority task. There is little use in commencing the
scanning of paper records if there is no established method of recording what has been
scanned or storing descriptions of those scanned records.
For the management of digital images as records, systems that are designed to fulfil
records management needs should ideally be used. The effort and resources required to
adapt a system not designed for the management of digital records should be analysed
and a business decision made as to whether the introduction of a new system with the
required capabilities would be a more effective use of resources.
If new systems, such as eDRMS, are to be introduced into an organisation to manage
digitised paper records, the system should comply with the guidance provided in this
document in such areas as metadata and file formats, and integrate into the organisation’s
existing ICT infrastructure. Resources will need to be allocated to training of staff and
technical support of any systems that are introduced.
Existing systems employed by public authorities to manage paper records may also have
the capability to also link to digital images of the records they describe. If this is not a
feature of the available records management software, provided that the software is able
to provide descriptions of individual items within a file, a file path to the digital images and
other information could be recorded as comments.
As an entry level solution, or as a temporary measure pending the introduction of a
specialised system, a record of what has been scanned, where the digital copies can be
found, and relevant details of the scanning process can be a stored in a spreadsheet or
small database.
There are also many free or low cost image cataloguing applications designed principally
for management of digital photographs that could be used for managing digital copies of
paper records. The features of such systems should be closely examined prior to
implementation, even if only as a temporary or interim measure. A large effort may be
required to adapt non-recordkeeping systems to document scanned records, and much of
this will need to be repeated when a more appropriate longer term solution is employed.
Imaging software for capture and manipulation
Scanners are bundled with software (known as a driver) that is required for the controlling
computer and the scanner to communicate. Additional software is typically included with
the scanner which allows scanning, calibration and some post scanning image processing
operations to be performed. This software will usually be thoroughly tested by the scanner
manufacturer to work optimally with their hardware, with features appropriate to the type of
scanner purchased. For example, software bundled with high speed sheet fed scanners
14
Queensland State Archives: Guideline for the Digitisation of Paper Records
would likely include features which would allow a choice to be made between single sided
and duplex scanning, and recognition of barcodes while the software that comes with slide
scanners would typically include features for magnifying the originals and reversing the
colours of negatives.
If the bundled software does not fulfil a public authority’s scanning needs it may need to be
supplemented or replaced by software which is purchased separately. Additional image
processing software may also be used for such tasks as the conversion of file formats,
deriving of related files, and modification or enhancement of images. Some degree of
interoperability usually exists between most image processing and scanning software so
that images can be acquired directly into the image processing software. To enable full
text searching of scanned paper documents, optical character recognition (OCR) software
would also be required. This is described and discussed further in section 6.6: Master files
and derivatives.
Security and access control
Just as the access to many paper records is monitored and typically restricted to
authorised staff, access to digitised copies should also be controlled in a secure digital
environment. As it would be a relatively straightforward exercise for a skilled operator
using free or inexpensive image processing software to change the appearance of
digitised records, security measures should be in place to prevent this type of
unauthorised tampering.
As described above and detailed in section 8: Storage and media options, some computer
storage scenarios, such as the use of write-once optical media, inherently prevent
modification of the image files once they have been stored. However, if other types of
computer storage are used, additional security provided by software with such features as
encryption, access control, and auditing should be employed.
For large digitisation programs that may be widely distributed throughout an organisation
and several staff need to add to and modify the collection of digitised records, a system
such as an eDRMS may be used to manage access to the information and to provide an
audit of system access and modification. In small scale digitising implementations,
security and access control may be provided through the use of a password protected
system by a single operator, with other authorised staff given read-only access. This could
be accomplished using the built-in security features of most current computer operating
systems.
4.3 Procedures and Standards
It is important to fully document decisions made about the digitisation process, including
technical, procedural and quality considerations. This is particularly important when
seeking authorisation for early disposal. The Digitisation Disposal Policy provides
information on the particular procedures required in this situation.
A method to identify which records are to be digitised
It is unlikely that all paper records within a public authority will be digitised. Identifying
those that are to be digitised prior to the implementation of digitisation may assist in setting
the requirements for a digitisation program and also determine the parameters for
subsequent digitisation. Internal policies and procedures should be developed and
relevant staff made aware of the criteria for deciding what paper records will be digitised.
15
Queensland State Archives: Guideline for the Digitisation of Paper Records
16
Queensland State Archives: Guideline for the Digitisation of Paper Records
allocated annually to maintenance, upgrades and training7. Allowance for this should be
made as part of the planning for digitisation.
4.4 Staff
Project management
Staff with business analysis and project management skills will be required to determine
the need for digitisation. They should also examine the workflow of current and new
processes to ensure that benefits of digitising are realised with minimal interruption to
business and effective communication and change management. Staff with these skills
may also manage the financial resources, negotiate with equipment and service suppliers
and prepare for the continued support, maintenance, and lifecycle management.
Technical experts
Digitisation involves the integration of computer hardware, imaging equipment and various
software packages to produce a managed collection of digitised records. Staff with
technical skills will need to investigate the various hardware and software options, and
work to bring the aims of the project to reality within time and budget constraints. These
staff may need to liaise with vendors, test different combinations and configurations of
equipment, and be responsible for acquisition, support, integration and maintenance of
equipment.
If an organisation’s IT help desk is expected to provide ongoing support to the digitisation
program, they should be made aware of any non-standard configurations required for
computers and contact details for the vendors of digitisation specific equipment.
Records management
Recordkeeping best practice applies to all records independent of its digital or paper
format. Recordkeeping controls and processes such as registration, classification or
profiling and appraisal and disposal will have to be applied to digitised records. Records
managers should be involved in the development of a digitisation program to ensure these
matters are addressed. On a routine basis once digitisation is underway, records
management staff will be involved in profiling and sentencing digitised images and also in
the retrieval and storage of paper records which are being digitised.
Records management staff without previous experience in managing digitised or other
technology dependent records should consult contemporary information management
research and guidelines to examine how digitisation will affect the management of records
within their organisation.
Equipment / computer operators
Personnel will be required to obtain the source paper records, operate the scanners, carry
out quality checks on scanned records and add metadata / profile information for the
digitised material. These staff should have a clear understanding of their task and a
workflow should be developed so that digitisation is regular and routine and meets
appropriate standards.
7
Revised Digital Imaging Guidelines for State of Ohio Executive Agencies and Local Governments. 2003. Ohio Electronic Records
Committee. Accessed March 2004 at http://www.ohiojunction.net/erc/imagingrevision/revisedimaging2003.html
17
Queensland State Archives: Guideline for the Digitisation of Paper Records
8
A ‘disposal trigger’ is the event or action, specified in a Retention and Disposal Schedule from which the
disposal date is calculated. Common disposal triggers include ‘after last action’, ‘after contract / agreement
18
Queensland State Archives: Guideline for the Digitisation of Paper Records
• the amount of time which the record has to be retained after a disposal trigger.
It is important to note that the ten-year retention period restriction on authorisation for early
disposal applies to this total period, not simply the number of years after the disposal
trigger occurs.
A retention and disposal schedule clearly indicates the disposal trigger and the period of
time a record needs to be retained after the trigger. However, to determine the total
retention period for the purpose of authorisation under the Digitisation Disposal Policy, it
will also be necessary to determine the average period that elapses before the disposal
trigger occurs.
Determining the total retention period may involve discussions with the relevant business
areas to determine how long records remain active before the disposal trigger is activated.
As many disposal classes use ‘after last action’ as a disposal trigger, it would be
necessary to determine for how long a file is usually active. For example:
• if files relating to business planning are routinely closed at the end of the financial year
in accordance with the planning cycle, and
• the retention period is five years after last action,
• then these records would be eligible for authorisation, as the total retention period
would be six years.
In contrast, if a record is usually active for three years, and the retention period is ten
years after last action, then this record would not be eligible for early destruction.
For example:
• An application was made for a licence, and approved.
• The disposal action for approved licence records is retain for five years after
expiration of licence.
• Licences are valid for two years.
The total retention period would be approximately seven years (assuming that it is
only a short time from receipt of application to approval of licence) and this class of
records would therefore be eligible for authorisation.
Queensland State Archives acknowledges that determining the total retention period for
some record classes will rest on the ‘balance of probabilities’, as particular records within a
record class may be retained for longer. In addition, some records change class during
their life span. For example, if some records within a class are subject to a Freedom of
Information request, a different record class also applies and their retention period is
potentially extended.
Please note: if it is difficult to apply a record class with any certainty to some types of
records at creation or early in the life of the record, then these records are not eligible for
early destruction.
expires’ or ‘after end of financial year’. As a disposal trigger may occur many years after the creation of a
record, for example, a trigger ‘after sale of building’ may occur more than 50 years after the creation of a
building purchase record.
19
Queensland State Archives: Guideline for the Digitisation of Paper Records
Queensland State Archives’ Appraisal Archivists can assist in determining the total
retention period and will review these determinations in assessing any application for
authorisation.
Risk assessment
Once it has been determined that the class of records meets the retention criteria, a risk
assessment should be undertaken which examines the likelihood of the records in each
class being proposed for early destruction, required for legal proceedings and challenged
in court. In undertaking this assessment, a review of the agency’s litigation history and
consultation with legal staff is essential.
For example:
• An agency may process 1000 claims files a year.
• Of these, 2% are subject to dispute and of this only 10% proceed to full litigation.
• In the agency’s experience of litigation, the validity of the records or parts of them
has not been challenged.
Therefore, there may be a low risk of the original records being required and the Chief
Executive may be happy to propose this record class for authorisation for early
destruction.
In addition to assessing the risk of the original record being required in legal proceedings,
there may be other factors to consider, such as whether it is a vital record or whether there
are any business needs for the original format. In many cases these other risks may be
appropriately treated and minimised through good digitisation procedures and appropriate
management of the image. Table 1 (below) includes examples of some risks and potential
actions to minimise risk.
Risk Mitigation / Treatment
Electronic copy not legible • Develop and implement quality
assurance procedures.
• Procedure for retention of original
and inclusion of metadata /
explanatory notes if poor quality
original means that a legible image
cannot be generated.
Loss of hand-written annotations on • Quality assurance procedures.
originals
• Raise awareness of staff that if they
make extensive annotations on a
print-out of an image, the annotated
copy should be rescanned as a new
record.
Electronic copy cannot be found / lost • Capture and management of image
in recordkeeping system.
• Regular backups and other system
maintenance.
20
Queensland State Archives: Guideline for the Digitisation of Paper Records
Formal risk assessment processes should be used to identify risks and plan mitigation
strategies. These may be either agency-endorsed procedures or AS 4360: Risk
Management. Appendix 11 of the DIRKS (Designing and Implementing Recordkeeping
Systems) manual provides advice on how the AS4360 risk management process can be
adapted to recordkeeping risks.9
Format-specific retention requirements
Records which are subject to format-specific retention requirements which are not
overriden by the Electronic Transactions Act 2001 are not eligible for early destruction
after digitisation. Format-specific requirements relate to the need to retain a record in its
original, paper form and commonly relate to witnessed or signed documents.
Many format-specific requirements in legislation were overridden by the Electronic
Transactions Act 2001. However some requirements were specifically excluded from the
coverage of the Act and these exclusions are noted in Schedule 1 of the Act. Other
requirements may be found in regulations or standards, for example the Financial
Management Standard 1997 requires financial information to be kept in its original form for
one year after the date of the audit report for the financial year.
Other format-specific requirements may require electronic forms of documents to be
retained on a particular storage device. If this is the case, the systems and procedures for
digitisation (see next section) should comply with these requirements.
Consultation with legal staff should be undertaken to determine the extent of format-
specific requirements affecting the records of an agency.
5.4 Ensuring appropriate systems and procedures
Section 2.3 of the policy statement specifies a range of conditions public authorities must
meet before authorisation can be granted. Table 2 (below) provides references to advice
on meeting these requirements. Many of these requirements are discussed further in these
guidelines.
9
National Archives of Australia (2003) The DIRKS Manual: A Strategic Approach to Managing Business Information. Available online:
http://www.naa.gov.au/recordkeeping/dirks/dirksman/dirks.html.
21
Queensland State Archives: Guideline for the Digitisation of Paper Records
22
Queensland State Archives: Guideline for the Digitisation of Paper Records
23
Queensland State Archives: Guideline for the Digitisation of Paper Records
Where the records are covered by an agency-specific retention and disposal schedule:
Reference Description of records Status Disposal Action
Class Original licence records which Temporary Retain for 2 months after
number have been digitised digitisation and completion of
quality checks
Class Digitised images of licence records Temporary Retain for 5 years after last
number action
Where the records are covered by General Retention or Disposal Schedule or sector-
specific schedule:
Reference Description of records Status Disposal Action
Class Original records sentenced under Temporary Retain for 2 months after
number classes [insert class numbers and digitisation and completion of
summary description] of the quality checks
[name, number and version of
schedule], where full and accurate
digitised images are retained for
the authorised retention period.
24
Queensland State Archives: Guideline for the Digitisation of Paper Records
6: Technical considerations
Prior to implementing a digitisation program, there should be a high level of understanding
of the technical aspects of scanning within the organisation. Whether an organisation
outsources its digitisation or performs the work in house, familiarity with key technical
aspects of digitising will assist relevant staff to gain an understanding the process.
Most software and hardware that will be used in a digitisation program will provide a range
of variable parameters such as image resolution and output file format, and informed
choices need to be made on each of these. Establishing appropriate technical standards
for digitisation before implementation will promote consistency and accountability.
As detailed in section 3: Issues to consider before commencing digitisation, there are a
number of business considerations that need to be assessed by your organisations prior to
implementing a digitisation program and also when deciding which records will be
digitised. The key technical considerations of:
• resolution,
• bit depth,
• compression, and
• file format
are described in detail in the following sections with recommendations provided where
warranted. A summary table of recommendations is provided in appendix 3: Table of
technical recommendations. Also included in this chapter is a discussion of the quality
control procedures that can be put into place to check that the image files created meet
the specified standards.
6.1 Resolution
Picture elements, or pixels, can be considered the building blocks of all digital images.
They are square cells of a single colour or shade that, when arranged in a regular grid
pattern, form the digital image. The resolution of a digital image is the density of pixels
that make up the image. Pixels per inch (PPI) is used to describe image resolution.
Figure 1 shows a piece of text scanned at various resolutions.
Figure 1: 100PPI, 200PPI and 300PPI examples showing the effect of resolution on image clarity
For example, the image produced by scanning an A4 (8.27” x 11.69”) page at 100 PPI
would have 827 pixels in the horizontal by 1169 pixels in the vertical direction, or a total of
966,763 pixels. If the same A4 image were scanned at a resolution of 300 PPI, it would be
made up of 2481 x 3507 pixels or a total of 8,700,867 pixels (8.7 megapixels). Similarly, a
4” x 6” photograph digitised at 300 PPI would result in a 1200 x 1800 pixel image with a
total of 2.16 megapixels. As seen in these examples, the pixel density combined with the
25
Queensland State Archives: Guideline for the Digitisation of Paper Records
dimensions of the source material provides an What about dots per inch (DPI)?
accurate assessment of the total number of
pixels that will make up the resultant image. DPI is a measure of printing resolution, in
particular the number of individual dots of ink
Occasionally, an image will be described by a printer or toner can produce within a linear
using its pixel dimensions rather then pixel one-inch space.
density. For example, images intended only for Due to the similarity with other measurements
viewing on a computer screen may described of graphical resolution, the DPI measurement
as “800 x 600 pixels”, “1024 x 768 pixels”, etc. is frequently misused, for instance, to specify
a scanner's sampling resolution or the number
By determining the source material dimensions of pixels per inch in a computer display.
in inches and using the provided horizontal and
Using DPI measurement in these cases is
vertical pixel totals, the pixel density of the
generally considered to be inaccurate and
image can be discovered. For example, a misleading, though the intended meaning is
1024 x 768 image displayed full screen on a usually clear based on context. In these
17” monitor (viewing size 13” x 10”) has a cases, a measure given in DPI can be taken
resolution of approximately 80 PPI. as the number of pixels per inch.
10
General Guidelines for Scanning. 1999. Colorado Digitization Project. Accessed March 2005 at
http://www.cdpheritage.org/resource/scanning/documents/std_scanning.pdf
11
Digital Imaging for Archival Preservation and Online Presentation: Best Practices. 2001. Michigan State University. Accessed March
2004 at http://www.historicalvoices.org/papers/image_digitization2.pdf
26
Queensland State Archives: Guideline for the Digitisation of Paper Records
Mode of use
How the digitised documents will be used needs to be considered when making a decision
about resolution. The resolution of the typical output should be considered. As a general
guide, source documents that are generally magnified for viewing and printing require
digitising at a higher resolution, while source documents that are reduced for viewing and
printing can be digitised at lower resolutions.
In the case of large documents, the intended viewing or reproduction size needs to be
considered, but there can be logistical and practical difficulties if using too high a resolution
for large documents. For example, digitisation of an A0-sized (33”x47”) poster at 300 PPI
could produce a file over 400Mb in size. While the storage of this sized file may be
accommodated, the processing power required to view and print such files is beyond many
systems.
For a large map or plan that is to be only ever viewed as an A4-sized image, the reduced
size of the output means that the input resolution may be quite low. However, a high
resolution may be required to legibly capture the fine line work and small text that is often
present on large format maps and plans. The resolution selected to digitise documents
may be a compromise between detail and file size.
Source documents that are typically enlarged for viewing or are of a small size and require
magnification for use should be digitised at high resolutions. The best illustration of this
would be the digitisation of a slide, microfilm or photographic negative which would
normally be viewed at several times its actual size. It is common practice for these types
of originals to be digitised at many thousands of pixels per inch to produce useable output
at viewing size.
On the other hand, considering that even the most modern computer monitors typically
have resolutions less than 100 PPI, if a document is digitised purely for on screen viewing
at the original scale, digitising at high resolutions will not provide any benefit12.
Recommended Resolutions
Table 4 shows the minimum recommended PPI resolutions for digitising paper records.
Document Type Page Size Resolution
Standard text documents Up to A3 200 PPI
Oversized documents, e.g. maps Larger than A3 200 PPI
6”x4” 600 PPI
Photographs 7”x5” 430 PPI
9”x6” 300 PPI
Table 4: Resolution recommendations
12
Scanning Tips and Techniques. Jasc Software Inc. 1999. Accessed October 2004 at http://www.jasc.com/tutorials/scantip.asp
27
Queensland State Archives: Guideline for the Digitisation of Paper Records
13
Creating and Managing Digital Content – Glossary. 2002. Canadian Heritage Information Network. Accessed March 2005 at
http://www.chin.gc.ca/English/Digital_Content/Small_Museum/glossary.html#c
14
Standard 4-bit colours are black, dark red, dark green, dark yellow, dark blue, dark purple, dark cyan, pale grey, mid grey, red, green,
yellow, blue, magenta, cyan and white.
15
Technical Recommendations for Digital Imaging Projects. 1997. Image Quality Working Group of ArchivesCom. Accessed March
2005 at http://www.columbia.edu/acis/dl/imagespec.html
28
Queensland State Archives: Guideline for the Digitisation of Paper Records
True colour depths are recommended for any materials with colour where colour conveys
essential information. For colour photographs the minimum recommended bit depth is 24-
bit (true colour). With current viewing and printing equipment, 48-bit colour does not
provide any meaningful advantages over 24-bit colour. However, if documents are
captured using a 48-bit capture device now, the benefits may be able to be exploited in the
future as technology develops.
32-bit colour is 24-bit colour with an additional 8-bit channel providing 256 levels of
transparency and is used mainly for digital video and animation applications.
Selecting an appropriate bit-depth
The nature of the documents being digitised should be the main factor dictating the bit
depth used for the images produced. For the digitisation of black and white text
documents, bi-tonal colour depth will usually capture the information most efficiently.
However, for documents that contain greyscales or colours, a bi-tonal image will not
capture all of the information and may produce an illegible image. Palettised colour depth
is typically suitable for line drawings, colour document and diagrams, while continuous
tone images, such as photographs, are best captured in true colour.
Figure 3: Greyscale text captured in 24-bit colour showing that using a higher than recommended colour depth may introduce extra
colours into the image
Capturing a document at a lower than recommended bit depth will possibly result in an
image that is visibly different from the original record. In some situations this visible
difference and loss of information will be acceptable – for example when digitising a
document with black and white content, but a coloured letterhead, the loss of colour in the
letterhead may be acceptable. Choosing a higher than recommended colour depth, such
as 24-bit colour for a black and white document, will not provide any benefits, but will result
in an increase in the file size of the image produced and may even introduce small areas
of extra colours not present in the original document.
The conversion of a colour drawing, such as a simple business graphic, into a 24-bit colour
image would not only result in an inefficient file size but also introduce many extra colours
into the image. For example, the original document digitised to produce the image shown
in Figure 3 had three colours – black, white and grey. However, during the process of
scanning this document as a 24-bit image, 17,898 colours including pixels with shades of
brown and pink were generated! If the image was printed using a monochrome printer, the
general appearance may be similar to the original, however, the introduction of additional
colours may affect post-digitisation image processing operations. In this case using 4- or
8-bit grey for the output image would be more appropriate.
29
Queensland State Archives: Guideline for the Digitisation of Paper Records
As is the case when determining the resolution to use, the mode of use of the digital
images should be considered when deciding upon an appropriate bit-depth. If imaged
pages will most often be viewed on computer screens, then the use of a higher than
normal bit-depth may be warranted. As seen in Figure 4, increasing the colour depth may
enhance the on-screen readability of a low resolution image. If, however, digitised copies
of records will only ever be made available as monochrome print outs, then the use of
colour could be considered superfluous.
Figure 4: 8-bit and 1-bit versions of a 72ppi image, showing how an increase in bit depth allows anti-aliasing which may improve
readability
Capturing a document that contains a watermark, highlighting, or hand written annotations
into a bi-tonal image may cause text to be obscured leading to a loss of information. An
example of this information loss is shown in Figure 5. Once again, a palettised grey or
palettised colour output image would capture the text of the document as well as the extra
information in the watermark or annotations.
Halftones
In printing, halftones are evenly spaced spots of varying diameter to produce apparent
shades of grey with a single colour ink. The darker the shade at a particular point in the
image, the larger the corresponding spot in the printed halftone. In traditional publishing,
halftones are created by photographing an image through a screen. In order to simulate
variable-sized halftone dots in digital imaging, dithering is used, which creates clusters of
pixels in a "halftone cell". The more black pixels in the “cell”, the darker the grey.
Bi-tonal images utilising halftones may be considered as an alternative to using 4- or 8-bit
grey to represent greyscales on digitised documents. This technique may provide some
advantages over using palettised images including wider format compatibility and reduced
file size.
Figure 5: 8-bit vs. 1-bit vs. 1-bit with halftone for watermarked documents
However, use of halftones may also introduce a speckled effect to areas of the image that
should be white. At too low a resolution, halftones will not be beneficial, and halftones at
high resolutions may produce a large number of halftone pixels where there should be
white space. Some other image processing, notably optical character recognition, (refer to
section 6.6: Master files and derivatives for more information) may also be negatively
affected if using halftones in text documents. Public authorities considering using
30
Queensland State Archives: Guideline for the Digitisation of Paper Records
halftones for digitised records should carry out thorough testing to ensure the end results
are suitable.
When paper documents that contain halftone images are digitised, a distracting pattern of
lines called "Moire" is often produced. To avoid this unwanted effect, most scanning
systems have a “de-screen” function to remove the Moire during the scanning process.
Post-capture image processing software can also be used to correct these images.
Alternatively, halftones may be captured by scanning the source document at a high
enough resolution to isolate each of the dots making up the halftone, typically 600 PPI or
above, and then using software to reduce the image to the standard resolution16.
The table below shows the recommended bit depth for digitising paper records.
Document type Bit Depth
Black and white text only 1-bit bi-tonal
Text with some colour 8-bit colour
Text with shades of grey 8-bit grey
Colour drawings / presentations / graphics 8-bit colour
Black and white photographs 8-bit grey
Colour photographs 24-bit colour
Table 5: Bit depth recommendations
If a document containing a mix of the above is being imaged, the highest colour depth
should be used to capture it. For example, an otherwise black and white page which
includes a colour photograph should be captured in 24-bit colour.
Public authorities should combine reference to these guidelines with their own testing
on typically digitised documents prior to selecting which bit depths to use.
16
How To Fix Bad Scans. 2004. Dixie State College of Utah. Accessed March 2005 at http://cit.dixie.edu/vt/vt2600/bad_scans.asp
17
1 byte contains 8 bits. 1024 bytes = 1Kb. 1024Kb = 1Mb
31
Queensland State Archives: Guideline for the Digitisation of Paper Records
Table 6: Uncompressed file sizes for an A4 page digitised at different pixel depths and resolutions
Compression
Compression reduces storage space requirements, saves on backup and transfer media,
lessens the impact on the network of accessing image files and provides shorter file
transfer times. Mainstream compression techniques in widespread use today are tried and
tested and can be used with the confidence that images will continue to be accessible
once compressed. Compression used for images can be categorised into lossless and
lossy compression.
Lossless compressions reduce the size of a file without discarding any information. An
example of a lossless compression technique is substitution. As a very simplistic example,
if the A4 page of text described in Table 6 consists of 90% white space and 10% black
text, then by simply substituting a 4-bit symbol for each white pixel’s 24-bit RGB value, the
image size would reduce from approximately 25 Mb to around 6Mb. The substitution table
is stored within the image file, allowing the exact image to be viewed and printed, while still
having a small file size.
Lossy compressions, however, are irreversible; file information is lost when a lossy
compression process is applied. When the file is viewed or printed, the resultant image
will therefore be different from the original. The degree of difference between the original
and compressed files is sometimes related to the amount of compression required.
Appropriately applied, the human eye should not be able to readily differentiate between
the original file and the compressed version.
One of the most commonly used lossy compression processes is known as quantisation.
Colour values are simplified and rounded - discarding real information. The extent of
compression is variable with the level of output quality specified governing how much
simplification occurs. Greater simplification leads to a smaller file size, but with greater
loss of information.
The effects of file compression can depend on the file format, the file contents and the
compression method used. There is not a fixed file size reduction that can be expected
from every image that is compressed. For example, the commonly used JPEG
compression works well on colour photographic images, but poorly compresses images
containing drawings, letters or simple graphics. Therefore, if compression is to be applied,
a method appropriate to the digital image and its intended use needs to be selected.
32
Queensland State Archives: Guideline for the Digitisation of Paper Records
Recommended Compression
Some form of compression should be applied to digitised records to enable storage and
access in an efficient manner.
Lossless compression provides file size reduction while being able to reproduce an
exact, true and accurate digital copy of the image created at time of digitisation. Where
possible, lossless compression should be employed.
Lossy compression is not suitable when original paper records are authorised for early
disposal as the accuracy of the image may be called into question. However, when
originals are being retained, the additional file size reduction that lossy compression
provides can mean that a small, perhaps indistinguishable, loss of data may be
acceptable for some file types. When employing lossy compression techniques, the
resulting image should not appear noticeably different from the original paper record.
18
Brown A. Digital Preservation Guidance Note 1: Selecting File Formats for Long-Term Preservation. 2003. National Archives (UK).
Accessed March 2005 at http://www.nationalarchives.gov.uk/preservation/advice/pdf/selecting_file_formats.pdf
19 nd
Horton S. Web Style Guide 2 Edition: JPEG Graphics. 2004. Lynch and Horton. Accessed March 2005 at
http://www.webstyleguide.com/graphics/jpegs.html
20
Mendham S. JPEG 2000. 2005. IDG Communications. Accessed March 2005 at
http://www.pcworld.idg.com.au/index.php/id;1170029196;fp;2;fpid;1585691688
33
Queensland State Archives: Guideline for the Digitisation of Paper Records
compatible with many mainstream image processing, scanning, and viewing programs or
web browsers. These compatibility issues may be alleviated once the format is more
established in the sector and software and hardware vendors have assessed the format
and the market’s demand for its use.
Tagged Image File Format (TIFF)
The TIFF format was developed in 1986 by Microsoft and Aldus and is currently
maintained by Adobe21. Despite being an older file format TIFF is widely supported and is
seen by many as a de facto standard for image files.
TIFF files are commonly used in desktop publishing, faxing, 3-D applications and medical
imaging applications. There are several sub-formats within the TIFF specification. TIFF
CCITT22 Group 3 and Group 4 are the most widely used format in document imaging –
most fax transmissions are in TIFF Group 3 format. Other sub formats of TIFF support
greyscale, colour depths of up to 64-bit and offer compression choices including
uncompressed, lossless LZW, and run length compression23.
The most recent release, TIFF 6.0, was launched in 1992. While the baseline version of
TIFF 6.0 is fully compatible with applications designed to read earlier TIFF images, a
number of additional features were added that require software to be specifically tailored to
support the newer version. JPEG compression was included in the TIFF 6.0
specifications, and despite a technical revision in 1995 to overcome serious design flaws24,
there still remain problems with the use of this lossy compression within TIFF files, and this
option is not widely used. The TIFF version 7.0 specification which appeared in draft
format in 1997 but is still to be released is expected to feature a more stable
implementation of JPEG compression amongst other new features.
Various extensions to the TIFF specification have been implemented for specialised
purposes. Care should be taken when using these extended versions of TIFF, as the
application support for viewing and manipulating them may be limited.
Graphics Interchange Format (GIF)
Graphics Interchange Format (GIF) is a widely used image format introduced in 1987 by
CompuServe. In the early years of the WWW, developers adopted GIF for its efficiency
and widespread familiarity. A large proportion of the images on the Web are presented in
GIF format, and virtually all Web browsers that support graphics can display GIF files.
The GIF format supports a maximum 256 palettised colours or shades of grey so is most
suited to discrete images such as illustrations, black and white images, logos and line
drawings rather than photographs. GIF files are compressed using a lossless
compression technique, LZW. Although GIF has a free and open specification, the Unisys
Corporation patents LZW and its commercial use may require licensing and royalty
payments25. While the generation and use of GIF files can generally be done without
requiring a licence, and many of the patents that relate to GIF have expired, or are soon to
expire, the royalty free PNG format (outlined below) which was developed largely because
of this patent issue has taken over from GIF in many applications.
21
TIFF Revision 6.0. 1992. Adobe Systems Inc. Accessed March 2005 at http://partners.adobe.com/asn/developer/pdfs/tn/TIFF6.pdf.
22
Comite Consultatif International Telegraphique et Telephonique (International Telegraph and Telephone Consultative Committee)
23
Leurs L. The TIFF file format. 2001 Laurens Leurs. Accessed March 2005 at http://www.prepressure.com/formats/tiff/fileformat.htm
24
JPEG Image Coding Standard. 1998. Centre for Telecommunications and Information Engineering, Monash University. Accessed
March 2005 at http://www.ctie.monash.edu.au/EMERGE/multimedia/JPEG/COMM03.HTM
25
LZW Patent Information. 2005. Unisys Corporation. Accessed March 2005 at http://www.unisys.com/about__unisys/lzw/
34
Queensland State Archives: Guideline for the Digitisation of Paper Records
26
PNG (Portable Network Graphics). 2004. World Wide Web Consortium. Accessed March 2005 at http://www.w3.org/Graphics/PNG/
27 nd
Horton S. Web Style Guide 2 Edition: PNG Graphics. 2004. Lynch and Horton. Accessed March 2004 at
http://www.webstyleguide.com/graphics/pngs.html
28
File Formats and Compression. 2004. Technical Advisory Service for Images. Accessed March 2005 at
http://www.tasi.ac.uk/advice/creating/fformat.html#ff2
29
Adobe PDF. 2005. Adobe Systems Inc. Accessed March 2005 at http://www.adobe.com/products/acrobat/adobepdf.html
35
Queensland State Archives: Guideline for the Digitisation of Paper Records
The five formats described in this guideline are widely used for a number of digital imaging
applications.
JPEG and PNG are non-proprietary formats while TIFF and PDF are proprietary formats
which have freely available specifications. This provides system developers with a cost
effective and readily available means to incorporate support for these formats. If a system
was adopted which required the use of another format, especially a system-specific
proprietary format, the ability of that system to convert images into at least one of the
common formats discussed here is paramount.
As an encapsulation format, it is difficult to provide a direct comparison between PDF and
the image formats described here. Scanning or imaging systems that provide PDF as an
output format option will always need to use a true image format for an intermediary,
usually temporary file, that is converted into PDF. Therefore, the characteristics of this
intermediary file need to be considered prior to implementation.
To take advantage of the characteristics of different file formats, multiple copies of a
digitised document may be stored. Refer to section 6.6: Master files and derivatives,
where derived files that may form part of a digitisation program are discussed in more
detail, along with suggested file formats for the different types of derived files.
Considerations for digitising multi-page documents
TIFF is the only image format described here that is able to capture more than one image
in a single file. This enables storage of individually scanned pages of a multi-page
document into a single file.
The other image formats described here can only deliver a single image per file. If these
formats are used for digitisation of multi-page documents, image management software or
other systems, such as an eDRMS, are required to provide the linkage and sequencing
required to represent multiple images making up the pages of digitised document as a
single entity.
Some systems, including widely used eDRMS and high volume scanning software, require
that a multi-page format, typically TIFF or PDF, be used for image storage so that a single
file is capable of representing several scanned pages of a document. This should be
viewed as a software limitation rather than best practice, and if organisations are forced to
use TIFF or PDF solely due to their support for multiple paged documents, they should do
so with caution after thoroughly investigating all options.
While TIFF is widely regarded as the standard file format to use when capturing
documents as bi-tonal images, it lacks the compression and bit depth combinations to suit
other document types, particularly greyscale and colour documents. If no compression or
the inefficient packbits compression is used to capture multiple page greyscale or colour
documents as a single TIFF file, file sizes can become very large, affecting the
accessibility and storage of the file.
To overcome this file size issue, some vendors have chosen to implement JPEG
compression within the TIFF file format, providing a higher rate of compression, but with
the data loss inherent of this compression scheme. As mentioned previously in this
section, there is no agreed standard for the implementation of JPEG compression within
the TIFF format. Using non-standard formats for the storage of digitised records may
create compatibility problems with other software, perhaps preventing the images from
being viewed or printed.
36
Queensland State Archives: Guideline for the Digitisation of Paper Records
Name and TIFF 6.0 GIF 89a JPEG JFIF PNG 1.2 PDF 1.4
Current Version
Extension .tif .gif .jpg .png .pdf
Bit-depth(s) 1-bit bi-tonal; 1-8 bit bi-tonal, 8-bit greyscale 1/2/4/8-bit palette 4- or 8-bit
4- or 8-bit greyscale, or 24-bit colour colour or greyscale or
greyscale or colour greyscale palette colour
palette colour 16-bit greyscale, Up to 64-bit
Up to 64-bit 24/48-bit true colour support
colour colour
Compression Uncompressed Lossless: LZW Lossy: JPEG Lossless: Uncompressed
Lossless: CCIT Deflate, an LZ77 Lossless: CCIT,
G3/G4, LZW, derivative LZW. JBIG
Packbits, JPEG Lossy: JPEG
37
Queensland State Archives: Guideline for the Digitisation of Paper Records
TIFF has many sub-formats that have been developed from the original specification,
baseline TIFF. It is not uncommon for software to be branded as capable of viewing TIFF
files, but actually only be able to view baseline TIFF. As a “baseline TIFF reader is not
required to read any images beyond the first one”30 many image viewing applications are
unable to view multi-page TIFF files, and instead show only the first page.
Public authorities who decide to use any non-standard image format should investigate
carefully the impact this may have on the longevity of the digitised records. Migrating non-
standard images to standard formats will require additional planning and resources. In
addition public authorities risk being locked into using products from the same vendor and
also risk the continued accessibility of their imaged records if the vendor leaves the
marketplace.
PDF may be considered to be an alternative to TIFF for the storage of multi-page files in a
single document. However as PDF is an encapsulation format if images are encapsulated
into a PDF file they cannot be manipulated, easily extracted, or have other image files
derived from them. As an alternative means of accessing images, PDF is an appropriate
format. However, careful consideration should be given to the likely need to manipulate,
extract, or derive information from the original image before retaining only a PDF version
of a digitised record.
Public authorities should combine reference to these guidelines with their own
testing and refer to the specification of the systems used to manage imaged
record prior to determining which file formats are compatible and most suited for
their use.
30
TIFF Revision 6.0. 1992. Adobe Systems Inc. Accessed March 2005 at http://partners.adobe.com/asn/developer/pdfs/tn/TIFF6.pdf
38
Queensland State Archives: Guideline for the Digitisation of Paper Records
31
Quality Assurance. 2004. Technical Advisory Service for Images. Accessed March 2005 at
http://www.tasi.ac.uk/advice/creating/quality.html
32
Moving Theory into Practice: Digital Imaging Tutorial. 2003. Cornell University Library/Research Department. Accessed March 2005
at http://www.library.cornell.edu/preservation/tutorial/quality/quality-01.html
33
Frey F. Guides to Quality in Visual Resource Imaging: 4. Measuring Quality Of Digital Masters. 2000. Council on Library and
Information Resources. Accessed March 2005 at http://www.rlg.org/visguides/visguide4.html#4.1
39
Queensland State Archives: Guideline for the Digitisation of Paper Records
The calibration settings for some equipment may need to be checked and recorded at the
beginning and end of a day’s digitisation work. Other equipment may not have any
calibration settings that are user-adjustable, and may only need calibration following
servicing or maintenance. Exact parameters and suggested intervals for calibration should
be determined with input from the hardware and software suppliers, and should be
documented with other quality controls.
To establish acceptable levels of quality for digital image capture, the scanning hardware
system should be tested by the use of scanner test targets or charts such as those shown
in Figure 6. These can contain a wide range of material which provide the ability to judge
output in carefully measured increments for such aspects as resolution, text, fonts, line
widths, colour, tonal range, handwriting, and halftone.
Figure 6: Standard “targets” can be used to test the functionality of digitising equipment
Environment
A controlled environment is required to consistently apply quality baselines. In an
uncontrolled environment, for example with excessive glare, reflections or using an
improperly set up computer system, a high quality image may be incorrectly deemed to
have not met quality baselines34. While calibration of hardware and software is one part of
ensuring a controlled environment that is necessary to evaluate digital images, viewing
conditions also need to be considered, as the optimal level for viewing a computer monitor
is in lower light conditions than for a paper based record. The size of the viewing screen,
plus the speed, processing power and memory of the computer need to be considered to
enable the retrieval and manipulation of large image files.
Workstation monitors used for scanning or quality control should be set at appropriate
colour depth, gamma and colour temperature settings and a high refresh rate to avoid a
flickering display. These settings will need to be set for each workstation, and also within
any image manipulation software where the option to adjust settings is available.
A monitor adjustment target, such as one shown in Figure 7, can be displayed on screen
when brightness and contrast adjustments are made, so that all the relevant shades and
steps in the target are distinguishable from the adjacent similar shades.
User perceived image quality will depend on the capabilities of display hardware being
used, the screen size and pixel dimension capabilities. The common pixel dimensions
supported by monitors are from a low 640 x 480 to a high of 1600 x 1200, referring to the
number of horizontal and vertical pixels on the screen for an image.
What area of an image can be seen on a monitor depends on the image pixel dimensions
and the desktop resolution. The area of an image displayed can be increased by
increasing the screen resolution or by decreasing the image resolution. When viewing
digitised text documents, typical checks can be made by examining the image at actual
34
Moving Theory into Practice: Digital Imaging Tutorial. 2003. Cornell University Library/Research Department. Accessed March 2005
at http://www.library.cornell.edu/preservation/tutorial/quality/quality-02.html
40
Queensland State Archives: Guideline for the Digitisation of Paper Records
size35. However, it may be necessary to enlarge other types of digitised records such as
photographs and maps to ensure details have been captured appropriately. As the
number of pixels displayed increases, more of the image area can be viewed, but without
also increasing the size of the monitor, details may be too small to see without zooming or
magnifying.
Figure 7: A monitor adjustment target can be used to ensure shades of grey can be distinguished
35
A 21” CRT monitor with a display resolution of 1280 x 1024 is able to display an A4 sized page at 1:1 scale.
36
Western States Digital Imaging Best Practices Version 1.0. 2003. Western States Digital Standards Group. Accessed March 2005 at
http://www.cdpheritage.org/resource/scanning/documents/WSDIBP_v1.pdf
41
Queensland State Archives: Guideline for the Digitisation of Paper Records
make images appear grainy37. Software can be used to measure the level of noise in
images, to check that it is minimised to an acceptable level.
Measuring technical aspects gives a consistent and repeatable measure of image quality.
For instance, if noise in an image is measured twice, the same levels should be detected.
However, software tools to measure all aspects of quality and image appearance may not
be widely used or available. Instead, some quality control relies on human judgement.
Human judgement is often subjective and therefore results of visual inspections may vary
from person to person. Ideally if a number of staff are responsible for visual inspections,
training should be provided to communicate qualitative information effectively38.
Digital images should be inspected and checked against the attributes listed below, in
addition to standard records management quality checks.
Has a true and accurate copy been made of the original record? Attributes to check
include:
• image size
• image resolution
• bit depth: bitonal, greyscale or appropriate colour depth
• too light or too dark
• too low or too high contrast
• lack of sharpness
• too much sharpening, unnatural appearance and halos around dark edges
• image orientation
• skewed or not centred
• images cropped or incomplete
• missing pixels or scan lines
• poor quality dithering
• obvious use of lossy compression
Has the image file been stored correctly? Issues to check include:
• file format
• file size
• incomplete or incorrect profile information / metadata
• appropriate security applied
• necessary derivatives produced
Have procedures for the disposal of the original paper record been followed?
Has the equipment been calibrated correctly?
37
Sharma A. Digital Noise, Film Grain. 2001. Digital Photo Techniques. Accessed March 2005 at
http://www.phototechmag.com/sample/sharma.pdf
38
Moving Theory into Practice: Digital Imaging Tutorial. 2003. Cornell University Library/Research Department. Accessed March 2005
at http://www.library.cornell.edu/preservation/tutorial/quality/quality-02.html
42
Queensland State Archives: Guideline for the Digitisation of Paper Records
as locating the paper file, obtaining it from storage, separating the pages, and
reassembling and storing the paper file after capture. The time taken to actually scan the
record to the highest quality that the device allows and save into a lossless compression
format may pale in significance when compared to the time taken to handle the paper.
Usually, if digitisation is undertaken at a high level of quality with good quality control
measures, the original paper file may not need to be routinely accessed.
A single file format may not be appropriate for all intended uses of a digitised document.
Consequently, a master file can be complemented by a number of derivatives to meet
business and user needs.
As outlined in section 6.4: File formats, different file formats have different characteristics
and applications. TIFF files, although capturing an image without compromise, may not be
suited for use on the internet. GIF and PNG files are both commonly used on the internet,
but may not be suitable for high-quality printing of certain types of images. JPEG, GIF,
and PNG files are often of a small enough size to be deliverable via a network, but the
network delivery of TIFF and PDF files depends on their content and intended use.
Master Files
The goal of a master image file is to provide a high-quality, unedited information rich copy
and to prevent the need for re-digitisation in the future39. A master file should capture as
much information as possible from the original document, and should be of the highest
quality possible. The quality available to derived files depends entirely on the quality of the
master files – a poor quality master file can only result in poor quality images derived from
it. The creation, use and storage of master files should be subject to strict quality control.
The image resolution is the main variable factor in controlling the quality of images from
scanning equipment. Using higher resolutions than those recommended in this document
will provide a higher quality image. However, as described earlier, colour depth should be
selected based on the characteristics of the paper record, and there is no real benefit in
increasing the colour depth beyond the recommendations provided earlier in this chapter
when creating a master file.
Master files should be protected from damage through excessive handling or overuse of
media and also kept secure from deliberate change or deletion. In many ways, the
traditional process of microfilming records is similar to this aspect of the digitisation
process. Just as several copies of microfilm are made for a variety of purposes, several
copies of an imaged record may be made to preserve the image while providing access.
Derived files
Several types of files that can be derived from the master image file are described below.
The process of creating these derivatives will vary, but should be implemented as a routine
part of the digitisation process. Derived files typically have a smaller file size than the
original images and the storage of these extra files should be considered when
determining the system requirements for storage of digitised records.
It is possible for derived files to be generated by the system when they are requested. For
example, the Microsoft Windows XP operating system automatically generates thumbnails
when a user views a directory of images. However, for a multi-user system such as an
eDRMS, the one time generation and then storage of derived files will probably be more
efficient. The relatively small amount of computer storage required for the storage of
39
Western States Digital Imaging Best Practices Version 1.0. 2003. Western States Digital Standards Group. Accessed March 2005 at
http://www.cdpheritage.org/resource/scanning/documents/WSDIBP_v1.pdf
43
Queensland State Archives: Guideline for the Digitisation of Paper Records
44
Queensland State Archives: Guideline for the Digitisation of Paper Records
Thumbnail images are very small images designed to display instantaneously on web
pages and file management software, allowing users to determine whether they want to
view an access image. Thumbnails are best used when dealing with a collection of
pictorial images, but they are not very useful for images of text documents due to the
difficulty in determining the textual content within a very small image40.
• Text
Using optical character recognition (OCR), the text depicted in a scanned paper document
can be extracted as a text file or word processor document. OCR software is required to
recognise the text contained in the image and usually provides search and export
capabilities. OCR is rarely a fully automated process and may require operator
intervention to assist in obtaining an accurate transcription of the scanned record’s text.
Documents containing handwriting, serif fonts41, halftones, and background text or images
or those that are damaged or dirty may not be suited to the OCR process.
As with other derived files, if OCR is used to derive a text document from an imaged
record, this additional file should be managed by the system.
Recommended Derivatives
Public authorities digitising records to improve access and integrate into other business
systems should aim to create images that meet the technical recommendations
provided earlier in this chapter.
Those organisations with additional capacity, wishing to take advantage of future
technical advances without having to rescan paper records, should consider creating a
high quality master copy in addition to an access copy. Any modification of the
appearance of the image should be performed using a copy, with an unmodified original
also retained.
The systems used to manage the digitised paper records should be tested for their
ability to manage multiple images if the decision to create derived versions of imaged
records is made.
40
Western States Digital Imaging Best Practices Version 1.0. 2003. Western States Digital Standards Group. Accessed March 2005 at
http://www.cdpheritage.org/resource/scanning/documents/WSDIBP_v1.pdf
41
A small decorative line added as embellishment to the basic form of a character. Typefaces are often described as being serif or
sans serif (without serifs). The most common serif typeface is Times New Roman. A common sans serif typeface is Arial.
45
Queensland State Archives: Guideline for the Digitisation of Paper Records
7: Metadata
Metadata is often described simply as 'data about data'. A more useful definition of
metadata is “structured information that describes and/or allows us to find, manage,
control, understand or preserve other information over time”42. While metadata has always
been utilised in the recordkeeping and archival professions, it has only been described as
'metadata' for the past decade. Many routine operations such as the profiling of records
and files, cataloguing of library resources, and describing archival items can all be
described as metadata collection.
Metadata describes information objects or resources and may be used for many purposes,
including the management, control and discovery of records43. As outlined in Information
Standard 34 – Metadata (IS34), metadata must be used to maintain the context, content
and structure of records in electronic environments. The creation, retention and
preservation of metadata is integral to the concept of records as evidence44. Ideally the
collection of metadata for digitised documents will be part of an agency-wide metadata
strategy which is consistent with the requirements of IS34.
A metadata standard (also known as a schema) provides a list of the elements that define
the individual pieces of information that should be captured to describe the record. Use of
a metadata standard as opposed to a locally developed set of metadata elements will:
• encourage best practice;
• assist the end users;
• avoid redoing work that has already been done elsewhere;
• provide system vendors with certainty; and
• support interoperability between applications.45
System developers, vendors and records management staff should have a good
understanding of metadata standards to facilitate their implementation in an organisation.
Vendors and system developers providing business information systems and applications
used to manage digitised records of public authorities should ensure the capability exists
to record metadata to the appropriate standard.46 Records managers should be involved
in liaising with these parties to certify that appropriate metadata is able to be captured,
stored and managed in the system they are purchasing.
Tools such as templates and data entry forms which facilitate the entry of metadata in a
user friendly manner may be supplied by the vendor or developed in house. Additionally,
records managers should develop in-house metadata procedures and policies to suit their
particular business requirements.
Users of locally developed systems, or those who manually record metadata in a
spreadsheet or records ledger, will need to be aware of the elements that make up the
relevant metadata standards and aim to record them correctly.
42
DIRKS – Glossary. 2001. National Archives of Australia. Accessed March 2005 at
http://www.naa.gov.au/recordkeeping/dirks/dirksman/glossary.html
43
Glossary of Archival and Recordkeeping Terms. 2004. Queensland State Archives. Accessed March 2005 at
http://www.archives.qld.gov.au/downloads/GlossaryOfArchivalRKTerms.pdf
44
Information Standard 34, Metadata. 2004. Office of Government ICT. Accessed March 2005 at
http://www.governmentict.qld.gov.au/02_infostand/standards/is34.htm
45
Cunningham A. Metadata Standards in Australia – An Overview. 2005. Presentation at Queensland State Archives March 2005.
National Archives of Australia.
46
Overview of Classification Tools for Records Management. 2003. National Archives of Australia. Accessed March 2005 at
http://www.naa.gov.au/recordkeeping/control/tools.pdf
46
Queensland State Archives: Guideline for the Digitisation of Paper Records
Capturing and maintaining accurate metadata for digitised paper records is essential, as
information can not be effectively managed or used without metadata. The scope of this
document does not extend to providing a full description of recordkeeping metadata and its
use, which applies to all records and recordkeeping systems. Instead the focus will be
placed on the recording and management of specific metadata pertaining to digitised
records.
7.1 Metadata Types
There are many types of metadata tailored to data types including geographic information,
financial applications, and rights management. Three main metadata types apply to the
digitisation process and subsequent management of records, namely resource discovery,
recordkeeping, and technical imaging metadata.
Resource Discovery Metadata
Resource discovery metadata contributes to enabling the discovery and management of
online information resources. Information Standard 34: Metadata mandates the use of the
Australian Government Locator Service (AGLS) metadata standard, or other standards
compatible with it, for all information resources, including records.47
Information Standard 34: Metadata requires that public authorities must, at a minimum,
adopt metadata schemes that are interoperable with the AGLS Metadata Element set and
are consistent with the Queensland Government AGLS Element Implementation Standard.
AGLS is a resource discovery standard and does not include elements required for
records management processes, such as disposal.
Recordkeeping Metadata
Information Standard 40, Recordkeeping, recommends the use of the National Archives of
Australia’s Recordkeeping Metadata Standard for Commonwealth Agencies. This standard
is compatible with AGLS and extends AGLS by included elements required for managing
records.
The recording and maintenance of recordkeeping metadata for digitised records can assist
with:
• A means of searching and identification,
• Authentication,
• Preservation of the content and context,
• Information on retention and disposal,
• Auditing and restriction of use, and
• Interoperability with other systems.48
An Australian recordkeeping metadata standard is currently under development. It is likely
that the National Archives of Australia’s metadata standard will be revised in the future to
align with the national standard. It is expected this national standard will include guidance
on implementation. Further information on recordkeeping metadata can be found in the
Public Records Alert, Understanding and Applying Recordkeeping Metadata49
47
For further information on AGLS including the list of elements see National Archives of Australia’s AGLS implementation manual at
http://www.naa.gov.au/recordkeeping/gov_online/agls/cim/cim_examples.html
48
Cunningham A. Metadata Standards in Australia – An Overview. 2005. Presentation at Queensland State Archives March 2005.
National Archives of Australia.
49
Public Records Alert No 2/05: Understanding and Applying Recordkeeping Metadata. 2005. Queensland State Archives. Accessed
March 2005 at http://www.archives.qld.gov.au/publications/PublicRecordsAlert/PRA205.pdf
47
Queensland State Archives: Guideline for the Digitisation of Paper Records
50
Data Dictionary—Technical Metadata for Digital Still Images. 2003. National Information Standards Organization and
AIIM International. Accessed March 2005 at http://www.niso.org/standards/resources/Z39_87_trial_use.pdf
51
Digital Standard 1 – Cataloguing and Metadata for Digital Images. 2003. State Library of Queensland. Accessed March 2005 at
http://www.slq.qld.gov.au/__data/assets/file/5449/sd1_meta_v1.2.doc
52
Metadata and Digital Images. 2004. Technical Advisory Service for Images. Accessed March 2005 at
http://tasi.ac.uk/advice/delivering/metadata.html and Suggested Technical Metadata Elements. 2004. Indiana Digital Library. Accessed
March 2005 at www.statelib.lib.in.us/www/isl/diglibin/techmeta.pdf.
48
Queensland State Archives: Guideline for the Digitisation of Paper Records
elements will create a metadata database that is very large. A large number of manually
entered metadata elements will also burden staff who are required to enter the information,
potentially leading to a lack of attention to detail and a poor quality collection. Selecting
the metadata elements to record, and determining which of these are mandatory or
optional to record should be part of the early phases of the digitisation project.
7.2 Capturing Metadata
As the manual collection and entry of metadata is a mundane task, automating the entry of
as many elements as possible should be a priority. Some metadata, such as titles and
comments, needs to be manually collected and linked to the digitised file. Other metadata,
such as recording the technical properties of a scanned image, can be collected
automatically. Table 10 shows a small sample of the metadata elements that can be
automatically captured.
Element Source Comment
Operator name Computer operating system Name can be extracted from the
login account of the user
Capture device Computer operating system Including hardware, software,
driver versions
Date of capture Computer operating system
Device calibration results Device driver
Time since calibration Device driver / Computer
operating system
Image resolution, colour depth, Imaging software
compression, file format & sub-
formats
Image file name Imaging software
Image’s parent collection details Imaging software Entered initially by the operator
and retained for subsequent
images until changed
53 th
Thornely J. The How of Metadata: Metadata Creation and Standards. 1999. 13 National Cataloguing Conference, October 1999,
Accessed March 2005 at http://www.slq.qld.gov.au/__data/assets/file/6289/How_of_Metadata.doc
49
Queensland State Archives: Guideline for the Digitisation of Paper Records
54
Electronic Records Management Guidelines: File Naming. 2004. Minnesota State Archives. Accessed March 2005 at
http://www.mnhs.org/preserve/records/electronicrecords/erfnaming.html
55
Electronic Records Management Guidelines: File Naming. 2004. Minnesota State Archives. Accessed March 2005 at
http://www.mnhs.org/preserve/records/electronicrecords/erfnaming.html
56
Digital Standard 2 – Digital capture, format & preservation. 2003. State Library of Queensland Accessed March 2005 at
http://www.slq.qld.gov.au/__data/assets/word_doc/12788/sd2_digcapture1.doc
50
Queensland State Archives: Guideline for the Digitisation of Paper Records
Recommended Metadata
57
Technical Guidelines for Digitizing Archival Materials for Electronic Access. 2004. National Archives and Records Administration
(US). Accessed March 2005 at http://www.archives.gov/research_room/arc/arc_info/techguide_raster_june2004.pdf
58
Guidelines for management, appraisal and preservation of electronic records. 1999. Public Record Office, The National Archives
(UK). Accessed March 2005 at http://www.nationalarchives.gov.uk/electronicrecords/advice/pdf/procedures2.pdf
51
Queensland State Archives: Guideline for the Digitisation of Paper Records
59
Frey F. Guides to Quality in Visual Resource Imaging: 5. File Formats for Digital Masters. 2000. Council on Library and Information
Resources. Accessed March 2005 at http://www.rlg.org/visguides/visguide5.html
52
Queensland State Archives: Guideline for the Digitisation of Paper Records
manual location and loading60. Access to files is normally accompanied by delays as the
files are located and the sequential nature of tape storage adds further delays.
8.2 Media Types
Magnetic Media
Magnetic media includes magnetic tapes and magnetic disks such as hard drives and
floppy drives61. Magnetic media usually has a lifespan of 10 to 20 years, although the
lifespan of magnetic media can be extended if appropriate storage conditions are used62.
Magnetic media can be used to provide offline, near-line or on-line access to files.
Magnetic tapes have a relatively low cost with large storage capacities of up to several
hundred gigabytes per tape. Tapes are sequential media and as such cannot be
considered as an alternative to random access media such as disks for routine access to
information. Instead magnetic tapes are widely used for long term, offline file storage or
backup63, where their low cost per megabyte is most appropriate. Robotically controlled
tape libraries can provide for a huge volume of information to be available near line.
Magnetic disks have a higher cost than magnetic tape, but with the benefit of providing
faster access to information. Personal computer hard disks and server disk arrays are
examples of magnetic disks. Magnetic disks are generally used for online storage,
however, it is becoming commonplace for cheaper, lower performance disks to be used as
an online backup or spare in a near line capacity.
Optical Media
Optical media use lasers to read data from a metallic coating on a disk. Optical media
include Compact Disks (CDs) and Digital Versatile Discs (DVDs). CDs and DVDs may be
read only (e.g. CD-ROM), writeable once (such as CD-Rs) or writeable many times (e.g.
CD-RW, DVD+RW). Optical disks are a common media used for digital file storage,
transportation and publication.
The main advantage that optical media have over magnetic media is that its life
expectation is more predictable, as its longevity is determined by the properties of the
optical material rather than wear and tear on the media64. Optical media can provide near-
line or off-line storage.
It is not possible to alter or delete information from write once, read many (WORM) optical
media. This provides an assurance that the imaged records that they store have not been
altered or deleted. Conversely, the implementation of disposal decisions for digitised
records stored on WORM media can be complicated if imaged records of differing
retention period are stored on the same disk. The nature of this type of media means that
it cannot be reused.
60
Creating and Managing Digital Content – Capture Your Collections. 2002. Canadian Heritage Information Network. Accessed March
2005 at http://www.chin.gc.ca/English/Digital_Content/Capture_Collections/maintenance.html
61
The Preservation Management of Digital Material Handbook, Chapter 5: Media and Formats. 2002. Digital Preservation Coalition.
Accessed March 2005 at http://www.dpconline.org/graphics/medfor/media.html
62
Frey F. Guides to Quality in Visual Resource Imaging: 5. File Formats for Digital Masters. 2000. Council on Library and Information
Resources. Accessed March 2005 at http://www.rlg.org/visguides/visguide5.html
63
Electronic Records Management Guidelines: Digital Media. 2004. Minnesota State Archives. Accessed March 2005 at
http://www.mnhs.org/preserve/records/electronicrecords/erdigital.html
64
Frey F. Guides to Quality in Visual Resource Imaging: 5. File Formats for Digital Masters. 2000. Council on Library and Information
Resources. Accessed March 2005 at http://www.rlg.org/visguides/visguide5.html
53
Queensland State Archives: Guideline for the Digitisation of Paper Records
Optical media is normally a cost effective option for storage. Media may have a unit cost
of only a few cents when purchased in reasonable quantities. However, recording and
retrieving files from optical media can often be slow, and locating the disk that a file is
stored on may complicate access to digitised documents. Optical media work well as a
storage option for small projects. Using optical media in large projects may lead to high
storage and retrieval costs65.
8.3 Media Lifecycle
A range of media is available for storing digital files in on-, near- or off-line capacities.
Choosing an appropriate medium for storage of digital files is important to ensure on-going
accessibility to the file. The rate of media obsolescence and reliance on hardware and
software for access to media requires that careful consideration is given to the media used
in digitisation projects.
Media types may lose popularity and be difficult to
read due to lack of available equipment over a period
of time. For example, reading Beta video cassettes or
5.25” disks is problematic, because the hardware
required is no longer readily available. Digitised
documents will need to be copied to new media if they
are to remain accessible. Generally, the lifetime of
hardware and software is shorter than the lifetime of
digital media. A five year timeframe has been
suggested for data refreshing (copying of files to a
new media)66.
Media Life and Deterioration
Unlike paper documents, digital files cannot be easily
examined to determine if the file is still legible. Most
digital media becomes obsolete or loses information
faster than words produced on paper. Hardware and
software are also required to interpret and display a
digital file so that the file’s legibility can be checked.
Hence the storage of digital files requires on-going,
regular maintenance to ensure that files remain
readable by contemporary hardware and software,
and to ensure that the media the files are stored on
does not decay.
Different media have varying life expectations. For Figure 9: Media Life Expectancy.
example, microfilm is expected to have a shelf life of From http://www.caps-project.org/cache/
DigitalMediaLifeExpectancyAndCare.html
500 years, whereas Compact Disc (CD) life may be as
short as 2 years. Recently some manufacturers have
released “archive quality” CD and Digital Versatile Disc (DVD) media which are said to
have a shelf life of up to 50 years, but many CDs, DVDs and tapes lose data within a very
short period after their creation (2-30 years67).
65
Western States Digital Imaging Best Practices Version 1.0. 2003. Western States Digital Standards Group. Accessed March 2005 at
http://www.cdpheritage.org/resource/scanning/documents/WSDIBP_v1.pdf
66
Rothenberg, J. Ensuring the Longevity of Digital Information. 1999. Council on Library and Information Resources. Accessed March
2005 at http://www.clir.org/pubs/archives/ensuring.pdf
67
The Preservation Management of Digital Material Handbook, Chapter 5: Media and Formats. 2002. Digital Preservation Coalition.
Accessed March 2005 at http://www.dpconline.org/graphics/medfor/media.html
54
Queensland State Archives: Guideline for the Digitisation of Paper Records
Often, longevity of storage media is less important than adequate plans to migrate and
refresh files for compatibility on contemporary hardware and software68.
Media Refreshing
To ensure the continued accessibility of digitised records and to prevent information loss, a
testing and re-mastering schedule should also be implemented and a strategy drawn up
for migrating the images and metadata to new media and new formats when necessary.69
Known as refreshing, this may involve copying the contents from one type of technology to
another in order to prevent records from being left on media which can no longer be read.
Alternatively, refreshing may be from one piece of media to another of the same
technology ensuring that pieces of media are replaced before they fail.70 The records
should be verified following the refresh process.
Expungement of digitised records
When the retention period of a record has been reached and it is scheduled for
destruction, any digital copies should also be destroyed at the same time as the paper
record. Care should be taken to destroy, overwrite, or carry out secure deletion on
computer storage media and devices used in the storage of records. As discussed above,
there can be some difficulty in achieving this when using WORM media – in this case files
to be retained could be copied onto new media, with the old disc destroyed.
Organisational IT policies, such as system backup, should also be examined to ensure
that digital copies of records are no longer preserved following their destruction.
68
Western States Digital Imaging Best Practices Version 1.0. 2003. Western States Digital Standards Group. Accessed March 2005 at
http://www.cdpheritage.org/resource/scanning/documents/WSDIBP_v1.pdf
69
Digital Preservation and Storage. 2004. Technical Advisory Service for Images. Accessed March 2005 at
http://www.tasi.ac.uk/advice/delivering/digital.html
55
Queensland State Archives: Guideline for the Digitisation of Paper Records
Acronyms
AGLS Australian Government Locator Service
BMP Bitmap
CD Compact Disc
DPI Dots Per Inch
DVD Digital Versatile Disc
GIF Graphics Interchange Format
JIFF JPEG File Interchange Format
JPEG Joint Photographic Expert Group
PCX PC Paintbrush Format
PDF Portable Document Format
PNG Portable Network Graphics
PPI Pixels Per Inch
RAM Random Access Memory
ROM Read Only Memory
TIFF Tagged Image File Format
WORM Write Once Read Many
Glossary
Anti-Aliasing Imporives the appearance of grey scale images by adding grey pixels at
the border of black and white areas, smoothing the transition from black
to white. Also used in colour images to smooth transitions between
colours.
Bit Depth The number of bits used to describe the colour of each pixel. Greater bit
depth allows more colours to be used in the colour palette for the image.
Bi-tonal Images containing only black and white pixels. Bi-tonal images are often
used to represent modern, non-illustrated text documents.
Colour The colour or bit depth of an image refers to the number of bits used to
Depth describe the colour of each pixel. Greater bit depth allows more colours to
be displayed in an image. Colour depths can range from 1 bit per pixel
for bi-tonal images to 24 bits per pixel or greater in high quality colour
images.
Continuous An image, such as an original photographic transparency or print, in
Colour which the tones or colours blend smoothly from one to another.
Continuous colour images have a virtually unlimited range of colour or
shades of greys.
Discrete Instances when the colours in an image are separate and distinct.
70
VERS Advice 10: System Requirements for Preserving Electronic Records. 2004. Public Record Office Victoria. Accessed March
2005 at http://www.prov.vic.gov.au/vers/standard/advice_10/3-8.htm
56
Queensland State Archives: Guideline for the Digitisation of Paper Records
Colour Discrete colour images do not blend smoothly from one colour to the next
and lack the many shades of colour seen in photographs.
Dithering The computer graphics equivalent to printed halftones, this technique
creates the illusion of colour depth in images with a limited colour palette.
This is done by interspersing pixels of different colours over the required
area to give the appearance of a third colour. For example, white and
black pixels allocated over an area will provide a grey appearance to that
area.
Dots per A measure of the resolution of a printer. It refers to the number of dots
Inch the printer is able to place in a linear one-inch space. The more dots per
inch, the higher the resolution and the higher the printing quality.
File format The specific way that data is arranged in a file. Some file formats can be
used by a range of applications (such as text files or some image files)
while others may only be used by a specific application (usually the same
application used to create the file).
Most applications can save documents in one or more standard formats
as well as in their native format (i.e. a document produced in Microsoft
Word can be saved as a Word document, or in rich text format, or in
WordPerfect format). File formats may be proprietary or non-proprietary.
Greyscale Greyscale images use only black, white and a range of shades of grey.
The number of grey shades available depends on the colour depth of the
image.
Half-tone A printed image in which the density and pattern of black and white dots
are varied, giving the appearance of a continuous tone image when
viewed from an appropriate distance. Half-tone images are used
extensively in magazines and newspapers.
Lossless The compression of data that guarantees the original data can be
compression restored exactly. A file that compressed using a lossless method and
then retrieved is exactly the same as the original, uncompressed file.
Lossy The compression of data that may result in some data being changed or
compression lost. A file that is compressed using a lossy method and then retrieved
may be different from the original file, but is "close enough" to be useful in
some way.
LZW A lossless compression algorithm developed by Abraham Lempel, Jacob
Ziv, and Terry Welch. Lempel-Ziv-Welch is a proprietary lossless data-
compression algorithm used in GIF files. The patent to the LZW algorithm
is owned by Unisys Corporation.
Naming A standardised approach to naming computer files.
conventions
Near-line Storage of files, normally on magnetic or optical media, so that files can
storage be accessed if needed. The accessing of files in near- line storage
should not require human intervention, as in the case of off-line storage,
but will usually be slower to access than on-line storage. Robotically
controlled tape libraries and CD/DVD jukeboxes are applications of near
line storage.
Non- Refers to a technological design or architecture whose configuration is
proprietary available for use by the public. Use of non-proprietary technology is not
restricted by licences or patents. Software is considered non-proprietary
once it is released with a license that would permit others to modify the
57
Queensland State Archives: Guideline for the Digitisation of Paper Records
58
Queensland State Archives: Guideline for the Digitisation of Paper Records
59
Queensland State Archives: Guideline for the Digitisation of Paper Records
Slide scanner – Designed specifically for digitising transparent materials such as slides
and negatives. This scanner type typically provides a higher throughput and improved
quality over a flatbed scanner with a transparent media adaptor, particular for scanning
high volumes of slides and negatives.
Drum scanner – Used in graphic design and in publication, these expensive scanners use
different technology from the other scanner types described here to produce a higher
quality image. The page being scanned is attached to a high speed rotating drum which
makes this type of scanner unsuitable for scanning fragile
documents or large volumes of records.
Digital Camera – It is possible for a standard digital camera to capture a digital copy of a
paper record. Digital cameras should be used in macro mode for photographing objects
that close to the camera. A stable mount should be used to ensure the camera is steady
enough to accurately capture the object. It should be noted that photographic effects,
such as barrel distortion and fall off which affect the edges of objects captured by a
camera lens at close range, will be present in records captured using a digital camera.
60
Queensland State Archives: Guideline for the Digitisation of Paper Records
Notes:
1. Resolution may be reduced for images only used for on-screen viewing and should be
increased for documents that require enlargement for use. For documents larger than
A3, a resolution of 200PPI is generally accepted to provide a reasonable file size. The
clarity of fine line work and small text at this reduced resolution should be assessed.
2. For storing multi-page documents as a single file, TIFF may be considered as an
alternative.
3. The ability of software to manage any licensing required for LZW compression should
be checked.
4. The compression ratio used JPEG compressed images should not exceed 10:1.
61
Queensland State Archives: Guideline for the Digitisation of Paper Records
62
Queensland State Archives: Guideline for the Digitisation of Paper Records
63
Queensland State Archives: Guideline for the Digitisation of Paper Records
64
Queensland State Archives: Guideline for the Digitisation of Paper Records
Other Documents
Adobe PDF. 2005. Adobe Systems Inc. Accessed March 2005 at
http://www.adobe.com/products/acrobat/adobepdf.html
Brown A. Digital Preservation Guidance Note 1: Selecting File Formats for Long-Term
Preservation. 2003. National Archives (UK). Accessed March 2005 at
http://www.nationalarchives.gov.uk/preservation/advice/pdf/selecting_file_formats.pdf
Cunningham A. Metadata Standards in Australia – An Overview. 2005. Presentation at
Queensland State Archives March 2005. National Archives of Australia.
Creating and Managing Digital Content. 2002. Canadian Heritage Information Network.
Accessed March 2005 at http://www.chin.gc.ca/English/Digital_Content
Data Dictionary—Technical Metadata for Digital Still Images. 2003. National Information
Standards Organization and AIIM International. Accessed March 2005 at
http://www.niso.org/standards/resources/Z39_87_trial_use.pdf
Digital Imaging for Archival Preservation and Online Presentation: Best Practices. 2001.
Michigan State University. Accessed March 2004 at
http://www.historicalvoices.org/papers/image_digitization2.pdf
Digital Preservation and Storage. 2004. Technical Advisory Service for Images. Accessed
March 2005 at http://www.tasi.ac.uk/advice/delivering/digital.html
Digital Standard 1 – Cataloguing and Metadata for Digital Images. 2003. State Library of
Queensland. Accessed March 2005 at
http://www.slq.qld.gov.au/__data/assets/file/5449/sd1_meta_v1.2.doc
Digital Standard 2 – Digital capture, format & preservation. 2003. State Library of
Queensland Accessed March 2005 at
http://www.slq.qld.gov.au/__data/assets/word_doc/32645/sd2_current.doc.
The DIRKS Manual: A Strategic Approach to Managing Business Information. 2003.
National Archives of Australia. Accessed December 2005 at
http://www.naa.gov.au/recordkeeping/dirks/dirksman/dirks.html.
65
Queensland State Archives: Guideline for the Digitisation of Paper Records
66
Queensland State Archives: Guideline for the Digitisation of Paper Records
Moving Theory into Practice: Digital Imaging Tutorial. 2003. Cornell University
Library/Research Department. Accessed March 2005 at
http://www.library.cornell.edu/preservation/tutorial/quality/quality-01.html
Moving Theory into Practice: Digital Imaging Tutorial. 2003. Cornell University
Library/Research Department. Accessed March 2005 at
http://www.library.cornell.edu/preservation/tutorial/quality/quality-02.html
PNG (Portable Network Graphics). 2004. World Wide Web Consortium. Accessed March
2005 at http://www.w3.org/Graphics/PNG/
The Preservation Management of Digital Material Handbook, Chapter 5: Media and
Formats. 2002. Digital Preservation Coalition. Accessed March 2005 at
http://www.dpconline.org/graphics/medfor/media.html
Quality Assurance. 2004. Technical Advisory Service for Images. Accessed March 2005 at
http://www.tasi.ac.uk/advice/creating/quality.html
Recordkeeping in Brief No. 11: Digital Imaging and Recordkeeping. 2003. State Records
New South Wales. Accessed March 2005 at
www.records.nsw.gov.au/publicsector/rk/rib/rib11.htm
Revised Digital Imaging Guidelines for State of Ohio Executive Agencies and Local
Governments. 2003. Ohio Electronic Records Committee. Accessed March 2004 at
http://www.ohiojunction.net/erc/imagingrevision/revisedimaging2003.html
Roelofs G. Multiple-image Network Graphics. 2005. Greg Roelofs. Accessed March 2005
at http://www.libpng.org/pub/mng
Rothenberg, J. Ensuring the Longevity of Digital Information. 1999. Council on Library and
Information Resources. Accessed March 2005 at
http://www.clir.org/PUBS/archives/ensuring.pdf.
Scanning Tips and Techniques. Jasc Software Inc. 1999. Accessed October 2004 at
http://www.jasc.com/tutorials/scantip.asp
Sharma A. Digital Noise, Film Grain. 2001. Digital Photo Techniques. Accessed March
2005 at http://www.phototechmag.com/sample/sharma.pdf
Suggested Technical Metadata Elements. 2004. Indiana Digital Library. Accessed March
2005 at www.statelib.lib.in.us/www/isl/diglibin/techmeta.pdf.
Tanner, S. From Vision to Implementation – strategic and management issues for digital
collections. 2000. The Electronic Library – strategic, policy and management issues
seminar. Accessed March 2005 at http://heds.herts.ac.uk/resources/papers/Lboro2000.pdf
Technical Guidelines for Digitizing Archival Materials for Electronic Access. 2004. National
Archives and Records Administration (US). Accessed March 2005 at
http://www.archives.gov/research_room/arc/arc_info/techguide_raster_june2004.pdf
Technical Recommendations for Digital Imaging Projects. 1997. Image Quality Working
Group of ArchivesCom. Accessed March 2005 at
http://www.columbia.edu/acis/dl/imagespec.html
Thornely J. The How of Metadata: Metadata Creation and Standards. 1999. 13th National
Cataloguing Conference, October 1999, Accessed March 2005 at
http://www.slq.qld.gov.au/__data/assets/file/6289/How_of_Metadata.doc
TIFF Revision 6.0. 1992. Adobe Systems Inc. Accessed March 2005 at
http://partners.adobe.com/asn/developer/pdfs/tn/TIFF6.pdf
67
Queensland State Archives: Guideline for the Digitisation of Paper Records
68