Selection
and assessment of material It was important that the digitisation project
paid due regard to preservation concerns. Conservation staff were employed exclusively
for the project, and had a major role to play in advising on the selection of
equipment to be used. They assessed the physical state of the volumes and warrants
(original wills and inventories in loose leaf format) in advance of digitisation.
This process was undertaken sufficiently far in advance to ensure an adequate
body of material for the digitisation to proceed without delays. They carried
out conservation on documents where necessary and employed the principle that
intervention would only be required where either the image would be significantly
enhanced - for example if the pages were very dirty - or where without conservation
input, the digitisation process could cause damage to the manuscript. Once
the digital images were created the conservation staff created custom boxes for
the proper housing of the volumes and these were then placed in good storage conditions. Further
information about the conservation input to the project together with recommendations
can be found in a published
report. Type of Material to be Digitised Most of the material
we captured was in the form of bound volumes. Some material was loose leaf but
the bound material represents the overwhelming majority of the material to be
captured. Preparation and pagination There were additional staff
resources for conservation of the material (before and after digitisation) and
several approaches to loose leaf and bound material were developed. An important
part of the preparation process was ensuring that each page to be digitised had
an accurate number. This was then incorporated into the document reference to
form the file name. Conservation staff paginated all the early material up to
1750, but the later material was paginated by our team of volunteer camera operators
according to guidelines laid down by the project archivists and conservators.
The pagination process helped to define the file name for the digital image but
it was also an important indicator that the camera operators used to ensure that
- all pages were captured
- no pages were duplicated
-
no images were missed
The accuracy of the page number was one of the
key checks carried out by the quality control operators. Proper handling
by trained camera operators The conservation staff also established handling
guidelines that all the camera operators were required to follow and also gave
operators training in handling the documents to minimise damage and to recognise
where further conservation input might be required. The requirements to undertake
training and abide by the handling guidelines were an important part of the contractual
relationship with the GSU. Image capture software that minimised operator
intervention We needed to develop a system which would allow staff, many
of whom had little or no ICT experience, to concentrate on their task of capturing
accurate, good quality images and to do so at a good rate of throughput. We therefore
looked to simplify the steps involved so that, once a volume had been set up and
the metadata for the volume entered, the camera operator had a one button approach
to capture each of the images that followed. This therefore included automatically
naming the file and storing them away. Images were cropped automatically, if required,
and checks were made on the colour to highlight anomalies. Image capture
itself was quick. We used a greyscale camera and attached a computer controlled
filter to it. The camera took three pictures with the red, green and blue filters
and then combined them to display a composite colour image on screen for the operator
to check. Once the final image had been captured the system would start to save
the image and also released the book cradle to allow the operator to turn the
next page. Each image would take about 3.5 seconds so a full colour image, with
three takes, would take around 11 seconds. Allowing for the operator to check
the image and turn the page this means a full cycle time of around 15 seconds
per colour image. Images were captured as colour tiff images onto the hard
disk of the local PC. This minimised any network traffic and meant we could invest
in fast disks with a large capacity for each of the six camera PCs we had purchased.
As we saved the images only in TIFF format we had no overhead at the pint of capture
for the creation of any other file formats. This operation took lace once the
camera operators had completed their work for the day and we would run the image
format program overnight. In order to manage the large number of images produced
we kept to a naming convention based on the original file reference plus the page
number. This file reference was also used to create a directory on the server
to store all the images for a particular volume and meant that it was straightforward
to name and find an image for any page for any volume. Image Quality
- Fit for Purpose We used digital cameras rather than scanners for the
digital capture. Digital cameras operate by focusing the image on a light sensitive
chip called a CCD (Charged Couple Device). The CCD has a fixed capacity and for
the two cameras we operated for this project the arrays were the following sizes
Camera
type | CCD Size | Total Available
Pixels | Kodak Megaplus 6.3i | 3072
x 2048 | 6291456 | Atmel
Camelia | 3500 x 2300 | 8050000 |
So
regardless of the size of the document being digitised we are limited by this
capacity. Line scanners operate differently and move a line array CCD across the
document to a fixed size. The optical resolution is therefore normally expressed
as dots per inch (dpi). With a fixed CCD capacity then the resolution would be
different depending on the size of the document being digitised. To achieve an
equivalent resolution of 300 dpi would mean restricting documents to less than
10 inches by 7 inches. In order to meet our requirement the image quality had
to be "fit for purpose". Our purpose was to make the documents legible
on screen or on printout. We needed a different metric that demonstrated
sufficient quality but was suitable to the various sizes of documents we had to
digitise. We agreed on a standard whereby the pen strokes of the handwriting were
examined. The number of distinct pixels for different types of line thicknesses
was measured and we concluded that if we had 4 pixels for each line then, regardless
of the use of image on screen or on a printout, that we had captured sufficient
information to represent the image accurately. This conclusion meant that we could
capture images of an open volume rather than having to take images of each page
on either side of the volume. This obviously increased the throughput but also
halved the strain on the documents that would have been required if we had taken
each page individually. The images were tested by our user group and found
to be very acceptable and judged to be of a high quality and sufficient for their
needs. Formal quality control procedures Quality control was undertaken
in a separate programme. Once images had been converted to jpeg format, which
happened overnight to minimise capture times, quality control was carried out
by another operator. Once a volume had been checked the results were recorded.
This means that we can ascertain whether an image was examined (and by which operator)
or whether it was approved as part of a larger batch. Once complete the quality
control program produces a summary printout. We started the project with a 100%
check of every image but the most effective results obtained from this program
were found to be from a 30% random selection of images per volume. Software
for data back ups We retained copies of the colour tiff images on the hard
drive of the machine that produced them until the quality control program was
complete and any necessary retakes were completed. Once this was done we had simple
procedures in place to let operators identify material that had been completed,
how much space they would take up on the tape and then write them to tape and
also record the information about the tape and starting block on a database. Resilience On
site image storage is both online and on tape. The online storage (approximately
1.5 terabytes) makes all the jpeg images available. The online storage is protected
by RAID 5 and also has a hot spare to immediately fix any disk problems. Tape
storage includes both uncompressed (TIFF) and compressed (JPEG) colour images.
Additional resilience comes from having uncompressed greyscale images written
to tape and stored "off-continent" in Salt Lake City. Procedures
for creating links between the finding aid and the images Once images
have been created we needed to provide access to them. A volume of images may
include over a thousand pages so giving access to a whole volume would be little
help. We didn't have a comprehensive index to all the testaments so had to create
one form all the different sources that were available. This included the digital
transcription of some published indexes, transcription of index pages from some
individual volumes and the creation of indexes where none previously existed.
This gives a direct link between the index and the images referred to in the index.
This can only be achieved successfully by accurate pagination of the original
document corresponding exactly with the image numbers. Provision has to be made
for linking index entries where there is more than one testament per page. This
is more common in the pre-18th century registers. Additional Metadata
for the Images We maintain a database that contains a record for each of
the images we have created. This does not include the indexing information used
to identify the content of the record but describes information about the processes
used to create the image. During the image capture process we automatically
create an entry in a logfile for each image. Attributes captured at this time
include the camera id, particular camera settings, date and time of capture, operator
id, volume description and indicators whether the image is of a blank page or
whether it is a retake. When we process the images to create our derivatives we
record the information against the image in our image logfile database. When the
images are quality controlled we record the QC information relevant to each image.
Once the images are written to tape we record information about the backup device
they are stored upon. Taken altogether this gives us a full picture of the creation
of the image and is also a key management information tool. The great advantage
of our system is that this wealth of information is collected with very little
intervention from the operators and causes very little overhead to collect. This
information can be exported to a simple text file for single images or for full
volumes. Website for access to the index and images including e-commerce While
we were still capturing the images and linking them to the index, we had planned
our e-commerce site to provide remote access. The index would be accessible free
of charge, along with a whole range of other supporting information. After undertaking
a marketing evaluation we decided that a fixed fee would be suitable, regardless
of the number of individual pages that a testament covered. After payment the
customer can view or download all of the images relating to a testament and we
will retain information about the customer order to allow them to come back to
our site and view again the images they had purchased.
This site (www.ScottishDocuments.com)
proved a very effective means for promoting access to the images. Since June 2005
the images have been incorporated into the www.ScotlandsPeople.gov.uk
website and the original website was suspended until we are ready to launch e-commerce
access to the Kirk Session records.
The digitisation process described
above has proved very successful at digitising large quantities of original manuscript
material in bound volumes. This has led us to undertake even larger projects and
we are currently digitising an estimated 8 million pages of Church records. In
addition we are considering the modification necessary to allow us to use the
same processes to digitise documents on demand.
|