QUAD-G – Automated Georeferencing Project
Jim Burt, Jeremy White, and A-Xing Zhu, Geography Department, Univ. of Wisconsin-Madison
Greg Allord, United States Geological Survey, Madison, WI
What is this project about?
As part of the new National Map the United States Geological Survey intends to provide digital images of all its historical topographic quadrangles. Users will have access to 180,000 7.5 minute (1:24000) quadrangles and another 120,000 maps at other scales. Contractors are creating these images by scanning paper maps at resolutions of 500-600 dots per inch. Thus the scans are essentially perfect facsimiles of the existing printed archive.
Full utility of the scanned images requires that they be "georeferenced", that is, the images must be tied to a known coordinate system. For example, without georeferencing there would be no way to overlay the maps on other layers comprising the National Map, nor would it be possible to assemble a seamless image from adjoining scans.
Existing georeferencing software employs what is essentially a manual procedure requiring users to digitize known locations on the screen and enter text in dialog boxes. At about 20 minutes per scan, more than 45 person-years would be needed to georeference the entire set of scanned images. This far exceeds available resources.
Our project aims to replace the manual process with an automated procedure that can process large numbers of scans with no operator supervision.
A related project addressed the georeferencing problem for small-scale maps that may or may not be quadrangles. You can download that software (Thema-G) from a link at the right. (Sorry, work on that is ongoing and there is no user manual.)
With support from the USGS we have developed software for automatically georeferencing scanned maps. The software converts an image from a scanner's coordinate system to the geographic (latitude, longitude) coordinate system implicit in the original map. The software, known as QUAD-G, operates in much the same way as other georeferencing tools: a small number of points are located in both image coordinates and geographic coordinates. These so-called control points are used to establish a relationship between the image and world. This relationship defines a mapping between the two systems that is applied to the scanned image. The result is a new image whose pixels comprise a grid of squares in latitude and longitude aligned with the cardinal directions. The output raster file is in GeoTiff format.
QUAD-G differs from standard tools in that the control points are found automatically rather than provided by a user through on-screen digitizing. Thus the software can process an arbitrarily large batch of scanned images without operator supervision. QUAD-G has dramatic advantages over standard tools whenever one has more than a few maps in a series that require georeferencing. It also has a semi-manual mode that uses pattern search to find control points without excessive panning and zooming. It therefore can be advantageous for processing even a single scan.
The default QUAD-G setup is for topographic quadrangles and similar map sheets, but it can be used with scans of large-scale maps from any source. The program is distributed as a Microsoft Windows executable.
QUAD-G assumes the scanned image contains a rectangular map quadrangle. In other words, the mapped area represents a rectangle in latitude and longitude. It is further assumed that the map contains a regular grid of control marks of known latitude and longitude. The four corners of the map area comprise the minimum required set of control points, but use of interior control points is strongly recommended. A standard set of shapes is assumed for the control marks as seen in the screen capture above. Note that interior, corner and edge marks each have a distinct shape. These shapes are built into QUAD-G in the orientations shown. Thus an interior mark is always a "+" shape, and a mark in the middle of a top edge is always a "T" shape. No assumptions are made about the number or arrangement of control marks other than that they comprise a regular grid and that all are black in color. The distribution version of QUAD-G assumes the quadrangle is bounded by a black neatline, and that the area outside the neatline is white. The program can be easily modified if these assumptions are not met.
As is true for standard georeferencing tools, we assume a low-degree polynomial can adequately represent the transformation from image to geographic coordinates. As a practical matter, this amounts to assuming the map scale is so large that the map projection is essentially undetectable (e.g., great circles appear straight on the map). Small-scale maps with geodesics that depart drastically from straight lines are outside the design parameters of this project. Thus while we have found good results for map scales of 1:250,000 and above, QUAD-G is not recommended for global or even continental maps. Error statistics computed by the program will alert the user to situations where the polynomial model is inadequate.
Finally, we assume the GDAL package of open-source raster tools is installed. This package can be obtained from www.gdal.org.
QUAD-G gathers data about each scan from an input file in XML format or from a user dialog (manual mode). The required data are:
- Scan file name. The program has native support for tif and ppm formats. Other image formats are converted automatically using a GDAL helper routine.
- Latitude and longitude of lower right quad corner. These can be provided as decimal degrees or in degrees, minutes and seconds.
- Number of rows and columns of control marks. For example, a 1:250,000 USGS map has 5 rows and 9 columns of control marks for a total of 45 marks. A 7.5 minute map has 16 marks in a 4x4 grid.
- Control mark spacing in degrees latitude or longitude. For example, the control marks on a 1:2500,000 sheet are 15 minutes apart. By contrast, a 7.5 minute sheet has marks every 2.5 minutes.
- Scan resolution in dots per inch.
- Control mark size. This is given in inches, pixels or as a fraction of the scan.
In addition, there are several optional fields that can be supplied:
- Map scale. This improves the search for control marks.
- Datum. NAD27 is assumed if this is not provided.
- Output file name. A modified version of the input scan name is used if this is not provided.
- Flag to indicate all control marks lie on a quadrangle boundary, with no interior marks.
Typically the program is used as follows. The user opens an input file containing information for all scans in a batch. QUAD-G performs the following steps for every scan in the batch:
- Corner Identification. QUAD-G first tries to find the four corners of the quadrangle. The image file is searched for lines that might be part of the quad boundary. Each possible edge is intersected with other possibilities to define a candidate quadrangle. All candidate quadrangles are evaluated against the known aspect ratio for the map quadrangle, and the closest match is selected. (If the map scale has been provided, the pixel extent of candidate quadrangles is used instead of relying merely on the aspect ratio.)
Intersections of the best-fit quadrangle are used as locations of the quad corners:
- Find Search Windows. The four corners are used with the grid layout to establish search windows within the image:
- Search for Marks. A pattern search is used within each window to find the corresponding control mark:
- Fit Image Transformation. A least-squares polynomial is fit to the suite of image coordinates and latitude/longitude pairs. (If there are at least 7 control marks, a 2nd degree equation is used. Otherwise a linear transformation is used.) The least-squares model is used to predict the location of each control point, and residuals between predicted and observed locations are calculated for error diagnosis:
Additionally, a suite of least squares fits are made for cross-validation, where each member of the suite has one control mark excluded. The resulting cross-validation errors are particularly useful for identifying control marks that were not successfully found in the pattern search.
- Georeference Image. If the transformation error is below a user-defined threshold, the polynomial is used to produce a georeferenced output file. The output file consists of pixels on a uniform grid in latitude and longitude. For every output location the polynomial is used to find the surrounding 4 image pixels. The colors of those pixels are interpolated linearly to the output location. Transformation error statistics are written to a log file. If the transformation does not pass the threshold, that scan's input data file is written an input error file. This input file can be loaded later so that a user can manually address those scans that failed during the automated batch process. There is an option for excluding areas outside the quadrangle.
- Quality Analysis. The output file is validated by looking for control marks at the expected locations. Deviations between expected and actual locations are noted, and images are prepared showing the actual image relative to the correct location for each mark.
In addition to the batch procedure described above, QUAD-G has a manual mode suitable for processing individual scans under operator supervision. Map data can be read from an XML file or supplied by theu user via a dialog box. The program provides facilities for manually locating quad corners and/or individual control marks, thereby over-riding the automated pattern search.
We have been using the program on files ranging from 500MB to 1.3GB in batches of several hundred files. We find about 2 minutes are needed per quad per CPU core, with most of the time consumed in applying the transformation (step 5 above). The first four steps are typically accomplished in 15-20 seconds.
In manual mode the time required depends on how many marks are manually placed by the user. Our experience is that even manual mode provides substantial improvements over existing software packages, mainly because it avoids much of the manual panning and text entry.
These links provide the following files: the executable, a user manual, and sample data and input and output image files. We recommend verifying the installation using the sample files.
|Quad-GVer2.10.exe||150KB||Windows executable file. No installation necessary, can be run from any directory.|
|Quad-GSampleFiles.zip||185MB||Sample input files: 2 image scans and corresponding batch xml and preference files. The images are low-resolution versions of HQSP scans (254 DPI).|
|Quad-GSampleVerify.zip||244MB||Sample output files (geotiff and log files)|
|Quad-Gver2.10Source.zip||869KB||Visual C# source program|
At a minimum you need the executable and manual. If you want to test the program, download the sample zip file and extract. Then:
- Start Quad-G
- Open the batch xml file (File->Open->SampleBatch.xml)
- Open the preferences file (Preferences->Open->SamplePreferences.xml)
- Click on "Automatic"
- Click on "Run Batch"
This will generate the output files found in the verify zip file.