CSIRO Mathematics, Informatics and Statistics
Predicting Salinity in the Upper Kent River Catchment
A report from the LWRRDC project
"Integrating Remotely Sensed Data With Other Spatial
Data Sets to Predict Areas at Risk from Salinity"
F. H. Evans1, P. A. Caccetta2, R. Ferdowsian3, H. T. Kiiveri1 and N. A. Campbell1
1CSIRO Division of Mathematics & Statistics
2Curtin University School of Computing Science
3Department of Agriculture Western Australia
This interim report summarises the work and findings to date of a project funded by the Land and Water Resources Research and Development Corporation on "Integrating Remotely Sensed Data With Other Spatial Data Sets to Predict Areas at Risk from Salinity."
The aim of the study is to evaluate methods for predicting areas at risk from salinity. This will be done by:
The outputs from the project are:
The results show that historical and present salinity maps, and maps showing areas at risk of future salinity, may be produced using remotely sensed data integrated with several computer-derived terrain attributes. These terrain attributes can be easily derived from digital elevation data.
Rule-based classifiers and probabilistic networks are used to produce salinity and salinity risk maps. The probabilistic networks can also be used as an interactive decision support tool to assess the effects of what if scenarios.
Secondary salinity caused by rising saline groundwater due to the clearing of land for agriculture has caused serious loss of productive land in Western Australia. In 1989, the Australian Bureau of Statistics (ABS) conducted a survey of farmers which reported that 443 441 ha or 2.83 percent of the 15.7 million hectares of cleared land in south-western Australia was saline (George 1990). The 12 shires of the Lower Great Southern Division reported a total of 63 194 ha of salt-affected land, which represents 2.6% of arable area. The Kent reported 10 614 ha of salt-affected land, with an average area per farm of 145 ha.
Farmer-based surveys such as these are likely to underestimate the problem. For instance, Ferdowsian and Greenham (1992) estimated that more than 12% of the Upper Denmark Catchment was salt-affected, while farmers in that area estimated that no salinity was present. These surveys give an estimate of the amount of salt-affected land in a shire, but do not provide maps showing where the salinity is, where it is spreading, or the rate of spread of salinity.
A concurrent study has developed procedures for discriminating between land in persistently poor condition due to salinity and land exhibiting short-term poor productivity caused by factors other than salinity. Combining land condition data from two or more seasons allows more accurate mapping of marginal sites. Combining Landsat TM data with terrain information further improved mapping accuracies.
In this study, satellite imagery has been integrated with other spatial data sets to produce maps of salinity, and to establish a method for predicting areas at risk of salinity.
2 The Study Area
The study was conducted in the Upper Kent River Catchment, which lies in a high rainfall (500-750mm) area west of Mt Barker, approximately 350km south east of Perth. This is one of the five focal catchments in the National Dryland Salinity R, D and E Programme.
Figure 1: The Upper Kent River Catchment
The physiography of the Upper Denmark and the Upper Kent River Catchments was described by Ferdowsian and Greenham (1992). That report described the formation of an east-west axis (the Perrilup axis) that forms the southern boundary of the catchment. This uplifting stopped the southward-flowing ancient rivers, and forced them to flow westwards. The Kent River later cut through this axis and captured the Upper Kent catchment. The uplifting, and later river capture, resulted in the Upper Kent River Catchment having three distinct hydrological zones:
Extensive pastoral activities in the Upper Kent River Catchment began in the second half of the last century. Allocation of land for farming and the first clearing of small areas occurred along the Albany Highway and later along Muir Highway early this century. Prescott and Bull described the vegetation association of the area in 1931. This work was the basis of the soils surveys that followed during the 1940s and the major allocation of land after the Second World War. Clearing of land for agriculture was very slow and by 1948 only isolated pockets of land had been cleared. In March 1948, the Minister for Lands introduced an act in the Legislative Assembly authorising the government to clear land for agriculture. During the 1950s, farmers did not have the financial resources to clear masses of land; the areas with low forest and scrubland were cleared. Mass clearing of land occurred in the 1960s. By 1973, approximately 62% of the catchment was cleared. This was almost all of the alienated land that was suitable for pasture. Isolated areas (approximately 4 000 ha) were cleared between 1973 and 1978.
Wood (1924) explained the increase of salt in soils and streams following the destruction of native vegetation. From 1930 to 1950, concerns about soil and stream salinity became dormant and increasing salinity of land and water resources was ignored. In 1956 and 1957, the allocation of land in water catchment areas was stopped. An amendment to the Country Area Water Supply Act was passed by the State Parliament to control clearing in the Wellington Dam Catchment Area and became effective in November 1976. A further amendment to this act was made in the Kent and Denmark catchments. By 1978, 64% of this area was cleared. There has been no significant clearing in the Upper Kent River Catchment since 1978. Between 1978 and 1988, a few farmers planted trees on limited areas to combat increasing soil salinity. As a result, by 1988 the cleared area had reduced to 61%. Since that year, some areas have been planted with commercial tree crops, reducing the total cleared area to 58% by 1994.
The Upper Kent River Catchment is atypical of the Western Australian wheatbelt. Unlike lower rainfall areas of the wheatbelt, remnant vegetation and degraded pastures survive in slightly and moderately salt-affected areas, making it harder to detect salinity using remotely sensed data alone (see section 5.1).
Another difference is that agricultural land use in the Upper Kent River Catchment is predominantly grazing. Previous studies by the CSIRO W.A. Remote Sensing Group have found that salt-affected pasture is less easily mapped using satellite data than are salt-affected cereal crops. Thus the Kent offers challenges for both the mapping and predicting of salinity.
3 Extraction of Current Knowledge
A workshop was held in Bunbury, W.A. on June 20 - 21 in 1994 to provide a forum to quantify current knowledge in explicit rule-based form. The following factors were identified as indicators of salinity risk:
4 Primary Data Sets
Landsat TM data and historical Landsat MSS data have been assembled. Landsat data are acquired every 16 days and an archive of TM data from 1988 is held at ACRES. The earlier MSS image was used to provide historical data on salinity. Spring images were selected, as experience suggests that they contain the most information about crop and pasture growth. The dates of imagery used in this project are listed below:
Figure 2: August 1994 Landsat TM Bands 4, 5, 7 in red, green, blue
Digital height data were obtained from the Department of Land Administration (DOLA), in the form of 5m contours.
Both current and historical air photos are held by the WA Department of Agriculture in Albany. Stereoscopic pairs are available for the years 1973/1974, 1988 and 1994.
5 Derived Data Sets
The primary data sets were used to generate data sets relating to the factors governing salinity which were identified in Section 3.
5.1 Classification of Landsat TM data
The Landsat TM data were processed to produce maps showing broad cover classes (water, remnant vegetation, bare salt, bare soil, waterside remnant vegetation, crop, pasture) for the different years. The processing steps were:
Remnant vegetation maps were derived from the classification maps. These include vegetation along stream-lines and roadside verges, as well as isolated pockets of trees within paddocks.
Figure 3: 1988 Landsat TM classification into broad cover classes - blue = water, green = remnant vegetation, yellow = bare soil, red = bare salt, cyan = waterside remnant vegetation and grey = agricultural land
The broad cover classes represent the cover classes which are spectrally separable. Mis-classifications can be seen along roadside verges and other areas where we would expect to find pixels comprising one or more ground cover types. It is important to note that salinity is present in classes other than the bare salt class. For instance, 88% of the waterside remnant vegetation class for 1993 is salt-affected over the training data; however, the salt-affected areas within this class are spectrally similar to the remainder of the class.
360 sites were classified according to their degree of soil salinity using Geonic EM38 measurements and field observation. Spectral analyses of these sites showed that the salt-affected sites could not be spectrally separated into slightly affected, moderately affected and strongly affected classes. Furthermore, salt-affected sites could not be separated from non-affected sites using the spectral data alone.
5.2 The Digital Elevation Model
The 5m contour data were gridded using spline interpolation to produce a Digital Elevation Model (DEM) for the catchment. Cross-validation techniques were employed to choose the optimal parameters for the gridding procedure, and hence the most realistic DEM.
Figure 4: The Digital Elevation Model - yellow shows the lowest elevations through to red showing the highest
5.3 DEM-Derived Variables
5.3.1 Slope, Aspect, Curvatures
Slope, aspect, profile curvature, tangential curvature and mean curvature were derived from the DEM, on a per-pixel basis. Curvature attributes for large flat regions were also derived.
5.3.2 Water Accumulation / Upslope Contributing Area
Water accumulation algorithms simulate a rainfall event and measure the subsequent flow of water across the landscape. The algorithms make use of the DEM and work on the simple principle that water flows downhill. In the resulting map, each pixel is assigned a value which represents the amount of water which flowed through it. Since this value includes all of the water passing through areas upslope of a pixel, water accumulation maps also provide a measure of upslope area.
The simplest model assumes that all of the water flows in the steepest downhill direction. Multiple-direction water accumulation models distribute the flow amongst all downhill locations. The relative drop in elevation is used to determine the amount of water that each location receives. These models are more realistic than the single flow path model, especially in areas with low relief and when the DEM has been derived from relatively coarse contour data. A multiple-direction water accumulation map has been produced for the Upper Kent River Catchment.
Figure 5: The water accumulation map - yellow = low accumulation, red = high accumulation
5.3.3 Flow Slope
The slope map derived from the DEM using conventional methods consists of a map showing the steepest slope in any direction for each pixel, irrespective of whether the steepest slope pixel is above or below the current location. Since salinity at any pixel is influenced by its drainage, a map showing the aggregated local downhill slope was generated. This slope has been termed flow slope as it represents slope in the direction of flow.
5.3.4 Drainage Density
Drainage density refers to the relative number of defined stream lines within a region and may be used as an indicator of historical flushing and hence regional groundwater salinity. Maps of drainage density were produced using drainage network data supplied by DAWA.
5.3.5 Upslope Clearing Maps
The Landsat TM classifications provide remnant vegetation maps for the different years. These have been incorporated into the water accumulation algorithms to produce maps of total upslope cleared area and percentage upslope cleared area for the years 1977, 1988 and 1994. In this way, the clearing of remnant vegetation and the effect of tree plantations can be considered.
5.3.6 Convergent Flow Points
Points of convergent flow are areas in which two or more significant flow paths meet. A map showing points of convergent flow was generated using the flow paths identified in the water accumulation map.
5.4 Shear Zone Map
The Upper Kent River Catchment is characterised by three types of shears: east-west longitudinal shears with high salt storages but low hydraulic conductivity; north-south shears, which occur relatively infrequently; and a conjugate set of oblique shears, which are very conductive and facilitate the movement of groundwater (Ferdowsian and Greenham, 1992). Shear zones for the Upper Kent River Catchment have been identified by interpretation of aerial photographs.
5.5 Landform Pattern Mapping
"Landforms are the product of the underlying geology, weathering history and erosive processes. A landform pattern (LFP) is a toposequence described by its relief, modal slope and component landform elements (LFEs). Landform patterns are differentiated by their attributes that are assessed within a circle of about 300m radius" (McDonald et al 1984).
Landform patterns have been mapped for the Upper Kent River Catchment by the Department of Agriculture, WA.
5.6 Training and Test Data - Areas of Known Salinity Status
Several sub-areas of the catchment were selected to represent the various landform elements and landform patterns that are representative of the Kent Catchment. These areas are shown in Figure 6.
Figure 6: Study area locations.
Training data were obtained in two forms over these areas:
An independent set of validation data was produced to determine the accuracies of the predictions. The data were independent in the sense that the expert had not seen the predictions and the modellers had not seen the validation data. The data were obtained in two forms:
6 Methodology for Mapping and Predicting Salinity
Several methods for integrating remotely sensed data with the primary and secondary data sets have been examined. These include rule-based classifiers and knowledge-based systems. The results of the different methods are presented separately.
6.1 Rule-based Classifiers
Rule-based classifiers induce a classification model from the training data. Rules are derived from the training data, and then applied to the remainder of the data. Decision tree classifiers and neural networks are two examples of rule-based classifiers.
The success of any rule-based classifier depends upon the data which are supplied to the classifier. In the first instance, it is necessary that the training data contain sufficient information for generalisation. Secondly, all of the information about the classes (in this case salt-affected, not salt-affected and potentially salt-affected) must be able to be expressed in terms of a fixed collection of data layers or attributes.
It is important to select a relatively small group of attributes which contain all of the relevant information about the movement of salinity within the landscape, in order to minimise computational time and data collection costs. The primary and derived data sets mentioned previously comprise the initial sources of data for mapping salinity and salinity risk. The necessary attributes, and those that are redundant, have been determined using the decision tree program c4.5.
Successive iterations of the c4.5 program were run using different combinations of attributes. The accuracy of each decision tree examined, so that redundant attributes could be eliminated from the classification process. The following attributes were chosen as final inputs to the salinity mapping and prediction process:
A series of salt maps has been produced using the c4.5 decision tree classifier for the years 1977, 1988 and 1994, using the attributes listed above. These have been produced so that historical changes in salinity may be examined.
The processing steps were:
Two examples of the decision tree outputs from the c4.5 programs are contained in Appendix VI. It is also possible to extract decision rules from the trees. The decision trees and derived rulesets can be examined to determine the relationships between the attributes.
The ground truth data were assigned to training and test sets according to the following sampling strategy: a regular grid (30 pixels by 30 pixels) was constructed to fit over the catchment. Two thirds of those cells in the grid which contained ground truth data were randomly selected as training data and the remainder were assigned to the test set. The decision trees were derived from the training data and their accuracies were assessed over the independent test data.
Table 1: C4.5 Salinity Mapping Accuracies over the Test Data
Field validation and visual examination of the salinity maps show that the locations of saline areas have been successfully identified; however, their extent is not accurately mapped. This results partly from the spectral similarity between saline and non-saline areas in the Landsat TM scenes, especially within pasture paddocks which comprise the larger part of the catchment.
The salinity mapping accuracy for saline areas increases over time, with the inclusion of more historical data in the mapping process (via the "distance to known salinity" attribute).
The salinity mapping accuracy for different components of the landscape has been examined, using the water accumulation map to divide the landscape into seven landform types: hilltops, ridges and upper slopes, slopes, lower slopes, foothills, valleys and broad valleys. The results (presented in Table 2) show that salinity is mapped less accurately over the slopes and lower slopes where there is less training data. This suggests that further training data is required over these areas, particularly since these areas are typically changing in salinity status over time.
Table 2: Salinity Mapping Accuracies over the Training Data by Landform Type
Lower accuracies over the slopes and lower slopes may also be caused by inadequacies in the DEM. Because the DEM has been derived from 5m contour data, small changes in height between contour lines can not be modelled accurately. In addition, "break of slope" positions can not be accurately located from interpolated contour data, making saline hillside seeps difficult to locate.
A further series of maps have been produced using c4.5, to establish its predictive capabilities. These take the form of between-year predictions, using historical data to predict current salinity for any year. That is, 1977 data have been used to predict salinity in 1988, 1988 data have been used to predict salinity in 1994, and 1994 data have been used to predict potential salinity using the air-photo interpretation of areas at risk as training data.
Table 3 summarises the accuracies achieved by the c4.5 predictions over the test data. The prediction accuracies are similar to the salinity mapping accuracies.
Table 3: C4.5 Salinity Prediction Accuracies over the Test Data
The yearly salinity maps and the future salinity prediction map have been combined to form a salinity change map for the region.
Figure 7: c4.5 change map overlaid on the August 1994 Landsat TM Band 4 - green areas were mapped as saline in 1977, cyan in 1988, blue in 1994, and pink areas are at risk of future salinity
The regional statistics for the Upper Kent River Catchment obtained from the c4.5 maps are presented in Table 4.
Table 4: C4.5 Regional Results
The regional statistics reflect the fact that most of the remaining remnant vegetation in the Upper Kent River catchment is located in the stagnant flats of the central zone, which have high historical salt storages. Some of these areas are misclassified as saline because of their position in the landscape; if these areas were cleared, salinity would emerge within several years. Other vegetated areas located in the stagnant flats are already affected by saline groundwater which has been discharged from agricultural land higher in the catchment.
6.2 Probabilistic Networks - A Knowledge-Based Approach
Probabilistic networks provide a means of combining multiple sources of data under the guidance of human expert knowledge. In this approach, rules obtained from experts are formed into a probabilistic network which can then be used to interpret the data by producing maps (see Figure 8), or used as an interactive probabilistic decision support system to answer what if scenarios.
A key aspect of the approach is the ability to reason with data which have a degree of uncertainty associated with them. Another aspect is that all data are interpreted at the same time, which makes it possible to construct an interactive decision support system.
Using this method, an expert system has been constructed and used to produce salinity maps for the years 1977, 1988 and 1994. For each year, the maps depict areas that are already saline at that time, areas that are at risk of future salinity, and areas that are not at risk.
Data sets interpreted by the model were:
It should be noted that the percentage upslope, upslope cleared area and water accumulation maps are functionally related, so that despite being chosen subjectively, they are similar to those identified by c4.5.
The approach taken was to:
The outputs of the probabilistic network consisted of a series of maps of salinity for the years 1977, 1988 and 1994, and a series of between-year predictions, using historical data to predict current salinity for any year. In addition, a map of future salinity risk has been produced.
Table 5 (see over) summarises the accuracies for salinity mapping achieved by the probabilistic network over the independent validation data.
Table 5: Probabilistic Network Salinity Mapping Accuracies over the Validation Data
Table 6 (below) shows the salinity mapping accuracy for different components of the landscape. The results are similar to those obtained using decision tree classifiers - lower accuracies correspond to areas where less training data were available.
Table 6: Salinity Mapping Accuracies over the Validation Data by Landform Type.
Table 7 shows the accuracies achieved by the network predictions. Given data from 1977, the extent of salinity in 1988 was predicted, and given data no later than 1988, the extent of salinity in 1994 was predicted.
Table 7: Probabilistic Network Predicting Salinity Mapping Accuracies
The prediction for 1988 has under-estimated the extent of salinity. This is in part due to the relatively poor resolution of the Landsat MSS data which were used as an input to the prediction, and partly because of the clearing history of the catchment.
The accuracies for mapping and predicting salinity were affected by a number of factors. The saline and potentially saline areas showed poor spectral discrimination (see Section 5.1). As a result, saline areas and areas which became saline within the ten-year time intervals were difficult to separate, hence the poor accuracies for salinity mapping in 1977 and 1988.
The poor discrimination between potentially saline and saline classes may be due to the inherent difficulties in the definition of these classes. Some of the training areas mapped as salt-affected still supported pastures and remnant vegetation, making it difficult to determine the boundaries between salt-affected and productive land (which may be at risk of future salinity). In addition, the historical boundaries were derived from aerial photographs, so a further degree of uncertainty associated with the class boundaries in the training data must be assumed.
Figure 8: Probabilistic Network change map overlaid on the August 1994 Landsat TM band 4 - green areas were mapped as saline in 1977, cyan in 1988, blue in 1994 and red areas are predicted to be at risk
The following table summarises the regional statistics for the catchment:
Table 8: Regional Results from the Probabilistic Network
6.3 Comparison of Results
The maps produced using the two methods discussed in this report have been visually examined to determine the areas which are mapped consistently by both methods, the areas where they differ, and how they compare to the validation data.
The first validation area is located in a dissected area with higher slope. The second validation area is located over a very flat region, most of which consists of swamps and stagnant flats.
Both methodologies perform reasonably well over the dissected area. The c4.5 maps have slightly under-estimated salinity in 1977 and 1988, and slightly over-estimated salinity in 1994, whilst the probabilistic network has over-estimated salinity in each of the three years.
In the second validation area, the c4.5 maps have severely under-estimated the extent of salinity although accurately locating some saline areas that the probabilistic network has missed. The network has performed differently for each of the three years. In 1977, it has missed most of the saline sites, while incorrectly classifying areas of non-saline land. A similar result occurred in 1988. In 1994, the probabilistic network has over-estimated the extent of salinity, although the general location of the saline areas was mapped correctly.
Rule-based classifiers, such as c4.5, can be used to produce maps of current and future salinity. The decision trees or derived decision rulesets can be examined to determine relationships between the data layers, and in particular, to determine the single data layer with the most use for salinity prediction.
An important issue when using rule-based classifiers is the need for consistently defined training data. The training data for future salinity have been provided by an expert for this study; however, this may not be possible in other areas. In addition, different experts may define salinity risk in different ways, and any prejudices in the expert definitions of salinity will be inherent in the final predictions.
Expert systems may also be used to produce maps of current and future salinity. They also provide a method for assessing the uncertainty associated with these maps. The data layers are integrated in a manner that encompasses current expert knowledge regarding the movement of salinity in the landscape; the relationships estimated from the training data may be interrogated interactively. An expert system also provides a method for interactive assessment of salinity presence or risk in an area, allowing a user to explore what-if scenarios.
Both rule-based classifiers and expert systems offer time and cost advantages over process-type models. Process modelling is data-intensive, and many of the necessary data sets are costly and time-consuming to produce. The methods investigated in this report, however, require only a small number of data sets. Moreover, these data are relatively inexpensive over broad areas, since they are provided by remote sensing and computer generation.
The advantages of rule-based classifiers are:
The advantages of probabilistic expert systems are:
Both methods are dependent on the existence of representative training data and would perform more accurately in regions more typical of the Western Australian wheatbelt, where salinity can be more easily identified using Landsat TM image data.
8 Further Work
The results presented in this report could be improved by using a more accurate DEM. The salinity mapping and prediction of areas at risk of salinity has shown lower accuracies in flatter regions of the landscape. This is due to the relatively sparse contour data in these areas. The salinity maps and predictions could be improved by obtaining and using a more accurate DEM.
It is planned to evaluate the effect of using denser height data over flat areas, generated from several sources. A softcopy system automatically generates height data from digitised stereoscopic aerial photographs. This work is being undertaken by the Department of Agriculture, WA. Concurrently, it is planned to use the WILD BC2 stereographic plotter, located at Curtin University, to manually extract height data from digitised stereoscopic aerial photographs over a regular grid of one metre intervals.
Improved accuracies may also result by examining sampling strategies for training and test data. Such strategies select an optimal representative sample of landform patterns, landform elements and changing sites, such as emerging salinity. Such strategies could then be implemented in a systematic manner when extending the mapping methodologies into other regions.
It is also proposed to examine other methodologies for mapping and predicting salinity, in particular neural network classification procedures.
Campbell, NA, Furby, SL and Fergusson, B 1994. Calibrating images from different dates. Report to LWRRDC. Project CMD1. CSIRO Division of Mathematics and Statistics, Perth.
Campbell, NA and Kiiveri, HT 1993. Canonical variate analysis with spatially-correlated data. Austral. J. Statist. 35: 333-344.
Campbell, NA and Wallace, JF 1989. Statistical methods for cover class mapping using remotely sensed data. Proc Int. Geosci. Remote Sensing Symp.: 493-496.
Ferdowsian, R and Greenham KJ 1992. Integrated catchment management: upper Denmark catchment. Department of Agriculture WA Technical Report 130.
Quinlan, JR 1992. C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers Inc. USA.
Wheaton, GA, Wallace, JF, McFarlane, DJ and Campbell, NA 1992. Mapping salt-affected land in Western Australia. Proceedings of the Sixth Australasian Remote Sensing Conference: 369-377.
The authors gratefully acknowledge the assistance and advice we have received at all stages of this study from our collaborators. In particular, we thank Don McFarlane and Arjen Ryder of DAWA for their assistance in providing ground information and interpretation. The project was carried out with funding provided by the Land and Water Resources Research and Development Corporation.
Note: The figures contained in this report are of necessity either a large scale, or show only a small area in detail. Full digital versions of the original files are available. Individual task reports which describe the analyses performed are also available.
last updated 05/06/02