During the period 1973 to 1985, public net investment in the United States and Japan averaged 0.3% and 5.1% of gross domestic product, while their respective growth rates of real gross domestic output per employed person were 0.6% and 3.1% per annum (OECD National Accounts and Historical Statistics). Twenty- five years ago, Aschauer  and Munnell and Cook  found a strong positive relationship between infrastructure and growth. Aschauer  estimated that the productivity-raising power of infrastructure investment to be huge, as much as quadrupling that of private investment. Barro  noted that production is driven by its flow of productive expenditures toward infrastructure.
However, at present state and local governments spend about 2.4% of their GNP on infrastructure, as compared to 3.1% in 1970  . A large portion of that amount is spent for growth as opposed to repair and replacement. As a result, it should come as no surprise that public infrastructure has been poorly rated by the American Society of Civil Engineers      and most public officials acknowledge the deterioration and risk of failure of the infrastructure relied on daily for economic and social systems. However, many jurisdictions have limited information about their systems, and little data to use to justify spending. Hence the infrastructure tends to deteriorate further each year as local officials opt to limit budgets in the absence of good needs data further, hence the need for better tools for asset management.
Asset management is a process of integrating design, construction, maintenance, rehabilitation, and renovation to maximize benefits and minimize cost. It is a plan for managing an organization’s infrastructure through a decision-making process driven by a defined standard level of service. The term asset management refers to business principles aimed at balancing risk and minimizing life-cycle costs of the physical assets, such as pipes, roads, structures and equipment. Asset management is also used as a tool to help municipalities gauge the health of infrastructure  . A strategy should be continuously reviewed and revised to acquire, use and dispose of assets while optimizing service and minimizing costs over the life of the assets. An asset management plan (AMP) considers financial, economic, operational, and engineering goals in an effort to balance risk and benefits as they relate to potential improvement to the overall operation of the system  .
Asset management plays a vital role to help minimize unnecessary or misplaced spending while meeting the health and environmental needs of a community  . Organizations that utilize asset management programs experience prolonged asset life by aiding in rehabilitation and repair decisions while meeting customer demands, service expectation and regulatory requirements  . The general framework of asset management programs involves collecting and organizing data on the physical components of a system and evaluating the condition of these components. The importance and the potential consequences associated with the failure of the individual assets are determined by this evaluation. Managers and operators can then prioritize what infrastructure is most critical to the operation of the system and furthermore which assets to consider for repair, rehabilitation or replacement. This strategy allows for funding, in terms of both repair and replacement (R & R) and operation and maintenance (O & M) dollars, to be distributed accordingly amongst the asserts that are the most vulnerable and most likely to fail. The goal is to provide strategic continuous maintenance to the infrastructure before total failure occurs. Of utmost importance is to define “failure” of the infrastructure. For example, for a storm water system, “failure” might mean that the community has areas that flood as a result of rainfall, tidal impacts, sea level, or groundwater elevation. Costs should be well distributed over the life of the asset to help avoid emergency repairs, because emergency repairs can cost multiple times the cost of a planned repair.
The reliability of the assets within the area of interest starts with the design process. Decision-making dictates how assets will be maintained and means to assure the maximum return on investments. An inventory of assets needs to be established. Depending on the accuracy wanted, the data can be gathered in many ways ranging from on-site field investigation which could take a lot of time, to using existing maps, using maps while verifying the assets using aerial photography and video, or field investigations. Through condition assessment, the probability of failure can be estimated. Assets can also fail due to exceeding its maximum capacity. Prioritizing the assets by a defined system will allow for the community to see what areas are most susceptible to vulnerability/failure, which assets need the most attention due to their condition, and where the critical assets are located in relation to major public areas (hospitals, schools, etc.) with a high population.
But what if the infrastructure data is limited, which is often the case with buried pipelines? In such cases it is difficult to analyze the condition of the system and prioritize repair and replacement dollars. The goal of this paper is to outline a means to assess the system’s condition, by evaluating what can be inferred from the known data, without the need to dig up the piping. The question is how to collect data that might be useful that does not involve destructive testing on buried infrastructure which is costly and inconvenient. The reality is that there is more data than one thinks.
The problem with condition assessments is that for many of the infrastructure assets, determining the condition is nigh on impossible. Buried infrastructure is nearly impossible to determine without unburying it. Even infrastructure that is visible may provide a false assessment. A fire hydrant is partially buried. The foundations for bridges and the base of a roadway are not visible. Stormwater pipes may be visible only at outlets. Hence many asset management programs stall when there is a need to assess the condition of the assets. Many assume that since the buried infrastructure is unseen the condition cannot be determined. But this is rarely true. There is usually some information that is known, but the certainty of this information is the challenge. Uncertainty is a concept that many people, including engineers, operations staff and administrators are uncomfortable with. However statistical analysis is the mathematical means to address this uncertainty since event with some data, there is still uncertainty.
Several statistical methods have been developed to attempt to exploit limited information: resampling (bootstrap and jack-knife methods), fuzzy set theory, interval analysis, information theory and Bayesian methods  . Resampling methods permit a sampling distribution to be created from limited data, by using the likelihood as a means to estimate the distribution parameters  . These parameters are then used to create probability distributions that are sampled. Sampling repeatedly from each distribution allows the investigator to simulate expected uncertainty and variability from the original set of data. As much data as the investigator wants can be created. The process is relatively straightforward for two variables but Haas  report that the method is tedious with multiple variables and the entire analysis is limited by the quality of the original data. Perhaps most importantly, the method offers no capability for using subjective information, which may be available  .
Fuzzy set theory/logic was proposed as a paradigm shift in logic that involves a set of rules that define boundaries, and solves problems within those boundaries  . As the name suggests, fuzzy logic is the logic of underlying modes of reasoning that are approximate rather that exact, where everything is a matter of degree used to handle the concept of partial truths―between true and completely false. Fuzzy set analysis is usually applied to subjective, verbal information that is divided into two sets of data. Fuzzy set theory measures the intersection of the two sets. The intersection is the fuzzy set pairing value. The most obvious limitation noted in the literature is that the data cannot be mutually exclusive from the two sets. In addition, real data are not helpful, nor is new data.
Interval analysis is an approach to the analysis of systems when the value of the quantity measured in uncertain. Interval analysis defines the value of the quantity by specifying the interval that the value is guaranteed to fall within. The methodology provides a correct formal method for measuring the upper and lower bounds required for the worst possible case. Interval analysis is not as powerful as other statistical methods when empirical information is available. Some prior information or data to create the subjective opinion is required. Like the methods previously discussed, updating with new data is not feasible, and with limited data, the ability to refine the distribution with prior data is a desirable ability.
A solution to address at least some of these issues comes from Shannon’s Information Theory and a theorem that he proved in 1948: “The probability distribution having maximum entropy (uncertainty) over any finite range of real values is the uniform distribution over that range”  . Information theory is branch of mathematical theory within probability and statistics that can be applied to a variety of fields  . The roots of information theory are found within the concepts of disorder or entropy in thermodynamics and statistical mechanics  . Since the turn of the 20th century, significant literature has been devoted to the studying the mathematical form of information entropy  . The literature assumes a samples space S, with a series of events i characterized by individual probabilities pi. Shannon  developed the following measure to indicate the information content of a probability distribution, which is in the form of the negative of the information content of the data set:
By maximizing information entropy, the most conservative or broadest distribution consistent with the available information can be derived - such as the mean, variance and range  . In general, if the mean or expected value of any function F(X) of a quantity is known, the entropy can be maximized subject to:
The concepts of information entropy are a useful theoretical underpinning in the application of Bayesian methods, useful in many aspects of the analysis  . Definitive observations do not play an important part in information entropy theory since once the definitive observation is made; the underlying uncertainty is greatly reduced or eliminated  . The uncertainty defines the confidence in the observation  . The use of subjective or recorded data in the absence of absolute certainty is akin to the use of Bayesian methods in the field of statistics.
The selection of Bayesian methods assumes that the absolute or unconditional probability density function p(x) on X is the underlying distribution found through curve-fitting. Its form, as defined by Aitcheson and Dunmore  is:
where p(x) and p(θ) are completely different, and independent functions and the function p(θ) is the prior distribution. The Bayesian approach is to assume that while the true value of θ is unknown, there are probabilities that can be assigned for a series of possible values of θ  . More precisely it is assumed that p(θ) is a density function.
For the purposes of infrastructure assessment, the “mean” value of θ, or E(θ), is akin to the expected age, material, soil, depth, traffic, groundwater table, or other factor that the assessor wishes to consider. And, like Bayesian statistical methods, the more data gathered for a given asset, even when unknown, the less likely any error in the estimate will frustrate the true condition. The more information, the more likely that outcome can be predicted. As a result, Bayesian methods permit the evolution of the prediction with added data―the shape of the distribution of results becomes more narrowly focused on the likely solution. Inference, in the absence of fact, is the key. The concept works for sizes of incidents, condition of infrastructure and likelihood of failure.
The only thing that is missing in gathering data is the need for an “event” or consequence. To be useful there must be some form of tracking consequences: breaks, flooding, etc. So, the agency must identify if there is data to indicate the events, such as work orders. If so, they may contain enough data needed to piece together missing variables that would be useful to add to the puzzle. Exact accuracy is not needed, but as much information as is available is helpful. An example is helpful.
Assume that there are five arbitrary levels of condition available to analyze the asset―excellent/new, good, fair, poor and failed. If there is an asset and there is no information about it, the condition could be any one of these conditions. The probability is 0.2% or 20% for each. Assume that one data point is known―that would change the analysis considerably. Or what if the data were “sort of” known―say a probability that the asset was good or fair based on some factor? Then the probability would be altered toward the good/fair condition―less so to the poor, failed or excellent. Still there is uncertainty involved. This is precisely what Bayesian statistical methods are trying to get at. The assessor has a lot more data than one thinks even though much of it may not be known with complete certainty. The uncertainty is contained in the judgment of the assessor about certain factors.
Continuing the example, most utilities have a pretty good idea about the pipe materials. Worker memory can be very useful, even if not completely accurate. In most cases the depth of pipe is fairly similar―the deviations may be known. Soil conditions may be useful―there is an indication that that aggressive soil causes more corrosion in ductile iron pipe, and most soil information is readily available even if it is less specific per pipe of valve than desired. That can then be used as a predictive tool to help identify assets that are mostly likely to become a problem. Bloetscher et al.  are working on such an example now, but suspect that it will be slightly different for each utility. Also, in smaller communities, many variables (ductile iron pipe, PVC pipe, soil condition…) may be so similar that differentiating would be unproductive.
Construction may have altered the soils―for example muck and rock likely were replaced during construction with good fill. Likewise, tree roots will wrap around pipes, so their presence may indicate damage to the pipe. But no one can know this with certainly without digging the pipe up, something most communities would prefer to avoid. But the presence of trees is easily noted from aerials. Roads with truck traffic create more vibrations on roads, causing rocks to move toward the pipe and joints to flex. That brings up another possible variable―the field perception―what do the field crews recall about breaks? Are there work orders? If so do they contain the data needed to piece together missing variables that would be useful to add to the puzzle? With a little research there are at least 5 variables known.
Assume there are 9 variables that are developed. Each one has an assessment of adding to excellent, good, fair or poor condition of the pipe. These probabilities are added each time to build an understanding of overall condition (see Equation (4)―this is what this equation is trying to represent). Figure 1 shows how the graph changes as more information is accumulated. There is a big change from 1 to 2 points, but notice how from 4 to 9 points the graph does not really
Figure 1. Distribution given additional amounts of information―note after 4 pieces of data, the distribution changes little.
change. This asset has a condition that is most probably good, maybe fair. It is probably not poor or excellent.
Ultimately there is an interest to determine if these factors have an impact on a consequence. Determining that those consequences are is the issue, so one needs to know what that response is:
・ Water main breaks?
・ Sewer breaks?
・ Sanitary sewer blockages or overflows?
・ Stormwater system overflows?
・ Roadway damage?
If the break history or sewer pipe condition is known, the impact of these factors can be developed via a linear regression model. The model would be developed
・ CI = Condition index
・ w = weighting factor
・ C is condition factor
If one knows the incident, the weights can be found:
where the values of C are real numbers and
Are the factors line trees, materials, traffic, etc. If one assumes these constraints and linear variables in the matrices are non-negative. If there are negative values, they must be made positive as follows
Based on the conceptual understanding of the “best guess” of data on the infrastructure, the following are the steps required to obtain a condition assessment with limited data, utilizing a series of assets gleaned from utility records for a water system for example purposes:
・ Step 1: Create a table of assets (see Table 1, column 2-this is a small piece of a much larger table).
・ Step 2: Create columns for the variables for which there is data (Table 1―age, material, soil type, groundwater level, depth, traffic, trees, etc.―columns 3 - 11).
・ Step 3: Note that where there are categorical variables (type of pipe for example), these need to be converted to separate yes/no questions as mixing. Categorical and numerical variable do not provide appropriate comparisons; hence the need to alter the categorical variables to absence/presence variables. So descriptive variables like pipe material need to be converted to binary form―i.e. create a column for each material and insert a 1 or 0 for “yes” and “no” (see Table 1―columns 12 - 16).
・ Step 4: Summarize the statistics for the variables. Note missing data is not permitted and known conditions should be entered directly (see Table 2).
・ Step 5: Develop a linear regression to determine factors associated with each and the amount of influence that each exerts. The result will yield a series of coefficients (see Table 3).
・ Step 6: Identify the predictive equation. In this case it is:
Table 2. Summary statistics.
Table 3. Model parameters: and statistics associated with model.
Note that the variables with larger exponents generally have more impact on the number of leaks (see Figure 2), although the values (like age) might be a challenge.
・ Step 7: The equation can then be used to predict the number of breaks going forward based on the information about breaks going back in time. Figure 3 outlines how the equation worked for the 93 overall datapoints in this example.
・ Step 8: Finally the data can be used to predict where the breaks might occur in the future based on the past (Figure 4).
Figure 2. Impact of factors on leaks.
Figure 3. Residual breaks over 10 years based on the observation (asset).
Figure 4. Comparison of predictive and actual breaks over 10 years (correlation desirable).
The hope is that these correlate well. The process is not time consuming but provides useful information on the system. It needs to be kept up as things change, but exact data is not really needed and none of this requires destructive testing.
Conducting an exercise to develop the methodology was useful, but the next step was to do something to with the results. The Dania Beach, FL sewer system was used as an example given that actual data of failures (pipe breaks from sewer leak data) existed and an understanding of the system was available. The City of Dania is approximately 7.7 square-miles. The analyzed network includes approximately 1500 assets located within the public right-of-way (ROW). Two asset maps were acquired to aid in the data collection. The first map depicted the system as it existed in the early 2010s. The original intention of this map was to illustrate which pipes in the network were suspected of breakage based on a midnight monitoring exercise after sealing the system  . The original analysis utilized flow data to track areas within the network most likely to be suffering from infiltration. Once this analysis had been completed, the sewer lines in question were televised and the found breaks were recorded.
The original installation design records were obtained. Dates and materials were assigned for large sections of the City. Most of the pipe was vitrified clay installed in the 1960s and 1970s so an estimated install date within 5 years was assigned since the expected life of sewer pipe assets is expected to range from 80 to 100-years and ±5 years was not deemed to be significant for the purposes of this analysis. Many other indicators of failure, for example, pipe diameter, groundwater, soils, traffic, trees and pipe depth were included. A Geographic Information System (GIS) provided spatial analytics and asset data management, while Excel provided independent asset analysis and comparative analytics. Soils maps from the US Department of Agriculture and contractor and information developed by the authors was used to help with groundwater and soils data.
From the map created in Figure 5, a table of assets was created (see Table 4 which shows a portion of this table). Columns were added for the variables for which there was data (age, material, soil type, groundwater level, depth, traffic, trees, etc.). For each asset, a value was assigned to the desired and undesirable attributes for each indicator of failure. For example, pipes greater than thirty-inches below surface grade were given a value of 1 and pipes with less than thirty-inches of cover were assigned a value of 0. These values were assigned independently to each asset for each indicator of failure analyzed. It is important to note that the values assigned do not provide a weight, or indicate importance. Categorical variables were converted to numerical ones. Table 5 shows the summary statistics.
The statistical analysis tool XLStat® was utilized in the data analysis. XLStat® requires that each variable be represented as a numeral. Table 6 shows the correlation matrix and Table 7 the analysis of variance (ANOVA) results. One issue is that most of the pipe is vitrified clay, and the soils are similar in most of the City so there were not good differentiators. Likewise, little gravity pipe is under
Figure 5. GIS of the sewer system in Dania Beach.
Table 4. Assets for sanitary sewer system (partial).
Table 5. Summary statistics for sewer system (complete table).
Table 6. Correlation matrix for sanitary sewer system (complete table).
Table 7. Analysis of variance for sewer system.
Computed against model Y = Mean (Y).
heavy traffic areas so this was also not a good identifier. Figure 6 is a varimax graph developed from principal component analysis of the data. Figure 6 shows the correlation of any individual factor in relation to all other factors within the graph. Data points with a higher correlation have a smaller degree of separation within the chart; a probable correlation can be predicted if the factors are within 45° of each other. Therefore, the two factors of failure which are most closely correlated with pipe breakage are length and depth. Using this data, the weight of importance can be determined to complete the condition index.
A linear regression formula was developed from the factors and the amount of influence that each exerts to yield the predictive equation. In this case it is:
This equation can then be used to predict the number of breaks using the consequences going back in time. Table 7 shows the analysis of variance. The fit was good, but note the lack of variability in the variables. Figure 7 shows the variance from actual to estimated breaks. Most pipes in good condition was noted as good in Figure 8. Note one issue here that only 1 break was noted in the pipe―there actually may have been more breaks but these could not be noted, an issue that requires follow-up video.
Many utilities have not implemented comprehensive asset management plans for their assets. In part, this is due to the belief that they cannot properly assess the assets or the cost to do so from traditional methods is too expensive or yields data of limited value. As a result, they have limited data to present to decision-makers about the condition of their assets, and the likelihood of failure, creating an atmosphere of hoping to avoid catastrophic failures. However, utilities with limited financial capability, and who might be most at risk if failure occurs, can develop an asset management program to help identify critical risks and provide data to decision-makers who need to provide the fiscal resources to properly manage and maintain a utility system.
In this exercise, an effort was made to develop a methodology to evaluate utility assets, buried and otherwise, to help identify financial resources needed to maintain a utility system. The concept was to create data on the assets for a condition assessment (buried pipe is not visible and in most cases, cannot really be
Figure 6. Factor graph indicating where there may be correlation between variables.
Figure 7. Variance from actual to estimated breaks.
Figure 8. Shows that most pipes in good condition were noted as good. The failure (15% of the system) was less reliable.
assessed). A challenge is posed with buried infrastructure since many utilities lack the resources for examining buried infrastructure, so other methods of data collection are needed. However, much more information is known about buried infrastructure than one anticipates. This permits an assessment of likelihood of condition, using the parameters of Bayesian statistical methods applied in the field.
For predictive methods to work there needs to be a measurable consequence to be useful for predicting future maintenance needs―breaks, flooding etc.―that would indicate a failure. Unfortunately, many utilities do not do this or do not collect this data from work orders (if they use work orders). The lack of tracking makes it difficult to determine which factors are the critical ones. In this project, the effort was applied to a sewer system since that system tracked pipe damage. Pipe breaks is a consequence, but the number of breaks was found to be of greater value.