Sunday, October 25, 2020

On Lies, Damned Lies & Statistics

Why statistics are actually quite useful


Figures often beguile me,” wrote the American Author Mark Twain, “particularly when I have the arranging of them myself; in which case the remark attributed to Disraeli would often apply with justice and force: ‘There are three kinds of lies: lies, damned lies, and statistics’”.

It is somewhat ironic that a sentence about degrees of falsehood appears to contain one, though most likely unintended. The earliest known appearance of the phrase attributed to British Prime Minister Benjamin Disraeli was in fact after his death.

Never-the-less it is a pithy phrase, which has remained in common parlance, most likely due to the habit politicians have for selectively marshalling statistics in support of their policy decisions. The idea being to give them a veneer of mathematical rigour.

This habit led, over time, to an acute public distrust of Government Statistics; so acute that in 2007 Parliament passed an act which made the Office for National Statistics independent of Government.

This is why I fear that this, my attempt to show why statistics matter and why they are in fact very reliable, could backfire spectacularly.

In order to make that case we must turn to the subject of how we know that a structure designed by an engineer will not fall down. It is of course self evident that some structures do fail, however before explaining why that is we must first explain why most of the time they don’t. We are not here talking about situations where a design has been created in error; that may not be fairly attributed to statistics.

It would of course be possible to build a structure and then test it for the desired load, however this is not entirely practical. If the structure fails to pass then there will have been a rather significant waste of time and materials to build it. Something more reliable is therefore required. 

A second option would be to proof test each of the components that are used to fabricate a given structure. This was in fact the method that was sometimes used, particularly for large or novel civil engineering structures. This has the advantage of not destroying the whole structure if a test specimen fails.

That said, it is still not a terribly efficient method manufacture, due to the time and expense of the testing regime, which is bespoke to the structure being constructed and cannot be re-used for different arrangements. 

It might be possible to test a model of the whole structure, however this approach has many difficulties. Firstly, most structural effects do not scale linearly i.e. behaviour at small scale is different to that at large scale, therefore in all but the simplest cases the model structure will not represent real behaviour at full scale[1]. Secondly, if we solved the scaling problem we still could not be sure that the actual model would behave as the real structure, because we have not tested the actual structure. This brings us back to the point we started.

What is needed is a reliable method of predicting the point of failure irrespective of whether a structure is a test model or the real thing.

Supposing that we required iron chain links, which are to be used in a suspension bridge, to carry 4 tonnes of load. If we tested a link and found that it could resist 1 tonne then we might conclude that 4 links are required. 



The difficulty is that if we did another test we are likely to get a different answer; perhaps less than 1 tonne; perhaps more. We could conduct 10 tests and each time we would very likely get a different answer. The reason for this is the presence of small, perhaps imperceptible, differences in the manufacturing of each one, which are impossible to eliminate completely, even with modern technology. 

The question therefore becomes, which of the test results should be used for the design?

Some would say the average figure should be used, however it does not take long to work out why this cannot be correct. It is self-evident that half of the links will have a capacity that is less than assumed in the design. That cannot possibly be a safe outcome.

Another idea would be to adopt the lowest value. Surely by definition this option must produce a safe, if cautious, result. If, however, we think a bit further we might consider how we know that any of the10 tests we have conducted thus far represent the lowest test result that we could get. After all, what if we conduct test number 11 and the result turns out to be lower than the lowest we had obtained thus far?

This turns out to be a bigger problem than initially seemed to be the case, for how do we know that test 12 won’t be worse still? The question appears to recur infinitely. We need a better answer and that is where statistics comes in.

All materials, or components for that matter, are imbued with a curious property that is wholly reliable, completely unchangeable, and is it would seem, an intrinsic property of nature itself. 

Despite its universal applicability, it is a deceptively simple concept to demonstrate. Supposing we started by plotting a bar chart [histogram] showing the distribution of our results. Along the bottom axis we plot the capacity of the links we have tested and on the vertical axis we plot the number of links that achieve a given capacity.

As we plot the test results we will find that they start to cluster around the average with fewer results corresponding to capacities either much greater or much less than the average.

With still more tests the cluster will start to develop into a recognisable pattern resembling the shape of a bell, where the top of the bell is located at the average value. It will soon become clear that no matter how many new results are added the overall pattern of the results does not change; the bell is there to stay.  



For obvious reasons a curved line which bounds all of the plotted results is called a ‘bell curve’. Johann Carl Friedrich Gauss was the first person to describe such curves mathematically, which is why they are also known as Gaussian distributions.

As soon as it is possible to describe the curve mathematically we need only a limited number of test results to do so. Once we have done this mathematics also gives us the ability to predict the probability of a given capacity being exceeded, or in other words we can predict how likely the material or component is to fail. All we then need to do is decide what probability of failure is acceptable.

This is an interesting question for several reasons. Firstly, it is not entirely an engineering question. It requires a decision to be made about what level of failure is tolerable. This is unquestionably influenced by public perception of what ought to be. Public opinion can be a fickle thing, which can often be contradictory, especially when emotion is involved. 

For example, if something has been done a particular way for many years the public are loath to have it changed or restricted in any way. You are bound to hear someone say “it has always been done this way and it has performed been perfectly well. Why do you feel the need to change it.

In reality this is often not the case. What someone thinks represents the way something has always been done often turns out to be nothing of the sort. It is in fact an imitation of what has been done previously. In reality it has been changed over the years in many small imperceptible ways, but the cumulative effect of those changes has not always been appreciated and may one day be the cause of failure.

Conversely if there has already been a major failure, particularly one that has caused the loss of life, then the public demand will be for immediate change and for someone to be held to account. It will be said “how did nobody foresee that this could happen? There must be either negligence of incompetence. This cannot be allowed to happen again; something must be done.

In this way public opinion is largely based on momentum. We do not wish things to change until there has been a rude awakening that forces us to re-think and then we must have change and the status quo will not do.

One of the ways that engineers think about this problem is to consider both the likelihood and the consequences of failure i.e. how probable is it for something to go wrong and what would happen if it did? 

It is obvious that the failure of a nuclear power plant or a hydroelectric dam would have consequences far greater than the failure of a roof purlin in an agricultural barn. Quite reasonably one might wish the former examples to be less likely than the latter.

It is also obvious that it is not economically viable to design barns like power stations. Nor is it aesthetically desirable to design barns that way. It may seem curious to discuss aesthetics in this context, however humans do place a value in art and culture. We do not wish to live in homes that look like nuclear bunkers and we rarely purchase a car based wholly on the results of safety testing. Aesthetics are important to us.

Perhaps a more subtle example of likelihood and consequence is to re-consider our bridge links. Since we have determined that more than one link is required. It seems improbable that all the links would be sub-standard, which opens the intriguing possibility that the systemic probability of failure may be quite different to the probability of an individual component failing. This would of course rely on the remaining links being capable of sustaining extra load due to the failure of their neighbour. This introduces the concepts of redundancy and margins of safety, but these are for a different blog post.

For now we are simply concerned with the bell curve and its ability to predict the probability of failure in a reliable way, based on a finite set of test results. It gives engineers the ability and the freedom to make reasoned judgements about cause and effect, however it does not provide engineers with an answer to the question of what an acceptable failure is.

This is of course the reason why some failures do happen; it is simply because they have been deemed to be acceptable, perhaps on the basis of a more robust structure being too expensive when compared to the perceived benefits. However, as we have learned, what is acceptable is not a fixed thing and may change with public perception.

A good example might be the failure of a flood prevention barrier shortly after it is completed. Presumably people chose to live close to the water, because they like the view and they do not wish to live behind a large wall that obscures said view. 

That being said, those same residents, when flooded, are not particularly interested in hearing that given the magnitude of flood event that has occurred failure of the barrier had been judged tolerable.

Statistics are therefore an eminently useful thing, which can inform our decisions, but cannot make tough choices on our behalf. We must decide what our appetite for risk is based on many subjective factors. Whether they know it or not this is the dilemma faced by all politicians and is probably what tempts some of them to misuse statistics. Like anyone else they do not wish to be held responsible for a decision that might be criticised in retrospect. They would rather believe that statistics can absolve them of that responsibility.

Unfortunately statistics can only ever tell us the probability of an event and not whether said event is good or bad. Perhaps Mark Twain should have said:

"There are three kinds of lies: lies, damned lies, and self-deception”.




[1]I am aware that there was a period in time where testing of models was adopted, but this is a rather complicated subject involving some interesting mathematics and is perhaps a subject for another time

 

No comments:

Post a Comment

On Ice Shelf Cracking

Tension Cracks in the Brunt Ice Shelf Yesterday the BBC news website published images showing a large section of the Brunt ice shelf in Ant...