Tag Archives: statistics

Near-Poisson statistics: how many police – firemen for a small city?

In a previous post, I dealt with the nearly-normal statistics of common things, like river crests, and explained why 100 year floods come more often than once every hundred years. As is not uncommon, the data was sort-of like a normal distribution, but deviated at the tail (the fantastic tail of the abnormal distribution). But now I’d like to present my take on a sort of statistics that (I think) should be used for the common problem of uncommon events: car crashes, fires, epidemics, wars…

Normally the mathematics used for these processes is Poisson statistics, and occasionally exponential statistics. I think these approaches lead to incorrect conclusions when applied to real-world cases of interest, e.g. choosing the size of a police force or fire department of a small town that rarely sees any crime or fire. This is relevant to Oak Park Michigan (where I live). I’ll show you how it’s treated by Poisson, and will then suggest a simpler way that’s more relevant.

First, consider an idealized version of Oak Park, Michigan (a semi-true version until the 1980s): the town had a small police department and a small fire department that saw only occasional crimes or fires, all of which required only 2 or 4 people respectively. Lets imagine that the likelihood of having one small fire at a given time is x = 5%, and that of having a violent crime is y =5% (it was 6% in 2011). A police department will need to have to have 2 policemen on call at all times, but will want 4 on the 0.25% chance that there are two simultaneous crimes (.05 x .05 = .0025); the fire department will want 8 souls on call at all times for the same reason. Either department will use the other 95% of their officers dealing with training, paperwork, investigations of less-immediate cases, care of equipment, and visiting schools, but this number on call is needed for immediate response. As there are 8760 hours per year and the police and fire workers only work 2000 hours, you’ll need at least 4.4 times this many officers. We’ll add some more for administration and sick-day relief, and predict a total staff of 20 police and 40 firemen. This is, more or less, what it was in the 1980s.

If each fire or violent crime took 3 hours (1/8 of a day), you’ll find that the entire on-call staff was busy 7.3 times per year (8x365x.0025 = 7.3), or a bit more since there is likely a seasonal effect, and since fires and violent crimes don’t fall into neat time slots. Having 3 fires or violent crimes simultaneously was very rare — and for those rare times, you could call on nearby communities, or do triage.

In response to austerity (towns always overspend in the good times, and come up short later), Oak Park realized it could use fewer employees if they combined the police and fire departments into an entity renamed “Public safety.” With 45-55 employees assigned to combined police / fire duty they’d still be able to handle the few violent crimes and fires. The sum of these events occurs 10% of the time, and we can apply the sort of statistics above to suggest that about 91% of the time there will be neither a fire nor violent crime; about 9% of the time there will be one or more fires or violent crimes (there is a 5% chance for each, but also a chance that 2 happen simultaneously). At least two events will occur 0.9% of the time (2 fires, 2 crimes or one of each), and they will have 3 or more events .09% of the time, or twice per year. The combined force allowed fewer responders since it was only rarely that 4 events happened simultaneously, and some of those were 4 crimes or 3 crimes and a fire — events that needed fewer responders. Your only real worry was when you have 3 fires, something that should happen every 3 years, or so, an acceptable risk at the time.

Before going to what caused this model of police and fire service to break down as Oak Park got bigger, I should explain Poisson statistics, exponential Statistics, and Power Law/ Fractal Statistics. The only type of statistics taught for dealing with crime like this is Poisson statistics, a type that works well when the events happen so suddenly and pass so briefly that we can claim to be interested in only how often we will see multiples of them in a period of time. The Poisson distribution formula is, P = rke/r! where P is the Probability of having some number of events, r is the total number of events divided by the total number of periods, and k is the number of events we are interested in.

Using the data above for a period-time of 3 hours, we can say that r= .1, and the likelihood of zero, one, or two events begin in the 3 hour period is 90.4%, 9.04% and 0.45%. These numbers are reasonable in terms of when events happen, but they are irrelevant to the problem anyone is really interested in: what resources are needed to come to the aid of the victims. That’s the problem with Poisson statistics: it treats something that no one cares about (when the thing start), and under-predicts the important things, like how often you’ll have multiple events in-progress. For 4 events, Poisson statistics predicts it happens only .00037% of the time — true enough, but irrelevant in terms of how often multiple teams are needed out on the job. We need four teams no matter if the 4 events began in a single 3 hour period or in close succession in two adjoining periods. The events take time to deal with, and the time overlaps.

The way I’d dealt with these events, above, suggests a power law approach. In this case, each likelihood was 1/10 the previous, and the probability P = .9 x10-k . This is called power law statistics. I’ve never seen it taught, though it appears very briefly in Wikipedia. Those who like math can re-write the above relation as log10P = log10 .9 -k.

One can generalize the above so that, for example, the decay rate can be 1/8 and not 1/10 (that is the chance of having k+1 events is 1/8 that of having k events). In this case, we could say that P = 7/8 x 8-k , or more generally that log10P = log10 A –kβ. Here k is the number of teams required at any time, β is a free variable, and Α = 1-10 because the sum of all probabilities has to equal 100%.

In college math, when behaviors like this appear, they are incorrectly translated into differential form to create “exponential statistics.” One begins by saying ∂P/∂k = -βP, where β = .9 as before, or remains some free-floating term. Everything looks fine until we integrate and set the total to 100%. We find that P = 1/λ e-kλ for k ≥ 0. This looks the same as before except that the pre-exponential always comes out wrong. In the above, the chance of having 0 events turns out to be 111%. Exponential statistics has the advantage (or disadvantage) that we find a non-zero possibility of having 1/100 of a fire, or 3.14159 crimes at a given time. We assign excessive likelihoods for fractional events and end up predicting artificially low likelihoods for the discrete events we are interested in except going away from a calculus that assumes continuity in a world where there is none. Discrete math is better than calculus here.

I now wish to generalize the power law statistics, to something similar but more robust. I’ll call my development fractal statistics (there’s already a section called fractal statistics on Wikipedia, but it’s really power-law statistics; mine will be different). Fractals were championed by Benoit B. Mandelbrot (who’s middle initial, according to the old joke, stood for Benoit B. Mandelbrot). Many random processes look fractal, e.g. the stock market. Before going here, I’d like to recall that the motivation for all this is figuring out how many people to hire for a police /fire force; we are not interested in any other irrelevant factoid, like how many calls of a certain type come in during a period of time.

To choose the size of the force, lets estimate how many times per year some number of people are needed simultaneously now that the city has bigger buildings and is seeing a few larger fires, and crimes. Lets assume that the larger fires and crimes occur only .05% of the time but might require 15 officers or more. Being prepared for even one event of this size will require expanding the force to about 80 men; 50% more than we have today, but we find that this expansion isn’t enough to cover the 0.0025% of the time when we will have two such major events simultaneously. That would require a 160 man fire-squad, and we still could not deal with two major fires and a simultaneous assault, or with a strike, or a lot of people who take sick at the same time. 

To treat this situation mathematically, we’ll say that the number times per year where a certain number of people are need, relates to the number of people based on a simple modification of the power law statistics. Thus:  log10N = A – βθ  where A and β are constants, N is the number of times per year that some number of officers are needed, and θ is the number of officers needed. To solve for the constants, plot the experimental values on a semi-log scale, and find the best straight line: -β is the slope and A  is the intercept. If the line is really straight, you are now done, and I would say that the fractal order is 1. But from the above discussion, I don’t expect this line to be straight. Rather I expect it to curve upward at high θ: there will be a tail where you require a higher number of officers. One might be tempted to modify the above by adding a term like but this will cause problems at very high θ. Thus, I’d suggest a fractal fix.

My fractal modification of the equation above is the following: log10N = A-βθ-w where A and β are similar to the power law coefficients and w is the fractal order of the decay, a coefficient that I expect to be slightly less than 1. To solve for the coefficients, pick a value of w, and find the best fits for A and β as before. The right value of w is the one that results in the straightest line fit. The equation above does not look like anything I’ve seen quite, or anything like the one shown in Wikipedia under the heading of fractal statistics, but I believe it to be correct — or at least useful.

To treat this politically is more difficult than treating it mathematically. I suspect we will have to combine our police and fire department with those of surrounding towns, and this will likely require our city to revert to a pure police department and a pure fire department. We can’t expect other cities specialists to work with our generalists particularly well. It may also mean payments to other cities, plus (perhaps) standardizing salaries and staffing. This should save money for Oak Park and should provide better service as specialists tend to do their jobs better than generalists (they also tend to be safer). But the change goes against the desire (need) of our local politicians to hand out favors of money and jobs to their friends. Keeping a non-specialized force costs lives as well as money but that doesn’t mean we’re likely to change soon.

Robert E. Buxbaum  December 6, 2013. My two previous posts are on how to climb a ladder safely, and on the relationship between mustaches in WWII: mustache men do things, and those with similar mustache styles get along best.

The 2013 hurricane drought

News about the bad weather that didn’t happen: there were no major hurricanes in 2013. That is, there was not one storm in the Atlantic Ocean, the Caribbean Sea, or the Gulf of Mexico with a maximum wind speed over 110 mph. None. As I write this, we are near the end of the hurricane season (it officially ends Nov. 30), and we have seen nothing like what we saw in 2012; compare the top and bottom charts below. Barring a very late, very major storm, this looks like it will go down as the most uneventful season in at least 2 decades. Our monitoring equipment has improved over the years, but even with improved detection, we’ve seen nothing major. The last time we saw this lack was 1994 — and before that 1986, 1972, and 1968.

Hurricanes 2012 -2013. This year looks like it will be the one with the lowest number and strength of modern times.

Hurricanes 2012 -2013. This year there were only two hurricanes, and both were category 1 The last time we had this few was 1994. By comparison, in 2012 we saw 5 category 1 hurricanes, 3 Category 2s, and 2 Category 3s including Sandy, the most destructive hurricane to hit New York City since 1938.

In the pacific, major storms are called typhoons, and this year has been fairly typical: 13 typhoons, 5 of them super, the same as in 2012.  Weather tends to be chaotic, but it’s nice to have a year without major hurricane damage or death.

In the news this month, no major storm lead to the lack of destruction of the boats, beaches and stately homes of the North Carolina shore.

In the news, a lack of major storms lead to the lack of destruction of the boats, beaches, and stately homes of the North Carolina shore.

The reason you have not heard of this before is that it’s hard to write a story about events that didn’t happen. Good news is as important as bad, and 2013 had been predicted to be one of the worst seasons on record, but then it didn’t happen and there was nothing to write about. Global warming is supposed to increase hurricane activity, but global warming has taken a 16 year rest. You didn’t hear about the lack of global warming for the same reason you didn’t hear about the lack of storms.

Here’s why hurricanes form in fall and spin so fast, plus how they pick up stuff (an explanation from Einstein). In other good weather news, the ozone hole is smaller, and arctic ice is growing (I suggest we build a northwest passage). It’s hard to write about the lack of bad news, still Good science requires an open mind to the data, as it is, or as it isn’t. Here is a simple way to do abnormal statistics, plus why 100 year storms come more often than once every 100 years.

Robert E. Buxbaum. November 23, 2013.

Ab Normal Statistics and joke

The normal distribution of observation data looks sort of like a ghost. A Distribution  that really looks like a ghost is scary.

The normal distribution of observation data looks sort of like a ghost. A Distribution that really looks like a ghost is scary.

It’s funny because …. the normal distribution curve looks sort-of like a ghost. It’s also funny because it would be possible to imagine data being distributed like the ghost, and most people would be totally clue-less as to how to deal with data like that — abnormal statistics. They’d find it scary and would likely try to ignore the problem. When faced with a statistics problem, most people just hope that the data is normal; they then use standard mathematical methods with a calculator or simulation package and hope for the best.

Take the following example: you’re interested in buying a house near a river. You’d like to analyze river flood data to know your risks. How high will the river rise in 100 years, or 1000. Or perhaps you would like to analyze wind data to know how strong to make a sculpture so it does not blow down. Your first thought is to use the normal distribution math in your college statistics book. This looks awfully daunting (it doesn’t have to) and may be wrong, but it’s all you’ve got.

The normal distribution graph is considered normal, in part, because it’s fairly common to find that measured data deviates from the average in this way. Also, this distribution can be derived from the mathematics of an idealized view of the world, where any variety derives from multiple small errors around a common norm, and not from some single, giant issue. It’s not clear this is a realistic assumption in most cases, but it is comforting. I’ll show you how to do the common math as it’s normally done, and then how to do it better and quicker with no math at all, and without those assumptions.

Lets say you want to know the hundred-year maximum flood-height of a river near your house. You don’t want to wait 100 years, so you measure the maximum flood height every year over five years, say, and use statistics. Lets say you measure 8 foot, 6 foot, 3 foot (a draught year), 5 feet, and 7 feet.

The “normal” approach (pardon the pun), is to take a quick look at the data, and see that it is sort-of normal (many people don’t bother). One now takes the average, calculated here as (8+6+3+5+7)/5 = 5.8 feet. About half the times the flood waters should be higher than this (a good researcher would check this, many do not). You now calculate the standard deviation for your data, a measure of the width of the ghost, generally using a spreadsheet. The formula for standard deviation of a sample is s = √{[(8-5.8)2 + (6-5.8)2 + (3-5.8)2 + (5-5.8)2 + (7-5.8)2]/4} = 1.92. The use of 4 here in the denominator instead of 5 is called the Brussels correction – it refers to the fact that a standard of deviation is meaningless if there is only one data point.

For normal data, the one hundred year maximum height of the river (the 1% maximum) is the average height plus 2.2 times the deviation; in this case, 5.8 + 2.2 x 1.92 = 10.0 feet. If your house is any higher than this you should expect few troubles in a century. But is this confidence warranted? You could build on stilts or further from the river, but you don’t want to go too far. How far is too far?

So let’s do this better. We can, with less math, through the use of probability paper. As with any good science we begin with data, not assumptions, like that the data is normal. Arrange the river height data in a list from highest to lowest (or lowest to highest), and plot the values in this order on your probability paper as shown below. That is on paper where likelihoods from .01% to 99.99% are arranged along the bottom — x axis, and your other numbers, in this case the river heights, are the y values listed at the left. Graph paper of this sort is sold in university book stores; you can also get jpeg versions on line, but they don’t look as nice.

probability plot of maximum river height over 5 years -- looks reasonably normal, but slightly ghost-like.

Probability plot of the maximum river height over 5 years. If the data suggests a straight line, like here the data is reasonably normal. Extrapolating to 99% suggests the 100 year flood height would be 9.5 to 10.2 feet, and that it is 99.99% unlikely to reach 11 feet. That’s once in 10,000 years, other things being equal.

For the x axis values of the 5 data points above, I’ve taken the likelihood to be the middle of its percentile. Since there are 5 data points, each point is taken to represent its own 20 percentile; the middles appear at 10%, 30%, 50%, etc. I’ve plotted the highest value (8 feet) at the 10% point on the x axis, that being the middle of the upper 20%. I then plotted the second highest (7 feet) at 30%, the middle of the second 20%; the third, 6 ft at 50%; the fourth at 70%; and the draught year maximum (3 feet) at 90%.  When done, I judge if a reasonably straight line would describe the data. In this case, a line through the data looks reasonably straight, suggesting a fairly normal distribution of river heights. I notice that, if anything the heights drop off at the left suggesting that really high river levels are less likely than normal. The points will also have to drop off at the right since a negative river height is impossible. Thus my river heights describe a version of the ghost distribution in the cartoon above. This is a welcome finding since it suggests that really high flood levels are unlikely. If the data were non-normal, curving the other way we’d want to build our house higher than a normal distribution would suggest. 

You can now find the 100 year flood height from the graph above without going through any the math. Just draw your best line through the data, and look where it crosses the 1% value on your graph (that’s two major lines from the left in the graph above — you may have to expand your view to see the little 1% at top). My extrapolation suggests the hundred-year flood maximum will be somewhere between about 9.5 feet, and 10.2 feet, depending on how I choose my line. This prediction is a little lower than we calculated above, and was done graphically, without the need for a spreadsheet or math. What’s more, our predictions is more accurate, since we were in a position to evaluate the normality of the data and thus able to fit the extrapolation line accordingly. There are several ways to handle extreme curvature in the line, but all involve fitting the curve some way. Most weather data is curved, e.g. normal against a fractal, I think, and this affects you predictions. You might expect to have an ice age in 10,000 years.

The standard deviation we calculated above is related to a quality standard called six sigma — something you may have heard of. If we had a lot of parts we were making, for example, we might expect to find that the size deviation varies from a target according to a normal distribution. We call this variation σ, the greek version of s. If your production is such that the upper spec is 2.2 standard deviations from the norm, 99% of your product will be within spec; good, but not great. If you’ve got six sigmas there is one-in-a-billion confidence of meeting the spec, other things being equal. Some companies (like Starbucks) aim for this low variation, a six sigma confidence of being within spec. That is, they aim for total product uniformity in the belief that uniformity is the same as quality. There are several problems with this thinking, in my opinion. The average is rarely an optimum, and you want to have a rational theory for acceptable variation boundaries. Still, uniformity is a popular metric in quality management, and companies that use it are better off than those that do nothing. At REB Research, we like to employ the quality methods of W. Edwards Deming; we assume non-normality and aim for an optimum (that’s subject matter for a further essay). If you want help with statistics, or a quality engineering project, contact us.

I’ve also meant to write about the phrase “other things being equal”, Ceteris paribus in Latin. All this math only makes sense so long as the general parameters don’t change much. Your home won’t flood so long as they don’t build a new mall up river from you with runoff in the river, and so long as the dam doesn’t break. If these are concerns (and they should be) you still need to use statistics and probability paper, but you will now have to use other data, like on the likelihood of malls going up, or of dams breaking. When you input this other data, you will find the probability curve is not normal, but typically has a long tail (when the dam breaks, the water goes up by a lot). That’s outside of standard statistic analysis, but why those hundred year floods come a lot more often than once in 100 years. I’ve noticed that, even at Starbucks, more than 1/1,000,000,000 cups of coffee come out wrong. Even in analyzing a common snafu like this, you still use probability paper, though. It may be ‘situation normal”, but the distribution curve it describes has an abnormal tail.

by Dr. Robert E. Buxbaum, November 6, 2013. This is my second statistics post/ joke, by the way. The first one dealt with bombs on airplanes — well, take a look.

Why random experimental design is better

In a previous post I claimed that, to do good research, you want to arrange experiments so there is no pre-hypothesis of how the results will turn out. As the post was long, I said nothing direct on how such experiments should be organized, but only alluded to my preference: experiments should be organized at randomly chosen conditions within the area of interest. The alternative, shown below is that experiments should be done at the cardinal points in the space, or at corner extremes: the Wilson Box and Taguchi design of experiments (DoE), respectively. Doing experiments at these points implies a sort of expectation of the outcome; generally that results will be linearly, orthogonal related to causes; in such cases, the extreme values are the most telling. Sorry to say, this usually isn’t how experimental data will fall out. First experimental test points according to a Wilson Box, a Taguchi, and a random experimental design. The Wilson box and Taguchi are OK choices if you know or suspect that there are no significant non-linear interactions, and where experiments can be done at these extreme points. Random is the way nature works; and I suspect that's best -- it's certainly easiest.

First experimental test points according to a Wilson Box, a Taguchi, and a random experimental design. The Wilson box and Taguchi are OK choices if you know or suspect that there are no significant non-linear interactions, and where experiments can be done at these extreme points. Random is the way nature works; and I suspect that’s best — it’s certainly easiest.

The first test-points for experiments according to the Wilson Box method and Taguchi method of experimental designs are shown on the left and center of the figure above, along with a randomly chosen set of experimental conditions on the right. Taguchi experiments are the most popular choice nowadays, especially in Japan, but as Taguchi himself points out, this approach works best if there are “few interactions between variables, and if only a few variables contribute significantly.” Wilson Box experimental choices help if there is a parabolic effect from at least one parameter, but are fairly unsuited to cases with strong cross-interactions.

Perhaps the main problems with doing experiments at extreme or cardinal points is that these experiments are usually harder than at random points, and that the results from these difficult tests generally tell you nothing you didn’t know or suspect from the start. The minimum concentration is usually zero, and the minimum temperature is usually one where reactions are too slow to matter. When you test at the minimum-minimum point, you expect to find nothing, and generally that’s what you find. In the data sets shown above, it will not be uncommon that the two minimum W-B data points, and the 3 minimum Taguchi data points, will show no measurable result at all.

Randomly selected experimental conditions are the experimental equivalent of Monte Carlo simulation, and is the method evolution uses. Set out the space of possible compositions, morphologies and test conditions as with the other method, and perhaps plot them on graph paper. Now, toss darts at the paper to pick a few compositions and sets of conditions to test; and do a few experiments. Because nature is rarely linear, you are likely to find better results and more interesting phenomena than at any of those at the extremes. After the first few experiments, when you think you understand how things work, you can pick experimental points that target an optimum extreme point, or that visit a more-interesting or representative survey of the possibilities. In any case, you’ll quickly get a sense of how things work, and how successful the experimental program will be. If nothing works at all, you may want to cancel the program early, if things work really well you’ll want to expand it. With random experimental points you do fewer worthless experiments, and you can easily increase or decrease the number of experiments in the program as funding and time allows.

Consider the simple case of choosing a composition for gunpowder. The composition itself involves only 3 or 4 components, but there is also morphology to consider including the gross structure and fine structure (degree of grinding). Instead of picking experiments at the maximum compositions: 100% salt-peter, 0% salt-peter, grinding to sub-micron size, etc., as with Taguchi, a random methodology is to pick random, easily do-able conditions: 20% S and 40% salt-peter, say. These compositions will be easier to ignite, and the results are likely to be more relevant to the project goals.

The advantages of random testing get bigger the more variables and levels you need to test. Testing 9 variables at 3 levels each takes 27 Taguchi points, but only 16 or so if the experimental points are randomly chosen. To test if the behavior is linear, you can use the results from your first 7 or 8 randomly chosen experiments, derive the vector that gives the steepest improvement in n-dimensional space (a weighted sum of all the improvement vectors), and then do another experimental point that’s as far along in the direction of that vector as you think reasonable. If your result at this point is better than at any point you’ve visited, you’re well on your way to determining the conditions of optimal operation. That’s a lot faster than by starting with 27 hard-to-do experiments. What’s more, if you don’t find an optimum; congratulate yourself, you’ve just discovered an non-linear behavior; something that would be easy to overlook with Taguchi or Wilson Box methodologies.

The basic idea is one Sherlock Holmes pointed out (Study in Scarlet): It is a capital mistake to theorize before one has data. Insensibly one begins to twist facts to suit theories, instead of theories to suit facts.” (Case of Identity). Life is infinitely stranger than anything which the mind of man could invent.

Robert E. Buxbaum, September 11, 2013. A nice description of the Wilson Box method is presented in Perry’s Handbook (6th ed). SInce I had trouble finding a free, on-line description, I linked to a paper by someone using it to test ingredient choices in baked bread. Here’s a link for more info about random experimental choice, from the University of Michigan, Chemical Engineering dept. Here’s a joke on the misuse of statistics, and a link regarding the Taguchi Methodology. Finally, here’s a pointless joke on irrational numbers, that I posted for pi-day.

The Scientific Method isn’t the method of scientists

A linchpin of middle school and high-school education is teaching ‘the scientific method.’ This is the method, students are led to believe, that scientists use to determine Truths, facts, and laws of nature. Scientists, students are told, start with a hypothesis of how things work or should work, they then devise a set of predictions based on deductive reasoning from these hypotheses, and perform some critical experiments to test the hypothesis and determine if it is true (experimentum crucis in Latin). Sorry to say, this is a path to error, and not the method that scientists use. The real method involves a few more steps, and follows a different order and path. It instead follows the path that Sherlock Holmes uses to crack a case.

The actual method of Holmes, and of science, is to avoid beginning with a hypothesis. Isaac Newton claimed: “I never make hypotheses” Instead as best we can tell, Newton, like most scientists, first gathered as much experimental evidence on a subject as possible before trying to concoct any explanation. As Holmes says (Study in Scarlet): “It is a capital mistake to theorize before you have all the evidence. It biases the judgment.”

It is a capital mistake to theorize before one has data. Insensibly one begins to twist facts to suit theories, instead of theories to suit facts (Holmes, Scandal in Bohemia).

Holmes barely tolerates those who hypothesize before they have all the data: “It is a capital mistake to theorize before one has data. Insensibly one begins to twist facts to suit theories, instead of theories to suit facts.” (Scandal in Bohemia).

Then there is the goal of science. It is not the goal of science to confirm some theory, model, or hypothesis; every theory probably has some limited area where it’s true. The goal for any real-life scientific investigation is the desire to explain something specific and out of the ordinary, or do something cool. Similarly, with Sherlock Holmes, the start of the investigation is the arrival of a client with a specific, unusual need – one that seems a bit outside of the normal routine. Similarly, the scientist wants to do something: build a bigger bridge, understand global warming, or how DNA directs genetics; make better gunpowder, cure a disease, or Rule the World (mad scientists favor this). Once there is a fixed goal, it is the goal that should direct the next steps: it directs the collection of data, and focuses the mind on the wide variety of types of solution. As Holmes says: , “it’s wise to make one’s self aware of the potential existence of multiple hypotheses, so that one eventually may choose one that fits most or all of the facts as they become known.” It’s only when there is no goal, that any path will do

In gathering experimental data (evidence), most scientists spend months in the less-fashionable sections of the library, looking at the experimental methods and observations of others, generally from many countries, collecting any scrap that seems reasonably related to the goal at hand. I used 3 x5″ cards to catalog this data and the references. From many books and articles, one extracts enough diversity of data to be able to look for patterns and to begin to apply inductive logic. “The little things are infinitely the most important” (Case of Identity). You have to look for patterns in the data you collect. Holmes does not explain how he looks for patterns, but this skill is innate in most people to a greater or lesser extent. A nice set approach to inductive logic is called the Baconian Method, it would be nice to see schools teach it. If the author is still alive, a scientist will try to contact him or her to clarify things. In every SH mystery, Holmes does the same and is always rewarded. There is always some key fact or observation that this turns up: key information unknown to the original client.

Based on the facts collected one begins to create the framework for a variety of mathematical models: mathematics is always involved, but these models should be pretty flexible. Often the result is a tree of related, mathematical models, each highlighting some different issue, process, or problem. One then may begin to prune the tree, trying to fit the known data (facts and numbers collected), into a mathematical picture of relevant parts of this tree. There usually won’t be quite enough for a full picture, but a fair amount of progress can usually be had with the application of statistics, calculus, physics, and chemistry. These are the key skills one learns in college, but usually the high-schooler and middle schooler has not learned them very well at all. If they’ve learned math and physics, they’ve not learned it in a way to apply it to something new, quite yet (it helps to read the accounts of real scientists here — e.g. The Double Helix by J. Watson).

Usually one tries to do some experiments at this stage. Homes might visit a ship or test a poison, and a scientist might go off to his, equally-smelly laboratory. The experiments done there are rarely experimenti crucae where one can say they’ve determined the truth of a single hypothesis. Rather one wants to eliminated some hypotheses and collect data to be used to evaluate others. An answer generally requires that you have both a numerical expectation and that you’ve eliminated all reasonable explanations but one. As Holmes says often, e.g. Sign of the four, “when you have excluded the impossible, whatever remains, however improbable, must be the truth”. The middle part of a scientific investigation generally involves these practical experiments to prune the tree of possibilities and determine the coefficients of relevant terms in the mathematical model: the weight or capacity of a bridge of a certain design, the likely effect of CO2 on global temperature, the dose response of a drug, or the temperature and burn rate of different gunpowder mixes. Though not mentioned by Holmes, it is critically important in science to aim for observations that have numbers attached.

The destruction of false aspects and models is a very important part of any study. Francis Bacon calls this act destruction of idols of the mind, and it includes many parts: destroying commonly held presuppositions, avoiding personal preferences, avoiding the tendency to see a closer relationship than can be justified, etc.

In science, one eliminates the impossible through the use of numbers and math, generally based on your laboratory observations. When you attempt to the numbers associated with our observations to the various possible models some will take the data well, some poorly; and some twill not fit the data at all. Apply the deductive reasoning that is taught in schools: logical, Boolean, step by step; if some aspect of a model does not fit, it is likely the model is wrong. If we have shown that all men are mortal, and we are comfortable that Socrates is a man, then it is far better to conclude that Socrates is mortal than to conclude that all men but Socrates is mortal (Occam’s razor). This is the sort of reasoning that computers are really good at (better than humans, actually). It all rests on the inductive pattern searches similarities and differences — that we started with, and very often we find we are missing a piece, e.g. we still need to determine that all men are indeed mortal, or that Socrates is a man. It’s back to the lab; this is why PhDs often take 5-6 years, and not the 3-4 that one hopes for at the start.

More often than not we find we have a theory or two (or three), but not quite all the pieces in place to get to our goal (whatever that was), but at least there’s a clearer path, and often more than one. Since science is goal oriented, we’re likely to find a more efficient than we fist thought. E.g. instead of proving that all men are mortal, show it to be true of Greek men, that is for all two-legged, fairly hairless beings who speak Greek. All we must show is that few Greeks live beyond 130 years, and that Socrates is one of them.

Putting numerical values on the mathematical relationship is a critical step in all science, as is the use of models — mathematical and otherwise. The path to measure the life expectancy of Greeks will generally involve looking at a sample population. A scientist calls this a model. He will analyze this model using statistical model of average and standard deviation and will derive his or her conclusions from there. It is only now that you have a hypothesis, but it’s still based on a model. In health experiments the model is typically a sample of animals (experiments on people are often illegal and take too long). For bridge experiments one uses small wood or metal models; and for chemical experiments, one uses small samples. Numbers and ratios are the key to making these models relevant in the real world. A hypothesis of this sort, backed by numbers is publishable, and is as far as you can go when dealing with the past (e.g. why Germany lost WW2, or why the dinosaurs died off) but the gold-standard of science is predictability.  Thus, while we a confident that Socrates is definitely mortal, we’re not 100% certain that global warming is real — in fact, it seems to have stopped though CO2 levels are rising. To be 100% sure you’re right about global warming we have to make predictions, e.g. that the temperature will have risen 7 degrees in the last 14 years (it has not), or Al Gore’s prediction that the sea will rise 8 meters by 2106 (this seems unlikely at the current time). This is not to blame the scientists whose predictions don’t pan out, “We balance probabilities and choose the most likely. It is the scientific use of the imagination” (Hound of the Baskervilles)The hope is that everything matches; but sometimes we must look for an alternative; that’s happened rarely in my research, but it’s happened.

You are now at the conclusion of the scientific process. In fiction, this is where the criminal is led away in chains (or not, as with “The Woman,” “The Adventure of the Yellow Face,” or of “The Blue Carbuncle” where Holmes lets the criminal free — “It’s Christmas”). For most research the conclusion includes writing a good research paper “Nothing clears up a case so much as stating it to another person”(Memoirs). For a PhD, this is followed by the search for a good job. For a commercial researcher, it’s a new product or product improvement. For the mad scientist, that conclusion is the goal: taking over the world and enslaving the population (or not; typically the scientist is thwarted by some detail!). But for the professor or professional research scientist, the goal is never quite reached; it’s a stepping stone to a grant application to do further work, and from there to tenure. In the case of the Socrates mortality work, the scientist might ask for money to go from country to country, measuring life-spans to demonstrate that all philosophers are mortal. This isn’t as pointless and self-serving as it seems, Follow-up work is easier than the first work since you’ve already got half of it done, and you sometimes find something interesting, e.g. about diet and life-span, or diseases, etc. I did some 70 papers when I was a professor, some on diet and lifespan.

One should avoid making some horrible bad logical conclusion at the end, by the way. It always seems to happen that the mad scientist is thwarted at the end; the greatest criminal masterminds are tripped by some last-minute flaw. Similarly the scientist must not make that last-mistep. “One should always look for a possible alternative, and provide against it” (Adventure of Black Peter). Just because you’ve demonstrated that  iodine kills germs, and you know that germs cause disease, please don’t conclude that drinking iodine will cure your disease. That’s the sort of science mistakes that were common in the middle ages, and show up far too often today. In the last steps, as in the first, follow the inductive and quantitative methods of Paracelsus to the end: look for numbers, (not a Holmes quote) check how quantity and location affects things. In the case of antiseptics, Paracelsus noticed that only external cleaning helped and that the help was dose sensitive.

As an example in the 20th century, don’t just conclude that, because bullets kill, removing the bullets is a good idea. It is likely that the trauma and infection of removing the bullet is what killed Lincoln, Garfield, and McKinley. Theodore Roosevelt was shot too, but decided to leave his bullet where it was, noticing that many shot animals and soldiers lived for years with bullets in them; and Roosevelt lived for 8 more years. Don’t make these last-minute missteps: though it’s logical to think that removing guns will reduce crime, the evidence does not support that. Don’t let a leap of bad deduction at the end ruin a line of good science. “A few flies make the ointment rancid,” said Solomon. Here’s how to do statistics on data that’s taken randomly.

Dr. Robert E. Buxbaum, scientist and Holmes fan wrote this, Sept 2, 2013. My thanks to Lou Manzione, a friend from college and grad school, who suggested I reread all of Holmes early in my PhD work, and to Wikiquote, a wonderful site where I found the Holmes quotes; the Solomon quote I knew, and the others I made up.

Hormesis, Sunshine and Radioactivity

It is often the case that something is good for you in small amounts, but bad in large amounts. As expressed by Paracelsus, an early 16th century doctor, “There is no difference between a poison and a cure: everything depends on dose.”

Aereolis Bombastus von Hoenheim (Paracelcus)

Phillipus Aureolus Theophrastus Bombastus von Hoenheim (Dr. Paracelsus).

Some obvious examples involve foods: an apple a day may keep the doctor away. Fifteen will cause deep physical problems. Alcohol, something bad in high doses, and once banned in the US, tends to promote longevity and health when consumed in moderation, 1/2-2 glasses per day. This is called “hormesis”, where the dose vs benefit curve looks like an upside down U. While it may not apply to all foods, poisons, and insults, a view called “mitridatism,” it has been shown to apply to exercise, chocolate, coffee and (most recently) sunlight.

Up until recently, the advice was to avoid direct sun because of the risk of cancer. More recent studies show that the benefits of small amounts of sunlight outweigh the risks. Health is improved by lowering blood pressure and exciting the immune system, perhaps through release of nitric oxide. At low doses, these benefits far outweigh the small chance of skin cancer. Here’s a New York Times article reviewing the health benefits of 2-6 cups of coffee per day.

A hotly debated issue is whether radiation too has a hormetic dose range. In a previous post, I noted that thyroid cancer rates down-wind of the Chernobyl disaster are lower than in the US as a whole. I thought this was a curious statistical fluke, but apparently it is not. According to a review by The Harvard Medical School, apparent health improvements have been seen among the cleanup workers at Chernobyl, and among those exposed to low levels of radiation from the atomic bombs dropped on Hiroshima and Nagasaki. The health   improvements relative to the general population could be a fluke, but after a while several flukes become a pattern.

Among the comments on my post, came this link to this scholarly summary article of several studies showing that long-term exposure to nuclear radiation below 1 Sv appears to be beneficial. One study involved an incident where a highly radioactive, Co-60 source was accidentally melted into a batch of steel that was subsequently used in the construction of apartments in Taiwan. The mistake was not discovered for over a decade, and by then the tenants had received between 0.4 and 6 Sv (far more than US law would allow). On average, they were healthier than the norm and had significantly lower cancer death rates. Supporting this is the finding, in the US, that lung cancer death rates are 35% lower in the states with the highest average radon radiation levels (Colorado, North Dakota, and Iowa) than in those with the lowest levels (Delaware, Louisiana, and California). Note: SHORT-TERM exposure to 1 Sv is NOT good for you; it will give radiation sickness, and short-term exposure to 4.5 Sv is the 50% death level

Most people in the irradiated Taiwan apartments got .2 Sv/year or less, but the same health benefit has also been shown for people living on radioactive sites in China and India where the levels were as high as .6 Sv/year (normal US background radiation is .0024 Sv/year). Similarly, virtually all animal and plant studies show that radiation appears to improve life expectancy and fecundity (fruit production, number of offspring) at dose rates as high as 1 Sv/month.

I’m not recommending 1 Sv/month for healthy people, it’s a cancer treatment dose, and will make healthy people feel sick. A possible reason it works for plants and some animals is that the radiation may kill proto- cancer, harmful bacteria, and viruses — organisms that lack the repair mechanisms of larger, more sophisticated organisms. Alternately, it could kill non-productive, benign growths allowing the more-healthy growths to do their thing. This explanation is similar to that for the benefits farmers produce by pinching off unwanted leaves and pruning unwanted branches.

It is not conclusive radiation improved human health in any of these studies. It is possible that exposed people happened to choose healthier life-styles than non-exposed people, choosing to smoke less, do more exercise, or eat fewer cheeseburgers (that, more-or-less, was my original explanation). Or it may be purely psychological: people who think they have only a few years to live, live healthier. Then again, it’s possible that radiation is healthy in small doses and maybe cheeseburgers and cigarettes are too?! Here’s a scene from “Sleeper” a 1973, science fiction, comedy movie where Woody Allan, asleep for 200 years, finds that deep fat, chocolate, and cigarettes are the best things for your health. You may not want a cigarette or a radium necklace quite yet, but based on these studies, I’m inclined to reconsider the risk/ benefit balance in favor of nuclear power.

Note: my company, REB Research makes (among other things), hydrogen getters (used to reduce the risks of radioactive waste transportation) and hydrogen separation filters (useful for cleanup of tritium from radioactive water, for fusion reactors, and to reduce the likelihood of explosions in nuclear facilities.

by Dr. Robert E. Buxbaum June 9, 2013

Chernobyl radiation appears to cure cancer

In a recent post about nuclear power, I mentioned that the health risks of nuclear power are low compared to the main alternatives: coal and natural gas. Even with scrubbing, the fumes from coal burning power plants are deadly once the cumulative effect on health over 1000 square miles is considered. And natural gas plants and pipes have fairly common explosions.

With this post I’d like to discuss a statistical fluke (or observation), that even with the worst type of nuclear accident, the broad area increased cancer incidence is generally too small to measure. The worst nuclear disaster we are ever likely to encounter was the explosion at Chernobyl. It occurred 27 years ago during a test of the safety shutdown system and sent a massive plume of radioactive core into the atmosphere. If any accident should increase the cancer rate of those around it, this should. Still, by fluke or not, the rate of thyroid cancer is higher in the US than in Belarus, close to the Chernobyl plant in the prime path of the wind. Thyroid cancer is likely the most excited cancer, enhanced by radio-iodine, and Chernobyl had the largest radio-iodine release to date. Thus, it’s easy to wonder why the rates of Thyroid cancer seem to suggest that the radiation cures cancer rather than causes it.

Thyroid Cancer Rates for Belarus and US; the effect of Chernobyl is less-than clear.

Thyroid Cancer Rates for Belarus and US; the effect of Chernobyl is less-than clear.

The chart above raises more questions than it answers. Note that the rate of thyroid cancer has doubled over the past few years, both in the US and in Belarus. Also note that the rate of cancer is 2 1/2 times as high in Pennsylvania as in Arkansas. One thought is test bias: perhaps we are  better at spotting cancer in the US than in Belarus, and perhaps better at spotting it in Pennsylvania than elsewhere. Perhaps. Another thought is coal. Areas that use a lot of coal tend to become sicker; Europe keeps getting sicker from its non-nuclear energy sources, Perhaps Pennsylvania (a coal state) uses more coal that Belarus (maybe).

Fukushima was a much less damaging accident, and much more recent. So far there has been no observed difference in cancer rate. As the reference below says: “there is no statistical evidence of a difference in thyroid cancer caused by the disaster.” This is not to say that explosions are OK. My company, REB Research, makes are high pressure, low temperature hydrogen-extracting membranes used to reduce the likelihood of hydrogen explosions in nuclear reactors; so far all the explosions have been hydrogen explosions.

Sources: for Belarus: Cancer consequences of the Chernobyl accident: 20 years on. For the US: GEOGRAPHIC VARIATION IN U.S. THYROID CANCER INCIDENCE, AND A CLUSTER NEAR NUCLEAR REACTORS IN NEW JERSEY, NEW YORK, AND PENNSYLVANIA.

R. E. Buxbaum, April 19, 2013; Here are some further, updated thoughts: radiation hormesis (and other hormesis)

Statistics Joke

A classic statistics joke concerns a person who’s afraid to fly; he goes to a statistician who explains that planes are very, very safe, especially if you fly a respectable airline in good weather. In that case, virtually the only problem you’ll have is the possibility of a bomb on board. The fellow thinks it over and decides that flying is still too risky, so the statistician suggests he plant a bomb on the airplane, but rig it to not go off. The statistician explains: while it’s very rare to have a bomb onboard an airplane, it’s really unheard of to have two bombs on the same plane.

It’s funny because …. the statistician left out the fact that an independent variable (number of bombs) has to be truly independent. If it is independent, the likelihood is found using a poisson distribution, a non-normal distribution where the greatest likelihood is zero bombs, and there are no possibilities for a negative bomb. Poisson distributions are rarely taught in schools for some reason.

By Dr. Robert E. Buxbaum, Mar 25, 2013. If you’ve got a problem like this (particularly involving chemical engineering) you could come to my company, REB Research.