Tag Archives: statistics

Every food causes cancer, and cures it, research shows.

Statistical analysis, misused, allows you to prove many things that are not true. This was long a feature of advertising: with our toothpaste you get 38% fewer cavities, etc. In the past such ‘studies’ were not published in respectable journals, and research supported by on such was not funded. Now it is published and it is funded, and no one much cares. For an academic, this is the only game in town. One result, well known, is the “crisis of replicability”– very few studies in medicine, psychology, or environment are replicable (see here for more).

In this post, I look at food health claims– studies that find foods cause cancer, or cure it. The analysis I present comes from two researchers, Schoenfeld and Ioannides, (read the original article here) who looked at the twenty most common ingredients in “The Boston Cooking-School Cook Book”. For each food, they used Pub-Med to look up the ten most recent medical articles that included the phrase, “risk factors”, the word “cancer”, and the name of the food in the title or abstract. For studies finding effect in the range of 10x risk factors to 1/10 risk factors, the results are plotted below for each of the 20 foods. Some studies showed factors beyond the end of the chart, but the chart gives a sense. It seems that most every food causes or cures cancer, often to a fairly extreme extent.

Effect estimates by ingredient. From Schoenfeld and Ioannides. Is everything we eat associated with cancer? Am J. Clin. Nutrition 97 (2013) 127-34. (I was alerted to this by Dr. Jeremy Brown, here)

A risk factor of 2 indicates that you double your chance of getting cancer if you eat this food. Buy contrast, as risk factor of 0.5 suggests that you halve your cancer risk. Some foods, like onion seem to reduce your chance of cancer to 1/10, though another study say 1/100th. This food is essentially a cancer cure, assuming you believe the study (I do not).

Only 19% of the studies found no statistically significant cancer effect of the particular food. The other 81% found that the food was significantly cancer-causing, or cancer preventing, generally of p=0.05 to 0.05. Between the many studies done, most foods did both. Some of these were meta studies (studies that combine other studies). These studies found slightly smaller average risk factors, but claimed more statistical significance in saying that the food caused or cured cancer.

0.1 0.2. 0.5 1. 2 5 10
Relative risk

The most common type of cancer caused is Gastrointestinal. The most common cancer cured is breast. Other cancers feature prominently, though: head, neck, genetilia-urinary, lung. The more cancers a researcher considers the higher the chance of showing significant effects from eating the food. If you look at ten cancers, each at the standard of one-tailed significance, you have a high chance of finding that one of these is cured or caused to the standard of p=0.05.

In each case the comparison was between a high-dose cohort and a low-dose cohort, but there was no consistency in determining the cut-offs for the cohort. Sometimes it was the top and bottom quartile, in others the quintile, in yet others the top 1/3 vs the bottom 1/3. Dose might be times eaten per week, or grams of food total. Having this flexibility increases a researcher’s chance of finding something. All of this is illegitimate, IMHO. I like to see a complete dose-response curve that shows an R2 factor pf 90+% or so. To be believable, you need to combine this R2 with a low p value, and demonstrate the same behaviors in men and woman. I showed this when looking at the curative properties of coffee. None of the food studies above did this.

From Yang, Youyou and Uzzi, 2020. Studies that failed replication are cited as often as those that passed replication. Folks don’t care.

Of course, better statistics will not protect you from outright lying, as with the decades long, faked work on the cause of Alzheimers. But the most remarkable part is how few people seem to care.

People want to see their favorite food or molecule as a poison or cure and will cite anything that says so. Irreplicable studies are cited at the same rate as replicated studies, as shown in this 2020 study by Yang Yang, Wu Youyou, and Brian Uzzi. We don’t stop prescribing bad heart medicines, or praising irreplaceable studies on foods. Does pomegranate juice really help? red wine? there was a study, but I doubt it replicated. We’ve repeatedly shown that aspirin helps your heart, but it isn’t prescribed much. Generally, we prefer more expensive blood thinners that may not help. Concerning the pandemic. It seems our lockdowns made things worse. We knew this two years ago, but kept doing it.

As Schoenfeld and Ioannides state: “Thousands of nutritional epidemiology studies are conducted and published annually in the quest to identify dietary factors that affect major health outcomes, including cancer risk. These studies influence dietary guidelines and at times public health policy… [However] Randomized trials have repeatedly failed to find treatment effects for nutrients in which observational studies had previously proposed strong associations.” My translation: take all these food studies with a grain of salt.

Robert Buxbaum, April 4, 2023

Social science is irreproducible, drug tests nonreplicable, and stoves studies ignore confounders.

Efforts to replicate the results of the most prominent studies in health and social science have found them largely irreproducible with the worst replicability appearing in cancer drug research. The figure below, from “The Reproducibility Project in Cancer Biology, Errington et al. 2021, compares the reported effects in 50 cancer drug experiments from 23 papers with the results from repeated versions of the same experiments, looking at a total of 158 effects.

Graph comparing the original, published effect of a cancer drug with the replication effect. The units are whatever units were used in the original study, percent, or risk ratio, etc. From “Investigating the replicability of preclinical cancer biology,”
Timothy M Errington et al. Center for Open Science, United States; Stanford University, Dec 7, 2021, https://doi.org/10.7554/eLife.71601.

It’s seen that virtually none of the drugs are found to work the same as originally reported. Those below the dotted, horizontal line behaved the opposite in the replication studies. About half, those shown in pink, showed no significant effect. Of those that showed positive behavior as originally published, mostly they show about half the activity with two drugs that now appear to be far more active. A favorite web-site of mine, retraction watch, is filled with retractions of articles on these drugs.

The general lack of replicability has been called a crisis. It was first seen in the social sciences, e.g. the figure below from this article in Science, 2015. Psychology research is bad enough such that Nobel Laureate, Daniel Kahneman, came to disown most of the conclusions in his book, “Thinking, Fast and Slow“. The experiments that underly his major sections don’t replicate. Take, for example, social printing. Classic studies had claimed that, if you take a group of students and have them fill out surveys with words about the aged or the flag, they will then walk slower from the survey room or stand longer near a flag. All efforts to reproduce these studies have failed. We now think they are not true. The problem here is that much of education and social engineering is based on such studies. Public policy too. The lack of replicability throws doubt on much of what modern society thinks and does. We like to have experts we can trust; we now have experts we can’t.

From “Estimating the reproducibility of psychological science” Science, 2015. Social science replication is better than dance drug replication, about 35% of the classic social science studies replicate to some, reasonable extent.

Are gas stoves dangerous? This 2022 environmental study said they are, claiming with 95% confidence that they are responsible for 12.7% of childhood asthma. I doubt the study will be reproducible for reasons I’ll detail below, but for now it’s science, and it may soon be law.

Part of the replication problem is that researchers have been found to lie. They fudge data or eliminate undesirable results, some more some less, and a few are honest, but the journals don’t bother checking. Some researchers convince themselves that they are doing the world a favor, but many seem money-motivated. A foundational study on Alzheimers was faked outright. The authors doctored photos using photoshop, and used the fake results to justify approval of non-working, expensive drugs. The researchers got $1B in NIH funding too. I’d want to see the researchers jailed, long term: it’s grand larceny and a serious violation of trust.

Another cause of this replication crisis — one that particularly hurt Daniel Kahneman’s book — is that many social science researchers do statistically illegitimate studies on populations that are vastly too small to give reliable results. Then, they only publish the results they like. The graph of z-values shown below suggest this is common, at least in some journals, including “Personality and social psychology Bulletin”. The vast fraction of results at ≥95% confidence suggest that researchers don’t publish the 90-95% of their work that doesn’t fit the desired hypothesis. While there has been no detailed analysis of all the social science research, it’s clear that this method was used to show that GMO grains caused cancer. The researcher did many small studies, and only published the one study where GMOs appeared to cause cancer. I review the GMO study here.

From Ulrich Schimmack, ReplicationIndex.com, January, 2023, https://replicationindex.com/2023/01/08/which-social-psychologists-can-you-trust/. If you really want to get into this he is a great resource.

The chart at left shows Z-scores, were Z = ∆X √n/σ. A Z score above 1.93 generally indicates significance, p < .05. Notice that almost all the studies have Z scores just over 1.93 that is almost all the studies proved their hypothesis at 95% confidence. That makes it seem that the researchers were very lucky, near prescient. But it’s clear from the distribution that there were a lot of studies that done but never shown to the public. That is a lot of data that was thrown out, either by the researchers or by the publishers. If all data was published, you’d expect to see a bell curve. Instead the Z values are of a tiny bit of a bell curve, just the tail end. The implication is that these studies with Z= >1.93 suggest far less than 95% confidence. This then shows up in the results being only 25% reproducible. It’s been suggested that you should not throw out all the results in the journal, just look for Z-scores of 3.6 or more. That leaves you with the top 23%, and these should have a good chance of being reproducible. The top graph somewhat supports this, but it’s not that simple.

Another classic way to cook the books, as it were, and make irreproducible studies provide the results you seek is to ignore “confounders.” This leads to association – causation errors. As an example, it’s observed that people taking aspirin have more heart attacks than those who do not, but the confounder is that aspirin is prescribed to those with heart problems; the aspirin actually helps, but appears to hurt. In the case of stoves, it seems likely that poorer, sicker people own gas, and that they live in older, moldy homes, and cook more at home, frying onions, etc. These are confounders that the study to my reading ignores. They could easily be the reason that gas stove owners get more asthma toxins than the rich folks who own electric, induction stoves. If you confuse association, you seem to find that owning the wrong stove causes you to be poor and sick with a moldy home. I suspect that the stove study will not replicate if they correct for the confounders.

I’d like to recommend a book, hardly mathematical, “How to Lie with Statistics” by Darrell Huff ($8.99 on Amazon). I read it in high school. It gives you a sense of what to look out for. I should also mention Dr. Anthony Fauci. He has been going around to campuses saying we should have zero tolerance for those who deny science, particularly health science. Given that so much of health science research is nonreplicable, I’d recommend questioning all of it. Here is a classic clip from the 1973 movie, ‘Sleeper’, where a health food expert wakes up in 2173 to discover that health science has changed.

Robert Buxbaum , February 7, 2023.

Coffee decreases your chance of Parkinson’s, a lot.

Some years ago, I thought to help my daughter understand statistics by reanalyzing the data from a 2004 study on coffee and Parkinson’s disease mortality, “Coffee consumption, gender, and Parkinson’s disease mortality in the cancer prevention study II cohort: the modifying effects of estrogen” , Am J Epidemiol. 2004 Nov 15;160(10):977-84, see it here

For the study, a cohort of over 1 million people was enrolled in 1982 and assessed for diet, smoking, alcohol, etc. Causes of deaths were ascertained through death certificates from January 1, 1989, through 1998. Death certificate data suggested that coffee decreased Parkinson’s mortality in men but not in women after adjustment for age, smoking, and alcohol intake. They used a technique I didn’t like though, ANOVA, analysis of variance. That is they compare the outcome of those who drank a lot of coffee (4 cups or more) to those who drank nothing. Though women in the coffee cohort had about 49% the death rate, it was not statistically significant by the ANOVA measure (p = 0.6). The authors of the study understood estrogen to be the reason for the difference.

Based on R2, coffee appears to significantly decrease the risk of Parkinson’s mortality in both men and women.

I thought we could do a better by graphical analysis, see plot at right, especially using R2 to analyze the trend. According to this plot it appears that coffee significantly reduces the likelihood of death in both men and women, confidence better than 90%. Women don’t tend to drink as much coffee as men, but the relative effect per cup is stronger than in men, it appears, and the trend line is clearer too. In the ANOVA, it appears that the effect in women is small because women are less prone Parkinson’s.

The benefit of coffee has been seen as well, in this study, looking at extreme drinkers. Benefits appear for other brain problems too, like Alzheimer’s. It seems that 2-4 cups of coffee per day also reduces the tendency for suicide, and decreases the rate of gout. It seems to be a preventative against kidney stones, too.

There is a confounding behavior that I should note, it’s possible that people who begin to feel signs of Parkinson’s, etc. stop drinking coffee. I doubt it, give the study’s design, but it’s worth a mention. The same confounding is also present in a previous analysis I did that suggested that being overweight protected from dementia, and from Alzheimer’s. Maybe pre-dementia people start loosing weight long before other symptoms appear.

Dr. Robert E. Buxbaum, and C.M. Buxbaum, December 15, 2022

Curing my heart fibrillation with ablation.

Two years ago, I was diagnosed with Atrial fibrillation, A-Fib in common parlance, a condition where my heart would sometimes speed up to double its normal speed. I was prescribed metopolol and then atenolol, common beta blockers, and a C-Pap for sleep apnea. None of this seemed to help, as best I could tell from occasional pulse measurements with watch and a finger pulse-oxometer. Besides, the C-Pap was giving me cough and the beta blockers made me dizzy. And the literature on C-Pap did not impress.

So, some moths ago, I bought an iWatch. The current versions allows you to take EKGs and provides a continuous record of your heart rate. This was very helpful, as I saw that my heart rate was transitioning to chaos. While it was normally predictable, it would zoom to 130 or so at some point virtually every day. Even more alarming, it would slow down to the mid 30s at some point during the night, bradycardia, and I could see it was getting worse. At that point, I agreed to go on eliquis, a blood thinner, and agreed to a catheter ablation. The doctor put a catheter into my heart by way of a leg vein, and zapped various nerve centers in the heart. The result is that my heart is back into normal behavior. See the heart-rate readout from my iWatch below; before and after are dramatically different.

My heart rate for the last month, very variable before the ablation treatment, 2 weeks ago; a far less variable range of heart rates in the two weeks following the treatment. Heart rate data is from my iPhone and iwatch — a good investment, IMHO.

The reason I chose ablation over drugs or no therapy was that I read health-studies on line. I’ve go a PhD, and that training helps me to understand the papers I’ve read, but you should read them too. They are not that hard to understand. Though ablation didn’t appear as a panacea, it was clearly better than the alternatives. Particularly relevant was the CABANA study on life expectancy. CABANA stands for “Catheter ABlation vs ANtiarrhythmic Drug Therapy for Atrial Fibrillation – CABANA”. https://www.acc.org/latest-in-cardiology/clinical-trials/2018/05/10/15/57/cabana.

2,204 individuals with persistent AF were followed for 5 years after treatment, 37% female, 63% male, average age 67.5. Prior hospitalization for AF: 39%. The results were as follows:

  • Death: 5.2% for ablation vs. 6.1% for drug therapy (p = 0.38)
  • Serious stroke: 0.3% for ablation vs. 0.6% for drug therapy (p = 0.19)
  • All-cause mortality: 4.4% for ablation vs. 7.5% for drug therapy (p = 0.005)
  • Death or CV hospitalization: 51.7% for ablation vs. 58.1% for drug therapy (p = 0.002)
  • Pericardial effusion with ablation: 3.0%; ablation-related events: 1.8%
  • First recurrent AF/atrial flutter/atrial tachycardia: 53.8% vs. 71.9% (p < 0.0001)

I found all of this significant, including the fact that 27.5% of those on the drug treatment crossed over to have ablation while only 9.2% on the ablation side crossed to have the drug treatment.

I must give a plug for doctor Ahmed at Beaumont Hospital who did the ablation. He does about 200 of these a year, and does them well. Do not go to an amateur. I was less-than impressed with him pushing the beta-blocker hard; I’ll write about that. Also, get an iWatch if you think you may have A-Fib or any other heart problem. You see a lot, just by watching, so to speak.

Robert Buxbaum, August 3, 2022.

Girls are doing better, Boys are doing far worse.

When I began college in 1972, the majority of engineering students and business students were male. They from the top of their high school classes, and from stable homes mostly; they went on to high paying jobs. Boys also dominated at the bottom of society. They were the majority of the criminals, drug addicts, and high-school dropouts. Many went off to Vietnam. Some, those who were handy, went to trade schools and a reasonable life, productive life. Society did not seem bothered by the destruction of boys in prison, or Vietnam, or by drugs, but there was an outcry that so few women achieved high academic levels. A famous presentation of the problem was called “for every 100 girls.” An updated version appears below showing the status as of October, 2021. A more detailed version appears further down.

From the table above, you can see that women are now the majority of those in college, the majority of those with a bachelors degree or higher, and a majority of those with advanced degrees. Colleges added special tutoring, special grants, and special programs. Each college had a Society of Women Engineers office, and similar programs in law and math. All of these explicitly excluded men or highly discouraged their presence. The curriculum was changed too; made more female-friendly. Dirty, and physical experiments were removed, replaced with group analysis of the social interactions — important aspects of engineers that boys were far-less adept at doing well. Perhaps society and engineering is better off now, but boys (men) are far worse off. This is particularly seem by the following chart, looking at the bottom. Boys/men provide the vast majority of the prison population, of those diagnosed as learning disabled, of those expelled, or overdosed, and among the war dead.

I’ve previously noted that a majority of boys in school are considered disruptive, and that these boys are routinely diagnosed as ADHD and drugged. It is not at all clear that this is a good thing, or that the drugs help anyone but the teacher. I’ve also noted that artwork and attitudes that were considered normal for boys are now considered disturbing and criminal like saying I wish the school was blown up. The cure here, perhaps is worse than the disease. I’m not saying that we should encourage boys to say such things, but that we should acknowledge a difference between an active and a passive wish. And we should find a way to educate boys/men so they don’t end up unemployed, addicted, or dead. Currently boy, particularly those at the bottom are on the scrap-heap of society.

Here is some source material for the above:

Robert Buxbaum, May 28, 2022

The equation behind Tinder, J-swipe, and good chess matchups.

Near the beginning of the movie “The social network”, Zuckerberg asks his Harvard roommate, Saverin, to explain the chess rating system. His friend writes an equation on the window, Zuckerberg looks for a while, nods, and uses it as a basis for Facemash, the predecessor of Facebook. The dating site, Tinder said it used this equation to match dates, but claims to have moved on from there, somewhat. The same is likely true at J-swipe, a jewish coating site, and Christian mingle.

Scene from the social network, Saverin shows Zuckerberg the equations for the expected outcome of a chess match between players of different rankings, Ra and Rb.

I’ll explain how the original chess ranking system worked, and then why it works also for dating. If you’ve used Tinder or J-swipe, you know that they provide fairly decent matches based on a brief questionnaire and your pattern of swiping left or right on pictures of people, but it is not at all clear that your left-right swipes are treated like wins and losses in a chess game: your first pairings are with people of equal rating.

Start with the chess match equations. These were developed by Anand Elo (pronounced like hello without the h) in the 1950s, a physics professor who was the top chess player in Wisconsin at the time. Based on the fact that chess ability changes relatively slowly (usually) he chose to change a persons rating based on a logistic equation, sigmoid model of your chances of winning a given match. He set a limit to the amount your rating could change with a single game, but the equation he chose changed your rating fastest when you someone much better than you or lost to someone much weaker. Based on lots of inaccurate comparisons, the game results, you get a remarkably accurate rating of your chess ability. Also, as it happens, this chess rating also works well to match people for chess games.

The knowledge equation, an S curve that can be assumed to relate to the expected outcome of chess matchups or dating opportunities.

For each player in a chess match, we estimate the likelihood that each player will win, lose or tie based on the difference in their ratings, Ra -Rb and the sigmoid curve at left. We call these expected outcome Ea for player A, and Eb for player B where Ea = Eb = is 50% when Ra = RB. It’s seen that Ea never exceeds 1; you can never more than 100% certain about a victory. The S-graph shows several possible estimates of Ea where x= Ra-Rb, and k is a measure of how strongly we imagine this difference predicts outcome. Elo chose a value of k such that 400 points difference in rating gave the higher ranked player a 91% expectation of winning.

To adjust your rating, the outcomes of a game is given a number between 1 and 0, where 1 represents a win, 0 a loss, and 0.5 a draw. Your rating changes in proportion to the difference between this outcome and your expected chance of winning. If player A wins, his new rating, Ra’, is determined from the old rating, Ra as follows:

Ra’ = Ra + 10 (1 – Ea)

It’s seen that one game can not change your rating by any more than 10, no matter how spectacular the win, nor can your rating drop by any more than 10 if you lose. If you lose, Ra’ = Ra – 10 Ea. New chess players are given a start ranking, and are matched with other new players at first. For new players, the maximum change is increased to 24, so you can be placed in a proper cohort that much quicker. My guess is that something similar is done with new people on dating sites: a basic rating (or several), a basic rating, and a fast rating change at first that slows down later.

As best I can tell, dating apps use one or more ratings to solve a mathematical economics problem called “the stable marriage problem.” Gayle and Shapely won the Nobel prize in economics for work on this problem. The idea of the problem is to pair everyone in such a way that no couple is happier by a swap of partners. It can be shown that there is always a solution that achieves that. If there is a singe, understood ranking, one way of achieving this stable marriage pairing is by pairing best with best, 2nd with second, and thus all the way down. The folks at the bottom may not be happy with their mates, but neither is there a pair that would like to switch mates with them.

Part of this, for better or worse, is physical attractiveness. Even if the low ranked (ugly) people are not happy with the people they are matched with, they may be happy to find that these people are reasonably happy with them. Besides a rating based on attractiveness, there is a rating based on age and location; sexual orientation and religiosity. On J-swipe and Tinder, people are shown others that are similar to them in attractiveness, and similar to the target in other regards. The first people you are shown are people who have already swiped right for you. If you agree too, you agree to a date, at least via a text message. Generally, the matches are not bad, and having immediate successes provides a nice jolt of pleasure at the start.

Religious dating sites, J-swipe and Christian Mingle work to match men with women, and to match people by claimed orthodoxy to their religion. Tinder is a lot less picky: not only will they match “men looking for men” but they also find that “men looking for women” will fairly often decide to date other “men looking for women”. The results of actual, chosen pairings will then affect future proposed pairings so that a man who once dates a man will be shown more men as possible dates. In each of the characteristic rankings, when you swipe right it is taken as a win for the person in the picture, if you swipe left it’s a loss: like a game outcome of 1 or 0. If both of you agree, or don’t it’s like a tie. Your rating on the scale of religion or beauty goes up or down in proportion to the difference between the outcome and the predictions. If you date a person of the same sex, it’s likely that your religion rating drops, but what do I know?

One way or another, this system seems to work at least as well as other matchmaking systems that paired people based on age, height, and claims of interest. If anything, I think there is room for far more applications, like matching doctors to patients in a hospital based on needs, skills, and availability, or matching coaches to players.

Robert Buxbaum, December 31, 2020. In February, at the beginning of the COVID outbreak I claimed that the disease was a lot worse than thought by most, but the it would not kill 10% of the population as thought by the alarmist. The reason: most diseases follow the logistic equation, the same sigmoid.

A mathematical approach to finding Mr (or Ms) Right.

A lot of folks want to marry their special soulmate, and there are many books to help get you there, but I thought I might discuss a mathematical approach that optimizes your chance of marrying the very best under some quite-odd assumptions. The set of assumptions is sometimes called “the fussy suitor problem” or the secretary problem. It’s sometimes presented as a practical dating guide, e.g. in a recent Washington Post article. My take, is that it’s not a great strategy for dealing with the real world, but neither is it total nonsense.

The basic problem was presented by Martin Gardner in Scientific American in 1960 or so. Assume you’re certain you can get whoever you like (who’s single); assume further that you have a good idea of the number of potential mates you will meet, and that you can quickly identify who is better than whom; you have a desire to marry none but the very best, but you don’t know who’s out there until you date, and you’ve an the inability to go back to someone you’ve rejected. This might be he case if you are a female engineering student studying in a program with 50 male engineers, all of whom have easily bruised egos. Assuming the above, it is possible to show, using Riemann Integrals (see solution here), that you maximize your chance of finding Mr/Ms Right by dating without intent to marry 36.8 % of the fellows (1/e), and then marrying the first fellow who’s better than any of the previous you’ve dated. I have a simpler, more flexible approach to getting the right answer, that involves infinite serieses; I’ll hope to show off some version of this at a later date.

Bluto, Popeye, or wait for someone better? In the cartoon as I recall, she rejects the first few people she meets, then meets Bluto and Popeye. What to do?

With this strategy, one can show that there is a 63.2% chance you will marry someone, and a 36.8% you’ll wed the best of the bunch. There is a decent chance you’ll end up with number 2. You end up with no-one if the best guy appears among the early rejects. That’s a 36.8% chance. If you are fussy enough, this is an OK outcome: it’s either the best or no-one. I don’t consider this a totally likely assumption, but it’s not that bad, and I find you can recalculate fairly easily for someone OK with number 2 or 3. The optimal strategy then, I think, is to date without intent at the start, as before, but to take a 2nd or 3rd choice if you find you’re unmarried after some secondary cut off. It’s solvable by series methods, or dynamic computing.

It’s unlikely that you have a fixed passel of passive suitors, of course, or that you know nothing of guys at the start. It also seems unlikely that you’re able to get anyone to say yes or that you are so fast evaluating fellows that there is no errors involved and no time-cost to the dating process. The Washington Post does not seem bothered by any of this, perhaps because the result is “mathematical” and reasonable looking. I’m bothered, though, in part because I don’t like the idea of dating under false pretense, it’s cruel. I also think it’s not a winning strategy in the real world, as I’ll explain below.

One true/useful lesson from the mathematical solution is that it’s important to learn from each date. Even a bad date, one with an unsuitable fellow, is not a waste of time so long as you leave with a better sense of what’s out there, and of what you like. A corollary of this, not in the mathematical analysis but from life, is that it’s important to choose your circle of daters. If your circle of friends are all geeky engineers, don’t expect to find Prince Charming among them. If you want Prince Charming, you’ll have to go to balls at the palace, and you’ll have to pass on the departmental wine and cheese.

If you want Prince Charming, you may have to associate with a different crowd from the one you grew up with. Whether that’s a good idea for a happy marriage is another matter.

The assumptions here that you know how many fellows there are is not a bad one, to my mind. Thus, if you start dating at 16 and hope to be married by 32, that’s 16 years of dating. You can use this time-frame as a stand in for total numbers. Thus if you decide to date-for-real after 37%, that’s about age 22, not an unreasonable age. It’s younger than most people marry, but you’re not likely to marry the fort person you meet after age 22. Besides, it’s not great dating into your thirties — trust me, I’ve done it.

The biggest problem with the original version of this model, to my mind, comes from the cost of non-marriage just because the mate isn’t the very best, or might not be. This cost gets worse when you realize that, even if you meet prince charming, he might say no; perhaps he’s gay, or would like someone royal, or richer. Then again, perhaps the Kennedy boy is just a cad who will drop you at some time (preferably not while crossing a bridge). I would therefor suggest, though I can’t show it’s optimal that you start out by collecting information on guys (or girls) by observing the people around you who you know: watch your parents, your brothers and sisters, your friends, uncles, aunts, and cousins. Listen to their conversation and you can get a pretty good idea of what’s available even before your first date. If you don’t like any of them, and find you’d like a completely different circle, it’s good to know early. Try to get a service job within ‘the better circle’. Working with people you think you might like to be with, long term, is a good idea even if you don’t decide to marry into the group in the end.

Once you’ve observed and interacted with the folks you think you might like, you can start dating for real from the start. If you’re super-organized, you can create a chart of the characteristics and ‘tells’ of characteristics you really want. Also, what is nice but not a deal-breaker. For these first dates, you can figure out the average and standard deviation, and aim for someone in the top 5%. A 5% target is someone whose two standard deviations above the average. This is simple Analysis of variation math (ANOVA), math that I discussed elsewhere. In general you’ll get to someone in the top 5% by dating ten people chosen with help from friends. Starting this way, you’ll avoid being unreasonably cruel to date #1, nor will you loose out on a great mate early on.

Some effort should be taken to look at the fellow’s family and his/her relationship with them. If their relationship is poor, or their behavior is, your kids may turn out similar.

After a while, you can say, I’ll marry the best I see, or the best that seems like he/she will say yes (a smaller sub-set). You should learn from each date, though, and don’t assume you can instantly size someone up. It’s also a good idea to meet the family since many things you would not expect seem to be inheritable. Meeting some friends too is a good idea. Even professionals can be fooled by a phony, and a phony will try to hide his/her family and friends. In the real world, dating should take time, and even if you discover that he/ she is not for you, you’ll learn something about what is out there: what the true average and standard deviation is. It’s not even clear that people fall on a normal distribution, by the way.

Don’t be too upset if you reject someone, and find you wish you had not. In the real world you can go back to one of the earlier fellows, to one of the rejects, if one does not wait too long. If you date with honesty from the start you can call up and say, ‘when I dated you I didn’t realize what a catch you were’ or words to that effect. That’s a lot better than saying ‘I rejected you based on a mathematical strategy that involved lying to all the first 36.8%.’

Robert Buxbaum, December 9, 2019. This started out as an essay on the mathematics of the fussy suitor problem. I see it morphed into a father’s dating advice to his marriage-age daughters. Here’s the advice I’d given to one of them at 16. I hope to do more with the math in a later post.

Vitamin A and E, killer supplements; B, C, and D are meh.

It’s often assumed that vitamins and minerals are good for you, so good for you that people buy all sorts of supplements providing more than the normal does in hopes of curing disease. Extra doses are a mistake unless you really have a mis-balanced diet. I know of no material that is good in small does that is not toxic in large doses. This has been shown to be so for water, exercise, weight loss, and it’s true for vitamins, too. That’s why there is an RDA (a Recommended Daily Allowance). 

Lets begin with Vitamin A. That’s beta carotene and its relatives, a vitamin found in green and orange fruits and vegetables. In small doses it’s good. It prevents night blindness, and is an anti-oxidant. It was hoped that Vitamin A would turn out to cure cancer too. It didn’t. In fact, it seems to make cancer worse. A study was preformed with 1029 men and women chosen random from a pool that was considered high risk for cancer: smokers, former smokers, and people exposed to asbestos. They were given either15 mg of beta carotene and 25,000 IU of vitamin A (5 times the RDA) or a placebo. Those taking the placebo did better than those taking the vitamin A. The results were presented in the New England Journal of Medicine, read it here, with some key findings summarized in the graph below.

Comparison of cumulative mortality and cardiovascular disease between those receiving Vitamin A (5 times RDA) and those receiving a placebo. From Omenn et. al, Clearly, this much vitamin A does more harm than good.

The main causes of death were, as typical, cardiovascular disease and cancer. As the graph shows, the rates of death were higher among people getting the Vitamin A than among those getting nothing, the placebo. Why that is so is not totally clear, but I have a theory that I presented in a paper at Michigan state. The theory is that your body uses oxidation to fight cancer. The theory might be right, or wrong, but what is always noticed is that too much of a good thing is never a good thing. The excess deaths from vitamin A were so significant that the study had to be cancelled after 5 1/2 years. There was no responsible way to continue. 

Vitamin E is another popular vitamin, an anti-oxidant, proposed to cure cancer. As with the vitamin A study, a large number of people who were at high risk  were selected and given either a large dose  of vitamin or a placebo. In this case, 35,000 men over 50 years old were given either vitamin E (400 to 660 IU, about 20 times the RDA) and/or selenium or a placebo. Selenium was added to the test because, while it isn’t an antioxidant, it is associated with elevated levels of an anti-oxidant enzyme. The hope was that these supplements would prevent cancer and perhaps ward off Alzheimer’s too. The full results are presented here, and the key data is summarized in the figure below. As with vitamin A, it turns out that high doses of vitamin E did more harm than good. It dramatically increased the rate of cancer and promoted some other problems too, including diabetes.  This study had to be cut short, to only 7 years, because  of the health damage observed. The long term effects were tracked for another two years; the negative effects are seen to level out, but there is still significant excess mortality among the vitamin takers. 

Cumulative incidence of prostate cancer with supplements of selenium and/or vitamin E compared to placebo.

Cumulative incidence of prostate cancer with supplements of selenium and/or vitamin E compared to placebo.

Selenium did not show any harmful or particularly beneficial effects in these tests, by the way, and it may have reduced the deadliness of the Vitamin A.. 

My theory, that the body fights cancer and other disease by oxidation, by rusting it away, would explain why too much antioxidant will kill you. It laves you defenseless against disease As for why selenium didn’t cause excess deaths, perhaps there are other mechanisms in play when the body sees excess selenium when already pumped with other anti oxidant. We studied antioxidant health foods (on rats) at Michigan State and found the same negative effects. The above studies are among the few done with humans. Meanwhile, as I’ve noted, small doses of radiation seem to do some good, as do small doses of chocolate, alcohol, and caffeine. The key words here are “small doses.” Alcoholics do die young. Exercise helps too, but only in moderation, and since bicycle helmets discourage bicycling, the net result of bicycle helmet laws may be to decrease life-span

What about vitamins B, C, and D? In normal doses, they’re OK, but as with vitamin A and E you start to see medical problems as soon as you start taking more– about  12 times the RDA. Large does of vitamin B are sometimes recommended by ‘health experts’ for headaches and sleeplessness. Instead they are known to produce skin problems, headaches and memory problems; fatigue, numbness, bowel problems, sensitivity to light, and in yet-larger doses, twitching nerves. That’s not as bad as cancer, but it’s enough that you might want to take something else for headaches and sleeplessness. Large does of Vitamin C and D are not known to provide any health benefits, but result in depression, stomach problems, bowel problems, frequent urination, and kidney stones. Vitamin C degrades to uric acid and oxalic acid, key components of kidney stones. Vitamin D produces kidney stones too, in this case by increasing calcium uptake and excretion. A recent report on vitamin D from the Mayo clinic is titled: Vitamin D, not as toxic as first thought. (see it here). The danger level is 12 times of the RDA, but many pills contain that much, or more. And some put the mega does in a form, like gummy vitamins” that is just asking to be abused by a child. The pills positively scream, “Take too many of me and be super healthy.”

It strikes me that the stomach, bowel, and skin problems that result from excess vitamins are just the problems that supplement sellers claim to cure: headaches, tiredness, problems of the nerves, stomach, and skin.  I’d suggest not taking vitamins in excess of the RDA — especially if you have skin, stomach or nerve problems. For stomach problems; try some peniiiain cheese. If you have a headache, try an aspirin or an advil. 

In case you should want to know what I do for myself, every other day or so, I take 1/2 of a multivitamin, a “One-A-Day Men’s Health Formula.” This 1/2 pill provides 35% of the RDA of Vitamin A, 37% of the RDA of Vitamin E, and 78% of the RDA of selenium, etc. I figure these are good amounts and that I’ll get the rest of my vitamins and minerals from food. I don’t take any other herbs, oils, or spices, either, but do take a baby aspirin daily for my heart. 

Robert Buxbaum, May 23, 2019. I was responsible for the statistics on several health studies while at MichiganState University (the test subjects were rats), and I did work on nerves, and on hydrogen in metals, and nuclear stuff.  I’ve written about statistics too, like here, talking about abnormal distributions. They’re common in health studies. If you don’t do this analysis, it will mess up the validity of your ANOVA tests. That said,  here’s how you do an anova test

Statistics for psychologists, sociologists, and political scientists

In terms of mathematical structure, psychologists, sociologists, and poly-sci folks all do the same experiment, over and over, and all use the same simple statistical calculation, the ANOVA, to determine its significance. I thought I’d explain that experiment and the calculation below, walking you through an actual paper (one I find interesting) in psychology / poly-sci. The results are true at the 95% level (that’s the same as saying p >0.05) — a significant achievement in poly-sci, but that doesn’t mean the experiment means what the researchers think. I’ll then suggest another statistic measure, r-squared, that deserves to be used along with ANOVA.

The standard psychological or poly-sci research experiments involves taking a group of people (often students) and giving them a questionnaire or test to measure their feelings about something — the war in Iraq, their fear of flying, their degree of racism, etc. This is scored on some scale to get an average. Another, near-identical group of subjects is now brought in and given a prompt: shown a movie, or a picture, or asked to visualize something, and then given the same questionnaire or test as the first group. The prompt is shown to have changed to average score, up or down, an ANOVA (analysis of variation) is used to show if this change is one the researcher can have confidence in. If the confidence exceeds 95% the researcher goes on to discuss the significance, and submits the study for publication. I’ll now walk you through the analysis the old fashioned way: the way it would have been done in the days of hand calculators and slide-rules so you understand it. Even when done this way, it only takes 20 minutes or so: far less time than the experiment.

I’ll call the “off the street score” for the ith subject, Xi°. It would be nice if papers would publish these, but usually they do not. Instead, researchers publish the survey and the average score, something I’ll call X°-bar, or X°. they also publish a standard deviation, calculated from the above, something I’ll call, SD°. In older papers, it’s called sigma, σ. Sigma and SD are the same thing. Now, moving to the group that’s been given the prompt, I’ll call the score for the ith subject, Xi*. Similar to the above, the average for this prompted group is X*, or X°-bar, and the standard deviation SD*.

I have assumed that there is only one prompt, identified by an asterix, *, one particular movie, picture, or challenge. For some studies there will be different concentrations of the prompt (show half the movie, for example), and some researchers throw in completely different prompts. The more prompts, the more likely you get false positives with an ANOVA, and the more likely you are to need to go to r-squared. Warning: very few researchers do this, intentionally (and crookedly) or by complete obliviousness to the math. Either way, if you have a study with ten prompt variations, and you are testing to 95% confidence your result is meaningless. Random variation will give you this result 50% of the time. A crooked researcher used ANOVA and 20 prompt variations “to prove to 95% confidence” that genetic modified food caused cancer; I’ll assume (trust) you won’t fall into that mistake, and that you won’t use the ANOVA knowledge I provide to get notoriety and easy publication of total, un-reproducible nonsense. If you have more than one or two prompts, you’ve got to add r-squared (and it’s probably a good idea with one or two). I’d discuss r-squared at the end.

I’ll now show how you calculate X°-bar the old-fashioned way, as would be done with a hand calculator. I do this, not because I think social-scientists can’t calculate an average, nor because I don’t trust the ANOVA function on your laptop or calculator, but because this is a good way to familiarize yourself with the notation:

X°-bar = X° = 1/n° ∑ Xi°.

Here, n° is the total number of subjects who take the test but who have not seen the prompt. Typically, for professional studies, there are 30 to 50 of these. ∑ means sum, and Xi° is the score of the ith subject, as I’d mentioned. Thus, ∑ Xi° indicates the sum of all the scores in this group, and 1/n° is the average, X°-bar. Convince yourself that this is, indeed the formula. The same formula is used for X*-bar. For a hand calculation, you’d write numbers 1 to n° on the left column of some paper, and each Xi° value next to its number, leaving room for more work to follow. This used to be done in a note-book, nowadays a spreadsheet will make that easier. Write the value of X°-bar on a separate line on the bottom.

T-table

T-table

In virtually all cases you’ll find that X°-bar is different from X*-bar, but there will be a lot of variation among the scores in both groups. The ANOVA (analysis of variation) is a simple way to determine whether the difference is significant enough to mean anything. Statistics books make this calculation seem far too complicated — they go into too much math-theory, or consider too many types of ANOVA tests, most of which make no sense in psychology or poly-sci but were developed for ball-bearings and cement. The only ANOVA approach used involves the T-table shown and the 95% confidence (column this is the same as two-tailed p<0.05 column). Though 99% is nice, it isn’t necessary. Other significances are on the chart, but they’re not really useful for publication. If you do this on a calculator, the table is buried in there someplace. The confidence level is written across the bottom line of the cart. 95% here is seen to be the same as a two-tailed P value of 0.05 = 5% seen on the third from the top line of the chart. For about 60 subjects (two groups of 30, say) and 95% certainty, T= 2.000. This is a very useful number to carry about in your head. It allows you to eyeball your results.

In order to use this T value, you will have to calculate the standard deviation, SD for both groups and the standard variation between them, SV. Typically, the SDs will be similar, but large, and the SV will be much smaller. First lets calculate SD° by hand. To do this, you first calculate its square, SD°2; once you have that, you’ll take the square-root. Take each of the X°i scores, each of the scores of the first group, and calculate the difference between each score and the average, X°-bar. Square each number and divide by (n°-1). These numbers go into their own column, each in line with its own Xi. The sum of this column will be SD°2. Put in mathematical terms, for the original group (the ones that didn’t see the movie),

SD°2 = 1/(n°-1) ∑ (Xi°- X°)2

SD° = √SD°2.

Similarly for the group that saw the movie, SD*2 = 1/(n*-1) ∑ (Xi*- X*)2

SD* = √SD*2.

As before, n° and n* are the number of subjects in each of the two groups. Usually you’ll aim for these to be the same, but often they’ll be different. Some students will end up only seeing half the move, some will see it twice, even if you don’t plan it that way; these students’ scares can not be used with the above, but be sure to write them down; save them. They might have tremendous value later on.

Write down the standard deviations, SD for each group calculated above, and check that the SDs are similar, differing by less than a factor of 2. If so, you can take a weighted average and call it SD-bar, and move on with your work. There are formulas for this average, and in some cases you’ll need an F-table to help choose the formula, but for my purposes, I’ll assume that the SDs are similar enough that any weighted average will do. If they are not, it’s likely a sign that something very significant is going on, and you may want to re-think your study.

Once you calculate SD-bar, the weighted average of the SD’s above, you can move on to calculate the standard variation, the SV between the two groups. This is the average difference that you’d expect to see if there were no real differences. That is, if there were no movie, no prompt, no nothing, just random chance of who showed up for the test. SV is calculated as:

SV = SD-bar √(1/n° + 1/n*).

Now, go to your T-table and look up the T value for two tailed tests at 95% certainty and N = n° + n*. You probably learned that you should be using degrees of freedom where, in this case, df = N-2, but for normal group sizes used, the T value will be nearly the same. As an example, I’ll assume that N is 80, two groups of 40 subjects the degrees of freedom is N-2, or 78. I you look at the T-table for 95% confidence, you’ll notice that the T value for 80 df is 1.99. You can use this. The value for  62 subjects would be 2.000, and the true value for 80 is 1.991; the least of your problems is the difference between 1.991 and 1.990; it’s unlikely your test is ideal, or your data is normally distributed. Such things cause far more problems for your results. If you want to see how to deal with these, go here.

Assuming random variation, and 80 subjects tested, we can say that, so long as X°-bar differs from X*-bar by at least 1.99 times the SV calculated above, you’ve demonstrated a difference with enough confidence that you can go for a publication. In math terms, you can publish if and only if: |X°-X*| ≥ 1.99 SV where the vertical lines represent absolute value. This is all the statistics you need. Do the above, and you’re good to publish. The reviewers will look at your average score values, and your value for SV. If the difference between the two averages is more than 2 times the SV, most people will accept that you’ve found something.

If you want any of this to sink in, you should now do a worked problem with actual numbers, in this case two groups, 11 and 10 students. It’s not difficult, but you should try at least with these real numbers. When you are done, go here. I will grind through to the answer. I’ll also introduce r-squared.

The worked problem: Assume you have two groups of people tested for racism, or political views, or some allergic reaction. One group was given nothing more than the test, the other group is given some prompt: an advertisement, a drug, a lecture… We want to know if we had a significant effect at 95% confidence. Here are the test scores for both groups assuming a scale from 0 to 3.

Control group: 0, 0, 1, 1, 1, 1, 2, 2, 2, 2, 3.  These are the Xi° s; there are 11 of them

Prompted group: 0, 1, 1, 1, 2, 2, 2, 2, 3, 3.  These are the Xi* s; there are 10 of them.

On a semi-humorous side: Here is the relationship between marriage and a PhD.

Robert Buxbaum, March 18, 2019. I also have an explanation of loaded dice and flipped coins; statistics for high school students.

Measles, anti-vaxers, and the pious lies of the CDC.

Measles is a horrible disease that contributed to the downfall that had been declared dead in the US, wiped out by immunization, but it has reappeared. A lot of the blame goes to folks who refuse to vaccinate: anti-vaxers in the popular press. The Center for Disease Control is doing its best to promote to stop the anti-vaxers, and promote vaccination for all, but in doing so, I find they present the risks of measles worse than they are. While I’m sympathetic to the goal, I’m not a fan of bending the truth. Lies hurt the people who speak them and the ones who believe them, and they can hurt the health of immune-compromized children who are pushed to vaccinate. You will see my arguments below.

The CDC’s most-used value for the mortality rate for measles is 0.3%. It appears, for example, in line two of the following table from Orenstein et al., 2004. This table also includes measles-caused complications, broken down by type and patient age; read the full article here.

Measles complications, death rates, US, 1987-2000, CDC.

Measles complications, death rates, US, 1987-2000, CDC, Orenstein et. al. 2004.

The 0.3% average mortality rate seems more in tune with the 1800s than today. Similarly, note that the risk of measles-associated encephalitis is given as 10.1%, higher than the risk of measles-diarrhea, 8.2%. Do 10.1% of measles cases today produce encephalitis, a horrible, brain-swelling disease that often causes death. Basically everyone in the 1950s and early 60s got measles (I got it twice), but there were only 1000 cases of encephalitis per year. None of my classmates got encephalitis, and none died. How is this possible; it was the era before antibiotics. Even Orenstein et. al comment that their measles mortality rates appear to be far higher today than in the 1940s and 50s. The article explains that the increase to 3 per thousand, “is most likely due to more complete reporting of measles as a cause of death, HIV infections, and a higher proportion of cases among preschool-aged children and adults.”

A far more likely explanation is that the CDC value is wrong. That the measles cases that were reported and certified as such are the ones that are the most severe. There were about 450 measles deaths per year in the 1940s and 1950s, and 408 in 1962, the last year before the MMR vaccine was developed and by Dr. Hilleman of Merck (a great man of science, forgotten). In the last two decades there were some 2000 measles cases reported US cases but only one measles death. A significant decline in cases, but the ratio does not support the CDC’s death rate. For a better estimate, I propose to divide the total number of measles deaths in 1962 by the average birth rate in the late 1950s. That is to say, I propose to divide 408 by the 4.3 million births per year. From this, I calculate a mortality rate just under 0.01% in 1962, That’s 1/30th the CDC number, and medicine has improved since 1962.

I suspect that the CDC inflates the mortality numbers, in part by cherry-picking its years. It inflates them further by treating “reported measles cases.” as if they were all measles cases. I suspect that the reported cases in these years were mainly the very severe ones. Mild case measles clears up before being reported or certified as measles. This seems the only normal explanation for why 10.1% of cases include encephalitis, and only 8.2% diarrhea. It’s why the CDC’s mortality numbers suggest that, despite antibiotics, our death rate has gone up by a factor of 30 since 1962.

Consider the experience of people who lived in the early 60s. Most children of my era went to public elementary schools with some 1000 other students, all of whom got measles. By the CDC’s mortality number, we should have seen three measles deaths per school, and 101 cases of encephalitis. In reality, if there had been one death in my school it would have been big news, and it’s impossible that 10% of my classmates got encephalitis. Instead, in those years, only 48,000 people were hospitalized per year for measles, and 1,000 of these suffered encephalitis (CDC numbers, reported here).

To see if vaccination is a good idea, lets now consider the risk of vaccination. The CDC reports their vaccine “is virtually risk free”, but what does risk-free mean? A British study finds vaccination-caused neurological damage in 1/365,000 MMR vaccinations, a rate of 0.00027%, with a small fraction leading to death. These problems are mostly found in immunocompromised patients. I will now estimate the neurological risk for actual measles based on the ratio of encephalitis to births, as before using the average birth rate as my estimate for measles cases; 1000/4,300,000 = 0.023%. This is far lower than the risk the CDC reports, and more in line with experience.

The risk for neurological damage from measles that I calculate is 86 times higher risk than the neurological risk from vaccination, suggesting vaccination is a very good thing, on average: The vast majority of people should get vaccinated. But for people with a weakened immune system, my calculations suggest it is worthwhile to not immunize at 12 months as doctors recommend. The main cause of vaccination death is encephalitis, but this only happens in patients with weakened immune systems. If your child’s immune system is weakened, even by a cold, I’d suggest you wait 1-3 months, and would hope that your doctor would concur. If your child has AIDS, ALS, Lupus, or any other, long-term immune problem, you should not vaccinate at all. Not vaccinating your immune-weakened child will weaken the herd immunity, but will protect your child.

We live in a country with significant herd immunity: Even if there were a measles outbreak, it is unlikely there would be 500 cases at one time, and your child’s chance of running into one of them in the next month is very small assuming that you don’t take your child to Disneyland, or to visit relatives from abroad. Also, don’t hang out with anti-vaxers if you are not vaccinated. Associating with anti-vaxers will dramatically increase your child’s risk of infection.

As for autism: there appears to be no autism advantage to pushing off vaccination. Signs of autism typically appear around 12 months, the same age that most children receive their first-stage MMR shot, so some people came to associate the two. Parents who push-off vaccination do not push-off the child’s chance of developing autism, they just increase the chance their child will get measles, and that their child will infect others. Schools are right to bar such children, IMHO.

I’ve noticed that, with health care in, particular, there is a tendency for researchers to mangle statistics so that good things seem better than they are. Health food: is not necessarily so healthy as they say; nor is weight lossBicycle helmets: ditto. Sometimes this bleeds over to outright lies. Generic modified grains were branded as cancer-causing based on outright lies and  missionary zeal. I feel that I help a bit, in part by countering individual white lies; in part by teaching folks how to better read statistic arguments. If you are a researcher, I strongly suggest you do not set up your research with a hypothesis so that only one outcome will be publishable or acceptable. Here’s how.

Robert E. Buxbaum, December 9, 2018.