So, a few days ago, I listed six elements of good science and then more recently I described why both Large-N and Time were important and how having small-Ns and short time horizons made making statements about causality more difficult. Today I want to discuss the role of randomness in setting up an experiment.
Large Sample Sizes
- Random Selection
- Random Assignment
Basically, the point of randomness is that there are a lot of things that we won’t be measuring in any given experiment and many times we won’t be wholly aware of what some of those things are. What?!?! Let me explain.
In a typical story on a scientific discovery, it is pretty common to read something like “The researchers controlled for sex, educational level, income, and race.” What this means is that scientists observed and noted these characteristics and then used a statistical procedure to systematically erase any effect from that factor from the analysis. One, not entirely accurate, way to think about it is like this: Let’s assume that what the scientist wants to know is the effect of a medicine on sex drive. The hypothesis is something like, “If we give subjects Libidinex, then they will report more interest in sex.” But let’s assume we know from a previous survey that both men and young people already report higher than average interest in sex than the “average person.” So the scientists would want to make sure there were men and women in experimental and the control groups and that there were young adults, adults, and older adults in both groups as well. Then they could make sure that the effect of the drug was not limited to simply those groups that were more prone to be interested in sex. Basically, the scientist can look at all the known groups separately to see if the drug has an effect above and beyond any effect of the known variables.
The problem of course is that we cannot measure everything. What if the drug only works if you take it precisely two hours after waking up, or what if it works best on people who live at higher altitudes. What if it’s best on humans who have a DNA combination that, to date, is completely unknown to have an effect on this kind of medicine? Scientists pick the usual suspects that we know tend to exert an influence on the type of intervention they want to learn about. But they can’t know or test all the other factors that might also affect the results.
That’s where randomness comes in.
There are two places randomness shows up in an experiment. First there is just picking who is going to be in the experiment altogether. This is random selection. And then there’s randomly assigning those folks into the various test groups: the control group and the experimental group(s).
The second, random assignment, is more common than the first. To do true random selection, you would literally have to have some method of choosing random people from a population and forcing them to participate. This does happen, but it’s pretty rare. Typically, some method of ensuring randomness is concocted (e.g., we will ask every 6th person that passes us by to take part in our survey) and then a response is determined ahead of time to handle what happens when a person so chosen refuses to take part. The fact is however, that you can never get rid of the fact that is will always be possible that the people who choose to take part systematically differ (somehow) from those that choose not to. This problem is compounded when recruitment consisted of an ad in a paper or from posters hung up around campus. In that case you have narrowed your participants to people who both (1) read that paper/live or work on campus and (2) agree to participate. This is even further confounded if your subjects are people with a specific health or behavioral issue in which case you even further limit the participants to (3) people who know they have the condition and (4) where severity may cause people act as an inducement then you may only test the worst cases (which can influence your findings by displaying better results than they would have on “the average” person with that condition.
Random assignment takes care of some of the effects of lack of random assignment. It can’t entirely erase the selection bias, but if we think of all the factors that might drive someone to participate, the least we can do is make sure those get spread around evenly into all the required test groups. And again, random assignment is the method of trying to equally distribute all those factors we don’t know to measure (as well as the ones we do know to measure).
Where random assignment is often lacking are in the clinical applications of medical interventions. To give you an example from my field:
- 95% of smokers that try to quit, will try to quit without any help at all. 5% of them will be successful.
- 5% of smokers that try to quit will use nicotine replacement therapy to help them (like the nicotine gum or patch). Of these about 25% will successfully quit.
Now the question before us is this: is there some systematic difference between those that choose to use NRT and those that do not. For example: NRT costs money and we know that there is a correlation between income levels and smoking rates. We also know that lower income folks have a harder time quitting. So perhaps the use of NRT is proxying for income to some degree and it makes it look like NRT is more effective that it would be if we just gave it (for free) to the 95% of people that would otherwise try to quit unaided. Maybe the use of medicine is selecting out certain at-risk minority groups that distrust Western medicine. Perhaps it’s selecting out older folks who have dental problems (the gum is hard to chew) but who may be more addicted to nicotine (because of how long they’ve been smoking). Maybe using NRT is something that more educated people know about–or maybe the instructions are hard to read (they are) and so less literate individuals shy away from it. Any of these factors are possible. And we can answer this question if we simply (1) randomly assign (2) smokers (3) who are seeking help (4) to one of two groups one (4a) where they get the NRT or another (4b) where they get a placebo. This random assignment can help us determine if NRT “works” on “everyone” or just those who are inclined to use this kind of help. [It does.]
The general idea behind randomness (and you can get your PhD in studying how randomness works, so consider this a really basic intro) …the general idea is that if you get two large enough groups constituted of people selected randomly, they will resemble each other in just about every factor you can think of–and the assumption then is that they also resemble each other in the all the way you can’t think of. So if an intervention has an effect on the experimental group, it can be assumed it would have a measurable effect on any group of random people. This is what we call generalizability…or external validity. The combination of these two common terms does a pretty good job of explaining what you can do if you have random selection and random assignment–and a pretty good idea of what you cannot do if you don’t: You can generalize the findings to groups of people outside the test group.
Basically most experiments are missing one or both of these features. So any conclusions from them should be limited to precisely the groups they studied, i.e., freshmen and sophomore, men and women, at university, in one of the Rust Belt states. If young, moderately educated, mostly white adolescents from middle class Midwestern families lose weight when they eat chocolate, it probably shouldn’t be interpreted as a rule for a 45 year old black man from northeast Mississippi to follow.
And a key point is that this problem does not go away with large-Ns. It’s often the case that really large Ns happened because of very successful efforts to achieve randomness, but this is not logically necessary. If a random survey was conducted among participants in the Million Man March the N could be very large, and they could be “random” in the sense that each individual was chosen from a very specific algorithm designed to achieve random participation. But in that case, the locale itself has already done some selection for you. You would not necessarily want to use that survey to predict anything about the general population. Similarly a very large survey of the “Residents of Minneapolis/St. Paul” will only reliably be able to tell you what the residents of Twin Cities feel on the topics questioned. Residency in a town has already done some of the selection for you and N doesn’t matter even if you achieved a 100% response rate; even if you asked the same questions for 30 years.