The Stanford Prison Experiment is arguably one of the most famous studies in the discipline of social psychology. Mentioning the study by name generally evokes images of the darker side of the human condition. As you can see from my previous essay detailing the reported qualitative details of the study.  Dr. Zimbardo’s controversial study garnered much attention to the ethical considerations of psychological research. While the study has been widely cited and profoundly influential in recent years it has come under fire. Back in 2018, journalist Ben Blum, published an expose scrutinizing the validity of Zimbardo’s work. Exposing major methodological flaws that most likely compromised the results. Even inferring that Zimbardo attempted to manipulate variables to influence the results.


Could one of psychology’s most well known and influential studies be completely invalid? Contrived and orchestrated like a school play? Such a determination veers into murky waters.  It can be said confidentially that Zimbarado’s methods were flawed  From the standpoint of methodology, the ethical considerations are a whole other subject. When applying the scientific method to research it is imperative to control for any confounding variables. This is the only way to confirm that the results are being impacted by the variables being manipulated by the experimenter. Otherwise, the results fall victim to the third variable problem. Making it impossible to derive causation from the results of the study. At the very least Zimbardo was derelict in regards to preventing outside factors from contaminating the results.


From a methodologically the Stanford Prison Experiment suffers from poor data collect, faulty participant selection,  and the demand characteristics of the study.


Poor Data Collection:  


Anyone who has read Zimbardo’s 1971 paper can tell you two characteristics are striking. The first being the unorthodox composition of the paper. The second being the paucity of hard data. The details of the paper are almost entirely qualitative. Making the term experiment an unfitting title for the study. French researcher Thibault Le Texier would most likely agree. In his paper, Debunking The Stanford Prison Experiment, he highlights many of the methodological flaws in the study. His research reveals that only 15% of the total “experiment” was recorded. “6 hr of video and 15 hr
of audio” out of the total 150 hours devoted to the experiment. No data was collected during day three of the study (p.12). Such gaps in data collection can only put the results of the study in question. Without sufficient data, the researchers are merely speculating. Presenting speculation as scientific findings are intellectually dishonest and problematic.


A touchstone of scientific inquiry is the ability to control for confounding variables. Extraneous variables that influence attributes being studied and swaying the results. How do we know that the results of Zimbardo’s study were truly due to the situational conditions of being granted unfettered authority over other people? Unfortunately, we cannot. Per Le Texier’s archival research   Zimbardo “collected very little personal information about the participants”(p.12). This is profoundly problematic if we are expected to draw causal results from this study. Zimbardo neglecting to collect adequate background information on the subjects generates more questions than answers. The cruel behavior of the guards may have been influenced by factors other than the situation. For example personality traits, political beliefs, religious convictions, etc. Not collecting such preliminary data not only skews the results but is just plain sloppy. Any experienced researcher should have known better to be so cavalier.


It has also come to surface that Zimbardo did not collect any data from actual prisons. Again, another fault in data collection that prevents these findings from being generalized. Without data from prisons, it is difficult to not only have an accurate understanding of typical behavior in these environments, but nothing to compare the results. Yes, you could utilize behavior before the experimental conditions as a baseline. However, this does little if you are seeking to make universal claims about the behavioral dynamics of prisons. In the absence of this information how can really can’t. The results could be atypical for the average prison.


Participant Self-selection:


The experiment suffered from one fatal error from the very beginning that could have impacted the results. Zimbardo placed an advertisement in the local paper requesting volunteers for a prison experiment (p.2). Even providing the detail of the study is a “prison” experiment in the process of soliciting participants allows extraneous variables to creep in. Contaminating the results. Individuals who may be interested in a prison study may skew towards people with a specific personality type, ideological convictions, or other proclivities. Thereby generating an applicant pool that may be predisposed towards authoritarian tendencies.  As unlikely as this sounds considering we are talking about a group of college kids in the 1970’s California, it cannot be ruled out. It cannot be ruled out because Zimbardo failed to shield the study from self-selection. This concern would even be a talking point if Zimbardo had merely request for participants for a study versus a “prison study”.


Speaking of an experiment taking place in a prestigious university in the 1970’s California, that is a really specific and unique time and place. Bringing to light another question, the generalization of participants. Generally, when you select subjects for a study, you want the pool of applicants to be as diverse as possible. Why? More diversity greatly reduces the likelihood of sampling error. The general population of the United States is extremely diverse. To reflect this, you need a diverse pool of participants to randomly select from. Otherwise, you run the risk of potentially selecting subjects that maybe all have similar characteristics that do not reflect the overall population. The greater the number and diversity of subjects any peculiarities tend to washout, averaging results that can be generalized. Would a bunch of college students presumably attending Stanford be a good representation of the American population? By any metric or measure that would be a resounding no!


Demand Characteristics:


Demand Characteristics in an experiment are “ques” that subconsciously influence the behavior of the subjects. For example, knowing the experimenter’s expectations or desired results impacting participant behavior. Once again, Zimbardo was derelict in his duty as a researcher to avoid such issues. Zimbardo expressed what his expected and desired outcomes were for the experiment to the guards during orientation (p.5). The guards also expressed feeling as if they were being “watched and filmed” (p.8). It is quite evident that when feel as if we are being observed we are more apt to behave differently. Especially when the lead experimenter has already expressed his opinions about the potential results. This fact is solidified in the testimony of Guard #1:


He wrote to Zimbardo, 3 months after the experiment, “I was always acting [. .] I
was always very conscious of the responsibility involved in the guards’ and the experimenters’ positions; I mentioned this to various people at various times, including to you during the debriefing” (Guard 1, 1971b). He wrote to him again, 3 months later,
I consciously felt that for the experiment to be at all useful ‘guards’ had to act something like guards.

[. . .] I felt that the experiment was important and my being ‘guard-like’ was part of finding out how people react to real oppression. (Guard 1,1972, p. 5)

(Le Texier, 2019, p. 8)



Unfortunately, it is speculated that to a certain extent the study was scripted and fabricated. Extends beyond the concerns of demand characteristics. Le Texier found that Zimbardo had prewritten conclusions for the study (p.13). There is ample evidence that the experimenters had conditioned the prisoners and guards in how to behave (p.10). Explaining to the participants how to behave in the context of the experiment. Zimbardo and the other researchers claim that the cruel behavior of the guards to have occurred organically is beyond spurious. Especially when the subjects were being coached.  To make matters worse, the experiments even played an active role in the experiment. Removing themselves from the role of impartial observers. The role of warden was played by one of Zimbardo’s experimenters.












  1. This was a very helpful summary of the main flaws with the prison experiment–to which I will add one more: it is impossible to “replicate” given the ethical problems with the original design and implementation of the experiment!

      1. I have been struggling with answering this one. It is highly situational and hard to determine if the potential harm is worth the gains.

        The problem to some extent mirrors the Trolley Car problem. To this day I can’t provide a decent solution to that philosophical puzzle.

        Factors such as the magnitude of harm or perceived cost of having harm inflicted play a role. Also, the subjects maybe okay with the adverse consequences for the first couple of years after the experiment, but 10 years down the road regret participating in it.

        Similar to a young adult passively accepting the risk of smoking cigarettes. As they get older they may not be more concern with the health risks. However, the damage has already been done.Its a little late post hoc to attempt to remedy the consequences of risking decisions.

        The asymmetry in information. The research is most has a better grasp of the risks than the potential participant. It mirrors any other decision in life where a large amount of uncertainty is present. I also don’t want to fully blame the participant. They aren’t 100% off the hook for not thinking through the consequences.

        Again, the research also need to be honest about potential risks.

        1. Finished reading your paper. I like the concept of reframing the conversation in a Coasean/Rawlsian context. The purely moral argument, “it’s wrong to kill” rhetoric, ends up being a dead end.

          Obviously there needs to be extenuating circumstances for taking the life of another. The trolley problem has left thinkers spinning their wheels for years. Making it a fruitless endeavor to discuss it. Nice work!

        2. Thanks for your kind words! I have always been fascinated by the Trolley Problem(s), since I was first exposed to it (them) many years ago. Also, although I like Rawls’ (reallly Harsanyi’s) idea of a veil of ignorance, the idea of “reflective equilibrium” is tota bullshit.

        3. I’d agree the veil of ignorance is a great conceptual device depending on the context.

          Overall, I am not a big fan of Rawlsian thinking. Too much focus on the group over the individual. While it is moral to secure equality of opportunity. However, it is a complete fallacy to secure equality of outcomes (IMHO).

