Methods

This study aims to describe the landscape of qualified immunity appeals in federal appellate courts. Specifically, we sought to answer these questions for the study period 2010 through 2020:

How many federal appeals involve qualified immunity?
What government officials are sued in qualified immunity appeals?
What rights violations are alleged in qualified immunity appeals?
What are the key characteristics of qualified immunity appeals?
a. How long does litigation involving qualified immunity appeals last?
b. How many qualified immunity appeals are interlocutory appeals?
c. At what stage of litigation do qualified immunity appeals occur?
d. How often are plaintiffs in qualified immunity appeals represented by counsel?
e. How often are opinions in qualified immunity appeals published?
What are the overall outcomes of qualified immunity appeals, and how frequently is qualified immunity granted or denied?

To obtain the potential universe of qualified immunity opinions, we searched Thomson Reuters’ Westlaw service for any federal appellate court opinion issued between January 1, 2010, and December 31, 2020, containing the phrase “qualified immunity.” ¹ This yielded 7,173 opinions. A central part of our task was determining whether opinions were relevant—that is, whether qualified immunity was raised in the appeal as opposed to merely being mentioned in the opinion.

Given the time it would take human coders to analyze thousands of opinions, we instead used algorithms—computerized instructions, rules, and models—to identify relevant opinions and label them across 33 additional variables. We collected two more variables through a separate process. ² To develop the algorithms and test their reliability, we first coded a random sample of opinions by hand. This section describes our variables, how we developed and tested the algorithms, and the final dataset.

Study Variables

For each opinion, the most important information we recorded was relevance: Was qualified immunity raised on appeal, making the opinion relevant to our study?

Then, for all relevant opinions, we recorded 35 additional fields corresponding to our research questions. These fields, with coding options, are summarized in Table 1 and further defined in our main codebook, available in Appendix D. The codebook also covers exceptions and special cases.

We coded all fields at the level of the opinion rather than claim or alleged violation found within an opinion. We used this approach because our initial research questions focused on the landscape of qualified immunity appeals, not individual claims. ³ This approach is consistent with many other studies on qualified immunity. ⁴ However, it means that if an opinion involved multiple claims, we cannot directly link factors like the defendants, violations, or outcomes to a particular claim.

Moreover, we coded only what was before the court on appeal. For example, if a lawsuit originally involved both law enforcement and prison defendants, but only the law enforcement defendants were involved in the appeal, we coded only the law enforcement defendants.

Table 1: Fields and Variables Included in Study

	Field	Description	Type	Response Options
Basic Information	Relevance	Was qualified immunity raised on appeal in the opinion?	Binary (Y/N)	—
	Circuit Court	The circuit court for the appeal	Text	—
	Circuit Court Case Number	The circuit court case number for the appeal	Text	—
	Opinion Date	The date the opinion was filed/decided	Text (Date)	—
	Plaintiffs	The plaintiffs in the opinion	Text	—
	Defendants	The defendants in the opinion	Text	—
	Judges	The judges who heard the appeal	Text	—
	District Court of Origin	The district court where the appeal originated	Text	—
	District Court Case Number	The case number of the lawsuit in district court	Text	—
	Case Origination Date	The date the lawsuit was initiated in district court	Text (Date)	—
Procedural Details	Appellants	Which party was appealing the district court’s decision?	Categorical	P – Plaintiffs D – Defendants B – Both parties (cross-appellants)
	Published	Was the opinion published?	Binary (Y/N)	—
	En Banc	Did the opinion involve an en banc hearing?	Binary (Y/N)	—
	Interlocutory Appeal	Was the appeal an interlocutory appeal?	Binary (Y/N)	—
	Pro Se Plaintiffs	Did the lawsuit include self-represented plaintiffs?	Categorical	1 – All plaintiffs were pro se for the appeal 0 – No plaintiffs were pro se at any point in the lawsuit ES – Plaintiffs were pro se at an earlier stage in the lawsuit
	Case Stage	What was the procedural stage of the lawsuit at the time of the appeal?	Categorical	D – Dismissal SJ – Summary Judgment B – Both Dismissal and Summary Judgment PT – Post-trial Other – Anything else
Government Defendant Type	Government Level of Defendants	Were the government officials being sued federal or state/local officials?	Categorical	Federal – Only federal State – Only state/local Both – Both federal and state/local
	State Law Enforcement Defendants	Was a state/local law enforcement officer listed as a defendant?	Binary (Y/N)	—
	Federal Law Enforcement Defendants	Was a federal law enforcement officer listed as a defendant?	Binary (Y/N)	—
	State Prison Defendants	Was a state/local prison official listed as a defendant?	Binary (Y/N)	—
	Federal Prison Defendants	Was a federal prison official listed as a defendant?	Binary (Y/N)	—
	Other Defendants	Was a non-law enforcement, non-prison official listed as a defendant in the appeal?	Binary (Y/N)	—
	Task Force Defendants	Were the defendants part of a state/federal law enforcement task force?	Binary (Y/N)	—
Constitutional Violation Type	First Amendment	Did the plaintiffs allege violations related to their First Amendment rights?	Binary (Y/N)	—
	Religious Liberty	Did the plaintiffs allege violations of their right to freely practice their religion? (Note: This field is a sub-field of the “First Amendment” field.)	Binary (Y/N)	—
	Excessive force	Did the plaintiffs allege that the defendants committed a violation related to excessive force?	Binary (Y/N)	—
	False Arrest	Did the plaintiffs allege that the defendants committed violations related to a false arrest, malicious prosecution, or illegal seizure of a person?	Binary (Y/N)	—
	Illegal Search	Did the plaintiffs allege that the defendants committed violations related to an illegal search?	Binary (Y/N)	—
	Procedural Due Process	Did the plaintiffs allege they were deprived of fair process under the due process requirements of the Constitution?	Binary (Y/N)	—
	Care in Custody	Did the alleged violations relate to the (lack of) care provided for the plaintiffs when they were in some form of custody?	Binary (Y/N)	—
	Parental Rights	Did the plaintiffs allege that the defendants interfered with their rights as parents?	Binary (Y/N)	—
	Employment	Were at least some of the alleged violations of constitutional rights in this opinion related to an adverse employment action, a hostile work environment, or unsafe workplace conditions?	Binary (Y/N)	—
Outcomes	Overall Prevailing Party	Who was the prevailing party in the opinion?	Categorical	P – Plaintiffs D – Defendants M – Both the defendants and plaintiffs prevailed in part (mixed)
	Qualified Immunity Granted	Was qualified immunity granted to one or more defendants in this opinion?	Binary (Y/N)	—
	Qualified Immunity Denied	Was qualified immunity denied to one or more defendants in this opinion?	Binary (Y/N)	—
	Lack of Jurisdiction – Factual Dispute	Did the court decline to rule on qualified immunity as it determined it lacked jurisdiction due to a factual dispute?	Binary (Y/N)	—

Developing the Algorithms

To develop the algorithms, we needed a sizable sample of reliably hand-coded opinions. This sample would allow the algorithms to find patterns in the text of opinions, resulting in reliable prediction models. To test the completed algorithms, we needed a similarly reliable, but smaller, sample of hand-coded opinions.

To create these samples, we randomly selected 791 (roughly 11%) of the 7,173 opinions for hand coding. We randomly assigned 604 opinions to the training sample and 187 to the testing sample. ⁵

To ensure accuracy, our human coders were either attorneys or others with substantial knowledge of legal matters generally, if not qualified immunity specifically. ⁶ We also conducted trainings on our codebook and tested coders’ accuracy by requiring them to complete a sample of practice opinions before starting the project. Finally, we employed a multistep quality-control process involving a panel of attorneys with experience in qualified immunity to resolve the thorniest coding decisions. ⁷

Once our human coders completed their work, we used the training sample to build our algorithms. (Appendix A details our process for developing and implementing the algorithms.)

Evaluating the Algorithms’ Reliability

After our algorithms were finished, we needed to evaluate their reliability. To do this, we used the testing sample to compare the datapoints generated by the algorithms to those recorded by our human coders. ⁸

Overall, our algorithms performed very well. Nearly all fields achieved performance statistics above—often well above—those in comparable legal studies. ⁹ Table 2 shows the most relevant performance statistic for each field, providing a general impression of the algorithms’ performance. Appendix B presents all the statistics necessary to gauge the algorithms’ performance in greater detail. ¹⁰

For different types of fields, we used different statistics as our primary performance metric:

For text fields, “accuracy” was our primary metric. In this context, accuracy simply means how often the algorithm recorded the right text (excluding minor typos and other trivial differences, such as punctuation or articles). As Table 2 shows, these fields performed extremely well, with accuracy rates of 99% to 100%.
For categorical fields (i.e., fields with multiple response options), accuracy was again our primary metric. For these fields, accuracy represents the percentage of opinions labeled with the correct option. For example, the prevailing party can be either the plaintiff, the defendant, or, in the case of a mixed decision, both. The accuracy for the prevailing party field was 96.3%, meaning the algorithm applied the right label to 96.3% of opinions. Comparable legal studies report accuracies between 73% and 93%. ¹¹ However, we generally aimed for performance accuracies at or above 95%. ¹² As detailed in Table 2, four out of five categorical fields exceeded this threshold, with the fifth just missing it at 94.4%.
For binary fields (i.e., fields with only yes/no response options), a measure called the “F1 score” was our primary performance metric. F1 scores range from 0 to 1, with 1 being perfect. While comparable legal studies report F1 scores ranging from 0.57 to 0.91, we aimed for F1 scores above 0.9, although we were generally willing to accept scores above 0.8. ¹³ As detailed in Table 2, we mostly succeeded, including achieving near-perfect F1 scores for several critical fields.

We focused most of our analyses on fields with strong performance—i.e., accuracy above 95% or F1 scores above 0.9. We generally avoided making detailed analyses of low-performing fields and fields with minimal data. ¹⁴

Table 2: Summary of Algorithm Performance

Type	Field	Primary Performance Statistic	Performance
Text Fields	Circuit Court	Accuracy	100%
	Circuit Court Case Number	Accuracy	99.4%
	Opinion Date	Accuracy	100%
	Plaintiffs	Accuracy	99.4%
	Defendants	Accuracy	100%
	Judges	Accuracy	100%
	District Court of Origin	Accuracy	99.4%
Categorical Fields	Appellants	Accuracy	99.4%
	Pro Se Plaintiffs (self-represented plaintiffs)	Accuracy	99.4%
	Case Stage (at time of appeal)	Accuracy	94.4%
	Government Level of Defendants	Accuracy	96.9%
	Overall Prevailing Party	Accuracy	96.3%
Binary (Y/N) Fields	Relevance (qualified immunity raised on appeal)	F1 Score	0.95
	Published	F1 Score	1.00
	En Banc	F1 Score	1.00
	Interlocutory Appeal	F1 Score	0.99
	State Law Enforcement Defendants	F1 Score	0.96
	Federal Law Enforcement Defendants	F1 Score	–*
	State Prison Defendants	F1 Score	0.93
	Federal Prison Defendants	F1 Score	0.86
	Other Defendants	F1 Score	0.84
	Task Force Defendants	F1 Score	–*
	First Amendment Violations	F1 Score	0.98
	Religious Liberty Violations	F1 Score	1.00
	Excessive Force Violations	F1 Score	0.97
	False Arrest Violations	F1 Score	0.85
	Illegal Search Violations	F1 Score	0.71
	Procedural Due Process Violations	F1 Score	0.85
	Care in Custody Violations	F1 Score	0.82
	Parental Rights Violations	F1 Score	0.89
	Employment Violations	F1 Score	0.81
	Qualified Immunity Granted	F1 Score	0.91
	Qualified Immunity Denied	F1 Score	0.86
	Lack of Jurisdiction – Factual Dispute	F1 Score	0.80

*This field did not appear in our testing sample, meaning an F1 score could not be calculated.

Note: High-performing fields (95%+ accuracy, 0.9+ F1 score) are shaded dark green. Fields with satisfactory performance (90%+ accuracy, 0.8+ F1 score) are shaded light green. Fields with unsatisfactory performance (<90% accuracy, <0.8 F1 score) are shaded yellow. These ranges are based on the goals of our study and comparable legal studies; nevertheless, they are inherently subjective. For a full range of performance statistics and detailed data distributions for each field, see Appendix B.

Finalizing the Dataset

After evaluating their performance, we ran the algorithms on all 7,173 opinions. In all, the algorithms generated roughly 190,000 datapoints. ¹⁵

Our full final dataset can be found here.

This final dataset is both comprehensive and broad: It encompasses 11 years of qualified immunity appeals and covers a range of seldom-studied attributes, including the types of government officials who were sued and the alleged rights violations at issue. Because of the algorithms’ strong performance, the dataset’s reliability is, for numerous critical fields, comparable to what hand coding could achieve. The scope, breadth, and reliability of this dataset allowed us to explore the landscape of qualified immunity appeals in the circuit courts.

Stay Informed

Wait