Methods
This study aims to describe the landscape of qualified immunity appeals in federal appellate courts. Specifically, we sought to answer these questions for the study period 2010 through 2020:
- How many federal appeals involve qualified immunity?
- What government officials are sued in qualified immunity appeals?
- What rights violations are alleged in qualified immunity appeals?
- What are the key characteristics of qualified immunity appeals?
a. How long does litigation involving qualified immunity appeals last?
b. How many qualified immunity appeals are interlocutory appeals?
c. At what stage of litigation do qualified immunity appeals occur?
d. How often are plaintiffs in qualified immunity appeals represented by counsel?
e. How often are opinions in qualified immunity appeals published? - What are the overall outcomes of qualified immunity appeals, and how frequently is qualified immunity granted or denied?
To obtain the potential universe of qualified immunity opinions, we searched Thomson Reuters’ Westlaw service for any federal appellate court opinion issued between January 1, 2010, and December 31, 2020, containing the phrase “qualified immunity.” 1 This yielded 7,173 opinions. A central part of our task was determining whether opinions were relevant—that is, whether qualified immunity was raised in the appeal as opposed to merely being mentioned in the opinion.
Given the time it would take human coders to analyze thousands of opinions, we instead used algorithms—computerized instructions, rules, and models—to identify relevant opinions and label them across 33 additional variables. We collected two more variables through a separate process. 2 To develop the algorithms and test their reliability, we first coded a random sample of opinions by hand. This section describes our variables, how we developed and tested the algorithms, and the final dataset.
Study Variables
For each opinion, the most important information we recorded was relevance: Was qualified immunity raised on appeal, making the opinion relevant to our study?
Then, for all relevant opinions, we recorded 35 additional fields corresponding to our research questions. These fields, with coding options, are summarized in Table 1 and further defined in our main codebook, available in Appendix D. The codebook also covers exceptions and special cases.
We coded all fields at the level of the opinion rather than claim or alleged violation found within an opinion. We used this approach because our initial research questions focused on the landscape of qualified immunity appeals, not individual claims. 3 This approach is consistent with many other studies on qualified immunity. 4 However, it means that if an opinion involved multiple claims, we cannot directly link factors like the defendants, violations, or outcomes to a particular claim.
Moreover, we coded only what was before the court on appeal. For example, if a lawsuit originally involved both law enforcement and prison defendants, but only the law enforcement defendants were involved in the appeal, we coded only the law enforcement defendants.
Table 1: Fields and Variables Included in Study
|
Field |
Description |
Type |
Response Options |
Basic Information |
Relevance |
Was qualified immunity raised on appeal in the opinion? |
Binary (Y/N) |
— |
Circuit Court |
The circuit court for the appeal |
Text |
— |
|
Circuit Court Case Number |
The circuit court case number for the appeal |
Text |
— |
|
Opinion Date |
The date the opinion was filed/decided |
Text (Date) |
— |
|
Plaintiffs |
The plaintiffs in the opinion |
Text |
— |
|
Defendants |
The defendants in the opinion |
Text |
— |
|
Judges |
The judges who heard the appeal |
Text |
— |
|
District Court of Origin |
The district court where the appeal originated |
Text |
— |
|
District Court Case Number |
The case number of the lawsuit in district court |
Text |
— |
|
Case Origination Date |
The date the lawsuit was initiated in district court |
Text (Date) |
— |
|
Procedural Details |
Appellants |
Which party was appealing the district court’s decision? |
Categorical |
P – Plaintiffs |
Published |
Was the opinion published? |
Binary (Y/N) |
— |
|
En Banc |
Did the opinion involve an en banc hearing? |
Binary (Y/N) |
— |
|
Interlocutory Appeal |
Was the appeal an interlocutory appeal? |
Binary (Y/N) |
— |
|
Pro Se Plaintiffs |
Did the lawsuit include self-represented plaintiffs? |
Categorical |
1 – All plaintiffs were pro se for the appeal |
|
Case Stage |
What was the procedural stage of the lawsuit at the time of the appeal? |
Categorical |
D – Dismissal |
|
Government Defendant Type |
Government Level of Defendants |
Were the government officials being sued federal or state/local officials? |
Categorical |
Federal – Only federal |
State Law Enforcement Defendants |
Was a state/local law enforcement officer listed as a defendant? |
Binary (Y/N) |
— |
|
Federal Law Enforcement Defendants |
Was a federal law enforcement officer listed as a defendant? |
Binary (Y/N) |
— |
|
State Prison Defendants |
Was a state/local prison official listed as a defendant? |
Binary (Y/N) |
— |
|
Federal Prison Defendants |
Was a federal prison official listed as a defendant? |
Binary (Y/N) |
— |
|
Other Defendants |
Was a non-law enforcement, non-prison official listed as a defendant in the appeal? |
Binary (Y/N) |
— |
|
Task Force Defendants |
Were the defendants part of a state/federal law enforcement task force? |
Binary (Y/N) |
— |
|
Constitutional Violation Type |
First Amendment |
Did the plaintiffs allege violations related to their First Amendment rights? |
Binary (Y/N) |
— |
Religious Liberty |
Did the plaintiffs allege violations of their right to freely practice their religion? (Note: This field is a sub-field of the “First Amendment” field.) |
Binary (Y/N) |
— |
|
Excessive force |
Did the plaintiffs allege that the defendants committed a violation related to excessive force? |
Binary (Y/N) |
— |
|
False Arrest |
Did the plaintiffs allege that the defendants committed violations related to a false arrest, malicious prosecution, or illegal seizure of a person? |
Binary (Y/N) |
— |
|
Illegal Search |
Did the plaintiffs allege that the defendants committed violations related to an illegal search? |
Binary (Y/N) |
— |
|
Procedural Due Process |
Did the plaintiffs allege they were deprived of fair process under the due process requirements of the Constitution? |
Binary (Y/N) |
— |
|
Care in Custody |
Did the alleged violations relate to the (lack of) care provided for the plaintiffs when they were in some form of custody? |
Binary (Y/N) |
— |
|
Parental Rights |
Did the plaintiffs allege that the defendants interfered with their rights as parents? |
Binary (Y/N) |
— |
|
Employment |
Were at least some of the alleged violations of constitutional rights in this opinion related to an adverse employment action, a hostile work environment, or unsafe workplace conditions? |
Binary (Y/N) |
— |
|
Outcomes |
Overall Prevailing Party |
Who was the prevailing party in the opinion? |
Categorical |
P – Plaintiffs |
Qualified Immunity Granted |
Was qualified immunity granted to one or more defendants in this opinion? |
Binary (Y/N) |
— |
|
Qualified Immunity Denied |
Was qualified immunity denied to one or more defendants in this opinion? |
Binary (Y/N) |
— |
|
Lack of Jurisdiction – Factual Dispute |
Did the court decline to rule on qualified immunity as it determined it lacked jurisdiction due to a factual dispute? |
Binary (Y/N) |
— |
Developing the Algorithms
To develop the algorithms, we needed a sizable sample of reliably hand-coded opinions. This sample would allow the algorithms to find patterns in the text of opinions, resulting in reliable prediction models. To test the completed algorithms, we needed a similarly reliable, but smaller, sample of hand-coded opinions.
To create these samples, we randomly selected 791 (roughly 11%) of the 7,173 opinions for hand coding. We randomly assigned 604 opinions to the training sample and 187 to the testing sample. 5
To ensure accuracy, our human coders were either attorneys or others with substantial knowledge of legal matters generally, if not qualified immunity specifically. 6 We also conducted trainings on our codebook and tested coders’ accuracy by requiring them to complete a sample of practice opinions before starting the project. Finally, we employed a multistep quality-control process involving a panel of attorneys with experience in qualified immunity to resolve the thorniest coding decisions. 7
Once our human coders completed their work, we used the training sample to build our algorithms. (Appendix A details our process for developing and implementing the algorithms.)
Evaluating the Algorithms’ Reliability
After our algorithms were finished, we needed to evaluate their reliability. To do this, we used the testing sample to compare the datapoints generated by the algorithms to those recorded by our human coders. 8
Overall, our algorithms performed very well. Nearly all fields achieved performance statistics above—often well above—those in comparable legal studies. 9 Table 2 shows the most relevant performance statistic for each field, providing a general impression of the algorithms’ performance. Appendix B presents all the statistics necessary to gauge the algorithms’ performance in greater detail. 10
For different types of fields, we used different statistics as our primary performance metric:
- For text fields, “accuracy” was our primary metric. In this context, accuracy simply means how often the algorithm recorded the right text (excluding minor typos and other trivial differences, such as punctuation or articles). As Table 2 shows, these fields performed extremely well, with accuracy rates of 99% to 100%.
- For categorical fields (i.e., fields with multiple response options), accuracy was again our primary metric. For these fields, accuracy represents the percentage of opinions labeled with the correct option. For example, the prevailing party can be either the plaintiff, the defendant, or, in the case of a mixed decision, both. The accuracy for the prevailing party field was 96.3%, meaning the algorithm applied the right label to 96.3% of opinions. Comparable legal studies report accuracies between 73% and 93%. 11 However, we generally aimed for performance accuracies at or above 95%. 12 As detailed in Table 2, four out of five categorical fields exceeded this threshold, with the fifth just missing it at 94.4%.
- For binary fields (i.e., fields with only yes/no response options), a measure called the “F1 score” was our primary performance metric. F1 scores range from 0 to 1, with 1 being perfect. While comparable legal studies report F1 scores ranging from 0.57 to 0.91, we aimed for F1 scores above 0.9, although we were generally willing to accept scores above 0.8. 13 As detailed in Table 2, we mostly succeeded, including achieving near-perfect F1 scores for several critical fields.
We focused most of our analyses on fields with strong performance—i.e., accuracy above 95% or F1 scores above 0.9. We generally avoided making detailed analyses of low-performing fields and fields with minimal data. 14
Table 2: Summary of Algorithm Performance
Type |
Field |
Primary Performance Statistic |
Performance |
Text Fields |
Circuit Court |
Accuracy |
100% |
Circuit Court Case Number |
Accuracy |
99.4% |
|
Opinion Date |
Accuracy |
100% |
|
Plaintiffs |
Accuracy |
99.4% |
|
Defendants |
Accuracy |
100% |
|
Judges |
Accuracy |
100% |
|
District Court of Origin |
Accuracy |
99.4% |
|
Categorical Fields |
Appellants |
Accuracy |
99.4% |
Pro Se Plaintiffs (self-represented plaintiffs) |
Accuracy |
99.4% |
|
Case Stage (at time of appeal) |
Accuracy |
94.4% |
|
Government Level of Defendants |
Accuracy |
96.9% |
|
Overall Prevailing Party |
Accuracy |
96.3% |
|
Binary (Y/N) Fields |
Relevance (qualified immunity raised on appeal) |
F1 Score |
0.95 |
Published |
F1 Score |
1.00 |
|
En Banc |
F1 Score |
1.00 |
|
Interlocutory Appeal |
F1 Score |
0.99 |
|
State Law Enforcement Defendants |
F1 Score |
0.96 |
|
Federal Law Enforcement Defendants |
F1 Score |
–* |
|
State Prison Defendants |
F1 Score |
0.93 |
|
Federal Prison Defendants |
F1 Score |
0.86 |
|
Other Defendants |
F1 Score |
0.84 |
|
Task Force Defendants |
F1 Score |
–* |
|
First Amendment Violations |
F1 Score |
0.98 |
|
Religious Liberty Violations |
F1 Score |
1.00 |
|
Excessive Force Violations |
F1 Score |
0.97 |
|
False Arrest Violations |
F1 Score |
0.85 |
|
Illegal Search Violations |
F1 Score |
0.71 |
|
Procedural Due Process Violations |
F1 Score |
0.85 |
|
Care in Custody Violations |
F1 Score |
0.82 |
|
Parental Rights Violations |
F1 Score |
0.89 |
|
Employment Violations |
F1 Score |
0.81 |
|
Qualified Immunity Granted |
F1 Score |
0.91 |
|
Qualified Immunity Denied |
F1 Score |
0.86 |
|
Lack of Jurisdiction – Factual Dispute |
F1 Score |
0.80 |
*This field did not appear in our testing sample, meaning an F1 score could not be calculated.
Note: High-performing fields (95%+ accuracy, 0.9+ F1 score) are shaded dark green. Fields with satisfactory performance (90%+ accuracy, 0.8+ F1 score) are shaded light green. Fields with unsatisfactory performance (<90% accuracy, <0.8 F1 score) are shaded yellow. These ranges are based on the goals of our study and comparable legal studies; nevertheless, they are inherently subjective. For a full range of performance statistics and detailed data distributions for each field, see Appendix B.
Finalizing the Dataset
After evaluating their performance, we ran the algorithms on all 7,173 opinions. In all, the algorithms generated roughly 190,000 datapoints. 15
Our full final dataset can be found here.
This final dataset is both comprehensive and broad: It encompasses 11 years of qualified immunity appeals and covers a range of seldom-studied attributes, including the types of government officials who were sued and the alleged rights violations at issue. Because of the algorithms’ strong performance, the dataset’s reliability is, for numerous critical fields, comparable to what hand coding could achieve. The scope, breadth, and reliability of this dataset allowed us to explore the landscape of qualified immunity appeals in the circuit courts.