Methods

This study aims to describe the landscape of qualified immunity appeals in federal appellate courts. Specifically, we sought to answer these questions for the study period 2010 through 2020:

  1. How many federal appeals involve qualified immunity?
  2. What government officials are sued in qualified immunity appeals?
  3. What rights violations are alleged in qualified immunity appeals?
  4. What are the key characteristics of qualified immunity appeals?
    a. How long does litigation involving qualified immunity appeals last?
    b. How many qualified immunity appeals are interlocutory appeals?
    c. At what stage of litigation do qualified immunity appeals occur?
    d. How often are plaintiffs in qualified immunity appeals represented by counsel?
    e. How often are opinions in qualified immunity appeals published?
  5. What are the overall outcomes of qualified immunity appeals, and how frequently is qualified immunity granted or denied?

To obtain the potential universe of qualified immunity opinions, we searched Thomson Reuters’ Westlaw service for any federal appellate court opinion issued between January 1, 2010, and December 31, 2020, containing the phrase “qualified immunity.” 1 This yielded 7,173 opinions. A central part of our task was determining whether opinions were relevant—that is, whether qualified immunity was raised in the appeal as opposed to merely being mentioned in the opinion.

Given the time it would take human coders to analyze thousands of opinions, we instead used algorithms—computerized instructions, rules, and models—to identify relevant opinions and label them across 33 additional variables. We collected two more variables through a separate process. 2 To develop the algorithms and test their reliability, we first coded a random sample of opinions by hand. This section describes our variables, how we developed and tested the algorithms, and the final dataset.

Study Variables

For each opinion, the most important information we recorded was relevance: Was qualified immunity raised on appeal, making the opinion relevant to our study?

Then, for all relevant opinions, we recorded 35 additional fields corresponding to our research questions. These fields, with coding options, are summarized in Table 1 and further defined in our main codebook, available in Appendix D. The codebook also covers exceptions and special cases.

We coded all fields at the level of the opinion rather than claim or alleged violation found within an opinion. We used this approach because our initial research questions focused on the landscape of qualified immunity appeals, not individual claims. 3 This approach is consistent with many other studies on qualified immunity. 4 However, it means that if an opinion involved multiple claims, we cannot directly link factors like the defendants, violations, or outcomes to a particular claim.

Moreover, we coded only what was before the court on appeal. For example, if a lawsuit originally involved both law enforcement and prison defendants, but only the law enforcement defendants were involved in the appeal, we coded only the law enforcement defendants.

Table 1: Fields and Variables Included in Study

 

Field

Description

Type

Response Options

Basic Information

Relevance

Was qualified immunity raised on appeal in the opinion?

Binary (Y/N)

Circuit Court

The circuit court for the appeal

Text

Circuit Court Case Number

The circuit court case number for the appeal

Text

Opinion Date

The date the opinion was filed/decided

Text (Date)

Plaintiffs

The plaintiffs in the opinion

Text

Defendants

The defendants in the opinion

Text

Judges

The judges who heard the appeal

Text

District Court of Origin

The district court where the appeal originated

Text

District Court Case Number

The case number of the lawsuit in district court

Text

Case Origination Date

The date the lawsuit was initiated in district court

Text (Date)

Procedural Details

Appellants

Which party was appealing the district court’s decision?

Categorical

P – Plaintiffs
D – Defendants
B – Both parties (cross-appellants)

Published

Was the opinion published?

Binary (Y/N)

En Banc

Did the opinion involve an en banc hearing?

Binary (Y/N)

Interlocutory Appeal

Was the appeal an interlocutory appeal?

Binary (Y/N)

Pro Se Plaintiffs

Did the lawsuit include self-represented plaintiffs?

Categorical

1 – All plaintiffs were pro se for the appeal
0 – No plaintiffs were pro se at any point in the lawsuit
ES – Plaintiffs were pro se at an earlier stage in the lawsuit

Case Stage

What was the procedural stage of the lawsuit at the time of the appeal?

Categorical

D – Dismissal
SJ – Summary Judgment
B – Both Dismissal and Summary Judgment
PT – Post-trial
Other – Anything else

Government Defendant Type

Government Level of Defendants

Were the government officials being sued federal or state/local officials?

Categorical

Federal – Only federal
State – Only state/local
Both – Both federal and state/local

State Law Enforcement Defendants

Was a state/local law enforcement officer listed as a defendant?

Binary (Y/N)

Federal Law Enforcement Defendants

Was a federal law enforcement officer listed as a defendant?

Binary (Y/N)

State Prison Defendants

Was a state/local prison official listed as a defendant?

Binary (Y/N)

Federal Prison Defendants

Was a federal prison official listed as a defendant?

Binary (Y/N)

Other Defendants

Was a non-law enforcement, non-prison official listed as a defendant in the appeal?

Binary (Y/N)

Task Force Defendants

Were the defendants part of a state/federal law enforcement task force?

Binary (Y/N)

Constitutional Violation Type

First Amendment

Did the plaintiffs allege violations related to their First Amendment rights?

Binary (Y/N)

Religious Liberty

Did the plaintiffs allege violations of their right to freely practice their religion? (Note: This field is a sub-field of the “First Amendment” field.)

Binary (Y/N)

Excessive force

Did the plaintiffs allege that the defendants committed a violation related to excessive force?

Binary (Y/N)

False Arrest

Did the plaintiffs allege that the defendants committed violations related to a false arrest, malicious prosecution, or illegal seizure of a person?

Binary (Y/N)

Illegal Search

Did the plaintiffs allege that the defendants committed violations related to an illegal search?

Binary (Y/N)

Procedural Due Process

Did the plaintiffs allege they were deprived of fair process under the due process requirements of the Constitution?

Binary (Y/N)

Care in Custody

Did the alleged violations relate to the (lack of) care provided for the plaintiffs when they were in some form of custody?

Binary (Y/N)

Parental Rights

Did the plaintiffs allege that the defendants interfered with their rights as parents?

Binary (Y/N)

Employment

Were at least some of the alleged violations of constitutional rights in this opinion related to an adverse employment action, a hostile work environment, or unsafe workplace conditions?

Binary (Y/N)

Outcomes

Overall Prevailing Party

Who was the prevailing party in the opinion?

Categorical

P – Plaintiffs
D – Defendants
M – Both the defendants and plaintiffs prevailed in part (mixed)

Qualified Immunity Granted

Was qualified immunity granted to one or more defendants in this opinion?

Binary (Y/N)

Qualified Immunity Denied

Was qualified immunity denied to one or more defendants in this opinion?

Binary (Y/N)

Lack of Jurisdiction – Factual Dispute

Did the court decline to rule on qualified immunity as it determined it lacked jurisdiction due to a factual dispute?

Binary (Y/N)

Developing the Algorithms

To develop the algorithms, we needed a sizable sample of reliably hand-coded opinions. This sample would allow the algorithms to find patterns in the text of opinions, resulting in reliable prediction models. To test the completed algorithms, we needed a similarly reliable, but smaller, sample of hand-coded opinions.

To create these samples, we randomly selected 791 (roughly 11%) of the 7,173 opinions for hand coding. We randomly assigned 604 opinions to the training sample and 187 to the testing sample. 5

To ensure accuracy, our human coders were either attorneys or others with substantial knowledge of legal matters generally, if not qualified immunity specifically. 6 We also conducted trainings on our codebook and tested coders’ accuracy by requiring them to complete a sample of practice opinions before starting the project. Finally, we employed a multistep quality-control process involving a panel of attorneys with experience in qualified immunity to resolve the thorniest coding decisions. 7

Once our human coders completed their work, we used the training sample to build our algorithms. (Appendix A details our process for developing and implementing the algorithms.)

Evaluating the Algorithms’ Reliability

After our algorithms were finished, we needed to evaluate their reliability. To do this, we used the testing sample to compare the datapoints generated by the algorithms to those recorded by our human coders. 8

Overall, our algorithms performed very well. Nearly all fields achieved performance statistics above—often well above—those in comparable legal studies. 9 Table 2 shows the most relevant performance statistic for each field, providing a general impression of the algorithms’ performance. Appendix B presents all the statistics necessary to gauge the algorithms’ performance in greater detail. 10

For different types of fields, we used different statistics as our primary performance metric:

  • For text fields, “accuracy” was our primary metric. In this context, accuracy simply means how often the algorithm recorded the right text (excluding minor typos and other trivial differences, such as punctuation or articles). As Table 2 shows, these fields performed extremely well, with accuracy rates of 99% to 100%.
  • For categorical fields (i.e., fields with multiple response options), accuracy was again our primary metric. For these fields, accuracy represents the percentage of opinions labeled with the correct option. For example, the prevailing party can be either the plaintiff, the defendant, or, in the case of a mixed decision, both. The accuracy for the prevailing party field was 96.3%, meaning the algorithm applied the right label to 96.3% of opinions. Comparable legal studies report accuracies between 73% and 93%. 11 However, we generally aimed for performance accuracies at or above 95%. 12 As detailed in Table 2, four out of five categorical fields exceeded this threshold, with the fifth just missing it at 94.4%.
  • For binary fields (i.e., fields with only yes/no response options), a measure called the “F1 score” was our primary performance metric. F1 scores range from 0 to 1, with 1 being perfect. While comparable legal studies report F1 scores ranging from 0.57 to 0.91, we aimed for F1 scores above 0.9, although we were generally willing to accept scores above 0.8. 13 As detailed in Table 2, we mostly succeeded, including achieving near-perfect F1 scores for several critical fields.

We focused most of our analyses on fields with strong performance—i.e., accuracy above 95% or F1 scores above 0.9. We generally avoided making detailed analyses of low-performing fields and fields with minimal data. 14

Table 2: Summary of Algorithm Performance

Type

Field

Primary Performance Statistic

Performance

Text Fields

Circuit Court

Accuracy

100%

Circuit Court Case Number

Accuracy

99.4%

Opinion Date

Accuracy

100%

Plaintiffs

Accuracy

99.4%

Defendants

Accuracy

100%

Judges

Accuracy

100%

District Court of Origin

Accuracy

99.4%

Categorical Fields

Appellants

Accuracy

99.4%

Pro Se Plaintiffs (self-represented plaintiffs)

Accuracy

99.4%

Case Stage (at time of appeal)

Accuracy

94.4%

Government Level of Defendants

Accuracy

96.9%

Overall Prevailing Party

Accuracy

96.3%

Binary (Y/N) Fields

Relevance (qualified immunity raised on appeal)

F1 Score

0.95

Published

F1 Score

1.00

En Banc

F1 Score

1.00

Interlocutory Appeal

F1 Score

0.99

State Law Enforcement Defendants

F1 Score

0.96

Federal Law Enforcement Defendants

F1 Score

–*

State Prison Defendants

F1 Score

0.93

Federal Prison Defendants

F1 Score

0.86

Other Defendants

F1 Score

0.84

Task Force Defendants

F1 Score

–*

First Amendment Violations

F1 Score

0.98

Religious Liberty Violations

F1 Score

1.00

Excessive Force Violations

F1 Score

0.97

False Arrest Violations

F1 Score

0.85

Illegal Search Violations

F1 Score

0.71

Procedural Due Process Violations

F1 Score

0.85

Care in Custody Violations

F1 Score

0.82

Parental Rights Violations

F1 Score

0.89

Employment Violations

F1 Score

0.81

Qualified Immunity Granted

F1 Score

0.91

Qualified Immunity Denied

F1 Score

0.86

Lack of Jurisdiction – Factual Dispute

F1 Score

0.80

 *This field did not appear in our testing sample, meaning an F1 score could not be calculated.

Note: High-performing fields (95%+ accuracy, 0.9+ F1 score) are shaded dark green. Fields with satisfactory performance (90%+ accuracy, 0.8+ F1 score) are shaded light green. Fields with unsatisfactory performance (<90% accuracy, <0.8 F1 score) are shaded yellow. These ranges are based on the goals of our study and comparable legal studies; nevertheless, they are inherently subjective. For a full range of performance statistics and detailed data distributions for each field, see Appendix B.

Finalizing the Dataset

After evaluating their performance, we ran the algorithms on all 7,173 opinions. In all, the algorithms generated roughly 190,000 datapoints. 15

Our full final dataset can be found here.

This final dataset is both comprehensive and broad: It encompasses 11 years of qualified immunity appeals and covers a range of seldom-studied attributes, including the types of government officials who were sued and the alleged rights violations at issue. Because of the algorithms’ strong performance, the dataset’s reliability is, for numerous critical fields, comparable to what hand coding could achieve. The scope, breadth, and reliability of this dataset allowed us to explore the landscape of qualified immunity appeals in the circuit courts.