Appendix A: Methods

Model Estimation

I conducted the geographic regression discontinuity analyses using the rdrobust package in R. 1 Because inspections were nested within firms, I used Bartalotti and Brummet’s approach, which allows for cluster dependence in the error term and is incorporated into the rdrobust package. 2 I determined the bandwidth using a mean-squared error-optimal bandwidth selector in the package rdbwselect. 3  I report both conventional and robust statistics.

Comparing Nail Salon Inspection Outcomes in Connecticut and New York

A regression discontinuity analysis essentially involves two regression models. On each side of the “cutoff,” a polynomial regression model is estimated with the dependent variable regressed onto the “forcing variable.” In my manicurist analysis, the cutoff was the Connecticut/New York border, the dependent variable was either of two measures of nail salon inspection outcomes (violation z-score or violation rate), and the forcing variable was the distance in miles to the border. The “treatment effect” is calculated as the difference between the intercepts for the two regression equations. For example, when the dependent variable was the violation z-score, the intercept on the Connecticut side was the violation z-score for a hypothetical nail salon in Connecticut directly on the border, the intercept on the New York side was the violation z-score for a hypothetical nail salon in New York directly on the border, and the treatment effect was the difference between those two intercepts. Theoretically, the two intercepts represent a counterfactual. For instance, inspection outcomes for nail salons with licensed workers in Connecticut during the study period are unknowable because the state did not require licensure. However, subject to qualifications, inspection outcomes of nail salons in New York should reasonably represent that unobservable counterfactual—the inspection outcomes that would have been observed had Connecticut required licensure.

The key assumption of a regression discontinuity design is that, within a specified bandwidth, units on either side of the cutoff are balanced on covariates. 4 This assumption cannot be directly tested, but its plausibility can be evaluated. Ideally, I would be able to compare firms’ characteristics, such as the number of employees and the types of services offered. Such information was unavailable, so I performed checks using three variables from the census (at the census block group level) that may reflect the consumer market of the firms—median household income, percentage of the population with at least a bachelor’s degree, and population. Household income, for example, is associated with greater spending on personal care products and services. 5 Thus, firms in areas with higher household incomes may be more responsive to consumer demand for safe, clean service compared to firms in areas with lower household incomes.

There were significant discontinuities at the border in the total population (i.e., New York had a larger population), but not in income or education. On the one hand, the discontinuity in total population could be interpreted as evidence of the implausibility of a key assumption of the design, while the continuities in income and education could be interpreted as evidence of the assumption’s plausibility. On the other hand, however, these variables are at the census block group level and might not capture firm-level differences or similarities (to the extent either exist). I report the original model, as well as the model with estimates adjusted for income, education, and population.

As a sensitivity test, I reran all the models using the “donut hole” approach. 6  Generally, regression discontinuity design model estimates are most influenced by observations closest to the cutoff, which can be problematic if there is non-random “sorting” or “manipulation” around the border. For example, an entrepreneur might have chosen to open a nail salon on the Connecticut side rather than the New York side to avoid the latter state’s license. The donut hole approach involves excluding observations within certain radiuses and re-estimating the models. My assumption is that if businesses were sorting themselves in a non-random way, it would be most likely to occur closer to the border. 7 By running the analyses excluding potential manipulators, I can evaluate the sensitivity of the results to them. I reran the analyses with donut hole radiuses of 1, 2, and 3 miles for the Connecticut and New York sample.

Some sensitivity is expected because the estimation routine tends to be more influenced by observations very close to the border, but if those observations were exceptionally unique in some way or ways, the results would be very sensitive to whether those observations were included or excluded. 8 The idea is similar to how an average can be influenced by extreme, atypical values—if there are nine people in a room who are all 5 feet tall and one person walks in who is 10 feet tall, the average height in the room will go up by 6 inches. Analogously, if businesses very close to the border were like that person who is 10 feet tall, then excluding them would result in substantially different findings.

Comparing Barbershop Inspection Outcomes in Alabama and Mississippi

For my barber analysis, I employed the same analytical strategy with the sample of barbershops in Alabama and Mississippi. In this case, the dependent variable was a binary indicator of whether the inspection was passed or failed. I used a linear probability model because it is straightforward to interpret, it can produce unbiased estimates of treatment effects, and the residual heteroskedasticity intrinsic to it can be addressed by estimating robust (“sandwich”) standard errors. 9

As with the Connecticut/New York comparison, there were discontinuities in covariates at the border. Specifically, census block groups on the Alabama side of the border had a slightly greater proportion of people with at least a bachelor’s degree and a larger population overall. There was not a discontinuity in household income. As with the Connecticut/New York comparison, I reran my analysis adjusted for income, education, and population.

Also as with the Connecticut/New York comparison, I employed the donut hole approach to evaluate the sensitivity of my results. However, I used larger radiuses of 5, 6, and 7 miles for the Alabama and Mississippi sample. This was because the number of observations closer to the border was too small. For example, there were only three observations within 3 miles of the Alabama/Mississippi border.