Shift-share IV in trade economics
August 26, 2025
We want a local causal effect of a trade shock:
Yℓ=α+δXℓ+εℓ
Problem. Xℓ is endogenous: regions that import more are also regions where productivity, demand, or politics differ in unobserved ways.
→ We need an instrument for Xℓ.
Two ingredients:
Combine into a single regional exposure measure:
Zℓ=∑ksℓkgk
The instrument predicts how hard region ℓ should be hit by a common shock, given its industrial structure.
3 regions × 2 sectors. Tariffs fall between 1987 and 1988.
| region | LAgri | LManuf | sAgri | sManuf |
|---|---|---|---|---|
| A | 100 | 9 000 | 0.011 | 0.989 |
| B | 1 000 | 400 | 0.714 | 0.286 |
| C | 700 | 3 000 | 0.189 | 0.811 |
National tariff change 1987→1988: Agri 30 → 10, Manuf 20 → 5.
Bartik exposure Zℓ=sℓ,AgriΔlogtAgri+sℓ,ManufΔlogtManuf:
| region | Zℓ (Δlog) | Zℓ (Δlevel) |
|---|---|---|
| A | −1.38 | −15.1 |
| B | −1.18 | −18.6 |
| C | −1.33 | −15.9 |
Note the ranking flips: log change makes A most exposed; level change makes B most exposed.
The Bartik is used as an instrument:
First stage Xℓ=π0+π1Zℓ+uℓ
Reduced form Yℓ=ρ0+ρ1Zℓ+vℓ
2SLS ˆδIV=ρ1/π1
Goldsmith-Pinkham, Sorkin, and Swift (2020) ask the central question: which assumption makes the Bartik valid?
| Camp | Source of identification | Champion paper |
|---|---|---|
| Share-based | Shares sℓk are exogenous to unobservables | Goldsmith-Pinkham, Sorkin & Swift (2020) |
| Shock-based | Shifts gk are quasi-randomly assigned | Borusyak, Hull & Jaravel (2022) |
Both deliver consistent IV under different conditions. The right diagnostic depends on which assumption you lean on.
Goldsmith-Pinkham, Sorkin, and Swift (2020) prove: the Bartik 2SLS estimator equals a GMM estimator where each share sℓk acts as a separate instrument, weighted by Rotemberg weights αk.
ˆδBartik=∑kαkˆδk
For each industry k, the weight is
αk=gk⋅∑ℓsℓkXℓ∑k′gk′⋅∑ℓsℓk′Xℓ,∑kαk=1.
Two ingredients drive αk:
Weights can be negative: if gk and the first-stage covariance have opposite signs, that industry pulls the Bartik in the opposite direction. A handful of industries usually carry most of the weight.
Named for Julio Rotemberg (1983), who used the same decomposition logic in a different IV setting; Goldsmith-Pinkham, Sorkin, and Swift (2020) formalised it for the Bartik.
Use our toy data. Set Xℓ=Zlevℓ so the mechanics are visible: XA=−15.1,XB=−18.6,XC=−15.9, with shifts gAgri=−20,gManuf=−15.
First-stage covariances:
∑ℓsℓ,AgriXℓ=0.011(−15.1)+0.714(−18.6)+0.189(−15.9)=−16.45 ∑ℓsℓ,ManufXℓ=0.989(−15.1)+0.286(−18.6)+0.811(−15.9)=−33.15
Numerators: (−20)(−16.45)=329.0 and (−15)(−33.15)=497.2. Denominator 826.2.
αAgri=0.40,αManuf=0.60.
Manufacturing carries 60% of the Bartik. With 100+ industries in Topalova or ADH, 5–10 industries typically carry 80% of the weight — those are the shares whose exogeneity you actually need to defend.
# shares: L × K matrix; g: length K; X: length L
fs_cov <- as.numeric(t(shares) %*% X) # first-stage covariances
alpha <- g * fs_cov
alpha <- alpha / sum(alpha) # Rotemberg weights
# per-industry just-identified IV
beta_k <- as.numeric(t(shares) %*% Y) / fs_cov
sum(alpha * beta_k) # = Bartik 2SLS estimateThe bartik.weight package (GitHub: paulgp/bartik-weight) does this and produces the Goldsmith-Pinkham, Sorkin, and Swift (2020) Table 5 diagnostic panel directly.
Goldsmith-Pinkham, Sorkin, and Swift (2020) recommend reporting:
Borusyak, Hull, and Jaravel (2022) take a different route. Treat the shifts gk as random and the shares as exposure weights.
Adão, Kolesár, and Morales (2019) show that conventional cluster-robust SEs over-reject with shift-share regressors:
ShiftShareSE package or replicate by IV with industry-level data.Identification: the pace of tariff cuts was dictated by external (IMF) pressure and was negotiated industry-by-industry without regard to district outcomes → shocks plausibly exogenous to district trends.
ZTopℓ=∑ks1987ℓkΔlog(Tariffk)
Outcomes: district-level poverty headcount, poverty gap, consumption growth.
Topalova runs a panel across four NSS rounds (1983, 1987–88, 1993–94, 1999–2000):
Ydt=αd+γst+δTariffdt+W′dtλ+εdt
with district tariff constructed Bartik-style,
Tariffdt=∑ksdk,1987⋅Tariffkt
Under two-way FE this is numerically equivalent to a long-difference regression on ΔYd vs. ΔTariffd — which is why ADH-style papers (and teaching slides) often present Bartik in differences.
Takeaway: Bartik captures exposure; institutions decide adjustment.
ΔIPWcτ=∑kLck,t0Lk,t0⋅ΔMUSkτLc,t0
To address shock endogeneity (US demand could pull in imports too), ADH instrument ΔMUS with ΔMOTH — imports from China to eight other rich countries. This isolates the China-side supply shock.

Otkt=S∑s=1(Qsk,t=0Qk,t=0×tst)
Itkt=S∑s=1(Qsk,t=0Qk,t=0×J∑j=1Mjs,1990Ms,1990tjt)
sector s, district k, time t, input sector j, labor Q, tariff t
The first share weights output tariffs by district industry mix; the second additionally weights through the I-O matrix to capture imported intermediate inputs embodied in each sector.
Δykt=α+β1Otkt+β2Itkt+γΔX′kt+I′kθ+λrt+Δϵkt
Net effect on Indonesian poverty is negative — but only because input tariff liberalization dominates. The distributional story matters.
| Topalova (2010) | ADH (2013) | KK & Sparrow (2015) | |
|---|---|---|---|
| Country | India | US | Indonesia |
| Episode | 1991 reform | 1990–2007 | 1993–2002 |
| Unit | districts | commuting zones | districts (259) |
| Shock | tariff cuts | China imports | tariff cuts (2 waves) |
| Share | 1987 employment | start-period emp. | I-O × employment |
| Outcome | poverty, consumption | jobs, wages | poverty |
| Direction | exposure ↑ → poverty ↑ | exposure ↑ → jobs ↓ | output↑ poverty↑, input↓ poverty↓ |
dat.xlsx — 3 regions, 2 sectors, 2 years. Toy data so you can verify by hand.
# A tibble: 12 × 5
year region sector tariff labor
<dbl> <chr> <chr> <dbl> <dbl>
1 1987 A Agri 30 100
2 1987 B Agri 30 1000
3 1987 C Agri 30 700
4 1987 A Manuf 20 9000
5 1987 B Manuf 20 400
6 1987 C Manuf 20 3000
7 1988 A Agri 10 110
8 1988 B Agri 10 1100
...
# A tibble: 3 × 3
region Z_log Z_lev
<chr> <dbl> <dbl>
1 A -1.38 -15.1
2 B -1.18 -18.6
3 C -1.33 -15.9
Region A is most exposed in log, region B most exposed in levels — same point we made earlier, now in code.
With only 3 districts there’s nothing to estimate. In a real Topalova-style exercise:
For AKM-corrected SEs, use the ShiftShareSE package and pass the share matrix.
Using only dat.xlsx:
ShiftShareSE implements it.