Six files, 209,321 programs, three join keys: turning PPD:2026 into answers your provost actually asks

The IR analyst's working view of the PPD:2026 dataset, and where the time actually goes

June 2, 2026

10 mins read

Table of Contents

One provost question, six files

The question comes in a Monday email. "How many of our programs would fail STATS?" One sentence. Forty hours of analyst time, minimum, if it's the first time you've opened PPD:2026. That sits squarely inside the 3 to 14 days our research on IR data request workflows found a typical data request takes to fulfill, except this one repeats every time the question changes.

PPD:2026 is the Program Performance Data release the U.S. Department of Education published in early 2026 to inform the AHEAD negotiated rulemaking. It covers 209,321 unique programs across 5,096 institutions and roughly 20.6 million Title IV students. It is, on paper, an exceptional dataset, the closest thing IR teams have ever had to a complete program-level outcomes file with federal earnings benchmarks baked in.

On disk, it is six separate Excel files. They share three join keys, but those keys have to be exactly right or you silently lose rows. They use two different earnings horizons, two different cohort definitions, two different dollar-year bases for debt versus earnings, and a 4-digit CIP code while the actual STATS test will eventually operate at 6-digit CIP. Every time the provost rephrases the question (different filter, different scenario, different peer comparison), the wrangling starts over.

This piece walks through where the hours actually go. If you have an analyst running PPD:2026 today, this is what their week looks like. If you're looking for the strategic frame instead (which programs to triage, what the 50% institutional trigger is), start with the exposure-map playbook and the 50% institutional trigger explainer.

The three-key join nobody warns you about

Every file in PPD:2026 keys on the same three columns: opeid6 (six-digit institutional OPEID), credlev (credential level: certificate, associate, bachelor's, master's, doctoral), and cip4 (the four-digit Classification of Instructional Programs code). All six files join one-to-one on the combination.

The "one-to-one" is the part that fools people. If the join is on opeid6 alone, multiple credential-level rows collapse incorrectly and a single program inherits another program's earnings. If it's on opeid6 + cip4 only, a certificate and an associate's degree in the same field merge into one row. If credlev is coded inconsistently between files, and in user-uploaded crosswalks it often is, entire programs drop silently.

opeid6, six-digit OPEID, not the eight-digit IPEDS UnitID. Crosswalking the two is its own task.
credlev, credential level. Has to match the encoding ED uses, not the local registrar encoding. A "Postbaccalaureate Certificate" in your SIS may be a different credlev than ED expects.
cip4, four-digit CIP. Your local roster is probably at CIP-6; you have to roll it to CIP-4 before joining, then keep the CIP-6 separately for the eventual STATS test.
Missing any one of these silently drops programs from the result set. The query "how many of our programs fail" returns a number, but the number is wrong, and the wrongness is invisible.

The benchmark matrix is six columns, not one

The Earnings Premium test is one formula, graduate earnings minus a benchmark, but the benchmark isn't one number. PPD:2026 carries six different benchmark columns, and the correct one depends on credential level, in-state vs national, field of study, and which version of the test you're running (2023 FVT/GE or OBBBA-aligned). Picking the wrong column quietly inflates or deflates the failure rate. The OBBB-to-Earnings-Premium math walkthrough covers the statutory side of why these benchmark columns exist; this section covers the operational side of picking the right one.

PPD:2026 benchmark columns and what each one is for

Variable	What it is	When you use it
hs_state_mdn_cip2_wageb	State median earnings of working adults 25–34 with only a high-school diploma.	Undergraduate programs under both 2023 FVT/GE and STATS. The default Earnings Premium benchmark.
hs_nat_mdn_cip2_wageb	National median earnings for the same high-school-only population.	Fallback when the state-level value is missing or suppressed.
ba_state_field_mdn_cip2_wageb	State median earnings of working BA-holders 25–34 in the same field of study.	Graduate programs under STATS. The new higher bar that didn't exist under FVT/GE.
ba_state_mdn_cip2_wageb	State median earnings of working BA-holders 25–34, all fields.	Graduate programs when same-field state data is missing or suppressed.
ba_nat_field_mdn_cip2_wageb	National median earnings of working BA-holders in the same field.	Graduate programs when state-level values aren't available.
ba_nat_mdn_cip2_wageb	National median earnings of working BA-holders, all fields.	Final fallback for graduate programs.
earn_bnchmrk_cip2_wageb	The composite benchmark PPD:2026 actually applies, the full matrix above resolved for each program.	The primary OBBB test (fail_obbb_cip2_wageb). This is the one ED will use for the published February 2027 results.

The earnings-definition swap that flips pass/fail

PPD:2026 carries two different earnings variables, and they produce materially different pass/fail results. md_earn_ne_p3_1516 is median earnings of all completers, working and non-working, measured three tax years after exit, pooled across the 2014–15 and 2015–16 cohorts. md_earn_wne_p4 is median earnings of working completers only, measured four tax years after exit, pooled across the 2017–18 and 2018–19 cohorts.

The 2023 FVT/GE test uses the first. STATS uses the second. The two are not interchangeable. Excluding non-working completers raises the median (people earning $0 are dropped). Adding a year of career development raises it again. The combined effect, depending on the field, is a 10 to 25 percent lift in measured graduate earnings. Programs that fail under the 2023 definition routinely pass under the STATS definition, and vice versa.

Why the same program can pass one test and fail the other

Percentage of Institutions reporting

18%

STATS earnings: typical lift over FVT/GE (4yr, working-only)

Cohort vintage shift (2014–16 → 2017–19)

12%

Excluding non-working completers

Extra year of career development

Challenges

The chart is illustrative (the exact lift varies by field and state), but the point is structural. If your analyst pulls the wrong earnings variable, the answer to "how many of our programs would fail" can be off by a factor of two. PPD:2026 includes both ge_overall_2023_fail (combined 2023 FVT/GE failure) and ge_overall_algn_fail (OBBBA-aligned combined failure) precisely so the dataset can show the policy shift side by side. Comparing the two for your own programs is one of the most useful analyses the dataset enables. It is also one of the easiest to get wrong if the join keys or benchmark columns are off.

Privacy suppression silently shrinks your universe

PPD:2026 applies layered privacy protections that disproportionately suppress data for small certificate programs, the same programs most likely to fail the Earnings Premium test. If you don't account for the suppression rules explicitly, your at-risk count is structurally biased downward.

Three suppression rules that change your sample

Counts ≤ 9 students

Variables derived from nine or fewer students are suppressed entirely, no count, no median, no flag. For small certificate programs this is common, and the program effectively vanishes from your dataset even though it exists and may be at risk.

Counts of 10–19 students

Counts in this range are forced to the midpoint value of 15, and derived percentages are recalculated accordingly. The number you see is not the true number. For small institutions, this artificial midpointing distorts every per-program ratio you compute.

IRS earnings noise

IRS-derived earnings are suppressed entirely if 15 or fewer completers reported taxes. Statistical noise is added to the remaining values, and the median is dropped if noise moved it by more than 9 percent. Small-cohort programs lose earnings data even when they have enough completers in principle.

CIP-4 today, CIP-6 tomorrow

PPD:2026 reports at the 4-digit CIP level. ED has signaled that the actual STATS test, when results are published in February 2027, will operate at the more granular 6-digit CIP level. The difference matters more than it sounds.

Two programs that share a 4-digit CIP can land in different 6-digit codes, for example, two health-administration certificates with different specialisations. At CIP-4 they aggregate into a single PPD:2026 row; at CIP-6 they will be measured separately, with separate cohorts and separate Earnings Premium tests. A program that looks safe in PPD:2026 because it's averaged with a stronger sibling may fail on its own at CIP-6. A program that looks at-risk in PPD:2026 may be dragged down by a sibling and pass on its own.

This is not a flaw in PPD:2026. ED was explicit that the dataset is informational and not the final implementation file. But it means any analysis built strictly on PPD:2026 has a known accuracy ceiling, and the CIP-4-to-CIP-6 unrolling is one of the things institutions will have to do themselves as the official February 2027 publication approaches.

The 40-hour week the 72% know about

In AIR's 2023 survey on FVT/GE implementation, 72 percent of responding members reported their IR or IE office held institutional responsibility for the compliance reporting. The widely-cited pain points were tight timelines and a lack of clear communication from ED's FVT/GE guidance. Both of those land directly on PPD:2026 today. (For the policy lineage from FVT/GE into STATS, see the STATS vs FVT/GE vs PPD explainer. For the final FVT/GE submission timeline, see the FVT/GE final reporting year guide. For how the sector is asking the Department to soften this in the final rule, see our analysis of all 8,796 NPRM public comments.)

The wrangling pattern this piece walks through (six-file join, benchmark-column selection, definition-swap awareness, suppression handling, CIP-4-to-CIP-6 unrolling) is not a one-time setup. It repeats every time the provost asks a slightly different question. Filter to programs in our college of allied health. Compare our cosmetology mix to community-college peers in our state. Model what happens if median earnings drop 5 percent. Each rephrasing rebuilds the join, re-selects the benchmark, re-applies the suppression rules. Each rephrasing is most of a week. Stack that on a baseline where ad-hoc requests already consume 40-60% of IR capacity, between 550 and 5,333 hours per year by our whitepaper estimates, and the math stops working.

That recurring weekly cost is what burns out IR analysts and what makes the question feel impossible at the cabinet level. The work is not hard. It is just slow, repetitive, and unforgiving of small errors in the join.

What changes when you can ask in plain English

Clema's STATS (FVT/GE) AI Agent reads PPD:2026 directly. The join keys, the benchmark matrix, the earnings-definition pair, the suppression rules, the CIP-4 caveat: all of it is built into how the agent answers questions, not into a query the analyst has to write.

You ask: how many of our programs fail the OBBBA-aligned test. The agent joins the right files on opeid6 + credlev + cip4, picks the composite benchmark column, applies fail_obbb_cip2_wageb, returns the count with the underlying program list. You ask the same question against the 2023 FVT/GE methodology to see the policy-shift delta; the agent swaps to ge_overall_2023_fail and re-runs. You ask for the rollup by college, then by department, then by Title IV concentration. Three questions, three answers, fifteen minutes total. The wrangling pattern is constant and the agent has it memorised.

That is the actual difference. Not faster joining. Not better visualisation. The conversational frame (provost asks, analyst asks the agent, analyst answers) instead of the project frame. Once the data is approachable in conversation, the program-level exposure-map playbook and the 50% institutional trigger become questions you can actually answer this quarter, not next year.

Skip the six-file join. Ask PPD:2026 directly.

Ask the STATS (FVT/GE) AI Agent for program-level Earnings Premium results, institutional rollups, and scenario comparisons against the live PPD:2026 dataset, all in plain English.

Try the STATS AI Agent

Sources

AIR, Program Accountability Reporting resource center. U.S. Department of Education, Federal Student Aid Partners, FVT/GE knowledge center. FVT/GE Frequently Asked Questions.

CRT

Written by

Clema Research Team

The Clema research team publishes original analysis and practical guides for institutional research and institutional effectiveness professionals.

Frequently asked questions

What are the three join keys in PPD:2026, and why do they matter?

Every file keys on opeid6 (six-digit OPEID), credlev (credential level), and cip4 (four-digit CIP), and all six files join one-to-one on the combination. Get them wrong and rows drop silently: joining on opeid6 alone collapses credential levels, joining on opeid6 plus cip4 merges a certificate and an associate degree, and inconsistent credlev coding makes whole programs vanish. The count comes back wrong, and the wrongness is invisible.

Why does PPD:2026 have six benchmark columns instead of one?

The Earnings Premium test is one formula, but the benchmark depends on credential level, in-state versus national, field of study, and which test version you run. PPD:2026 carries six benchmark columns plus a composite (earn_bnchmrk_cip2_wageb) that resolves the matrix per program. The composite feeds the primary OBBB test, fail_obbb_cip2_wageb, which is what ED will use for the published February 2027 results.

Which earnings variable should I use, and does it change the answer?

It changes the answer dramatically. The 2023 FVT/GE test uses md_earn_ne_p3_1516 (all completers, three years out, 2014 to 2016 cohorts). STATS uses md_earn_wne_p4 (working completers only, four years out, 2017 to 2019 cohorts). Excluding non-workers and adding a career year lifts measured earnings by roughly 10 to 25 percent, so programs that fail one definition routinely pass the other.

How does privacy suppression bias my at-risk count?

Suppression disproportionately hits small certificate programs, the ones most likely to fail. Variables from nine or fewer students are suppressed entirely; counts of 10 to 19 are forced to a midpoint of 15; IRS earnings are dropped if 15 or fewer completers filed taxes or if added noise moved the median by more than 9 percent. If you do not account for this, your at-risk count is structurally biased downward.

Why is CIP-4 in PPD:2026 a problem if the real test uses CIP-6?

PPD:2026 reports at the 4-digit CIP level, but the official February 2027 results will operate at 6-digit CIP. Two programs sharing a CIP-4 can split into different CIP-6 codes and be measured separately, with separate cohorts and tests. A program that looks safe averaged with a stronger sibling may fail on its own, so any analysis built strictly on PPD:2026 has a known accuracy ceiling.

Ready to get started?

Reclaim Your Team's Capacity

See how Clema can help your IR team handle routine requests automatically

Try for Free Book a Demo