Why does one term end up with multiple definitions?

The same term is defined at several layers at once: federal (IPEDS), state, accreditation, internal, departmental, dashboard, and ad hoc. Each layer draws the line for its own purpose, and all of them can be correct. The goal is not to eliminate the variants but to document which one applies in which context.

Is this really a data quality problem?

Usually not. In the research the underlying numbers were typically fine; what was missing was the layer of meaning that says what each field counts and excludes. That makes it an interpretation problem, not a data problem, which is why cleaning the data alone does not fix it.

How much time does definition ambiguity actually cost?

One IR director in the study estimated that 20% to 30% of annual IR time goes to clarification, rework, and reconciliation tied to definitions. Three-quarters of offices reported a trust-erosion event when conflicting numbers reached leadership, and half said it had invalidated their peer benchmarking.

What counts as a complete data definition?

It covers two dimensions, functional and technical, across seven parts: term name, plain-language meaning, technical calculation with inclusions and exclusions, source system and field, reporting context, ownership, and review history. Of the 20 offices studied, only one maintained definitions that complete with active governance.

Can AI just write our definitions for us?

AI can draft them, but in one test only about 30% of AI-generated definitions were usable on a first pass; the rest needed editing by someone with deep institutional knowledge. AI accelerates the work without replacing the judgment about which definition is authoritative in your context.

Why 'enrolled student' has five valid definitions

Introduction

An institutional research office gets a request that sounds simple: how many students are enrolled this fall? The data is sitting right there in the student information system. The query takes a minute. And the honest answer is a question back: enrolled by whose definition?

This is the part of the work that rarely makes it into a job description. The raw data (an IPEDS file, a warehouse extract, a student record) arrives as rows and columns, and loading it is mechanical. What turns those rows into something leadership can decide on is the layer of meaning sitting on top: what each field counts, what it excludes, which rule applies in which context. That interpretation layer is the actual product of an IR office. The data is just the input.

In primary research with 20 IR and IE professionals across 13 states, the Clema research team kept finding the same thing: when this layer is missing or inconsistent, the failure looks like a data problem but is not one. The numbers are usually fine. What is broken is the meaning attached to them. This post is about why one term resists a single definition, the places that ambiguity lives, and what it costs when it goes undocumented.

What a complete definition actually contains

A definition that holds up has two dimensions: a functional one (what the term means to a person who does not write SQL) and a technical one (exactly how it is calculated, down to what gets included and what gets excluded). Most documentation that offices call a definition has only the first, or only the second. A glossary entry that reads "students currently enrolled" tells you nothing about whether study-abroad students count. A SQL snippet tells you the logic but not why anyone chose it.

A complete definition pins down both, plus the context around them. Of the 20 offices in the study, exactly one maintained definitions this complete, with active governance, versioning, and access for people outside the IR team. Everyone else had pieces. In practice the full picture comes down to seven parts.

The seven parts of a complete definition

Term name: the exact label as it appears in reports and systems.
Plain-language business definition a non-technical reader can act on.
Technical calculation, including what counts and what is excluded.
Source system and field (Banner, PeopleSoft, Workday) and the specific column.
Reporting context it serves (IPEDS, state, accreditation, or internal).
Ownership: who maintains it, who approves changes, and who gets notified.
Review history: when it was last reviewed, what changed, and why.

When the obvious column is the wrong one

Here is the trap that makes this more than bookkeeping. Imagine a table with a column literally named "master," and next to it an unlabeled computed flag that quietly applies the real eligibility rules. The column called "master" looks authoritative. It is not. The flag is.

A person who knows the data catches that in a second. A system reading the data dictionary at face value does not. It sees a field named "master," assumes that is the source of truth, and produces a classification that looks right and is wrong. The name was never the definition. The logic was. That gap between the label and the logic is exactly what a complete definition is built to close, and it is the gap that quietly defeats any tool you point at the warehouse without it.

One term, eight places it can be defined

Part of why a single term resists a single definition is that it gets defined in more than one place, by more than one authority, for more than one purpose. A term like "full-time" or "completer" can carry a distinct, official meaning at each of eight layers at once. None of these layers is wrong, and none is going away. Federal reporting will always differ from a state mandate, which will differ from what a dean needs on a dashboard.

So the goal is not to collapse eight definitions into one. That is neither possible nor desirable. The goal is to document which definition applies in which context, and make that documentation something other people can actually reach. The eight layers look like this.

Eight layers a single term can be defined at

Definition layer	Where its rule comes from
IPEDS / federal	Federal reporting requirements, the baseline most other layers start from.
State reporting agency	State or system mandates that can set their own thresholds (for example, full-time at 15 credits).
Consortium or system office	Shared rules across a multi-campus system or a benchmarking consortium.
Accreditation body	Standards from a regional or programmatic accreditor such as HLC, SACSCOC, or WASC.
Internal institutional	The institution's own working definition for board reports and planning.
Departmental or unit-level	A college, program, or office that needs the term sliced its own way.
Dashboard-specific	A definition baked into a single dashboard's filters and logic.
Ad hoc or request-specific	A one-off definition created on the spot to answer a single question.

Five terms, taken apart

'Enrolled student' has at least five

IPEDS excludes non-affiliated study-abroad students. Financial aid includes them. Internal headcount keeps everyone. State reporting and the Common Data Set each draw the line somewhere else again. Every one of those is correct for its purpose. The trouble starts the moment two offices compare headcounts and assume they counted the same people. One IR director traced a string of external and interdepartmental mismatches back to exactly this.

Full-time: 12 credits or 15

IPEDS sets full-time at 12 credits. One state chancellor's office sets it at 15. Graduation-rate calculations diverge from there, and an analyst described reconciling the two across thousands of dashboards as a nightmare. Neither threshold is wrong. They were written for different systems that now have to be compared.

Walking graduates versus actual graduates

Leadership at one institution quoted commencement-eligible "walking" counts that ran well above actual completions. The walking number was documented nowhere, yet it made its way into public statements and inflated the completion figures the institution reported. A number with no definition behind it is the easiest kind to misuse.

One 'student type' field, six versions

A single student type field existed in five or six versions at once: the IR version, the IPEDS version, the state version, plus a pre-1992 legacy classification still hanging around. On every request the analyst had to supply each version and explain which was which, because the field name gave no hint that several incompatible meanings shared it.

Dual credit: 40% of enrollment, no agreed term

At one community college, roughly 40% of enrollment came from high-school dual-credit students. Departments could not agree on what "dual credit" even meant: a filtered subset of those students, or all high-school enrollment. The disagreement was about the word, not the data, and it shaped every number the term touched.

How widespread the problem is, across 20 offices

Percentage of Institutions reporting

85%

Key-person dependency

70%

Same term, multiple definitions

55%

No formal review cadence

50%

Peer benchmarking affected

25%

No definition change history

Challenges

Why it breaks reporting

None of this is hypothetical friction. In the same study, 75% of offices reported a trust-erosion event: conflicting or incorrect numbers reached leadership, and confidence in the data took the hit. Half said inconsistent definitions had invalidated their peer benchmarking, because you cannot compare your number to a peer's when the two were built on different rules. One director estimated that 20% to 30% of annual IR time goes to definition-related clarification, rework, and reconciliation: time spent working out which version of a number someone meant, not producing anything new.

The cost compounds quietly. Institutional memory erodes as the people who held the definitions move on. Shadow interpretations multiply across offices, each one internally consistent and mutually incompatible, so two reports can both be right and still disagree. Dashboards lose adoption because nobody trusts which number is the real one. And the chain runs all the way to students: an early-alert program acts on whichever definition decides who is "at risk," so a wrong definition groups the wrong students. The same goes for the story a number tells in a leadership meeting; if the definition underneath it is contested, the narrative is too.

This is a diagnosis worth taking seriously, and it has a name. We call it the institutional intelligence gap, and the next post in this series walks through how to assess your own exposure to it. The post after that lays out a six-step framework for governing definitions so they survive the person who wrote them. If you want the near-term version, our best practices for IR and IE data requests cover the habits that stop new ambiguity from piling up.

Where Clema fits in

Clema is an AI data-intelligence platform built for IR and IE teams. It connects to nine federal data sources (IPEDS, College Scorecard, EADA, Pell Grants, DAPIP, PSEO, and others) and lets you query institutional and federal data in plain language.

Relevant to everything above: it flags where the same term resolves to different numbers across sources, and it maintains both a technical dictionary for analysts and a plain-language glossary for requestors from the same underlying definitions. The meaning layer lives in the system rather than in one person's memory, and that memory persists past any single tenure. Around 35 institutions use it today. None of this removes the human judgment a definition needs. It gives that judgment somewhere durable to live.

The research behind this post

The full study: 20 IR and IE professionals across 13 states, the institutional intelligence gap, the cost model, the multi-layer definition problem, and a six-step framework for governing definitions. Read the 45-page whitepaper.

Read the whitepaper

See how Clema keeps definitions in the system, not in someone's head

Clema flags where the same term resolves to different numbers across sources, and serves a technical dictionary and a plain-language glossary from the same definitions.

Book a demo

Why 'enrolled student' has five definitions