Why AI Hallucinates Imports (and How to Stop It)

Part 4 of a 6-part series on configuring AI coding assistants for large codebases

Imagine you ask your AI assistant to write an HTTP client for a third-party shipping API. It generates clean code. Good error handling. Proper retry logic. And at the top of the file: import com.company.platform.common.http.RetryableClient.

That class doesn’t exist. It never existed. The AI invented it.

The code looks professional. The class name is perfectly plausible. If you weren’t paying attention, you’d accept the suggestion, run the build, and spend ten minutes figuring out why it can’t resolve the import.

This isn’t a random glitch. It’s a predictable failure mode. And once you understand why it happens, you can prevent most of it.

How hallucination works

Language models predict the next token based on patterns in their training data. When the model has enough context about your project (the right imports, the right class names, the right package structure), it generates accurate code. It’s pattern-matching against real signals.

When context is missing, the model fills the gap. It doesn’t flag uncertainty. It doesn’t say “I’m not sure this class exists.” It generates something that looks right based on statistical likelihood. Something that would exist in a typical Java project. Something that matches the naming conventions it’s seen millions of times.

The result: code that would be valid in some project. Just not yours.

Three types of hallucination

Not all hallucination is the same. In large Dropwizard codebases, three types dominate.

Phantom imports. The AI invents packages, utility classes, or helper methods that don’t exist. com.company.platform.common.http.RetryableClient. com.company.platform.utils.JsonHelper. com.company.platform.core.BaseResource. These names are plausible because they follow standard Java naming patterns. That’s exactly what makes them dangerous.

In practice, hallucinated package names tend to be repeatable. Ask the model to regenerate the same code, and it often hallucinates the same fake package. This isn’t randomness. It’s a systematic bias in the model’s training data. Lots of Java projects have a common.http package, so the model confidently predicts one in yours.

Cross-module confusion. Your project has an OrderStatus enum in the orders package and a Status enum in the billing package. The AI is working in the billing module and imports OrderStatus instead of Status. The types don’t match. Or worse, they’re structurally similar enough that the code compiles but produces wrong behavior at runtime.

This happens because the AI sees your project as a flat buffer of text tokens. It doesn’t have a resolved dependency graph. It doesn’t know that billing should never import from orders. It just sees two enums with similar names and picks the wrong one.

Version drift. You’re on Dropwizard 2.1. The AI generates code using io.dropwizard.core.Application, the Dropwizard 4.x package path. Or it uses @Context HttpServletRequest in a resource method, which works in Jersey 2 but behaves differently in Jersey 3. Or it suggests JdbiFactory with constructor arguments that changed between JDBI 3.x minor versions.

The AI has seen code from every version of every framework. Without an explicit version pin, it defaults to whatever version appears most in its training data. That’s often not your version.

Why scale makes it worse

Small projects are partially protected by context. If your entire project fits in the context window, the AI can read every file and import from what actually exists. The hallucination rate is low.

Large projects don’t have that luxury. At 150K lines, the AI can only see a small slice of the codebase at any given time. The rest is invisible. When it needs a utility class and can’t find one in its current context, it invents one.

More packages also mean more name collisions. Every module has a utils/, a model/, a config/. The AI sees these generic names and picks one, sometimes from the wrong module. Your project’s sheer surface area creates ambiguity that small projects don’t have.

Long conversations make it worse. Early context drops out as the conversation grows. The architectural context from your instruction file, which loaded at the start of the session, may get pushed out by code and tool outputs. The model falls back on generic Java patterns.

Version pinning

The cheapest fix. Three to five lines in your instruction file. Massive impact.

## Framework versions (match these exactly)
- Dropwizard 2.1.x (NOT 4.x. Package prefix is io.dropwizard, NOT io.dropwizard.core)
- Jersey 2.x (javax.ws.rs, NOT jakarta.ws.rs)
- JDBI 3.37+ (use @SqlQuery/@SqlUpdate, not older Handle-based API)
- Jackson 2.15 (use @JsonProperty, not @JsonAlias for field mapping)

This prevents the entire category of version drift. When the AI sees “Jersey 2.x, javax.ws.rs”, it stops suggesting jakarta.ws.rs imports. When it sees “Dropwizard 2.1.x, NOT 4.x”, it stops using the Dropwizard 4 package structure.

The NOT clauses are important. They’re not redundant. The AI needs to know what to avoid, not just what to use. Its training data is full of jakarta.ws.rs code. Without the explicit exclusion, it’ll mix and match.

The “DO NOT” list

This is the most effective long-term strategy. When the AI makes a mistake, add the correction to your instruction file.

## Known AI mistakes (learned from experience)
- DO NOT use EntityManager or @Entity. We use JDBI3, not JPA/Hibernate
- DO NOT import from com.company.platform.common.http. That package does not exist
- DO NOT use @Autowired or @Inject from Spring. Use HK2 @Inject (org.glassfish.hk2)
- DO NOT use Lombok @Data on DTOs. We use explicit getters/setters for Jackson compatibility
- DO NOT use java.util.Date. Use java.time.Instant for timestamps
- DO NOT call InventoryDao from OrderService. Go through InventoryService

Every rule traces back to a real failure. Nothing speculative. Nothing aspirational.

This list grows over time. The first week you might have three entries. After a month, maybe ten. After a quarter, it’s a curated record of every way the AI has tried to go wrong in your specific codebase. Each entry saves every developer on your team from hitting the same mistake.

Some teams automate this. When an engineer rejects an AI suggestion for a structural reason, they tag it. The tag gets reviewed weekly and promoted to the instruction file if it’s a pattern.

Reference files, not descriptions

When you need the AI to follow a pattern, point it at a real file instead of describing the pattern in prose.

## Patterns to follow
- New REST resource: follow the pattern in orders/resource/OrderResource.java
- New DAO: follow the pattern in orders/dao/OrderDao.java
- New integration test: follow the pattern in orders/resource/OrderResourceIT.java

Three lines. Dramatically more reliable than a 30-line prose description of “how to write a resource.” The AI reads the real file and matches its structure: actual imports, actual annotations, actual patterns. No room for hallucination.

This works especially well for complex patterns. Error handling middleware chains. Test setup boilerplate. DAO mapper configurations. These are things that are tedious to describe in prose and easy to get wrong from description alone.

Hooks and CI

Here’s the uncomfortable truth. In practice, instruction files are followed roughly 70% of the time. Even well-written ones. The model is probabilistic. It will occasionally ignore rules.

For style preferences, 70% compliance is fine. For import correctness, 70% is not fine. The 30% that slips through creates broken builds, runtime errors, or subtle bugs.

Hooks close the gap. They run deterministic checks after every AI-generated code change.

A post-edit hook that runs mvn compile -pl <module> catches phantom imports immediately. The build fails. The AI sees the error. It corrects the import. No human intervention needed.

A pre-commit hook that runs mvn checkstyle:check catches style violations that instructions missed.

In Claude Code, hooks are configured in .claude/settings.json. A PostToolUse hook can trigger compilation after every file write. In Cursor, distributed hooks provide similar functionality. In Copilot’s agent mode, the agent runs builds and tests automatically as part of its workflow.

The principle: deterministic tools should enforce deterministic rules. Save your instruction budget for judgment calls that only a human (or an LLM) can make.

Type systems as guardrails

Your type system is your strongest ally against hallucination. It catches wrong imports, wrong method signatures, and wrong return types at compile time.

Lean into it:

Strict types in your DTOs. No Object fields. No raw Map<String, Object> when a typed class would work.
Optional<T> return types on DAO lookup methods. The AI can’t return null without the compiler complaining.
Custom value types for things like OrderId, UserId, Money. The AI can’t accidentally pass a UserId where an OrderId is expected.
Explicit Jackson annotations on every serialized field. The AI can’t rely on implicit naming conventions that might differ from what it assumes.

The more your type system constrains the code, the narrower the hallucination surface becomes. A method that returns Optional<Order> gives the AI much less room to hallucinate than one that returns Object.

Layered defense

No single technique eliminates hallucination. You need layers.

Layer 1: Instruction files. Version pins. Module boundaries. DO NOT lists. Reference files. These catch 70% of mistakes before they’re generated.

Layer 2: Hooks. Auto-compilation after file writes. Linter runs after edits. These catch the remaining 30% that instructions missed, immediately and automatically.

Layer 3: CI. Full build, test suite, static analysis. This catches anything that slipped through layers 1 and 2 before it reaches a pull request.

Layer 4: Human review. The AI is a junior developer. It writes code faster than any junior you’ve ever seen. It also makes mistakes that no senior would make. Review accordingly. Don’t rubber-stamp generated code because it looks clean.

Each layer catches things the previous layers missed. Together, they bring the effective hallucination rate close to zero. Not because any single layer is perfect, but because they cover each other’s blind spots.

The investment is front-loaded. Version pins take five minutes. A DO NOT list takes an hour of curation. Hooks take a few hours to configure. CI you probably already have. After that initial setup, the system maintains itself. The instruction file grows from observed failures. The hooks run automatically. The CI catches the rest.

The alternative, manually catching hallucinated imports in code review every day across every developer, is far more expensive.

Next up: Post 5: How to Give AI Assistants a Map of Your Messy Codebase, on configuring AI assistants for the messy reality of large existing codebases, legacy patterns, and migrations in progress.