How to Give AI Assistants a Map of Your Messy Codebase

Part 5 of a 6-part series on configuring AI coding assistants for large codebases

Imagine you’ve read the first four posts in this series. You’ve set up a clean instruction file hierarchy. Module boundaries are documented. Version pins are in place. DO NOT lists are curated. Everything makes sense.

Then you open your actual codebase.

Half the services are on Dropwizard 2.1. Two are still on 1.3. One module uses JDBI3. Another still uses raw JDBC with hand-written ResultSet mappers. There’s a legacy/ package that nobody wants to touch but everyone depends on. The migration from XML configuration to YAML started eight months ago and stalled at 60%.

None of the best practices from the previous posts assumed this kind of mess. This post does.

Monorepo scoping

If your organization uses a monorepo with multiple Dropwizard services, the instruction file structure maps naturally to the directory structure. One root file for universal rules. One file per service.

CLAUDE.md                              <- Universal: Java version, Maven commands, service map
services/order-service/CLAUDE.md       <- Dropwizard 2.1, JDBI3, PostgreSQL
services/user-service/CLAUDE.md        <- Dropwizard 2.1, JDBI3, Redis sessions
services/legacy-gateway/CLAUDE.md      <- Dropwizard 1.3, raw JDBC, XML config
libs/common-models/CLAUDE.md           <- Shared DTOs, strict backward compatibility rules
libs/http-client/CLAUDE.md             <- Internal HTTP client, no breaking changes

The root file needs one thing above all else: a service map.

## Service map
- order-service: Order lifecycle and fulfillment. DW 2.1, JDBI3, PostgreSQL. Port 8080
- user-service: Authentication, profiles, preferences. DW 2.1, JDBI3, Redis. Port 8081
- legacy-gateway: External partner integrations. DW 1.3, raw JDBC, XML config. Port 8082
- notification-worker: Async email/SMS. DW 2.1, Kafka consumer. No database. Port 8083

Four lines. But they prevent the AI from applying Dropwizard 2.1 conventions to the 1.3 service. They prevent it from assuming every service has a database. They tell it which port each service runs on, so integration tests point to the right place.

The service-specific files handle everything else. The order service file talks about JDBI3 DAOs and order state machines. The legacy gateway file talks about raw JDBC and XML configuration. They never see each other’s rules.

Shared library rules

Shared libraries need the most defensive instruction files in your entire repo. A bad change to a shared library breaks every service that depends on it.

# common-models

Shared DTOs used by all services. Backward compatibility is mandatory.

## Rules
- NEVER remove a field from an existing DTO. Mark it @Deprecated instead
- NEVER change a field's type. Add a new field with the new type
- All new fields must have default values or be Optional<T>
- Jackson annotations are required on every field. No implicit naming
- Any change to this library requires running tests for ALL downstream services:
  `mvn test -pl services/order-service,services/user-service,services/legacy-gateway`

## Boundaries
- This library has NO dependencies on any service. It is a leaf dependency
- Never add service-specific logic here. If only one service needs it, it doesn't belong here

The tone is deliberate. “NEVER remove a field.” “NEVER change a field’s type.” Shared libraries are where strong language in instruction files actually earns its place. A casually generated breaking change here cascades across the entire organization.

Multi-repo contracts

If your services live in separate repositories, scoping is simpler. Each repo has its own instruction file. No cross-contamination by default.

The challenge is different. The AI doesn’t know what lives in other repos. It can’t read them. So when your order service calls the user service over HTTP, the AI might hallucinate the endpoint path, the request schema, or the response format.

The fix: document external contracts explicitly in each repo’s instruction file.

## External service contracts

### User Service (https://user-service.internal/api/v1)
- GET /users/{id} -> UserProfile (see docs/user-service-contract.md)
- POST /users/{id}/verify -> VerificationResult
- Auth: internal service token in X-Service-Auth header

### Notification Worker
- Publishes to Kafka topic: order.events
- Schema: see docs/order-event.avsc
- Consumer: notification-worker (separate repo, do not modify)

This gives the AI enough to generate correct HTTP client code without inventing endpoints. It’s not a full API spec. It’s just enough that the AI doesn’t have to guess.

For richer contracts, use the pointer pattern from Post 3. Keep the summary inline. Point to the full OpenAPI spec or Avro schema for details.

Configuring AI for legacy codebases

Here’s the counterintuitive thing about legacy codebases. AI often produces better output in brownfield than greenfield. In a new project, the AI makes arbitrary decisions: naming, structure, patterns. In an existing project, it has real code to pattern-match against. That constrains its choices in a good way.

But only if you tell it which patterns are current and which are legacy.

Without that guidance, the AI happily copies whatever pattern it finds first. If the first DAO it reads uses raw JDBC with manual ResultSet mapping (because that’s in the legacy/ package), it’ll generate more raw JDBC code. Even if every other DAO in the project uses JDBI3.

Migration status

When patterns are changing, document the status explicitly.

## Migration status

### Database access
- Current: JDBI3 DAOs with @SqlQuery/@SqlUpdate (services/order-service/dao/)
- Legacy: Raw JDBC with manual ResultSet mapping (services/legacy-gateway/db/)
- Rule: All NEW code uses JDBI3. Do not create new raw JDBC classes
- Rule: Do not modify legacy JDBC classes unless specifically asked to migrate them

### Configuration
- Current: YAML config extending io.dropwizard.Configuration
- Legacy: XML config in services/legacy-gateway/config/
- Rule: All NEW services use YAML. Legacy XML stays until full migration

### Dependency injection
- Current: HK2 constructor injection
- In progress: Some classes still use field injection with @Inject
- Rule: New classes use constructor injection. Migrate field injection when touching existing classes

The rules are unambiguous. “All NEW code uses JDBI3” tells the AI exactly what to do without pretending the legacy code doesn’t exist.

Follow-this-file technique

For brownfield codebases, the single most reliable instruction is: build it like this existing file.

## Reference implementations
- New REST resource: follow services/order-service/resource/OrderResource.java
- New JDBI DAO: follow services/order-service/dao/OrderDao.java
- New integration test: follow services/order-service/resource/OrderResourceIT.java
- New Kafka consumer: follow services/notification-worker/consumer/OrderEventConsumer.java

The AI reads the real file. It matches the imports, the annotations, the structure, the error handling. No room for hallucination. No room for mixing in legacy patterns.

This works because AI assistants are excellent mimics. Give them a good example and they’ll replicate it faithfully. Give them a vague description and they’ll improvise. In a brownfield codebase, you want replication, not improvisation.

Do-not-touch zones

Every brownfield codebase has code that works, nobody fully understands, and everyone is afraid to change. Name it. Protect it.

## Do not modify without explicit approval
- services/legacy-gateway/: Partner integration layer. Fragile. No test coverage.
  If asked to make changes here, stop and ask for confirmation first.
- libs/crypto-utils/: Encryption utilities. Security-sensitive. Changes require security review.
- services/order-service/service/OrderStateMachine.java: Core state machine.
  Complex edge cases. Do not refactor. Only add new transitions if explicitly requested.

This is an underused pattern. Most instruction files tell the AI what to do. Telling it what not to touch is equally valuable. An AI doesn’t have a sense of risk. It’ll cheerfully refactor your state machine into something cleaner and break six edge cases in the process.

Incremental documentation for AI assistants

You can’t spec an entire legacy codebase in one go. Don’t try.

The practical approach: spec only what you’re changing. Every time the AI works on a module, ensure that module has an instruction file. If it doesn’t, create a minimal one before starting the task.

A minimal instruction file for a previously-undocumented module takes five minutes:

# Billing Module

Invoicing and revenue reporting. Dropwizard 2.1 / JDBI3 / PostgreSQL.

## Key classes
- InvoiceResource.java: REST endpoints for invoice CRUD
- InvoiceService.java: Business logic and validation
- InvoiceDao.java: JDBI3 database access

## Conventions
- Follow patterns in InvoiceResource.java for new endpoints
- Money amounts use BigDecimal. Never double or float
- Audit logging via AuditService.record() for all mutations

Ten lines. Took five minutes. Covers the basics. The next person who works on billing gets better AI output. Over time, module by module, the specification gap closes.

This is incremental documentation. Each AI-assisted change leaves the instruction coverage slightly better than before. After six months, most active modules have some level of documentation. The cold, untouched modules don’t, and that’s fine. They’re not getting AI-assisted changes anyway.

Case study: Salesforce

One real-world case illustrates the potential. Salesforce documented a legacy migration of 275 Apex classes and 3,500+ files from an old managed package. The original estimate was two years of manual effort. Using AI-assisted refactoring with iteratively adjusted transformation rules, they completed it in roughly four months.

The key wasn’t one massive spec. It was narrow, file-specific transformation rules that got adjusted as the migration progressed. Each batch taught them something new. They updated the rules. The next batch went smoother. The instruction files evolved alongside the migration.

That’s the model. Not perfection upfront. Iteration.

Summary

Three principles for messy real-world codebases.

Scope by service in monorepos. The service map in the root file is worth more than any other single section. Service-specific files handle everything else.

Document migration status honestly. Current patterns, legacy patterns, in-progress transitions. Don’t pretend the legacy code isn’t there. Tell the AI what’s current and what’s frozen.

Close the specification gap incrementally. Five minutes of instruction file setup before each AI-assisted task. Module by module. The coverage builds itself.

An AI with a clear map of a messy system will outperform an AI with no map of a clean one. Don’t wait for the codebase to be perfect. Configure for the codebase you have.

Next up: Post 6: AI Follows Your Instructions 70% of the Time: How to Enforce the Rest, on managing instruction files across engineering teams, enforcing critical rules deterministically, and measuring impact.