It happens to often that a sprint ends and the data work is “done” but you know half the pipelines are fragile, the docs are outdated, and you still have a notebook full of “clean this up later” comments.
Most data engineers know what the work should look like. Everything is tested, documented, code is clean. But reality is a stream of tickets, ad hoc fixes, and half-written migration plans.
Over the last year, generative AI tools have quietly become the extra pair of hands many teams were missing. Not a replacement senior engineer, more like a very fast junior who never gets tired of boilerplate and can quickly search and synthesise across your internal wiki, within the context limits of your setup. Used well, it changes where time is spent, not just how fast code is typed.
Studies on AI coding assistants show clear productivity gains for routine coding work, especially for less experienced developers who benefit from faster scaffolding and documentation support, and that matches what I have seen on real teams.
The trick is to treat AI as that junior data engineer and manage it. That means deciding what to delegate, what to keep for yourself, and how to review its work.
Where AI helps in data engineering
There are a few categories of work where AI is an obvious force multiplier in a data team: text-heavy tasks, repetitive code patterns, and “explain this” questions. These map surprisingly well to things juniors already do, especially in teams with a long backlog of unglamorous work.
Documentation and metadata nobody has time to write
Most warehouses are full of tables that “everyone” uses, but very few people can describe precisely. Column descriptions are blank, business definitions live in Confluence, and the last data dictionary was updated three quarters ago.
Large language models can sit on top of catalogs, schema dumps, and internal docs to fill in a good first draft of this missing metadata.
Good delegation here looks like:
- Feed the model real context (schema, sample queries, business docs) and ask it to propose column descriptions for a specific table.
- Let it suggest owner, domain and tags for new datasets based on naming and lineage.
- Have it summarize long design docs into short, searchable descriptions that fit into your catalog tool.
This type of work is low risk and high leverage. You still review for correctness, but you are no longer staring at a blank description box for 200 columns.
Natural language to SQL and ETL scaffolding
Turning a business question into the first draft of a query or transformation is another sweet spot.
LLMs are quite good at translating a sentence like “daily active users by country over the last 90 days” into plausible SQL if they have enough context about schema and naming conventions. The key is to treat it as scaffolding, not as a final answer.
You might start with a prompt like:
You are a senior analytics engineer.
Given this schema and question, propose a first draft of a dbt model.
Schema:
- users(id, created_at, country_code, is_active)
- events(user_id, event_name, occurred_at)
Question:
Daily active users by country over the last 90 days.
Use our style: explicit joins, snake_case aliases, no select *.
The model can produce a reasonable starting point with the joins, date filters, and groupings in place.
You still check:
- Are we using the right “active” definition for this organisation?
- Does the timezone handling match how the product team thinks about days?
- Are there existing models this should be built on top of instead of raw tables?
Getting to a solid draft faster matters, but the business logic still belongs to a human.
Reverse engineering legacy schemas
Many teams carry legacy systems where the original designers left years ago. Table names are cryptic, relationships implicit, and documentation nonexistent. Here AI acts like a fast reader more than a coder. You can export schema definitions, a handful of representative queries, and maybe some wiki pages that reference those tables.
From there, LLMs are surprisingly capable of:
- Proposing an entity relationship narrative. For example: how orders, customers, products link together.
- Suggesting likely primary keys and foreign keys based on naming and usage.
- Highlighting which fields seem to be identifiers, timestamps, status codes.
The suggestions will not be perfect. What they do is collapse hours of manual skimming into a structured starting point you can confirm or correct.
Boilerplate around data quality checks
Another classic junior task is wiring up repetitive data quality checks. Think “this column should never be null”, “values must be within this range”, “this foreign key should link to that table”. Given a set of expectations, AI is quite good at spitting out assertions or tests in the framework you already use.
For example, if the team uses Great Expectations or dbt tests, you can feed it a simple spec and ask for concrete definitions.
# high level spec
checks:
- table: fact_orders
rules:
- row_count: daily_non_zero
- column: order_id
not_null: true
unique: true
- column: status
allowed_values: ["pending", "paid", "shipped", "cancelled"]
Prompt:
Convert this spec into dbt tests using our conventions.
Assume the model is called `fact_orders` in the mart.orders package.
Use schema.yml format.
The result will not be magic. It will, however, save time on the repetitive parts so the team can focus on deciding which rules actually matter.
Tasks to keep for humans
Treating AI like a junior also means you do not let it run the migration, redesign the warehouse, or negotiate with finance about cost trade-offs. There are entire categories of data engineering work that require context the model simply does not have.
Business rules and contract level guarantees
A lot of data work is hidden business logic. How is revenue recognised, when is an account considered churned, which events count towards active usage? These things live in meeting notes, contracts, and the heads of stakeholders, not just in code.
If you ask an AI to “write a pipeline that calculates monthly recurring revenue”, it will happily write something that looks plausible. But it is validating the behaviour implied by existing code and patterns, not the behaviour agreed with finance or product.
This is where the junior analogy is strongest. You might ask a junior to draft the implementation after a design has been agreed. You do not ask them to guess the business rule from existing code and then ship it without review.
The same holds for AI.
State, side effects, and production changes
Most AI coding assistants are at their best on pure, stateless functions. They struggle more with workflows that span multiple systems. Think of writing into Kafka, updating warehouse tables, and invalidating caches in the right order.
Backfills, late arriving data, schema evolution, and dependency ordering define a lot of the complexity. Delegating those changes to an AI without deep review is an invitation to subtle corruption.
Good boundaries here:
- Use AI to scaffold the Dagster or Airflow DAG structure, but decide schedule, retries, backfill strategy yourself.
- Let it suggest idempotency patterns, but you choose how to actually implement deduplication for your business keys.
- Do not let it write migration steps that modify production tables without a human walking through each change.
It is fine for a junior to draft the code for a backfill job. It is not fine for them to run it unsupervised on the main warehouse. It works the same for the AI.
Data modeling and domain boundaries
Deciding how to model the core entities of a business is a design problem, not a typing problem. Should there be one orders table or separate concepts for quotes, orders, and invoices? Do you surface flattened wide tables for analytics or push everyone through a semantic layer?
These are decisions about:
- How different teams understand the business.
- Which use cases matter over the next year.
- What trade offs you want to make between flexibility, performance, and governance.
AI can absolutely help by listing common patterns like star schemas, data vault, or domain-oriented data products. It can summarise internal discussions and highlight contradictions. But actually committing to a model that the rest of the company will live with is work that belongs to humans who own the consequences.
A practical delegation pattern
Thinking of AI as a junior data engineer is only useful if it leads to different daily behaviour. In practice, a simple delegation loop helps.
Pick a task and explicitly label it as “AI friendly” or “human only”.
- AI friendly: docs drafts, scaffold queries, unit tests for small functions, boilerplate DAGs.
- Human only: anything that changes contracts, business metrics, or production schemas.
For AI friendly tasks, describe intent and constraints clearly, including examples from your codebase. Let the model produce a draft. Review it the same way you would review a junior’s PR. Integrate, refactor, or discard. Over time these changes how you spend your attention. The machine handles more of the search, typing, and summarising.You spend more time on deciding what the system should do at all.
An example: AI as pair on a small data utility
Consider a simple validation function used in a pipeline that reads CSV files from external partners.
def parse_positive_int(value: str) -> int:
"""Parse a positive integer from a string.
Returns an int if value is a positive integer string.
Raises ValueError otherwise.
"""
value = value.strip()
if not value:
raise ValueError("Empty value")
try:
number = int(value)
except ValueError:
raise ValueError(f"Not an integer: {value!r}")
if number <= 0:
raise ValueError(f"Not positive: {number}")
return number
You can absolutely write tests for this by hand. Under time pressure, most people cover a couple of happy paths and one or two obvious failures. Instead, you give an AI this function and a prompt like:
You are a senior Python test engineer.
Generate pytest tests for this function.
Cover happy path, boundary values, and error handling.
Group cases with pytest.mark.parametrize where it makes sense.
Do not test implementation details.
The output will likely include cases such as:
- “1”, “42”, ” 7 ” parsing correctly.
- “0”, “-1” raising for non positive values.
- ““,”abc”, None handled as errors with appropriate exception types.
This is exactly the kind of structural test coverage that tends to be skipped, not because it is hard, but because it is boring. You still read the tests and decide if they reflect the real constraints of the integration.
You might add a case for extremely large integers if that matters or remove some overly specific assertion about error messages. The point is not that AI discovered a profound new edge case. It simply removed enough friction that the decent test suite exists at all.
The mental model that keeps you out of trouble
A useful way to keep the boundaries clear is this split:
- Let AI handle structural work: boilerplate code, parameterised tests, draft documentation, scaffolding DAGs, simple transforms.
- Keep humans on semantic work: business rules, contracts, migrations, stateful workflows, data modeling.
Structural work is where junior engineers and AI both shine – they can move quickly once given a clear pattern. When the semantic work is where understanding of the company, the domain, and the future roadmap lives.
If you keep that separation in mind, AI stops being a toy demo and starts feeling like a capable junior on the team. It will not design your lakehouse, but it might finally keep the tests, docs, and boilerplate from rotting while you do.


.webp)
