As Data Engineers, we are very good at moving and refining data. We build solid ingestions, clean up raw JSON logs, enforce strict schemas in our Silver layer, and prepare aggregated Gold tables.
But then, we face a major roadblock: How do we put this data in front of business users?
Usually, a standard dashboard like Power BI or Tableau is enough. But sometimes, a static dashboard isn't enough. What if the business team needs an interactive data entry form to fix data quality errors manually? What if the data scientists built a brilliant RAG (Retrieval-Augmented Generation) chatbot model, and they need a custom ChatGPT-like interface for the marketing team?
Until recently, doing this meant leaving the lakehouse. You had to provision an Azure App Service or a separate VM, configure networks, set up Docker containers, manage authentication (SSO), and—worst of all—create separate API pipelines to push data out of Databricks into an external database like Postgres.
It added massive infrastructure overhead, security risks, and technical debt.
In 2026, this friction is gone. Databricks has introduced Databricks Apps, a native feature that allows us to build and deploy full-stack web applications directly on the Databricks serverless platform.
What is Databricks Apps?
Let’s skip the marketing hype. From an engineering standpoint, Databricks Apps is a containerized web-hosting service built directly into your workspace.
Instead of setting up external servers, you write standard Python or Node.js code using popular web frameworks like Streamlit, Dash, Gradio, Flask, or React. When you hit deploy, Databricks automatically spins up an isolated, serverless container, provisions a secure URL, and runs your app.
Why this is a game-changer for data teams:
- Zero Infrastructure Management: No Kubernetes, no Azure App Services, no firewall configurations. The serverless compute scales automatically.
- Native Governance (Unity Catalog): Your web app runs inside the workspace. It inherits the same security, network isolation, and data governance policies you already built.
- Built-in Authentication: It automatically leverages OIDC/OAuth 2.0 and Single Sign-On (SSO). Anyone in your organization can access the app securely using their standard corporate login without you writing a single line of authentication code.
Under the Hood: The Architecture & Security
When you create a Databricks App, you aren’t running a notebook cell. You are deploying a real, isolated service.
The Identity (Service Principals)
Every app runs under its own dedicated Service Principal (SP) managed automatically by Databricks. This is a crucial design detail for Data Engineers.
Instead of the app using your personal credentials to read data, you grant specific Unity Catalog permissions to the app's Service Principal. For example, if your app needs to show a summary of sales, you explicitly run:
GRANT SELECT ON TABLE gold.sales_summary TO `app_identity_sales_tracker`;
This enforces the principle of least privilege. The app can only see exactly what you allow it to see.
Authentication Modes
Databricks Apps supports two main ways of handling user access:
- On-Behalf-Of (OBO) Token: The app acts on behalf of the person logged in. If a manager logs in, they only see the data they have access to in Unity Catalog. If an analyst logs in, they see their respective subset.
- App-Only (Service Principal) Token: The app uses its own fixed permissions, regardless of who is clicking the buttons. This is perfect for background tasks or automated data monitoring.
A Real-World Blueprint: The "Self-Service" Data Quality App
Let’s look at a practical scenario. The operations team complains that a specific country code in our supplier table is frequently entered incorrectly (e.g., writing "PL" as "Plnd" or "Poland"). They want a simple tool to search suppliers and fix the codes manually.
Instead of building an external application and setting up a database connector, we can build a 50-line Streamlit app directly inside Databricks.
Here is the file structure you would use:
my_data_app/
├── app.yaml # The configuration manifest
├── requirements.txt # Python dependencies
└── app.py # The Streamlit application code
1. The Configuration (app.yaml)
This manifest tells Databricks how to start your containerized service.
command:
- streamlit
- run
- app.py
env:
- name: DATABRICKS_WAREHOUSE_ID
value: "1234567890abcdef"
2. The Application Code (app.py)
Because the app runs natively inside Databricks, it can connect to your Serverless SQL Warehouses instantly using the built-in environment variables.
import streamlit as st
from databricks import sql
import os
st.title("Supplier Country Code Fixer")
# Connect using the native Databricks SQL connector
def get_db_connection():
return sql.connect(
server_hostname=os.getenv("DATABRICKS_HOST"),
http_path=f"/sql/1.0/warehouses/{os.getenv('DATABRICKS_WAREHOUSE_ID')}",
credentials_provider=lambda: os.getenv("DATABRICKS_OAUTH_TOKEN") # Uses OBO authentication
)
search_term = st.text_input("Enter Supplier Name:")
if search_term:
with get_db_connection() as conn:
with conn.cursor() as cursor:
# Fetch data directly from the Gold layer
cursor.execute(f"SELECT id, name, country_code FROM gold.suppliers WHERE name ILIKE '%{search_term}%'")
result = cursor.fetchall()
if result:
for row in result:
st.write(f"ID: {row.id} | Name: {row.name}")
new_code = st.text_input(f"Update Country Code for {row.name}:", value=row.country_code, key=row.id)
if st.button(f"Save Changes for {row.name}", key=f"btn_{row.id}"):
# Update the data in place safely
cursor.execute(f"UPDATE gold.suppliers SET country_code = '{new_code}' WHERE id = {row.id}")
st.success("Updated successfully!")
else:
st.warning("No suppliers found.")
Once you run databricks apps deploy, this app is live. It provides a unique URL like https://supplier-fixer-wsk123.region.databricksapps.com. You send that link to the operations team, and they are ready to work. No databases were moved, no ports were opened on the network, and the data stayed securely inside the Lakehouse.
Key Use Cases for Data Teams
Now that we can build front-ends easily, our role as Data Engineers expands. We aren't just building backend pipelines; we are building Data Products. Here are the three best ways to utilize this feature:
1. Custom Generative AI and RAG Interfaces
If you are using Databricks Vector Search and Model Serving to build LLM pipelines, Databricks Apps is the natural home for the UI. You can use Gradio or Streamlit to quickly build chat boxes that consume your internal knowledge base without exposing sensitive company data to public app environments.
2. Advanced Operational Dashboards
While standard dashboards are good for charts, they struggle with operational logic. If you need a dashboard that allows users to adjust machine learning feature parameters dynamically, run simulation models, or manually trigger a Databricks Workflow via the REST API, you can code those actions directly into a Python-based web app.
3. Data Entry and Annotation Tools
Before training custom ML models, data scientists often need labeled data. You can build internal applications that display rows of raw text or images from a Unity Catalog Volume and allow analysts to label them, writing the annotations directly back into a structured Delta Table.
Operational Reality: What to Watch For
As engineers, we need to consider the constraints before jumping into a new tool.
- Cold Starts: Because the platform relies on serverless compute, if an app hasn't been used for hours, the container might scale down to zero. The next user who opens the link might experience a delay of 20-30 seconds while the serverless resources provision.
- Python vs. Node.js Ecosystem: Python frameworks (like Streamlit) are highly optimized for data visualization. However, if your company has dedicated frontend web developers who want to use heavy React or Angular setups, they will need to handle dependencies using standard package.json manifests, which requires slightly more coordination with the backend data layers.
- Cost Structure: Databricks Apps are billed based on the compute time while running, utilizing the serverless capacity model. For internal tools used occasionally throughout the day, this is incredibly cost-effective. However, for an application left running 24/7 with massive concurrent users, monitor your consumption logs in the Unity Catalog System Tables to optimize cluster sizing.
Conclusion: Data Products, Not Just Data Pipes
For a long time, there was a strict wall between Data Engineering and Software Engineering. We managed the data warehouses, and someone else built the applications that consumed them.
Databricks Apps erases that boundary. It shifts our focus from simply maintaining "pipes" to delivering fully realized Data Products. By removing the infrastructure, networking, and governance bottlenecks, we can go from a clean Delta table to a secure, enterprise-grade web application in an afternoon.
Stop thinking about your data as something that sits quietly in a table. It is time to build interfaces around it.
Are you building your first internal application? Let me know in the comments if you prefer Streamlit for fast prototyping or custom Node.js frameworks for more design control!



.webp)
