THE DATA CONTRACT PLANE · v0.7 · FOR DESIGN PARTNERS

One contract. Every engine. Every agent.

The Data Contract Plane for Open Lakehouses. A Data Contract describes one dataset along three coupled dimensions — what it means, who can see it, and what state it's in. Neksur compiles the Contract per engine, enforces it at the catalog and the write path, and proves enforcement as audit-grade evidence. Spark, Trino, Snowflake, Dremio, AI agents over MCP: same Contract, same guarantees, same audit trail.

v0.7 · design partner phase self-host BSL Core or managed SaaS SOC 2 Type II in progress
One Contract, enforced across every engine and agent SparkTrinoSnowflakeDremioAI agents over MCP
The state of multi-engine, multi-consumer lakehouses

The open lakehouse made three promises. It kept one.

Iceberg kept the first promise: one copy of the data. The other two quietly broke. We close the gap with a single object — the Contract.

01 · Promise 01

One copy of the data.

Kept

Iceberg solved it: one immutable table format, ACID, schema evolution, time travel. Every engine reads the same parquet from the same object store. The hard physics problem is done.

02 · Promise 02

Any engine, anywhere.

Broken in practice

Each engine ships its own access-control model — Spark, Trino, Snowflake Horizon, Dremio each honor a different one. Add an AI agent over MCP for a fifth. Same data, four to five ideas of who's allowed to see what. The auditor sees one of them.

03 · Promise 03

Governance you can defend.

Never delivered

The Unity Catalog April 2026 release documents what practitioners knew for two years — row filters and column masks do not flow through Iceberg REST to external engines. Horizon is the same; Polaris doesn't try. The 30-day evidence pull is a manual scramble across four engines and three tools.

The first promise was a physical-format problem. The second and third are semantic, access, and state problems — three things that travel together and must be enforced together. That is what a Contract is.

The 12-month window

Three events in 2026 made this possible — and unavoidable.

We are not building a category. We are inhabiting a category that the industry just opened. The window is short.

  1. Event 01 January 27, 2026 → Meaning

    Open Semantic Interchange v1.0 finalized.

    Snowflake, dbt Labs, Cube, AtScale, Databricks, and 40+ partners agreed on a vendor-neutral YAML for semantic metadata. The Meaning dimension now has a portable wire format. Neksur is OSI-native in the NEKSUR vendor namespace.

  2. Event 02 April 2026 → State

    Iceberg v3 shipped deletion vectors and row lineage.

    Both went GA. GDPR cascade erasure became a primitive. The State dimension — pinned snapshots, branch-aware retention, cryptographic time-travel — now has its storage substrate.

  3. Event 03 April 2026 → Access

    Unity Catalog published the limitation.

    Databricks documented that row filters and column masks on Unity-governed tables do not propagate to external engines via Iceberg REST. A structural property of every platform-native governance solution, not a bug. The Access dimension is the answer.

Three events. Three Contract dimensions. The platform-native vendors cannot close this by design — closing it requires being outside the platform.

The central object

One Contract per dataset. Three coupled dimensions. Every consumer reads the same one.

A Data Contract is a runtime-enforced object, not a YAML in a repo. It is the authoritative source: every other view of the dataset is reconciled back to it. One Contract, one history, one audit log, one rollback button.

01 · Meaning

What does this data mean?

Owns
Analytics Engineer
Reads
Governance, Platform
  • Metrics, dimensions, hierarchies, time intelligence.
  • Semantic invariants that survive dialect translation.
  • OSI representation in the NEKSUR vendor namespace.
  • Meaning is grounded, not free-floating: each metric is anchored to a glossary term in the ontology and to the tags in the taxonomy.

Guarantee. Same numbers in Spark, Trino, Snowflake, Dremio — and when an agent asks. A metric inherits the sensitivity of the columns it is computed over; declassifying it takes an explicit governance-steward attestation, and the numbers stay exact.

02 · Access

Who can see what, under which conditions?

Owns
Governance Officer
Reads
Analytics, Platform
  • RBAC, ABAC, row filters, column masks, retention, agent scope.
  • Authored in one language, compiled to whatever the engine speaks.
  • Classify once, govern everywhere: a policy scoped to a sensitivity tag governs every column that carries that tag, automatically, as detection finds new ones.

Guarantee. Enforced at the catalog, before the write commits, during the write, and as signed evidence afterward — the same filter whether a human or an agent is asking.

03 · State

What state is this data in, right now?

Owns
Platform Engineer
Reads
Analytics, Governance
  • A durable as-of pin on the agreed Iceberg snapshot.
  • Schema version, partition spec, branch or tag, compaction, freshness budget.
  • Cross-engine pinning during destructive operations; cross-engine reconciliation when reality and expectation diverge.

Guarantee. The pin is the anchor every attestation hangs from. It answers what was certified, as of which snapshot — not whatever happens to be newest. The Contract enforces which one you meant.

Meaning, Access, State are coupled, not parallel.

A policy is a function of classification.

A snapshot pin is a function of policy.

A metric is a function of physical state.

Every attestation is anchored as of one pinned snapshot.

The Contract is the authoritative root that makes them travel together.

One authoritative root

Every view of the dataset reconciles back to the Contract.

A Data Contract is a runtime-enforced object, not a YAML in a repo. One Contract, one history, one audit log, one rollback button — Meaning, Access, and State travelling together.

One ritual, three personas

Three jobs. One lifecycle. Every change to a Contract takes the same path.

Whether an Analytics Engineer redefines a metric, a Governance Officer adds a column mask, or a Platform Engineer pins a snapshot — the change takes the same six steps, surfaces in the same review queue, deploys with the same mechanism, and lands in the same audit log. Cross-persona changes happen inside one Contract review with all three signatures.

Define

Author and version the Contract along any dimension.

  • Pull-request workflow, golden tests, downstream impact analysis.
  • The change exists as a draft until reviewed.

Enforce

Going live is gated, not automatic.

  • The deploy-to-active transition runs data-quality checks and cross-engine reconciliation against the pinned snapshot.
  • A non-breaking change that passes advances on its own; a breaking change escalates to human sign-off.
  • Once active, every read and write by every consumer is subject to the Contract. No engine and no agent can bypass it; attempts are audit events.

Prove

Every enforcement event is recorded as signed evidence.

  • The full history of every Contract is replayable.
  • SOC 2, GDPR, HIPAA evidence pulls go from 30-day scrambles to one query.
draft

Author proposes a change to one or more dimensions of one Contract.

review

Peer / governance / steward review; downstream impact surfaced.

compile

Engine-specific artifacts produced (SQL dialects, OPA bundles, Iceberg ops); validated against golden tests.

deploy

The data gate: DQ checks plus cross-engine reconciliation against the pinned snapshot must pass. Non-breaking changes advance to active automatically; breaking changes escalate to sign-off before atomic cutover.

enforce

Every consumer reads only through the deployed Contract, identically.

audit

Every enforcement event recorded as signed evidence; full history replayable.

Analytics Engineer adds a metric

Governance flags a PII implication at review; per-dialect SQL compiles; Trino, Spark, and the agent all serve the new metric; the change is signed and immutable.

Governance Officer adds a column mask

Platform flags downstream BI impact at review; an OPA bundle compiles per engine; the column is masked on every consumer; the mask event chain is signed.

Platform Engineer pins a snapshot for a regulatory hold

Analytics flags two affected metrics at review; cross-engine pin commands compile; every consumer reads from the pinned snapshot; the hold is signed evidence.

Where the Contract is enforced

Four named guarantees. Opt-in by tier. Each one closes a different audit question.

The Contract is enforced at four independent layers — catalog, write path, post-commit scan, compute boundary. Each is an opt-in guarantee with a specific audit answer. You buy as far up the ladder as your auditors require.

01 ·

Catalog-level enforcement

included in Core

Every commit to the Iceberg REST catalog passes a policy gateway. Schema, write ACL, and retention are evaluated at commit time; violations rejected with a 403 and audit-logged. Works with Polaris, Unity, Glue, Snowflake Horizon, Nessie.

Audit question answered

"Can a non-compliant write enter the catalog?" — No. Caught before commit.

02 ·

Write-path enforcement

included in Defense-in-Depth tier

A Spark Catalyst extension and DataFrameWriter SDK redact, mask, or KMS-encrypt columns before parquet is written. Sensitive data never lands on disk in violating form.

Audit question answered

"Was sensitive data ever physically written before being masked?" — No. Masked in flight, before the file lands.

03 ·

Continuous compliance scan

Defense-in-Depth (regex); ML detection included in Intelligence tier

Async scanners watch every committed file — pattern detection on Defense-in-Depth, ML anomaly detection on Intelligence. Detections land in the metadata graph and fire Slack and PagerDuty alerts at configurable confidence.

Audit question answered

"How fast do you detect a leak if the upstream control failed?" — Minutes, on every file, continuously.

04 ·

Compute isolation

included in Defense-in-Depth tier

No long-lived storage credentials. Credentials are vended per-table, per-operation, short-TTL, scoped to exactly the rows and columns the Contract allows the principal to see.

Audit question answered

"What can a compromised engine reach?" — Only what the Contract allows the principal to reach, for the duration of one credential lease.

Guarantees stack — each is the same Contract enforced at a different point in the data path. You opt in as far as your audit posture requires.

How we read against the field

A new category. Adjacent products.

Polaris solves the catalog API. Cube solves semantics on top of warehouses. Atlan curates active metadata. Unity owns Databricks compute and its catalog. None of them is the Contract. Neksur is.

Capability Neksur Unity Catalog Polaris Atlan Cube dbt SL
Consumption
Cross-engine semantic-layer contract
OSI v1.0 import/export (roundtrip-stable) 1 3
MCP server with policy-aware tools 2
Knowledge graph queryable (openCypher)
Coordination
Snapshot pinning across engines
Schema-cache invalidation (cross-engine p99 < 5s)
Cross-engine semantic consistency
Write-conflict resolution
Policy
Cross-engine row filters / column masks *
Write-path enforcement
Cryptographic audit chain
GDPR cascade via lineage
  • full support
  • partial / limited
  • implementing / pending (Phase 1 roadmap)
  • not in product
  • * Unity Catalog April 2026: documented limitation re Iceberg REST + row filters and column masks. source
  • 1 Open Semantic Interchange v1.0 specification finalized 2026-01-27. source
  • 2 AWS Industries blog 2026-02-06 — measured 90% LLM token reduction via semantic compression. source
  • 3 dbt Labs is a founding OSI coalition member (2025-09-23) and a v1.0 specification signatory. source
Three roles. One Contract.

You own one dimension. You read the other two. You share the lifecycle.

The Analytics Engineer owns Meaning. The Governance Officer owns Access. The Platform Engineer owns State. All three see the same Contract. All three review each other's changes. Nobody is locked out of a dimension because they don't own it.

Analytics Engineer

The Analytics Engineer.

Pain

You define a metric in dbt. It compiles to one thing in Spark and a different thing in Trino. The numbers in the dashboard don't match the numbers in the notebook. The fix is a spreadsheet of dialect quirks you maintain by hand.

Neksur

Meaning lives in the Contract. Compiled per engine, golden-tested per engine, identical numbers across all of them. When you add a metric, the Governance Officer sees it in the review queue and flags PII implications before deploy. You own Meaning. You read Access and State. You ship faster because nobody discovers a contradiction in production.

Governance Officer

The Governance Officer.

Pain

Your auditor wants row-filter proof per engine. Your DPO wants GDPR cascade across snapshots, backups, and ML training sets. You currently maintain four policy files in four engines and pray they agree.

Neksur

Access lives in the Contract. Defined once, compiled per engine, enforced at the catalog and the write path, signed as evidence. When you add a column mask, the Analytics Engineer sees the impact on their metrics before deploy. You own Access. You read Meaning and State. The 30-day evidence pull is one query.

Platform Engineer

The Platform Engineer.

Pain

You pin a snapshot for a regulatory hold in Spark. Trino reads from the next snapshot. Dremio reads from the one after. The agent reads whatever it decides. The hold is theoretical.

Neksur

State lives in the Contract. Cross-engine pinning, branch-aware retention, freshness budgets, compaction coordination. When you pin a snapshot, the Analytics Engineer sees which metrics it affects before deploy. You own State. You read Meaning and Access. The hold is real on every consumer, including agents.

Three roles. One Contract. One lifecycle. Three signatures on the same review.

AI and agents

One Contract. Spark, Trino, Snowflake — and the agent that reads them.

MCP-speaking agents read through the same compiled Contract as every engine — same Meaning, same Access, same State, same audit log.

Same Contract, agent surface

The Neksur MCP server projects the Contract to the agent in OSI representation. It reads the same three dimensions every engine does:

Meaning
metric and dimension definitions that make a query unambiguous.
Access
its own policy scope — one more entry in the Contract's Access dimension, alongside every other consumer.
State
the pinned snapshot and schema version that guarantee the data it reasons about is the data that was governed.

Every agent decision is recorded in the same audit chain as every engine read.

Token cost of ungoverned metadata

90%

fewer tokens, same answer — AWS Industries, Feb 2026

What an LLM agent spends navigating ungoverned metadata in one telco RCA — a measurement of the problem, not a Neksur deliverable. When Meaning is pre-compiled and Access is pre-scoped, the agent reaches the answer without spending tokens on navigation.

Design-partner phase

Six design partners. One quarter. Direct line to founders.

We're picking the Q3 2026 cohort. Three engines minimum, governance pain real, willing to give us 2-3 hours per month. Self-host BSL Core or managed SaaS — your choice, mixable. Free Defense-in-Depth tier for 12 months from PoC.

What you get

  • free Defense-in-Depth tier through GA — 12 months full commercial features at every tier you need
  • perpetual commercial license credits: 50% off first 24 months post-GA
  • direct shaping of the Contract roadmap; PRs welcome on the BSL Core repo
  • white-glove migration from your current governance setup
  • co-marketing and reference architecture (your logo, your call — opt in only)
  • influence on the BSL → Apache 2.0 Change Date (accelerate if your cohort says so)

What we need

  • 2-3 hours per month of your platform engineering time
  • permission to deploy in a non-prod environment first; production after validation
  • honest feedback on what breaks (we ship fixes fast)
  • quarterly reference call with prospective customers (anonymized if you prefer)
  • three-engine minimum (Spark + Trino + one of Snowflake / Dremio / Athena / Flink)
  • mix mode acceptable: self-host (BSL Core) for some tables, managed SaaS for others
Talk to us about design partnership →

6 spots. We've already had conversations with 12 candidates. Apply by July 1, 2026.

Product The Contract Compare Pricing Customers Talk to us