One copy of the data.
KeptIceberg solved it: one immutable table format, ACID, schema evolution, time travel. Every engine reads the same parquet from the same object store. The hard physics problem is done.
The Data Contract Plane for Open Lakehouses. A Data Contract describes one dataset along three coupled dimensions — what it means, who can see it, and what state it's in. Neksur compiles the Contract per engine, enforces it at the catalog and the write path, and proves enforcement as audit-grade evidence. Spark, Trino, Snowflake, Dremio, AI agents over MCP: same Contract, same guarantees, same audit trail.
Iceberg kept the first promise: one copy of the data. The other two quietly broke. We close the gap with a single object — the Contract.
Iceberg solved it: one immutable table format, ACID, schema evolution, time travel. Every engine reads the same parquet from the same object store. The hard physics problem is done.
Each engine ships its own access-control model — Spark, Trino, Snowflake Horizon, Dremio each honor a different one. Add an AI agent over MCP for a fifth. Same data, four to five ideas of who's allowed to see what. The auditor sees one of them.
The Unity Catalog April 2026 release documents what practitioners knew for two years — row filters and column masks do not flow through Iceberg REST to external engines. Horizon is the same; Polaris doesn't try. The 30-day evidence pull is a manual scramble across four engines and three tools.
The first promise was a physical-format problem. The second and third are semantic, access, and state problems — three things that travel together and must be enforced together. That is what a Contract is.
We are not building a category. We are inhabiting a category that the industry just opened. The window is short.
Snowflake, dbt Labs, Cube, AtScale, Databricks, and 40+ partners agreed on a vendor-neutral YAML for semantic metadata. The Meaning dimension now has a portable wire format. Neksur is OSI-native in the NEKSUR vendor namespace.
Both went GA. GDPR cascade erasure became a primitive. The State dimension — pinned snapshots, branch-aware retention, cryptographic time-travel — now has its storage substrate.
Databricks documented that row filters and column masks on Unity-governed tables do not propagate to external engines via Iceberg REST. A structural property of every platform-native governance solution, not a bug. The Access dimension is the answer.
Three events. Three Contract dimensions. The platform-native vendors cannot close this by design — closing it requires being outside the platform.
A Data Contract is a runtime-enforced object, not a YAML in a repo. It is the authoritative source: every other view of the dataset is reconciled back to it. One Contract, one history, one audit log, one rollback button.
Guarantee. Same numbers in Spark, Trino, Snowflake, Dremio — and when an agent asks. A metric inherits the sensitivity of the columns it is computed over; declassifying it takes an explicit governance-steward attestation, and the numbers stay exact.
Guarantee. Enforced at the catalog, before the write commits, during the write, and as signed evidence afterward — the same filter whether a human or an agent is asking.
Guarantee. The pin is the anchor every attestation hangs from. It answers what was certified, as of which snapshot — not whatever happens to be newest. The Contract enforces which one you meant.
Meaning, Access, State are coupled, not parallel.
A policy is a function of classification.
A snapshot pin is a function of policy.
A metric is a function of physical state.
Every attestation is anchored as of one pinned snapshot.
The Contract is the authoritative root that makes them travel together.
A Data Contract is a runtime-enforced object, not a YAML in a repo. One Contract, one history, one audit log, one rollback button — Meaning, Access, and State travelling together.
Whether an Analytics Engineer redefines a metric, a Governance Officer adds a column mask, or a Platform Engineer pins a snapshot — the change takes the same six steps, surfaces in the same review queue, deploys with the same mechanism, and lands in the same audit log. Cross-persona changes happen inside one Contract review with all three signatures.
Author and version the Contract along any dimension.
Going live is gated, not automatic.
Every enforcement event is recorded as signed evidence.
Author proposes a change to one or more dimensions of one Contract.
Peer / governance / steward review; downstream impact surfaced.
Engine-specific artifacts produced (SQL dialects, OPA bundles, Iceberg ops); validated against golden tests.
The data gate: DQ checks plus cross-engine reconciliation against the pinned snapshot must pass. Non-breaking changes advance to active automatically; breaking changes escalate to sign-off before atomic cutover.
Every consumer reads only through the deployed Contract, identically.
Every enforcement event recorded as signed evidence; full history replayable.
Analytics Engineer adds a metric
Governance flags a PII implication at review; per-dialect SQL compiles; Trino, Spark, and the agent all serve the new metric; the change is signed and immutable.
Governance Officer adds a column mask
Platform flags downstream BI impact at review; an OPA bundle compiles per engine; the column is masked on every consumer; the mask event chain is signed.
Platform Engineer pins a snapshot for a regulatory hold
Analytics flags two affected metrics at review; cross-engine pin commands compile; every consumer reads from the pinned snapshot; the hold is signed evidence.
The Contract is enforced at four independent layers — catalog, write path, post-commit scan, compute boundary. Each is an opt-in guarantee with a specific audit answer. You buy as far up the ladder as your auditors require.
Every commit to the Iceberg REST catalog passes a policy gateway. Schema, write ACL, and retention are evaluated at commit time; violations rejected with a 403 and audit-logged. Works with Polaris, Unity, Glue, Snowflake Horizon, Nessie.
Audit question answered
"Can a non-compliant write enter the catalog?" — No. Caught before commit.
A Spark Catalyst extension and DataFrameWriter SDK redact, mask, or KMS-encrypt columns before parquet is written. Sensitive data never lands on disk in violating form.
Audit question answered
"Was sensitive data ever physically written before being masked?" — No. Masked in flight, before the file lands.
Async scanners watch every committed file — pattern detection on Defense-in-Depth, ML anomaly detection on Intelligence. Detections land in the metadata graph and fire Slack and PagerDuty alerts at configurable confidence.
Audit question answered
"How fast do you detect a leak if the upstream control failed?" — Minutes, on every file, continuously.
No long-lived storage credentials. Credentials are vended per-table, per-operation, short-TTL, scoped to exactly the rows and columns the Contract allows the principal to see.
Audit question answered
"What can a compromised engine reach?" — Only what the Contract allows the principal to reach, for the duration of one credential lease.
Guarantees stack — each is the same Contract enforced at a different point in the data path. You opt in as far as your audit posture requires.
Polaris solves the catalog API. Cube solves semantics on top of warehouses. Atlan curates active metadata. Unity owns Databricks compute and its catalog. None of them is the Contract. Neksur is.
| Capability | Neksur | Unity Catalog | Polaris | Atlan | Cube | dbt SL |
|---|---|---|---|---|---|---|
| Consumption | ||||||
| Cross-engine semantic-layer contract | ● | — | — | — | ○ | ○ |
| OSI v1.0 import/export (roundtrip-stable) | ◐1 | — | — | ○ | ○ | ●3 |
| MCP server with policy-aware tools | ◐2 | — | — | — | ○ | — |
| Knowledge graph queryable (openCypher) | ● | — | — | ○ | — | — |
| Coordination | ||||||
| Snapshot pinning across engines | ● | ○ | — | — | — | — |
| Schema-cache invalidation (cross-engine p99 < 5s) | ● | ○ | — | — | — | — |
| Cross-engine semantic consistency | ● | — | — | — | ○ | — |
| Write-conflict resolution | ● | ○ | — | — | — | — |
| Policy | ||||||
| Cross-engine row filters / column masks | ● | ○* | ○ | — | — | — |
| Write-path enforcement | ● | — | — | — | — | — |
| Cryptographic audit chain | ● | ○ | — | — | — | — |
| GDPR cascade via lineage | ● | ○ | — | ○ | — | — |
The Analytics Engineer owns Meaning. The Governance Officer owns Access. The Platform Engineer owns State. All three see the same Contract. All three review each other's changes. Nobody is locked out of a dimension because they don't own it.
You define a metric in dbt. It compiles to one thing in Spark and a different thing in Trino. The numbers in the dashboard don't match the numbers in the notebook. The fix is a spreadsheet of dialect quirks you maintain by hand.
Meaning lives in the Contract. Compiled per engine, golden-tested per engine, identical numbers across all of them. When you add a metric, the Governance Officer sees it in the review queue and flags PII implications before deploy. You own Meaning. You read Access and State. You ship faster because nobody discovers a contradiction in production.
Your auditor wants row-filter proof per engine. Your DPO wants GDPR cascade across snapshots, backups, and ML training sets. You currently maintain four policy files in four engines and pray they agree.
Access lives in the Contract. Defined once, compiled per engine, enforced at the catalog and the write path, signed as evidence. When you add a column mask, the Analytics Engineer sees the impact on their metrics before deploy. You own Access. You read Meaning and State. The 30-day evidence pull is one query.
You pin a snapshot for a regulatory hold in Spark. Trino reads from the next snapshot. Dremio reads from the one after. The agent reads whatever it decides. The hold is theoretical.
State lives in the Contract. Cross-engine pinning, branch-aware retention, freshness budgets, compaction coordination. When you pin a snapshot, the Analytics Engineer sees which metrics it affects before deploy. You own State. You read Meaning and Access. The hold is real on every consumer, including agents.
Three roles. One Contract. One lifecycle. Three signatures on the same review.
MCP-speaking agents read through the same compiled Contract as every engine — same Meaning, same Access, same State, same audit log.
The Neksur MCP server projects the Contract to the agent in OSI representation. It reads the same three dimensions every engine does:
Every agent decision is recorded in the same audit chain as every engine read.
fewer tokens, same answer — AWS Industries, Feb 2026
What an LLM agent spends navigating ungoverned metadata in one telco RCA — a measurement of the problem, not a Neksur deliverable. When Meaning is pre-compiled and Access is pre-scoped, the agent reaches the answer without spending tokens on navigation.
We're picking the Q3 2026 cohort. Three engines minimum, governance pain real, willing to give us 2-3 hours per month. Self-host BSL Core or managed SaaS — your choice, mixable. Free Defense-in-Depth tier for 12 months from PoC.
6 spots. We've already had conversations with 12 candidates. Apply by July 1, 2026.