Mprove
.mdx

DWH Schema

There are three types of Connection Schemas in Mprove:

  • Raw schemas are fetched automatically from SQL connections - tables, columns, data types, foreign keys, and indexes.
  • Extra schemas are YAML files with extra metadata - descriptions, examples, and relationship type.
  • Combined schemas merge Raw and Extra schemas. They are useful for LLMs and humans to build Malloy semantic models.

Viewing Combined Schemas

User see combined schemas in UI and explore Relationships Graph.

LLM can use MCP tool.

There is also get-schemas mprove CLI command.

Raw schemas are cached. User must click "Refresh" button in UI to refresh a raw schema cache when db schema changes. Raw schema cache is not refreshed on page load.

Mprove validates (rebuilds) all files on every file save in UI or repo sync through CLI. During validation a raw cache is provided for Malloy compilation in a compatible format. This way Malloy does not need to fetch schema from db. It improves compilation time.

Without a cache, Malloy retrieves the schema for each table separately. Even with a cache, in some cases, Malloy needs to sample data (some SnowFlake data types) to determine the data structure.

File Format

Each schema must be stored as connection_name.schema_name.schema YAML file:

schema: c1_postgres.fleetdescription: 'Fleet management system tracking vehicles, drivers, trips, and maintenance'tables:  - table: trips    description: 'Individual vehicle trips with route, distance, and assigned driver'    columns:      - column: trip_id        description: 'Unique identifier for each trip'        example: '80234'      - column: vehicle_id        description: 'The vehicle used for this trip'        example: '152'        relationships:          - to: vehicles.vehicle_id            type: many_to_one      - column: driver_id        description: 'The driver who operated the vehicle on this trip'        example: '38'        relationships:          - to: drivers.driver_id            type: many_to_one      - column: customer_id        description: 'The customer who requested this trip'        example: '4501'        relationships:          - to: customers.customer_id            to_schema: c1_postgres.billing            type: many_to_one      - column: status        description: 'Trip completion status'        example: 'completed'        cache_unique_values: true  - table: vehicles    description: 'Fleet vehicles with make, model, and current operational status'    columns:      - column: vehicle_id        description: 'Unique identifier for each vehicle'        example: '152'      - column: license_plate        description: 'Vehicle registration plate number'        example: 'AB-1234-CD'  - table: drivers    description: 'Drivers certified to operate fleet vehicles'    columns:      - column: driver_id        description: 'Unique identifier for each driver'        example: '38'      - column: name        description: 'Full name of the driver'        example: 'Carlos Rivera'        cache_unique_values: true  - table: maintenance_records    description: 'Scheduled and unscheduled vehicle maintenance events'    columns:      - column: record_id        description: 'Unique identifier for each maintenance record'        example: '6010'      - column: vehicle_id        description: 'The vehicle that was serviced'        example: '152'        relationships:          - to: vehicles.vehicle_id            type: many_to_one      - column: service_type        description: 'Type of maintenance performed'        example: 'oil change'        cache_unique_values: true

Reference

Schema

NameTypeRequiredDescription
schemastringyesConnection and schema name in connection.schema format
descriptionstringnoHuman-readable description of the database schema
tablesTable []yesList of table definitions

Table

NameTypeRequiredDescription
tablestringyesTable name
descriptionstringnoBusiness-oriented description of the table
columnsColumn []noList of column definitions

Column

NameTypeRequiredDescription
columnstringyesColumn name
descriptionstringnoWhat this column represents
examplestringnoA realistic sample value
cache_unique_valuesbooleannoCache unique values for column or not
relationshipsRelationship []noList of relationships to other columns

Relationship

NameTypeRequiredDescription
tostringyesTarget in table.column format (exactly one dot)
typeenumyesone_to_one, one_to_many, many_to_one, many_to_many
to_schemastringnoFor cross-schema relationships: connection_name.schema_name format

Relationship Rules

Defining one side is sufficient (no need to define both sides).

Preferred convention: Define many_to_one on the "many" side pointing to the "one" side.

Mirror consistency: If you define both sides of a relationship (e.g., A->B and B->A), the types must be consistent mirrors:

Side ASide B
many_to_oneone_to_many
one_to_oneone_to_one
many_to_manymany_to_many

No duplicates: A column cannot have two relationships with the same to and to_schema values.

Cross-schema: Use to_schema to reference a table in a different schema on the same connection. See the customer_id column in the example above. Cross-connection relationships are not supported.

Partial coverage: You do not need to define every table or column. You should include tables and columns that have relationships. Also include those columns where you want to add descriptions or examples. Omitted columns still appear in the combined schema with their raw metadata.

Cache Unique Values

cache_unique_values: true marks the column as recommended for caching unique values. If a column unique values are cached, the Explorer AI session can look up the correct model fields by value before producing the charts.

On this page