DWH Schema
There are three types of Connection Schemas in Mprove:
- Raw schemas are fetched automatically from SQL connections - tables, columns, data types, foreign keys, and indexes.
- Extra schemas are YAML files with extra metadata - descriptions, examples, and relationship type.
- Combined schemas merge Raw and Extra schemas. They are useful for LLMs and humans to build Malloy semantic models.
Viewing Combined Schemas
User see combined schemas in UI and explore Relationships Graph.
LLM can use MCP tool.
There is also get-schemas mprove CLI command.
Raw schemas are cached. User must click "Refresh" button in UI to refresh a raw schema cache when db schema changes. Raw schema cache is not refreshed on page load.
Mprove validates (rebuilds) all files on every file save in UI or repo sync through CLI. During validation a raw cache is provided for Malloy compilation in a compatible format. This way Malloy does not need to fetch schema from db. It improves compilation time.
Without a cache, Malloy retrieves the schema for each table separately. Even with a cache, in some cases, Malloy needs to sample data (some SnowFlake data types) to determine the data structure.
File Format
Each schema must be stored as connection_name.schema_name.schema YAML file:
schema: c1_postgres.fleetdescription: 'Fleet management system tracking vehicles, drivers, trips, and maintenance'tables: - table: trips description: 'Individual vehicle trips with route, distance, and assigned driver' columns: - column: trip_id description: 'Unique identifier for each trip' example: '80234' - column: vehicle_id description: 'The vehicle used for this trip' example: '152' relationships: - to: vehicles.vehicle_id type: many_to_one - column: driver_id description: 'The driver who operated the vehicle on this trip' example: '38' relationships: - to: drivers.driver_id type: many_to_one - column: customer_id description: 'The customer who requested this trip' example: '4501' relationships: - to: customers.customer_id to_schema: c1_postgres.billing type: many_to_one - column: status description: 'Trip completion status' example: 'completed' cache_unique_values: true - table: vehicles description: 'Fleet vehicles with make, model, and current operational status' columns: - column: vehicle_id description: 'Unique identifier for each vehicle' example: '152' - column: license_plate description: 'Vehicle registration plate number' example: 'AB-1234-CD' - table: drivers description: 'Drivers certified to operate fleet vehicles' columns: - column: driver_id description: 'Unique identifier for each driver' example: '38' - column: name description: 'Full name of the driver' example: 'Carlos Rivera' cache_unique_values: true - table: maintenance_records description: 'Scheduled and unscheduled vehicle maintenance events' columns: - column: record_id description: 'Unique identifier for each maintenance record' example: '6010' - column: vehicle_id description: 'The vehicle that was serviced' example: '152' relationships: - to: vehicles.vehicle_id type: many_to_one - column: service_type description: 'Type of maintenance performed' example: 'oil change' cache_unique_values: trueReference
Schema
| Name | Type | Required | Description |
|---|---|---|---|
| schema | string | yes | Connection and schema name in connection.schema format |
| description | string | no | Human-readable description of the database schema |
| tables | Table [] | yes | List of table definitions |
Table
| Name | Type | Required | Description |
|---|---|---|---|
| table | string | yes | Table name |
| description | string | no | Business-oriented description of the table |
| columns | Column [] | no | List of column definitions |
Column
| Name | Type | Required | Description |
|---|---|---|---|
| column | string | yes | Column name |
| description | string | no | What this column represents |
| example | string | no | A realistic sample value |
| cache_unique_values | boolean | no | Cache unique values for column or not |
| relationships | Relationship [] | no | List of relationships to other columns |
Relationship
| Name | Type | Required | Description |
|---|---|---|---|
| to | string | yes | Target in table.column format (exactly one dot) |
| type | enum | yes | one_to_one, one_to_many, many_to_one, many_to_many |
| to_schema | string | no | For cross-schema relationships: connection_name.schema_name format |
Relationship Rules
Defining one side is sufficient (no need to define both sides).
Preferred convention: Define many_to_one on the "many" side pointing to the "one" side.
Mirror consistency: If you define both sides of a relationship (e.g., A->B and B->A), the types must be consistent mirrors:
| Side A | Side B |
|---|---|
many_to_one | one_to_many |
one_to_one | one_to_one |
many_to_many | many_to_many |
No duplicates: A column cannot have two relationships with the same to and to_schema values.
Cross-schema: Use to_schema to reference a table in a different schema on the same connection. See the customer_id column in the example above. Cross-connection relationships are not supported.
Partial coverage: You do not need to define every table or column. You should include tables and columns that have relationships. Also include those columns where you want to add descriptions or examples. Omitted columns still appear in the combined schema with their raw metadata.
Cache Unique Values
cache_unique_values: true marks the column as recommended for caching unique values.
If a column unique values are cached, the Explorer AI session can look up the correct model fields by value before producing the charts.