Query
Query the semantic layer using SQL syntax or the Python API
Query the semantic layer using SQL syntax or the Python API.
Sidemantic SQL has different semantics than generic SQL:
- No
JOINkeyword. Joins are inferred from referenced fields and declared relationships. GROUP BYis ignored. Grouping is derived from selected dimensions.- The SQL interface does not interpolate
{{ ... }}. See Parameters & Templating.
Runnable Examples (Executed in CI)
These examples run as part of the docs test suite. They use a small in-memory DuckDB dataset.
Doctest fixture (models/semantic_layer.yml)
models:
- name: customers
sql: |
select * from (values
(1, 'Alice Johnson', 'North'),
(2, 'Bob Smith', 'South'),
(3, 'Carol Davis', 'North')
) as t(id, name, region)
primary_key: id
dimensions:
- name: region
type: categorical
sql: region
relationships:
- name: orders
type: one_to_many
foreign_key: customer_id
- name: orders
sql: |
select * from (values
(1, 1, 120.00, 'completed'),
(2, 2, 80.00, 'completed'),
(3, 1, 50.00, 'pending'),
(4, 3, 200.00, 'completed')
) as t(id, customer_id, amount, status)
primary_key: id
dimensions:
- name: status
type: categorical
sql: status
metrics:
- name: revenue
agg: sum
sql: amount
relationships:
- name: customers
type: many_to_one
foreign_key: customer_id
from pathlib import Path
from sidemantic import SemanticLayer
from sidemantic.loaders import load_from_directory
layer = SemanticLayer()
load_from_directory(layer, Path("./models"))
GROUP BY Is Ignored
uvx sidemantic query "SELECT orders.status, orders.revenue FROM orders GROUP BY orders.status ORDER BY orders.status" --models ./models
status,revenue
completed,400.00
pending,50.00
Grouping is still derived from selected dimensions even without GROUP BY:
uvx sidemantic query "SELECT orders.status, orders.revenue FROM orders ORDER BY orders.status" --models ./models
status,revenue
completed,400.00
pending,50.00
Cross-Model Queries (No JOIN Keyword)
uvx sidemantic query "SELECT customers.region, orders.revenue FROM orders ORDER BY customers.region" --models ./models
region,revenue
North,370.00
South,80.00
Query Methods
Sidemantic provides two ways to query your semantic layer:
- SQL Interface - Familiar SQL syntax with automatic join handling
- Python API - Programmatic query building with type safety
SQL Interface
Basic SQL Queries
from _setup import layer
# Simple metric query
result = layer.sql("SELECT orders.revenue FROM orders")
# With dimensions and filters
result = layer.sql("""
SELECT orders.revenue, orders.status
FROM orders
WHERE orders.status = 'completed'
""")
# Get results
rows = result.fetchall()
Supported SQL Features
| Feature | Supported | Notes |
|---|---|---|
| SELECT | ✅ Yes | Metrics and dimensions |
| **SELECT *** | ✅ Yes | Expands to all model fields |
| WHERE | ✅ Yes | Standard SQL conditions |
| ORDER BY | ✅ Yes | Sort by any field |
| LIMIT / OFFSET | ✅ Yes | Pagination support |
| Parameters | ❌ No | SQL interface does not interpolate templates |
| Cross-Model Queries | ✅ Yes | Auto-joins via relationships |
| Subqueries | ✅ Yes | Query semantic layer in subqueries |
| CTEs / WITH | ✅ Yes | Use CTEs with semantic queries |
| JOIN | ❌ No | Joins are automatic |
| GROUP BY | ⚠️ Ignored | Grouping is automatic based on selected dimensions |
| Aggregate Functions | ❌ No | Use pre-defined metrics |
| HAVING | ❌ No | Use WHERE on metrics instead |
| Window Functions | ❌ No | Use cumulative metrics |
| DISTINCT | ❌ No | Dimensions are auto-distinct |
SELECT
Select metrics and dimensions:
layer.sql("SELECT orders.revenue, orders.order_count, orders.status FROM orders")
Table prefixes are currently required:
from _setup import layer
layer.sql("SELECT orders.revenue FROM orders")
FROM Clause Uses Model Names
The FROM clause references semantic model names, not underlying table names:
models:
- name: orders
table: raw_orders_staging
from _setup import layer
layer.sql("SELECT orders.revenue FROM orders") # ✅ Use model name
WHERE
Filter with standard SQL conditions:
layer.sql("""
SELECT orders.revenue
FROM orders
WHERE orders.status = 'completed'
AND orders.order_date >= '2024-01-01'
AND orders.amount > 100
""")
ORDER BY and LIMIT
from _setup import layer
layer.sql("""
SELECT orders.revenue, orders.status
FROM orders
ORDER BY orders.revenue DESC
LIMIT 10
""")
Cross-Model Queries
Reference multiple models - joins happen automatically:
layer.sql("""
SELECT
orders.revenue,
customers.region,
products.category
FROM orders
""")
Parameters
The SQL interface does not interpolate {{ ... }}. Parameter interpolation exists only for filter strings in programmatic queries and is intentionally constrained. See Parameters & Templating.
Subqueries
Query semantic layer in subqueries:
from _setup import layer
layer.sql("""
SELECT *
FROM (
SELECT orders.revenue, orders.status FROM orders
) AS orders_agg
WHERE revenue > 1000
""")
Join semantic query results with regular tables:
layer.sql("""
SELECT
orders_agg.revenue,
r.continent
FROM (
SELECT orders.revenue, customers.region
FROM orders
) AS orders_agg
JOIN regions r ON orders_agg.region = r.region
""")
CTEs
Use CTEs with semantic layer queries:
from _setup import layer
layer.sql("""
WITH orders_by_region AS (
SELECT orders.revenue, orders.status, customers.region
FROM orders
)
SELECT * FROM orders_by_region
WHERE revenue > 500
ORDER BY revenue DESC
""")
Mix semantic and regular CTEs:
from _setup import layer
layer.sql("""
WITH
orders_agg AS (
SELECT orders.revenue, orders.status FROM orders
),
status_labels AS (
SELECT 'completed' as code, 'Complete' as label
UNION ALL SELECT 'pending', 'Pending'
)
SELECT o.revenue, s.label
FROM orders_agg o
JOIN status_labels s ON o.status = s.code
""")
Python API
Basic Query
from sidemantic import SemanticLayer
layer = SemanticLayer.from_yaml("models.yml")
# Query metrics and dimensions
result = layer.query(
metrics=["orders.revenue", "orders.order_count"],
dimensions=["orders.status"]
)
# Get results as DataFrame
df = result.fetchdf()
print(df)
Query Parameters
metrics (list[str] | None)
- List of metric references to aggregate
- Format:
"model.metric"or"metric"(for graph-level metrics) - Example:
["orders.revenue", "total_revenue"]
dimensions (list[str] | None)
- List of dimensions to group by
- Format:
"model.dimension" - Example:
["orders.status", "customers.region"]
filters (list[str] | None)
- SQL filter expressions
- Example:
["orders.status = 'completed'", "orders.amount > 100"]
segments (list[str] | None)
- Named segment references
- Example:
["orders.high_value", "customers.active"]
order_by (list[str] | None)
- Fields to order by with optional ASC/DESC
- Example:
["orders.revenue DESC", "orders.status"]
limit (int | None)
- Maximum rows to return
- Example:
100
ungrouped (bool)
- Return raw rows without aggregation (no GROUP BY)
- Default:
False
parameters (dict[str, any] | None)
- Deprecated. Interpolated only into filter strings in
query/compile - Requires registered
Parameterobjects (Python only) - Example:
{"start_date": "2024-01-01"}
use_preaggregations (bool | None)
- Override layer-level pre-aggregation setting
- Default:
None(uses layer setting)
Examples
Simple metric query:
from _setup import layer
result = layer.query(metrics=["orders.revenue"])
print(result.fetchone()) # (12345.67,)
With dimensions:
from _setup import layer
result = layer.query(
metrics=["orders.revenue"],
dimensions=["orders.status"]
)
rows = result.fetchall()
# status revenue
# 0 completed 10000.00
# 1 pending 2345.67
Cross-model query:
from _setup import layer
# Automatically joins orders -> customers
result = layer.query(
metrics=["orders.revenue"],
dimensions=["customers.region"]
)
With filters:
result = layer.query(
metrics=["orders.revenue"],
dimensions=["orders.status"],
filters=["orders.created_at >= '2024-01-01'"]
)
With segments:
result = layer.query(
metrics=["orders.revenue"],
segments=["orders.completed"]
)
Sorting and pagination:
from _setup import layer
result = layer.query(
metrics=["orders.revenue"],
dimensions=["orders.status"],
order_by=["orders.revenue DESC"],
limit=10
)
Ungrouped (raw rows):
# Get individual order details without aggregation
result = layer.query(
metrics=["orders.revenue"],
dimensions=["orders.order_id", "orders.customer_id"],
ungrouped=True,
limit=100
)
With parameters (filters only):
from sidemantic import Parameter
layer.graph.add_parameter(Parameter(name="start_date", type="date", default_value="2024-01-01"))
result = layer.query(
metrics=["orders.revenue"],
filters=["orders.order_date >= {{ start_date }}"],
parameters={"start_date": "2024-06-01"}
)
Result Formats
fetchone() - Single row as tuple:
from _setup import layer
result = layer.query(metrics=["orders.revenue"])
row = result.fetchone()
print(row) # (12345.67,)
fetchall() - All rows as list of tuples:
from _setup import layer
result = layer.query(
metrics=["orders.revenue"],
dimensions=["orders.status"]
)
rows = result.fetchall()
# [('completed', 10000.0), ('pending', 2345.67)]
fetchdf() - Pandas DataFrame:
result = layer.query(
metrics=["orders.revenue"],
dimensions=["orders.status"]
)
df = result.fetchdf()
description - Column metadata:
from _setup import layer
result = layer.query(metrics=["orders.revenue"])
columns = [desc[0] for desc in result.description]
print(columns) # ['revenue']
Compile Without Execution
Generate SQL without executing:
from _setup import layer
sql = layer.compile(
metrics=["orders.revenue"],
dimensions=["orders.status"],
filters=["orders.status = 'completed'"]
)
print(sql)
# WITH orders_cte AS (
# SELECT status, SUM(amount) as revenue
# FROM orders
# WHERE status = 'completed'
# GROUP BY status
# )
# SELECT * FROM orders_cte
Additional compile() parameters:
sql = layer.compile(
metrics=["orders.revenue"],
dimensions=["orders.status"],
filters=["orders.status = 'completed'"],
order_by=["orders.revenue DESC"],
limit=10,
offset=5, # Skip first 5 rows
dialect="postgres" # Override SQL dialect
)
CLI Query
Query from command line:
# Query to stdout
sidemantic query "SELECT orders.revenue FROM orders" --models ./models
# Save to file
sidemantic query "SELECT orders.revenue, orders.status FROM orders" \
--models ./models \
--output results.csv
# With connection override
sidemantic query "SELECT orders.revenue FROM orders WHERE orders.status = 'completed'" \
--models ./models \
--connection "postgres://localhost:5432/analytics"
See CLI for complete CLI reference.
Best Practices
Define Metrics in YAML
Define metrics once, query anywhere:
metrics:
- name: revenue
agg: sum
sql: amount
- name: high_value_revenue
agg: sum
sql: "CASE WHEN amount > 1000 THEN amount ELSE 0 END"
from _setup import layer
# Simple queries
layer.query(metrics=["orders.revenue"])
layer.sql("SELECT orders.revenue FROM orders")
Use Python API for Dynamic Queries
Build queries programmatically:
metrics = ["orders.revenue"]
dimensions = []
filters = []
# Add dimensions based on condition
if group_by_status:
dimensions.append("orders.status")
# Add filters based on user input
if start_date:
filters.append(f"orders.created_at >= '{start_date}'")
result = layer.query(
metrics=metrics,
dimensions=dimensions,
filters=filters
)
Use SQL for Static Queries
SQL is clearer for fixed queries:
# Dashboard widget
revenue_by_status = layer.sql("""
SELECT orders.revenue, orders.status
FROM orders
WHERE orders.created_at >= '2024-01-01'
ORDER BY orders.revenue DESC
""")
Compile for Inspection
Inspect generated SQL:
from _setup import layer
sql = layer.compile(
metrics=["orders.revenue"],
dimensions=["customers.region"]
)
print(sql) # See what query will be executed
Process Results Efficiently
Small results (< 1000 rows):
from _setup import layer
result = layer.query(metrics=["orders.revenue"])
rows = result.fetchall()
Large results:
result = layer.query(
metrics=["orders.revenue"],
dimensions=["orders.order_id"],
ungrouped=True
)
# Iterate rows
for row in result.fetchall():
process(row)
Streaming:
# Use Arrow for large datasets
result = layer.adapter.execute(sql)
batches = result.fetch_arrow_reader()
for batch in batches:
process(batch)
Comparison: Python API vs SQL
Python API:
- ✅ Type-safe query building
- ✅ Dynamic query construction
- ✅ Better for programmatic use
- ✅ Programmatic filters and segments
SQL Interface:
- ✅ Familiar syntax
- ✅ Readable for static queries
- ✅ Copy-paste friendly
Use Python API when:
- Building queries programmatically
- Constructing dynamic filters
- Need type safety
- Building applications/APIs
Use SQL Interface when:
- Writing dashboard queries
- Ad-hoc exploration
- Static report queries
- Prefer SQL syntax
Related Documentation
- Configuration - YAML configuration reference
- Models - Model definition
- Metrics - Metric types
- CLI - CLI query command
- Python API - Complete API reference