GROUP BY and Aggregation
The GROUP BY clause is used to group documents based on specified fields and perform aggregation operations. This page explains how to use GROUP BY and aggregation functions in QueryLeaf.
Feature Support
Feature | Support | Notes |
---|---|---|
Basic GROUP BY | ✅ Full | Group by single or multiple fields |
COUNT(*) | ✅ Full | Count documents in each group |
COUNT(field) | ✅ Full | Count non-null values in each group |
SUM | ✅ Full | Sum of values in each group |
AVG | ✅ Full | Average of values in each group |
MIN | ✅ Full | Minimum value in each group |
MAX | ✅ Full | Maximum value in each group |
Nested field grouping | ✅ Full | Group by embedded document fields |
ORDER BY with GROUP BY | ✅ Full | Sort grouped results |
HAVING clause | ⚠️ Limited | Limited support for filtering groups |
Complex expressions | ⚠️ Limited | Limited support in grouping expressions |
Window functions | ❌ None | No support for window functions |
GROUP BY Syntax
SELECT
columns,
AGG_FUNCTION(column) AS alias
FROM collection
[WHERE conditions]
GROUP BY columns
[ORDER BY columns]
Supported Aggregation Functions
QueryLeaf supports the following aggregation functions:
Function | Description |
---|---|
COUNT(*) | Count the number of documents in each group |
COUNT(field) | Count non-null values of a field in each group |
SUM(field) | Sum of field values in each group |
AVG(field) | Average of field values in each group |
MIN(field) | Minimum field value in each group |
MAX(field) | Maximum field value in each group |
Basic GROUP BY Examples
Simple GROUP BY
-- Group by single field
SELECT category, COUNT(*) as count
FROM products
GROUP BY category
-- Group by single field with conditions
SELECT status, COUNT(*) as count
FROM orders
WHERE createdAt > '2023-01-01'
GROUP BY status
Multiple Aggregation Functions
-- Multiple aggregation functions
SELECT
category,
COUNT(*) as count,
AVG(price) as avg_price,
MIN(price) as min_price,
MAX(price) as max_price,
SUM(price) as total_value
FROM products
GROUP BY category
Sorting Grouped Results
-- GROUP BY with ORDER BY
SELECT category, COUNT(*) as count
FROM products
GROUP BY category
ORDER BY count DESC
-- Multiple sort fields
SELECT status, COUNT(*) as count, SUM(total) as revenue
FROM orders
GROUP BY status
ORDER BY revenue DESC, count ASC
GROUP BY with Multiple Fields
-- Group by multiple fields
SELECT
category,
manufacturer,
COUNT(*) as count,
AVG(price) as avg_price
FROM products
GROUP BY category, manufacturer
GROUP BY with Nested Fields
-- Group by nested field
SELECT
address.city,
COUNT(*) as user_count
FROM users
GROUP BY address.city
Translation to MongoDB
QueryLeaf translates GROUP BY operations to MongoDB's aggregation framework:
SQL Feature | MongoDB Equivalent |
---|---|
GROUP BY | $group stage |
COUNT(*) | $sum: 1 |
SUM(field) | $sum: '$field' |
AVG(field) | $avg: '$field' |
MIN(field) | $min: '$field' |
MAX(field) | $max: '$field' |
WHERE with GROUP BY | $match stage before $group |
ORDER BY with GROUP BY | $sort stage after $group |
Example Translation
SQL:
SELECT
category,
COUNT(*) as count,
AVG(price) as avg_price
FROM products
WHERE stock > 0
GROUP BY category
ORDER BY count DESC
MongoDB:
db.collection('products').aggregate([
{ $match: { stock: { $gt: 0 } } },
{
$group: {
_id: '$category',
category: { $first: '$category' },
count: { $sum: 1 },
avg_price: { $avg: '$price' }
}
},
{ $sort: { count: -1 } }
])
Performance Considerations
- GROUP BY operations use MongoDB's aggregation framework, which can be resource-intensive
- Create indexes on fields used in GROUP BY and WHERE clauses
- Be mindful of memory usage when grouping large collections
- Consider using the allowDiskUse option for large datasets (via MongoDB driver)
- Aggregation pipelines with multiple stages can impact performance
Limitations
- Limited support for HAVING clause
- No support for window functions
- Limited support for complex expressions in GROUP BY
- Non-standard handling of NULL values in grouping
Best Practices
- Use WHERE conditions to limit the documents before grouping
- Create indexes on frequently grouped fields
- Keep aggregation pipelines simple when possible
- Consider using MongoDB's native aggregation for very complex operations
- For time-based grouping, consider pre-processing dates