Skip to content

GROUP BY and Aggregation

The GROUP BY clause is used to group documents based on specified fields and perform aggregation operations. This page explains how to use GROUP BY and aggregation functions in QueryLeaf.

Feature Support

Feature Support Notes
Basic GROUP BY ✅ Full Group by single or multiple fields
COUNT(*) ✅ Full Count documents in each group
COUNT(field) ✅ Full Count non-null values in each group
SUM ✅ Full Sum of values in each group
AVG ✅ Full Average of values in each group
MIN ✅ Full Minimum value in each group
MAX ✅ Full Maximum value in each group
Nested field grouping ✅ Full Group by embedded document fields
ORDER BY with GROUP BY ✅ Full Sort grouped results
HAVING clause ⚠️ Limited Limited support for filtering groups
Complex expressions ⚠️ Limited Limited support in grouping expressions
Window functions ❌ None No support for window functions

GROUP BY Syntax

SELECT 
  columns,
  AGG_FUNCTION(column) AS alias
FROM collection
[WHERE conditions]
GROUP BY columns
[ORDER BY columns]

Supported Aggregation Functions

QueryLeaf supports the following aggregation functions:

Function Description
COUNT(*) Count the number of documents in each group
COUNT(field) Count non-null values of a field in each group
SUM(field) Sum of field values in each group
AVG(field) Average of field values in each group
MIN(field) Minimum field value in each group
MAX(field) Maximum field value in each group

Basic GROUP BY Examples

Simple GROUP BY

-- Group by single field
SELECT category, COUNT(*) as count
FROM products
GROUP BY category

-- Group by single field with conditions
SELECT status, COUNT(*) as count
FROM orders
WHERE createdAt > '2023-01-01'
GROUP BY status

Multiple Aggregation Functions

-- Multiple aggregation functions
SELECT 
  category,
  COUNT(*) as count,
  AVG(price) as avg_price,
  MIN(price) as min_price,
  MAX(price) as max_price,
  SUM(price) as total_value
FROM products
GROUP BY category

Sorting Grouped Results

-- GROUP BY with ORDER BY
SELECT category, COUNT(*) as count
FROM products
GROUP BY category
ORDER BY count DESC

-- Multiple sort fields
SELECT status, COUNT(*) as count, SUM(total) as revenue
FROM orders
GROUP BY status
ORDER BY revenue DESC, count ASC

GROUP BY with Multiple Fields

-- Group by multiple fields
SELECT 
  category, 
  manufacturer, 
  COUNT(*) as count,
  AVG(price) as avg_price
FROM products
GROUP BY category, manufacturer

GROUP BY with Nested Fields

-- Group by nested field
SELECT 
  address.city, 
  COUNT(*) as user_count
FROM users
GROUP BY address.city

Translation to MongoDB

QueryLeaf translates GROUP BY operations to MongoDB's aggregation framework:

SQL Feature MongoDB Equivalent
GROUP BY $group stage
COUNT(*) $sum: 1
SUM(field) $sum: '$field'
AVG(field) $avg: '$field'
MIN(field) $min: '$field'
MAX(field) $max: '$field'
WHERE with GROUP BY $match stage before $group
ORDER BY with GROUP BY $sort stage after $group

Example Translation

SQL:

SELECT 
  category, 
  COUNT(*) as count, 
  AVG(price) as avg_price
FROM products
WHERE stock > 0
GROUP BY category
ORDER BY count DESC

MongoDB:

db.collection('products').aggregate([
  { $match: { stock: { $gt: 0 } } },
  { 
    $group: {
      _id: '$category',
      category: { $first: '$category' },
      count: { $sum: 1 },
      avg_price: { $avg: '$price' }
    }
  },
  { $sort: { count: -1 } }
])

Performance Considerations

  • GROUP BY operations use MongoDB's aggregation framework, which can be resource-intensive
  • Create indexes on fields used in GROUP BY and WHERE clauses
  • Be mindful of memory usage when grouping large collections
  • Consider using the allowDiskUse option for large datasets (via MongoDB driver)
  • Aggregation pipelines with multiple stages can impact performance

Limitations

  • Limited support for HAVING clause
  • No support for window functions
  • Limited support for complex expressions in GROUP BY
  • Non-standard handling of NULL values in grouping

Best Practices

  • Use WHERE conditions to limit the documents before grouping
  • Create indexes on frequently grouped fields
  • Keep aggregation pipelines simple when possible
  • Consider using MongoDB's native aggregation for very complex operations
  • For time-based grouping, consider pre-processing dates