I know I need to have (although I don't know why) a GROUP BY
clause on the end of a SQL query that uses any aggregate functions like count
, sum
, avg
, etc:
SELECT count(userID), userName
FROM users
GROUP BY userName
When else would GROUP BY
be useful, and what are the performance ramifications?
To retrieve the number of widgets from each widget category that has more than 5 widgets, you could do this:
The "having" clause is something people often forget about, instead opting to retrieve all their data to the client and iterating through it there.
GROUP BY is similar to DISTINCT in that it groups multiple records into one.
This example, borrowed from http://www.devguru.com/technologies/t-sql/7080.asp, lists distinct products in the Products table.
The advantage of GROUP BY over DISTINCT, is that it can give you granular control when used with a HAVING clause.
Group By forces the entire set to be populated before records are returned (since it is an implicit sort).
For that reason (and many others), never use a Group By in a subquery.
Counting the number of times tags are used might be a google example:
If you simply want a distinct value of tags, I would prefer to use the
DISTINCT
statement.GROUP BY also helps when you want to generate a report that will average or sum a bunch of data. You can GROUP By the Department ID and the SUM all the sales revenue or AVG the count of sales for each month.