10.5. Group Clauses

Grouping is also supported, like in SQL.

For each incoming tuple, the expression in the group clause is evaluated to an atomic. The value of this atomic is called a grouping key. The incoming tuples are then grouped according to the grouping key -- one group for each value of the grouping key.

For each group, a tuple is output, in which:

Each grouping variable (appearing in the group clause) is bound to the group's key corresponding to this variable.
Each other (non-grouping) variable is bound to the sequence obtained by concatenating all original values of the variable within the group. Aggregations can then be done on these variables in further clauses.

Here is an example:

for $i in (1, 2),
    $j in (3, 4)
group by $j
...

The first for clause produces four tuples (this is again an ad-hoc syntax for illustrative purposes):

"$i" : 1, "$j" : 3
"$i" : 1, "$j" : 4
"$i" : 2, "$j" : 3
"$i" : 2, "$j" : 4

Then the group clause groups according the value of $j. There are two distinct values (3 and 4), so that this results in two groups.

Group 1 (key $j : 3)
$i : 1, $j : 3
$i : 2, $j : 3

Group 2 (key $j : 4)
$i : 1, $j : 4
$i : 2, $j : 4

In each output tuple, $j is the grouping variable and is bound to the key of the group. $i is non-grouping and is bound to the sequence of all values in the group.

$i : (1, 2), $j : 3
$i : (1, 2), $j : 4

The following query is equivalent to "SELECT question_id FROM answers GROUP BY question_id".

Example 10.13. A group by clause.

for $answer in collection("answers")
group by $question := $answer.question_id
return { "question" : $question }

Results:

{
  "question" : 5453872
}
{
  "question" : 6183352
}
{
  "question" : 4720508
}
{
  "question" : 4419499
}
{
  "question" : 37823
}

The following query is equivalent to "SELECT question_id, COUNT(*) FROM answers GROUP BY question_id".

Example 10.14. A group by clause using count aggregation.

for $answer in collection("answers")
group by $question := $answer.question_id
return {
  "question" : $question,
  "count" : count($answer)
}

Results:

{
  "question" : 5453872,
  "count" : 1
}
{
  "question" : 6183352,
  "count" : 2
}
{
  "question" : 4720508,
  "count" : 1
}
{
  "question" : 4419499,
  "count" : 2
}
{
  "question" : 37823,
  "count" : 2
}

The following query is equivalent to "SELECT question_id, AVG(score) FROM answers GROUP BY question_id".

Example 10.15. A group by clause using average aggregation.

for $answer in collection("answers")
group by $question := $answer.question_id
return {
  "question" : $question,
  "average score" : avg($answer.score)
}

Results:

{
  "question" : 5453872,
  "average score" : 6
}
{
  "question" : 6183352,
  "average score" : 0.5
}
{
  "question" : 4720508,
  "average score" : 34
}
{
  "question" : 4419499,
  "average score" : 9
}
{
  "question" : 37823,
  "average score" : 5.5
}

JSONiq's group by is more flexible than SQL and is fully composable.

Example 10.16. A group by clause with a nested expression.

for $answer in collection("answers")
group by $question := $answer.question_id
return {
  "question" : $question,
  "scores" : [ $answer.score ]
}

Results:

{
  "question" : 5453872,
  "scores" : [ 6 ]
}
{
  "question" : 6183352,
  "scores" : [ 0, 1 ]
}
{
  "question" : 4720508,
  "scores" : [ 34 ]
}
{
  "question" : 4419499,
  "scores" : [ 17, 1 ]
}
{
  "question" : 37823,
  "scores" : [ 7, 4 ]
}

Unlike SQL, JSONiq does not need a having clause, because a where clause works perfectly after grouping as well.

The following query is the counterpart of "SELECT question_id, COUNT(*) FROM answers GROUP BY question_id HAVING COUNT(*) > 1"

Example 10.17. A group by clause with a post-grouping condition.

for $answer in collection("answers")
group by $question := $answer.question_id
where count($answer) gt 1
return {
  "question" : $question,
  "count" : count($answer)
}

Results:

{
  "question" : 6183352,
  "count" : 2
}
{
  "question" : 4419499,
  "count" : 2
}
{
  "question" : 37823,
  "count" : 2
}