I haven't been able to find an understandable explanation of how to actually use Python's itertools.groupby()
function. What I'm trying to do is this:
- Take a list - in this case, the children of an objectified
lxml
element - Divide it into groups based on some criteria
- Then later iterate over each of these groups separately.
I've reviewed the documentation, and the examples, but I've had trouble trying to apply them beyond a simple list of numbers.
So, how do I use of itertools.groupby()
? Is there another technique I should be using? Pointers to good "prerequisite" reading would also be appreciated.
IMPORTANT NOTE: You have to sort your data first.
The part I didn't get is that in the example construction
k
is the current grouping key, andg
is an iterator that you can use to iterate over the group defined by that grouping key. In other words, thegroupby
iterator itself returns iterators.Here's an example of that, using clearer variable names:
This will give you the output:
In this example,
things
is a list of tuples where the first item in each tuple is the group the second item belongs to.The
groupby()
function takes two arguments: (1) the data to group and (2) the function to group it with.Here,
lambda x: x[0]
tellsgroupby()
to use the first item in each tuple as the grouping key.In the above
for
statement,groupby
returns three (key, group iterator) pairs - once for each unique key. You can use the returned iterator to iterate over each individual item in that group.Here's a slightly different example with the same data, using a list comprehension:
This will give you the output:
Can you show us your code?
The example on the Python docs is quite straightforward:
So in your case, data is a list of nodes, keyfunc is where the logic of your criteria function goes and then
groupby()
groups the data.You must be careful to sort the data by the criteria before you call
groupby
or it won't work.groupby
method actually just iterates through a list and whenever the key changes it creates a new group.itertools.groupby
is a tool for grouping items.From the docs, we glean further what it might do:
groupby
objects yield key-group pairs where the group is a generator.Features
Comparisons
Uses
Note: Several of the latter examples derive from Víctor Terrón's PyCon (talk) (Spanish), "Kung Fu at Dawn with Itertools". See also the
groupby
source code written in C.Response
A neato trick with groupby is to run length encoding in one line:
will give you a list of 2-tuples where the first element is the char and the 2nd is the number of repetitions.
Edit: Note that this is what separates
itertools.groupby
from the SQLGROUP BY
semantics: itertools doesn't (and in general can't) sort the iterator in advance, so groups with the same "key" aren't merged.Another example:
results in
Note that igroup is an iterator (a sub-iterator as the documentation calls it).
This is useful for chunking a generator:
Another example of groupby - when the keys are not sorted. In the following example, items in xx are grouped by values in yy. In this case, one set of zeros is output first, followed by a set of ones, followed again by a set of zeros.
Produces:
WARNING:
The syntax list(groupby(...)) won't work the way that you intend. It seems to destroy the internal iterator objects, so using
will produce:
Instead, of list(groupby(...)), try [(k, list(g)) for k,g in groupby(...)], or if you use that syntax often,
and get access to the groupby functionality while avoiding those pesky (for small data) iterators all together.
I would like to give another example where groupby without sort is not working. Adapted from example by James Sulak
output is
there are two groups with vehicule, whereas one could expect only one group
@CaptSolo, I tried your example, but it didn't work.
Output:
As you can see, there are two o's and two e's, but they got into separate groups. That's when I realized you need to sort the list passed to the groupby function. So, the correct usage would be:
Output:
Just remembering, if the list is not sorted, the groupby function will not work!
You can use groupby to group things to iterate over. You give groupby an iterable, and a optional key function/callable by which to check the items as they come out of the iterable, and it returns an iterator that gives a two-tuple of the result of the key callable and the actual items in another iterable. From the help:
Here's an example of groupby using a coroutine to group by a count, it uses a key callable (in this case,
coroutine.send
) to just spit out the count for however many iterations and a grouped sub-iterator of elements:prints
One useful example that I came across may be helpful:
Sample input: 14445221
Sample output: (1,1) (3,4) (1,5) (2,2) (1,1)
You can write own groupby function: