Itertools pt.3
islice
A function that will get slices over an iterator. To use this, you can provide three arguments.
- A range
- Beginning point
- End point
- Steps
import itertools
# result = itertools.islice(range(10), start=1, stop=5, step=2)
result = itertools.islice(range(1000), 1, 8, 2)
for item in result:
print(item)
# 1
# 3
# 5
# 7
Useful when:
-
We have an iterator that is too large for memory, so we only want a slice of it.
-
Example: log file that is thousands of lines, but you only want to grab a selection of lines. This will avoid having to wait for the script to load the entire content of those files.
-
Let's use
islice
over this!
Date: 2077-05-34
Author: Evil AI
Description: This is a sample log file
Some very long log file that machines likes to spit out a lot of the time.
...
We might be in the future, our computers are more powerful but not faster, as software had become proportionally inefficient.
...
Pretend it goes on forever
With something like this, we can choose to only grab the first three lines.
with open('log.txt', 'r') as f:
header = itertools.islice(f, 3)
for line in header:
print(line, end='')
compress
Potentially useful for machine learning!?
letters = ['a', 'b', 'c', 'd']
selectors = [True, False, True, True]
# Let's pretend this is very long
result = itertools.compress(letters, selectors)
for item in result:
print(item)
# a
# c
# d
This only selects the corresponding values to the selectors out of the letters
list.
It is a bit different to the filter function. That one determines whether something is true or false, but with compress, those values are just passed as an iterable.
filterfalse
def it_2(n):
if n < 2:
return True
numbers = [-1, -5, -8, 7, 0, 1, 2, 3]
result = itertools.filterfalse(it_2, numbers)
for item in result:
print(item)
# 7
# 2
# 3
dropwhile
def it_2(n):
if n < 2:
return True
numbers = [-5, -10, -8, -1, 1, 1, 1, 0, 1, 2, 3, 2, 1, 0, -8, -10]
result = itertools.dropwhile(it_2, numbers)
for item in result:
print(item)
# 2
# 3
# 2
# 1
# 0
# -8
# -10
As you can see, this only drops numbers while the criteria is not met (item in numbers less than 2). But once it's met, it will stop filtering.
accumulate
This takes an iterable, and makes accumulated sums of each item that it sees. It will use addition by default, but you can use other operators as well.
numbers = [-5, -10, -8, -1, 1, 1, 9, 0, 11, 2, 13, 2, 9, -8, -10]
result = itertools.accumulate(numbers)
for item in result:
print(item)
# -5
# -15
# -23
# -24
# -23
# -22
# -13
# -13
# -2
# 0
# 13
# 15
# 24
# 16
# 6
groupby
people = [
{
'name': 'John Doe',
'city': 'Gotham',
'state': 'NY'
},
{
'name': 'Jane Doe',
'city': 'Kings Landing',
'state': 'NY'
},
{
'name': 'Corey Schafer',
'city': 'Boulder',
'state': 'CO'
},
{
'name': 'Al Einstein',
'city': 'Denver',
'state': 'CO'
},
{
'name': 'John Henry',
'city': 'Hinton',
'state': 'WV'
},
{
'name': 'Randy Moss',
'city': 'Rand',
'state': 'WV'
},
{
'name': 'Nicole K',
'city': 'Asheville',
'state': 'NC'
},
{
'name': 'Jim Doe',
'city': 'Charlotte',
'state': 'NC'
},
{
'name': 'Jane Taylor',
'city': 'Faketown',
'state': 'NC'
}
]
# list of dictionaries: dictionary contains information about individual people
# Let's say we want to group people by 'state' : 'NC'
def get_state(person):
return person['state']
person_group = itertools.groupby(people, get_state)
for key, group in person_group:
print(key)
for person in group:
print(person)
print()
# NY
# {'name': 'John Doe', 'city': 'Gotham', 'state': 'NY'}
# {'name': 'Jane Doe', 'city': 'Kings Landing', 'state': 'NY'}
#
# CO
# {'name': 'Corey Schafer', 'city': 'Boulder', 'state': 'CO'}
# {'name': 'Al Einstein', 'city': 'Denver', 'state': 'CO'}
#
# WV
# {'name': 'John Henry', 'city': 'Hinton', 'state': 'WV'}
# {'name': 'Randy Moss', 'city': 'Rand', 'state': 'WV'}
#
# NC
# {'name': 'Nicole K', 'city': 'Asheville', 'state': 'NC'}
# {'name': 'Jim Doe', 'city': 'Charlotte', 'state': 'NC'}
# {'name': 'Jane Taylor', 'city': 'Faketown', 'state': 'NC'}
Yeah, this does a lot of work in the background to let us have some nice sorted data.
- One thing to note: the dict data must already be sorted before groupby can work properly.
- In that sense, it's a bit different from SQL
GROUP_BY
because that one doesn't care about sorting.
tee
- To replicate an iterator easily.
- You can no longer use the original iterator after it has been copied. You can only use the copies.
person_group = itertools.groupby(people, get_state)
copy1, copy2 = itertools.tee(person_group)
for key, group in person_group:
print(key, len(list(group)))
print()
# NY 2
# CO 2
# WV 2
# NC 3
This actually isn't quite the same as the Linux tee
command, which lets you write some input to several files:
echo "copy me everywhere" | tee file1.txt file2.txt
tee file.txt
some text