Skip to content

Using Generators

A generator is a lazy iterator. That is, it loads the content that is requested into memory on-demend, e.g., when it's access via iteration. This allows vary large files to be loaded into memory iteratively, enabling computations that would otherwise be prohibited by available RAM.

Creating a Generator

In Python, there are two ways to create generators

  1. Generator function via yield. The yeild keyword operates similarly to return but instead yeilds a lazy iterator (generator). Is it important that you nest the call to yeild within a loop such that the __next__ method of the generator can be called
    python
    def generate_row(filename):
    	for row in open(filename, 'r'):
    		yeild row
    row_generator = generate_row(filename)
    def generate_row(filename):
    	for row in open(filename, 'r'):
    		yeild row
    row_generator = generate_row(filename)
2. Generator expression. With syntax similar to list comprehension, a generator expression wraped the logic in `(...)`, yeilding a lazy iterator.
	```python
	row_generator = (row for row in open(filename, 'r'))
2. Generator expression. With syntax similar to list comprehension, a generator expression wraped the logic in `(...)`, yeilding a lazy iterator.
	```python
	row_generator = (row for row in open(filename, 'r'))

Using a Generator

Recall that a generator is a lazy iterator, so it's entries must be accessed via iteration. Fundamentally, this access is via the object's __next__ method which is implicitly called in a for loop, but may also be called explicitly in something like a while loop with next(generator).

  1. One-off access via next
    python
    row0 = next(row_generator)
    row0 = next(row_generator)
2. Access via `while` loop. Here we use the `next(x, default)` default parameter to emit a `None` once the generator is exhausted.
	```python
	while True:
		row = next(row_generator, None)
		if row is None:
			break
2. Access via `while` loop. Here we use the `next(x, default)` default parameter to emit a `None` once the generator is exhausted.
	```python
	while True:
		row = next(row_generator, None)
		if row is None:
			break
  1. Access via for loop. This is the most common and intuitive.
    python
    for row in row_generator:
    	# process row
    for row in row_generator:
    	# process row