Article

Python Generator Functions

by Gary Worthington, More Than Monkeys

An image stating the article title “Python Generator Functions”

If you’re processing large datasets or streaming data, Python generator functions offer a lightweight, memory-efficient way to iterate over items without loading everything at once.

In this post, I’ll take you through what generator functions are, when to use them, and how they compare to alternatives like list comprehensions, including real timings from AWS Lambda.

What is a Generator Function?

A generator function looks like a regular function but uses the yield keyword instead of return. Each time it's called, it pauses at yield, hands back a value, and resumes from where it left off the next time it’s asked for another item.

Here’s a simple example that prints what’s happening under the hood:

def count_up_to(max):
count = 1
while count <= max:
print(f"Yielding {count}")
yield count
count += 1
print(f"Incremented count to {count}")

counter = count_up_to(3)
print("Starting loop")
for number in counter:
print(f"Received: {number}")

Expected Output:

Starting loop
Yielding 1
Received: 1
Incremented count to 2
Yielding 2
Received: 2
Incremented count to 3
Yielding 3
Received: 3
Incremented count to 4

Understanding Generator Execution Flow

One of the more subtle, but critical parts of working with generators is understanding how Python handles the code after a yield statement. When you call yield, Python pauses the function and hands back control to the caller. The next time the generator is resumed (usually by next() or a for loop), it continues right after the last yield.

Here’s a focused example to make this crystal clear:

def simple_generator():
print("Start of generator")
yield 1
print("After yielding 1")
yield 2
print("After yielding 2")
yield 3
print("End of generator")

gen = simple_generator()
print("Calling next() the first time")
print(next(gen))
print("Calling next() the second time")
print(next(gen))
print("Calling next() the third time")
print(next(gen))

Expected Output:

Calling next() the first time
Start of generator
1
Calling next() the second time
After yielding 1
2
Calling next() the third time
After yielding 2
3

To show what happens when the generator is exhausted:

gen = simple_generator()
try:
print(next(gen)) # Start -> yield 1
print(next(gen)) # After yield 1 -> yield 2
print(next(gen)) # After yield 2 -> yield 3
print(next(gen)) # No more yields
except StopIteration:
print("Generator is finished")

Expected Output:

Start of generator
1
After yielding 1
2
After yielding 2
3
End of generator
Generator is finished

When to Use a Generator Function

1. Handling Large Files or Data Streams

Say you need to process a large file, but don’t want to load the entire file into memory. The following generator yields a line of the file each time it is called.

def read_large_file(file_path):
with open(file_path, 'r') as f:
for line in f:
yield line.strip()

# Usage
for line in read_large_file("bigfile.txt"):
print(f"Processing: {line}")

Expected Output:

Processing: First line of file
Processing: Second line of file
Processing: Third line of file
...

(Assumes the file contains lines of text.)

2. Infinite Sequences Without Memory Leaks

Generators are ideal for infinite or very long sequences. Here’s a Fibonacci generator with debug output:

def fibonacci():
a, b = 0, 1
while True:
print(f"Yielding: {a}")
yield a
a, b = b, a + b
print(f"Next values will be: {a}, {b}")

fib = fibonacci()
for _ in range(5):
print(f"Got: {next(fib)}")

Expected Output:

Yielding: 0
Got: 0
Next values will be: 1, 1
Yielding: 1
Got: 1
Next values will be: 1, 2
Yielding: 1
Got: 1
Next values will be: 2, 3
Yielding: 2
Got: 2
Next values will be: 3, 5
Yielding: 3
Got: 3

Notice here that there is no line ‘Next values will be: 5, 8’, as the generator is not called again outside the loop, so that code is never reached. Remember the rules around Generator Execution Flow.

3. Building Data Pipelines

Generators can be chained to form pipelines where each step lazily processes its input.

def generate_numbers(n):
for i in range(n):
print(f"Generating: {i}")
yield i

def square(numbers):
for number in numbers:
result = number * number
print(f"Squaring: {number} -> {result}")
yield result

def filter_even(numbers):
for number in numbers:
if number % 2 == 0:
print(f"Filtering even: {number}")
yield number

pipeline = filter_even(square(generate_numbers(5)))
print("Running pipeline:")
for value in pipeline:
print(f"Final output: {value}")

Expected Output:

Running pipeline:
Generating: 0
Squaring: 0 -> 0
Filtering even: 0
Final output: 0
Generating: 1
Squaring: 1 -> 1
Generating: 2
Squaring: 2 -> 4
Filtering even: 4
Final output: 4
Generating: 3
Squaring: 3 -> 9
Generating: 4
Squaring: 4 -> 16
Filtering even: 16
Final output: 16

Generator Performance

Let’s compare a list comprehension and a generator expression when we only want the first 5 values out of a large range.

import time

def list_comprehension_demo():
start = time.time()
squares = [x * x for x in range(10**6)]
result = squares[:5]
end = time.time()
print("List Comprehension Result:", result)
print(f"List comprehension time: {end - start:.6f} seconds")

def generator_expression_demo():
start = time.time()
gen = (x * x for x in range(10**6))
result = []
for i, val in enumerate(gen):
result.append(val)
if i == 4:
break
end = time.time()
print("Generator Result:", result)
print(f"Generator time: {end - start:.6f} seconds")

list_comprehension_demo()
generator_expression_demo()

Expected Output :

List Comprehension Result: [0, 1, 4, 9, 16]
Generator Result: [0, 1, 4, 9, 16]


List comprehension time: 0.104100 seconds
Generator time: 0.001900 seconds

Why the difference?

  • The list comprehension computes all one million squares upfront; even though we only use five of them.
  • The generator stops computing after five iterations.

This is the core benefit of generators: they don’t do unnecessary work.

When Not to Use Generators

Generators aren’t a universal solution. Avoid them when:

  • You need to access items multiple times or use indexing (generators are one-time-use).
  • You require random access.
  • You want to serialise and send data somewhere. Lists are easier to dump or pickle.

Summary

Use a generator function when:

  • You’re working with massive or infinite datasets.
  • You want to keep memory usage low.
  • You only need part of the result.
  • You want to build lazy pipelines that process on demand.

They’re ideal in Lambda functions, data streams, or any code where performance and efficiency matter.

If you’re already using range, file reading, or streaming APIs, you're likely benefitting from generators under the hood, but using them directly can give you more control and clearer code.

You can read more of my writing at More Than Monkeys or follow me on LinkedIn where I regularly share practical takes on product, people, and engineering leadership.