Kurt McKee

lessons learned in production

Monkeypatching: Retrieving data from the parent scope

Posted 22 October 2020 in monkeypatching, pelican, programming, and python

I've previously covered the basics of monkeypatching and now it's time to dig deeper.

Let's begin with a quick review of monkeypatching.

A simple example

Let's say that you're interacting with an app that adds the first two numbers in a list and returns the result. Our goal is to modify the library so it adds all of the numbers in the list. The code could look like this:

# app.py
# ------

def add_numbers(numbers):
    """Add the first two numbers in *numbers*."""

    return sum(numbers[:2])

def main():
    """The app runs from here."""

    numbers = [1, 2, 3]

A simple solution

This is ideal. Remember that our goal is to make app.add_numbers() add all of the numbers (not just the first two), and because all of the numbers are available in the numbers parameter it's easy to monkeypatch the function and change its behavior.

# my_code.py
# ----------

import app

def add_all_numbers(numbers):
    """Add all of the numbers."""

    return sum(numbers)

# Monkeypatch app.add_numbers()
app.add_numbers = add_all_numbers


Now when app.add_numbers() is called, the monkey-patched function will add all of the numbers together.

A complex example

What happens if add_numbers() is written so that it doesn't accept a list of numbers? What if it only accepts two numbers as individual parameters?

# app.py
# ------

def add_numbers(x, y):
    """Add two numbers."""

    return x + y

def main():
    """The app runs from here."""

    numbers = [1, 2, 3]
    print(add_numbers(numbers[0], numbers[1]))

In this example, add_numbers() doesn't have access to the full list of numbers in its scope. The full list of numbers is only available in the local scope of main(). I would have to rewrite main() and reimplement all of its functionality just to change one small bit of its behavior.

Is it possible for add_numbers() to reach into the local scope of main()?

Hecks yeah it is!

A complex solution

To reach out of the called function's scope and into the caller's scope, let's use the inspect module.

The inspect module allows a function to interact with the entire call stack. This includes accessing variables in the other functions' local scopes. In our case, we just want to access the variables one level up in the call stack. Here's how to do it.

# my_code.py
# ----------

import inspect

import app

def add_all_numbers(x, y):
    """Add all numbers. *x* and *y* will be ignored."""

    # *add_all_numbers()* is the first frame in the stack.
    # Its parent frame is at index 1 in the stack.
    parent_frame = inspect.stack()[1].frame

    # Retrieve *numbers* from the local scope.
    all_numbers = parent_frame.f_locals['numbers']

    return sum(all_numbers)

app.add_numbers = add_all_numbers


When is this useful?

I've been reading content through feed aggregators for over 15 years. During that time I've observed several classes of problems with feed generators and feed aggregators. One of the most noticeable problems is the dreaded Post Flood (TM). This occurs when somebody changes their domain, or changes their site URL structure, or changes their blog software. These changes cause their tag URI's to change and result in developer planets and individual aggregators getting flooded with duplicate content.

An image of Batman and Robin. Robin announces that he has switched to a new static site generator. Batman slaps Robin and reprimands him for causing duplicate entries to show up in Batman's feed aggregator.

When I began importing my old content into Pelican I tried to avoid this problem. Unfortunately, Pelican calls a function named get_tag_uri() whose sole parameters are the post's link and the post's date. As shown in the complex example above, I needed to reach up into the calling function's local scope to access all of the post metadata. In my case, I had stored all of the tag URI's in a metadata field named uri.

Using the technique shown above, I monkeypatched get_tag_uri() so that it would reach into the calling function's scope, retrieve the value of the uri metadata field, and return that value.

Here's the code that I added to my Pelican configuration file:

# pelicanconf.py
# --------------

import feedgenerator
import pelican.writers

def new_get_tag_uri(*args, **kwargs):
    """Use an existing tag URI, if any exists in the item metadata."""

    parent_frame = inspect.stack()[1].frame
    uri = getattr(parent_frame.f_locals.get('item'), 'uri', '')
    if uri:
        return uri

    return feedgenerator.get_tag_uri(*args, **kwargs)

pelican.writers.get_tag_uri = new_get_tag_uri

Note that Pelican has a pretty good plugin architecture, so monkeypatching is only one way to solve this particular problem.

Further reading: Python's inspect module, Tag URI