Building Python Lambda Functions in CDK with uv

02 Jan 2025

(Jump directly to the code on GitHub: https://github.com/maxfriedrich/uv-lambda-cdk-example)

This post shows how to deploy AWS Lambda Python functions from a uv workspace with CDK. We build a custom construct based on uv sync that can be used as a replacement for the aws-lambda-python-alpha PythonFunction construct.

Update 2025-02-15: with a small hack it’s also possible to build Docker Lambda assets with this approach, see Docker Lambda Functions.

Setup

In our example, we use a workspace with a layout like this:

.
├── cdk
│   ├── app.py
│   ├── cdk.json  # all default options
│   ├── pyproject.toml
│   └── python_lambda_function.py
├── packages
│   ├── demo-common  # common code used in the Lambda packages
│   │   ├── pyproject.toml
│   │   └── src
│   │       └── demo_common
│   │           └── __init__.py
│   ├── demo-lambda1  # package with a Lambda handler in lambda_function.py
│   │   ├── pyproject.toml
│   │   └── src
│   │       └── demo_lambda1
│   │           ├── __init__.py
│   │           └── lambda_function.py
│   └── demo-lambda2 
│       └── ...
├── pyproject.toml
└── uv.lock

The main pyproject.toml configures the workspace:

[project]
name = "uv-cdk-demo"
version = "0.1.0"

[tool.uv.workspace]
members = ["packages/*", "cdk"]

The Lambda packages’ pyproject.toml configure their dependencies, e.g.:

[project]
name = "demo-lambda1"
version = "0.1.0"
requires-python = ">=3.11"
dependencies = [
    "demo-common",
    "orjson>=3.10.12",
]

[build-system]
requires = ["hatchling"]
build-backend = "hatchling.build"

[tool.uv.sources]
demo-common = { workspace = true }

Note that the Lambda packages are Python libraries created with uv init --lib demo-lambda1, not apps.

We can use a single lockfile and virtual environment for local development because there are (fortunately) no conflicts between the packages’ dependencies. The Lambda assets will however only include the dependencies they need.

The CDK code is written in Python as well. It can be executed from the same virtual environment, which is very convenient. We use a custom PythonLambdaFunction construct to build Lambda .zips.

Building a Lambda `.zip`

With this setup, building a Lambda function .zip means syncing the dependencies for one package into a temporary location and using lib/python3.11/site-packages as the Lambda code asset:

UV_PROJECT_ENVIRONMENT=/tmp/... \
  uv sync --package {package_name} \
  --frozen \
  --no-dev \
  --no-editable \
  --python {python_version}

This command needs to be run on the same platform and architecture as the Lambda function, e.g. for a x86_64 Lambda function, we can use a x86_64 GitHub Actions runner on Ubuntu but not a x86_64 MacBook running macOS. As far as I can tell, there is no way to use uv sync to create an environment for a different platform or architecture. See #8935, #9350 for additional discussion regarding building Lambda assets.

In CDK, we can provide a local bundling class that attempts bundling and fails fast if it’s not possible, like when the architecture doesn’t match. If local bundling is not possible, CDK falls back to Docker bundling. On recent Docker Desktop, running images with different architectures is not a problem, on GitHub Actions you would need to set up buildx or use a runner of the same architecture as the Lambda functions.

As of December 2024, there is a bug in CDK that makes the platform=... parameter in BundlingOptions useless and always picks the system platform. We can only work around this by using a SHA Docker image tag for the platform we want, so the component is less flexible than I’d like, i.e. we can’t interpolate the Python version into an image string like f"ghcr.io/astral-sh/uv:0.5-python{python_version}-bookworm-slim".

Making the Lambda `.zip` reproducible for faster deployments

If you have many small Lambda functions in your CDK app, it is desirable to not re-deploy all of them there is no code change, mainly because of deployment speed. For example, a Lambda function that is hooked up to an API Gateway and has some provisioned capacity will take around 2 minutes to deploy and especially when there are dependencies between stacks, time adds up quickly.

Reproducibility here means: building the exact same asset (byte-for-byte) and letting CDK know that it is the same (via the asset hash) so the Lambda function has an empty diff and can be left as-is.

First, we need to hash the CDK asset by the output, not input. In our uv workspace layout with a cdk directory, we are using .. as the input path, so any change in the project directory changes the input hash. Since building the asset is very fast with uv, it is fine for us to always build and then compare the output hash.

When using the uv sync command above with a aws_cdk.FileSystem.mkdtemp() or tempfile.mkdtemp() directory, assets are not reproducible:

Scripts like bin/fastapi contain the temporary path used for building in the shebang line. These scripts are not part of the .zip but their checksums are mentioned in dist-info RECORD files.
uv puts the build timestamp into uv_cache.json.

We can avoid these problems by using a stable temporary directory for each package like /tmp/uv-demo-{package_name}-build and by setting UV_NO_INSTALLER_METADATA=1 (available since uv 0.5.7).

Making the Lambda function cold start faster with compiled bytecode

While AWS officially recommends not to compile bytecode / include __pycache__ directories in Lambda .zips, it is often done in practice to reduce cold start time (although I haven’t measured it yet myself). When we compile bytecode with the correct Python version and architecture, the AWS Lambda runtime uses it, which can be verified by setting PYTHONVERBOSE=1 and inspecting the logs.

To compile bytecode eagerly with uv, we can pass uv sync ... --compile-bytecode. This creates __pycache__ directories with .pyc files for all packages.

All .pyc files contain the temporary directory for debugging (figuring out which file bytecode corresponds to). Since we made it stable before and it is not used when executing the code, we can ignore this.

The bytecode also contains the timestamp when files were last modified, which we can bypass by setting SOURCE_DATE_EPOCH. We can’t set it to 0 (= January 1, 1970) because hatchling (uv’s default build backend as of December 2024) complains that .zip only supports timestamps after 1980, so we picked an arbitrary value 444444444 (= February 1, 1984) in this example. The value you choose does not matter because if SOURCE_DATE_EPOCH is set, the bytecode is created with hash-based instead of timestamp-based invalidation, so the timestamp is not written to the bytecode header.

If you’re curious about bytecode, I recommend playing around with the show_pyc script from coverage.py – e.g. using

uv run \
  https://github.com/nedbat/coveragepy/raw/refs/heads/master/lab/show_pyc.py \
  packages/demo-lambda1/src/demo_lambda1/__pycache__/__init__.cpython-311.pyc

Putting it all together

In a CDK stack, we configure a Python Lambda function like this:

PythonLambdaFunction(
    self,
    "Lambda1",
    package_name="demo-lambda1",
    handler="demo_lambda1.lambda_function.lambda_handler",
)

The construct is (slightly simplified):

def build_asset_command_and_env(
    package_name: str,
    output_path: str,
    architecture: _Architecture,  # named tuple that maps between Lambda, Docker, and Python "spellings" of architectures
    python_version: str,  # e.g. "3.11"
) -> tuple[list[str], dict[str, str]]:
    # Always use the same path per package to ensure that paths in the output are stable
    tmp_path = os.path.join("/tmp", f"uv-demo-{package_name}-build")

    commands = [
        # Ensure we are on the correct architecture
        '[ "$(uname -m)" = {architecture} ]'.format(architecture=architecture.platform_machine),
        # Create a virtual environment with the package's dependencies
        "uv sync --package {package_name} --frozen --no-dev --no-editable --compile-bytecode --python {python_version}".format(
            package_name=shlex.quote(package_name), python_version=python_version
        ),
        # Copy the virtual environment's site packages to the output path
        "cp -r {src} {dest}".format(
            src=os.path.join(tmp_path, "lib", f"python{python_version}", "site-packages", "."),
            dest=os.path.join(output_path.rstrip("/"), ""),
        ),
    ]
    command = ["bash", "-c", " && ".join(commands)]

    env = {
        "UV_PROJECT_ENVIRONMENT": tmp_path,  # stable temporary path for reproducible dist-info and bytecode
        "UV_NO_INSTALLER_METADATA": "1",  # don't write uv data that includes timestamps to dist-info directories
        "UV_LINK_MODE": "copy",  # files should be copied, not linked
        "SOURCE_DATE_EPOCH": "444444444",  # for hash-based bytecode, 0 is not allowed because zip requires timestamps after 1980
    }
    return command, env


class PythonLambdaFunction(aws_lambda.Function):
    def __init__(
        self,
        scope: Construct,
        construct_id: str,
        path: str,
        package_name: str,
        handler: str,
        **kwargs,
    ):
        command, env = build_asset_command_and_env(
            package_name,
            output_path="/asset-output",
            architecture=ARCHITECTURE,
            python_version=PYTHON_VERSION,
        )

        super().__init__(
            scope,
            construct_id,
            code=aws_lambda.Code.from_asset(
                path,
                asset_hash_type=AssetHashType.OUTPUT,  # decide hash based on output (default: based on input)
                bundling=BundlingOptions(
                    image=DockerImage.from_registry(BUNDLING_DOCKER_IMAGE),
                    environment=env,
                    command=command,
                    # The platform we provide here is ignored by CDK https://github.com/aws/aws-cdk/issues/30239
                    platform=ARCHITECTURE.docker_architecture,
                    local=UvLocalBundling(package_name),
                ),
            ),
            handler=handler,
            runtime=LAMBDA_RUNTIME,
            architecture=ARCHITECTURE.lambda_architecture,
            **kwargs,
        )

The local bundling checks the system platform and architecture before attempting to run uv sync:

@jsii.implements(ILocalBundling)
class UvLocalBundling:
    def __init__(self, package_name: str) -> None:
        self.package_name = package_name
        super().__init__()

    def try_bundle(self, output_dir: str, *args, **kwargs) -> bool:
        if sys.platform != "linux" or platform.machine() != ARCHITECTURE.platform_machine:
            return False

        try:
            command, env = build_asset_command_and_env(
                self.package_name,
                output_path=output_dir,
                architecture=ARCHITECTURE,
                python_version=PYTHON_VERSION,
            )
            run_command(command, env=env)
        except RuntimeError as e:
            return False

        return True

See https://github.com/maxfriedrich/uv-lambda-cdk-example for the full code! The repository also includes Github Actions integration and a script to call Lambda functions for testing.

What is missing

The PythonLambdaFunction construct only supports uv workspaces where packages are Python libraries (uv init --lib), not apps because the .zip is built only using the site-packages directory. To support Python apps (uv init --app), we would need to copy the package contents into the asset output directory afterwards.

The construct is written in Python, so it can’t be used e.g. from TypeScript but my hunch is that it should not be too hard to translate (maybe with some LLM help).

Let me know if it works for you and if you use a component like this in your CDK setup!

Docker Lambda Functions

CDK builds Docker Lambda functions a little differently, which makes this approach not work out of the box. Instead of performing the bundling (“creating the asset”) step at synthesis time, CDK only computes the input path hash and uses it to decide if the function needs to be built and re-deployed. Since the input path we use above is .., it will change and therefore re-deploy all the time.

The key to making CDK not re-deploy is making the input directory exactly represent the Docker image contents. For this, we can almost exactly reuse the .zip asset object we build above (PR).

To create the Docker asset, we first create a .zip asset with lambda_code.bind(scope), then copy the Dockerfile into the asset’s directory in cdk.out and use it as an input directory for a Docker asset:

def python_docker_lambda_code(...):
    asset = lambda_code.bind(scope)
    asset_dir = os.path.join(
        "cdk.out",
        f"asset.{asset.s3_location.object_key.removesuffix('.zip')}"
    )
    shutil.copy(dockerfile, asset_dir)
    return aws_lambda.Code.from_asset_image(
        directory=asset_dir,
        build_args={"PYTHON_VERSION": python_version},
        ...
    )

This is of course a little bit hacky and not how you’re intended to use CDK but it works for me. YMMV!

Podcast Profile

16 Feb 2016

Am vorletzten Januarwochenende hat sich das Casualcoding-Team (Gruppenfotos!) in Flos Küche getroffen und einen Hackathon veranstaltet. Dabei entstand Podcast Profile, eine Website, auf der man zeigen kann, welche Podcasts man hört.

Ich fand Blogposts, in denen Leute erklären, welche Podcast sie warum hören, immer sehr interessant, aber ich dachte mir, dass eine einheitliche Lösung, basierend auf dem OPML-Export jedes Podcast-Clients, netter sein könnte und man im nächsten Schritt mit den Daten vielleicht irgendwann vernünftiges Stöbern ermöglicht. Die Idee geisterte also schon ein paar Jahre bei mir herum, aber ich kam nie dazu, sie umzusetzen.

Bis Stöbern kamen wir am Wochenende natürlich nicht, aber die grundlegende Funktionalität steht (in den letzten Wochen fand nur etwas Maintenance statt).

Persönliche Podcast-Empfehlungen sind nicht tot, aber es spricht nichts dagegen, mal den Link zum eigenen Podcast Profile zu twittern!

Maki (Prototyp)

03 Feb 2015

Ich habe vor ein paar Tagen einen Prototyp eines iOS-Keyboards gebaut, das für mich persönlich zu einem gewissen Grad WhatsApp und andere Nachrichten-Apps ohne Mac-Gegenstück fixt.

Man wechselt in das Keyboard, verbindet sich mit einem Mac, der sich im gleichen WLAN befindet, und kann nun von dort tippen. Aber:

„Send to iOS Device“ im Mac-Fenster bedeutet tatsächlich nur „ins iOS-Textfeld eintragen“, weil man den „Send“-Button in WhatsApp nicht vom Keyboard aus bedienen kann.
Baue ich jetzt Verschlüsselung ein, und wenn ja, welche? Derzeit schicke ich Klartext-Nachrichten per Websocket hin und her.
Will das irgendjemand außer mir (und Flo, von dem die Idee ursprünglich kam) haben? Ich selbst will es ja auch nur sehr selten haben, weil ich die allermeiste Zeit iMessage benutze, und vielleicht reicht dann auch Pushbullet und Copy & Paste.
So darf das natürlich nicht in den App Store, zum Beispiel weil es ohne Netzwerk keine Funktion hat (25.5), also müsste ich noch ein „normales“ Keyboard außenrum einbauen, das natürlich nicht so gut sein kann wie das Standard-Keyboard, und darauf habe ich gerade gar keine Lust.

(Inzwischen auch Open Source auf GitHub!)

Facebook without the News Feed

21 May 2014

Facebook is an important communication tool to me. Most of my fellow students don’t use iMessage or Twitter DMs, so we talk on Facebook. However, I seem not to be able to just check my messages on Facebook. Instead, I wind up scrolling through the News Feed for a couple of minutes each time I visit Facebook in the browser. I recently looked at about 30 News Feed posts in detail and discovered these three things:

I didn’t care about any of the posts that described activities on Facebook. It doesn’t matter to me if $friend changed their profile picture or liked someone else’s status update.
I was interested in two link posts but I had already seen the exact same links on Twitter hours earlier.
People tend to create albums of mediocre photos on Facebook while they only post their best on Instagram. Many of my Facebook friends have Instagram accounts and I already follow the ones I’m interested in.

I concluded that I don’t need to read the News Feed, so I wrote a custom stylesheet that hides it. This is the result:

I’ve been using “Quiet Facebook” for a week now and I’m happy with it, so I put the CSS on GitHub. I just embed the stylesheet via Safari’s settings but someone could certainly build browser extensions that load the stylesheet from raw.github.com and toggle it when the user clicks a button. Feel free to submit pull requests (or suggest a better name for the whole thing)!

Comments on Hacker News

Update May 24: Chrome and Firefox users can install the stylesheet via Userstyles.org, using the Stylish extension. Safari users: Download the CSS from GitHub, open Safari Preferences, Advanced, select downloaded file in Style sheet dropdown.

Project Euler: Aufgabe 18

13 May 2014

Diesen Beitrag habe ich ursprünglich für Daniel geschrieben, weil ich ihm meinen Lösungsweg für Aufgabe 18 von Project Euler erklären wollte. Nun, etwa zehn Minuten später, veröffentliche ich ihn auf meinem Blog.

Wir suchen die maximale Summe eines Pfades durch den Graphen. Der Pfad soll beim Wurzelknoten starten und von dort aus nach ganz unten führen. Bei diesem kleinen Graphen sieht das folgendermaßen aus:

Der rot markierte Pfad ist der Pfad mit der maximalen Summe: 3 + 7 + 4 + 9 = 23

Nun ergänzen wir unseren Graphen um eine weitere Ebene von Knoten.

Um nun den Pfad mit der maximalen Summe zu finden, müssen wir nicht die Summen aller möglichen Pfade bis zu den Knoten in der untersten Ebene neu berechnen. Die Summe des rot markierten Pfads kennen wir schon, und wir können sie weiterverwenden, denn Teilpfade eines Pfades mit maximaler Summe zwischen zwei Knoten sind ebenfalls Pfade mit maximaler Summe (Das bedeutet nicht, dass der rote Pfad Teilpfad jedes maximalen Pfades von der Wurzel bis zur untersten Ebene ist!). Diese Beziehung veranschaulichen wir uns wie folgt:

Sei der rote Pfad wieder der maximale Pfad vom Wurzelknoten bis zum dritten Knoten von links in der untersten Ebene (Knoten [5, 3]). Dann kann es keinen (hier blau markierten) Pfad geben, der eine größere Summe erzeugt als der rote Teilpfad daneben. Der Pfad, der von [2, 1] bis [5, 3] die größte Summe erzeugt, muss der rote Pfad sein, denn sonst wäre der rote Pfad vom Wurzelknoten bis [5, 3] nicht mehr der maximale Pfad (sondern der Pfad, der den blauen Teilpfad enthält; Beweis durch Widerspruch, as you do).

Weil wir nur an der maximalen Summe interessiert sind, merken wir uns nun keine Pfade mehr, sondern einfach für jeden Knoten die maximale Summe, die bis zu ihm erreicht werden kann. In unserem Graphen, den wir um eine Ebene ergänzt haben, sind das folgende Summen: (rechnet ruhig selbst nach!)

Um nun für die nächste Zeile die maximal erreichbaren Summen zu berechnen, müssen wir für jeden Knoten höchstens zwei Additionen durchführen (für die Knoten ganz links und rechts nur eine) und das größere Ergebnis speichern.

20 + 9 = 29, max(20 + 4, 19 + 4) = 24, …

Die maximale Pfadsumme im ergänzten Graphen ist also 29.

Older Newer