Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
23 commits
Select commit Hold shift + click to select a range
422e323
saturn-python-llm: declare trl, peft, datasets explicitly
hhuuggoo May 26, 2026
8d11560
Merge pull request #467 from saturncloud/feature/saturn-python-llm-tf…
hhuuggoo May 26, 2026
e368c79
Fix recipe-template field names to match ImageSpecSchema
hhuuggoo May 27, 2026
2421c36
saturn-python-llm: add axolotl 0.16.1 for Token Factory
hhuuggoo May 27, 2026
e1e7efe
Merge pull request #470 from saturncloud/feature/saturn-python-llm-ax…
hhuuggoo May 27, 2026
36a5b6f
Merge pull request #469 from saturncloud/fix/recipe-template-schema-d…
hhuuggoo May 27, 2026
1e970bc
Pin python=3.13 in pytorch and tensorflow envs
hhuuggoo May 28, 2026
7d8622c
Merge pull request #471 from saturncloud/hugo/pin-python-3.13-ds-2026…
hhuuggoo May 28, 2026
551d339
saturn-python-pytorch: move torch to PyPI cu129, drop pytorch conda c…
hhuuggoo May 28, 2026
bf56f58
Merge pull request #472 from saturncloud/hugo/pytorch-pypi-cu129-py313
hhuuggoo May 28, 2026
85c94ac
Pin python=3.13 across all py images, fix pytorch index-url syntax
hhuuggoo May 28, 2026
e325830
Merge remote-tracking branch 'origin/release-2026.05.01' into hugo/py…
hhuuggoo May 28, 2026
1fca62f
Merge pull request #473 from saturncloud/hugo/python-3.13-sweep
hhuuggoo May 28, 2026
a739062
saturn-python-llm: pin python=3.12, drop auto-gptq/autoawq, bump flas…
hhuuggoo May 28, 2026
b22fa1e
Merge pull request #474 from saturncloud/hugo/llm-python-3.12
hhuuggoo May 28, 2026
faf7dab
saturn-python-rapids: bump cuda to 12.9, drop dask-sql, pin rapids>=2…
hhuuggoo May 28, 2026
823c550
Merge pull request #475 from saturncloud/hugo/rapids-cuda-12.9-py313
hhuuggoo May 28, 2026
621e308
saturn-python-llm: pin vllm==0.11.0
hhuuggoo May 28, 2026
5060187
Merge pull request #476 from saturncloud/hugo/llm-pin-vllm-0.11.0
hhuuggoo May 28, 2026
f8031f8
Pin transformers <5 in saturn-python-llm so vLLM 0.11 boots
hhuuggoo May 30, 2026
0451ead
Merge pull request #477 from saturncloud/hugo/saturn-python-llm-trans…
hhuuggoo May 30, 2026
b892889
Add saturn-python-vllm + saturn-python-axolotl (split from saturn-pyt…
hhuuggoo Jun 4, 2026
e7d44d0
Merge pull request #478 from saturncloud/hugo/tf-split-vllm-axolotl-i…
hhuuggoo Jun 4, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 5 additions & 6 deletions saturn-python-312-slim-gpu-12.9/recipe-template.json
Original file line number Diff line number Diff line change
@@ -1,7 +1,6 @@
{
"recipeName": "saturn-python-312-slim-gpu-12.9",
"description": "Python 3.12 GPU slim image with CUDA 12.9 and minimal packages",
"image": "saturncloud/saturn-python-slim-gpu:2025.05.01-cuda129-python312",
"gpu": true,
"saturnVersion": "2025.05.01"
}
"name": "saturn-python-312-slim-gpu-12.9",
"description": "Python 3.12 GPU slim image with CUDA 12.9 and minimal packages.",
"hardware_type": "gpu",
"supports": ["jupyterlab", "dask"]
}
11 changes: 5 additions & 6 deletions saturn-python-312-slim/recipe-template.json
Original file line number Diff line number Diff line change
@@ -1,7 +1,6 @@
{
"recipeName": "saturn-python-312-slim",
"description": "Python 3.12 slim image with minimal packages",
"image": "saturncloud/saturn-python-slim:2025.05.01-python312",
"gpu": false,
"saturnVersion": "2025.05.01"
}
"name": "saturn-python-312-slim",
"description": "Python 3.12 slim image with minimal packages.",
"hardware_type": "cpu",
"supports": ["jupyterlab", "dask"]
}
25 changes: 25 additions & 0 deletions saturn-python-axolotl/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
ARG SATURNBASE_GPU_IMAGE
FROM ${SATURNBASE_GPU_IMAGE}

RUN sudo apt-get -qq --allow-releaseinfo-change update && \
sudo apt-get -qq install --yes --no-install-recommends \
libgl1

COPY environment.yml /tmp/environment.yml
# Unlike the serving image (saturn-python-llm), axolotl is installed WITH its
# deps here: this image has no vLLM, so there is no transformers<5 constraint to
# protect, and axolotl 0.16.1 needs the transformers 5.x API at runtime. Letting
# it resolve its own tree (transformers 5.5, datasets 4.5, trl 0.29, hf-hub>=1,
# accelerate 1.13, ...) is what makes training actually work. axolotl is declared
# in environment.yml's pip: block with its extras, so a normal env update pulls
# everything; no separate --no-deps step.
RUN mamba env update -n saturn --file /tmp/environment.yml && \
${CONDA_DIR}/envs/saturn/bin/python -m ipykernel install \
--name python3 \
--display-name 'saturn (Python 3)' \
--prefix=${CONDA_DIR} && \
${CONDA_DIR}/bin/conda clean -afy && \
find ${CONDA_DIR} -type f,l -name '*.pyc' -delete && \
find ${CONDA_DIR} -type f,l -name '*.a' -delete && \
find ${CONDA_DIR} -type f,l -name '*.js.map' -delete
RUN echo '' > ${CONDA_DIR}/envs/saturn/conda-meta/history
9 changes: 9 additions & 0 deletions saturn-python-axolotl/Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
include .env_deps
export

build_image:
docker build \
--no-cache \
--build-arg SATURNBASE_GPU_IMAGE=${SATURNBASE_GPU_IMAGE} \
-t ${IMAGE} \
.
53 changes: 53 additions & 0 deletions saturn-python-axolotl/environment.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
name: saturn
channels:
- conda-forge
- nodefaults
dependencies:
- python=3.12
# TRAINING-ONLY image (axolotl). Split out from saturn-python-llm because that
# image must pin transformers<5 for vLLM 0.11 serving, but axolotl 0.16.1 is
# hard-coupled to the transformers 5.x API (e.g. Trainer.create_optimizer(model=)
# — a 5.x-only signature; on 4.57 training dies inside the loop). The two dep
# stacks are incompatible in one env, so serving lives in saturn-python-llm and
# training lives here. Because there is NO vLLM here, axolotl is installed WITH
# its deps (see Dockerfile) — no transformers pin, no --no-deps hack — and it
# pulls the correct transformers 5.5 / datasets 4.5 / trl 0.29 / hf-hub>=1 set.
- numpy
- psutil
- pandas
- tqdm
- click
- rich
- tensorboard
- wandb
- ipykernel
- pip
- pip:
- --extra-index-url https://download.pytorch.org/whl/cu129
# axolotl 0.16.1 pins torch==2.8.0; install it from the cu129 index so the
# GPU build is used (and so the flash-attn wheel below matches torch 2.8).
- torch==2.8.0
- torchvision
- torchaudio
# The whole fine-tuning stack. Unlike the serving image, we let axolotl
# resolve its own dependency tree (transformers 5.5.0, datasets 4.5.0,
# trl 0.29.0, accelerate 1.13.0, peft, hf-hub>=1, etc.) — installed WITH
# deps in the Dockerfile. [flash-attn] + [deepspeed] extras for real
# multi-GPU LoRA/full fine-tunes; [mlflow] for experiment tracking.
- axolotl[flash-attn,deepspeed,mlflow]==0.16.1
# flash-attn's build wants torch present at install time; ship the prebuilt
# cu12/torch2.8 wheel so it doesn't compile from source (slow, needs nvcc).
- https://github.com/Dao-AILab/flash-attention/releases/download/v2.8.3/flash_attn-2.8.3%2Bcu12torch2.8cxx11abiTRUE-cp312-cp312-linux_x86_64.whl
# Saturn workspace + ops basics (mirror the serving image's tail).
- gpustat
- black
- isort
- mypy
- pytest
- saturn-client
# The Token Factory inline training script (pdc/scripts/tf/finetune.py) only
# needs requests + PyYAML at runtime; both come in transitively (requests via
# axolotl/saturn-client, PyYAML via axolotl). Listed here for clarity / in
# case axolotl ever drops them.
- requests
- pyyaml
6 changes: 6 additions & 0 deletions saturn-python-axolotl/recipe-template.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
{
"name": "saturn-python-axolotl",
"description": "Fine-tuning LLMs with axolotl (transformers 5 training stack)",
"hardware_type": "gpu",
"supports": ["jupyterlab", "dask"]
}
6 changes: 6 additions & 0 deletions saturn-python-llm/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,13 @@ RUN sudo apt-get -qq --allow-releaseinfo-change update && \
libgl1

COPY environment.yml /tmp/environment.yml
# axolotl 0.16.1 is installed --no-deps so its over-strict transformers==5.5.0
# metadata pin cannot drag transformers 5.x in and break vLLM 0.11 at boot.
# Its transitive deps are declared explicitly in environment.yml. This is a
# separate step because a --no-deps line inside the env.yml pip: block would
# apply to the whole block, suppressing deps for every pip entry.
RUN mamba env update -n saturn --file /tmp/environment.yml && \
${CONDA_DIR}/envs/saturn/bin/python -m pip install --no-deps axolotl==0.16.1 && \
${CONDA_DIR}/envs/saturn/bin/python -m ipykernel install \
--name python3 \
--display-name 'saturn (Python 3)' \
Expand Down
61 changes: 49 additions & 12 deletions saturn-python-llm/environment.yml
Original file line number Diff line number Diff line change
@@ -1,15 +1,15 @@
name: saturn
channels:
- pytorch
- nvidia
- conda-forge
- defaults
- nodefaults
dependencies:
- python=3.11
- cuda-toolkit
- pytorch
- pytorch-cuda
- transformers
- python=3.12
# vLLM 0.11 boots only on transformers 4.x: transformers 5.x removed
# tokenizer.all_special_tokens_extended, which vLLM 0.11 reads at startup,
# so 5.x triggers AttributeError -> CrashLoopBackOff. axolotl 0.16.1 runs
# fine on transformers 4.57.x at runtime (its metadata pins 5.5.0, but that
# is over-strict; see the pip: block where axolotl is installed --no-deps).
- transformers>=4.55,<5
- tokenizers
- numpy
- psutil
Expand All @@ -32,15 +32,52 @@ dependencies:
- ipykernel
- pip
- pip:
- --extra-index-url https://download.pytorch.org/whl/cu129
- torch
- torchvision
- torchaudio
# Fine-tuning stack. unsloth pulls trl/peft/datasets transitively, but we
# declare them explicitly so the image's training API surface is stable
# against unsloth version bumps. The Token Factory fine-tune training
# script (separate repo) imports trl.SFTTrainer + peft directly.
- unsloth
- vllm
- trl
- peft
- datasets
# Held below transformers 5 to match the conda transformers pin above (the
# pip: block is run through pip by `mamba env update`, so repeat the bound
# here to stop pip backtracking to 5.x).
- transformers>=4.55,<5
# axolotl 0.16.1 transitive deps. axolotl itself is installed separately in
# the Dockerfile with --no-deps, because its metadata carries an over-strict
# `transformers==5.5.0` pin that would otherwise drag transformers 5.x back
# in and break vLLM 0.11 at boot. axolotl runs fine on transformers 4.57.x;
# its real transitive deps are declared here (and via unsloth/vllm/trl/peft).
# NOTE: --no-deps cannot be scoped to a single entry inside this block (pip
# applies it to the whole `pip install` invocation), so axolotl is pulled
# out into its own `pip install --no-deps axolotl==0.16.1` Dockerfile step.
- liger-kernel==0.7.0
- lm_eval==0.4.11
- fla-core==0.4.1
- flash-linear-attention==0.4.1
- torchao==0.17.0
- optimum==1.16.2
- trackio>=0.16.1
- schedulefree==1.4.1
- axolotl-contribs-lgpl==0.0.7
- axolotl-contribs-mit==0.0.6
- openenv-core==0.1.0
- mistral-common==1.11.0
- modal==1.3.0.post1
# Pinned: vllm walks back through versions otherwise — axolotl 0.16.1 forces
# torch==2.8.0, and only 0.10–0.11 satisfy that. Pin to keep CI's pip from
# backtracking past wheel-only releases into 0.5.x sdists (which need nvcc).
- vllm==0.11.0
- ray
- sentence-transformers
- accelerate
- bitsandbytes
- auto-gptq
- autoawq
- https://github.com/Dao-AILab/flash-attention/releases/download/v2.8.0.post2/flash_attn-2.8.0.post2%2Bcu12torch2.7cxx11abiTRUE-cp311-cp311-linux_x86_64.whl
- https://github.com/Dao-AILab/flash-attention/releases/download/v2.8.3/flash_attn-2.8.3%2Bcu12torch2.8cxx11abiTRUE-cp312-cp312-linux_x86_64.whl
- xformers
- gpustat
- nvidia-ml-py
Expand Down
16 changes: 6 additions & 10 deletions saturn-python-pytorch/environment.yml
Original file line number Diff line number Diff line change
@@ -1,15 +1,11 @@
name: saturn
channels:
- pytorch
- rapidsai
- nvidia
- nodefaults
- conda-forge
- nodefaults
dependencies:
- python=3.13
- blas=*=mkl
- bokeh
- pytorch::pytorch-cuda=12.1
- dask-cuda
- dask
- fastai
- fsspec
Expand All @@ -23,15 +19,15 @@ dependencies:
- py-opencv
- pyarrow
- python-graphviz
- python
- pytorch::pytorch
- s3fs
- setuptools
- tensorboard
- pytorch::torchaudio
- pytorch::torchvision
- pynvml
- pip:
- --extra-index-url https://download.pytorch.org/whl/cu129
- torch
- torchvision
- torchaudio
- dask-saturn
- saturn-client
- saturnfs
Expand Down
7 changes: 3 additions & 4 deletions saturn-python-rapids/environment.yml
Original file line number Diff line number Diff line change
Expand Up @@ -6,10 +6,9 @@ channels:
- conda-forge
dependencies:
- bokeh
- cuda-version=12.0
- cuda-version=12.9
- cvxpy
- dask-ml
- dask-sql
- dask
- ipykernel
- ipywidgets
Expand All @@ -21,8 +20,8 @@ dependencies:
- prefect
- pyarrow
- python-graphviz
- python
- rapids
- python=3.13
- rapids>=26.02
- s3fs
- scikit-learn
- scipy
Expand Down
2 changes: 1 addition & 1 deletion saturn-python-tensorflow/environment.yml
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ dependencies:
- pip
- prefect
- pyarrow
- python
- python=3.13
- python-graphviz
- s3fs
- setuptools
Expand Down
20 changes: 20 additions & 0 deletions saturn-python-vllm/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
ARG SATURNBASE_GPU_IMAGE
FROM ${SATURNBASE_GPU_IMAGE}

RUN sudo apt-get -qq --allow-releaseinfo-change update && \
sudo apt-get -qq install --yes --no-install-recommends \
libgl1

COPY environment.yml /tmp/environment.yml
# vLLM serving image. No axolotl here (it lives in saturn-python-axolotl), so the
# former --no-deps axolotl install step is gone and a plain env update suffices.
RUN mamba env update -n saturn --file /tmp/environment.yml && \
${CONDA_DIR}/envs/saturn/bin/python -m ipykernel install \
--name python3 \
--display-name 'saturn (Python 3)' \
--prefix=${CONDA_DIR} && \
${CONDA_DIR}/bin/conda clean -afy && \
find ${CONDA_DIR} -type f,l -name '*.pyc' -delete && \
find ${CONDA_DIR} -type f,l -name '*.a' -delete && \
find ${CONDA_DIR} -type f,l -name '*.js.map' -delete
RUN echo '' > ${CONDA_DIR}/envs/saturn/conda-meta/history
9 changes: 9 additions & 0 deletions saturn-python-vllm/Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
include .env_deps
export

build_image:
docker build \
--no-cache \
--build-arg SATURNBASE_GPU_IMAGE=${SATURNBASE_GPU_IMAGE} \
-t ${IMAGE} \
.
62 changes: 62 additions & 0 deletions saturn-python-vllm/environment.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
name: saturn
channels:
- conda-forge
- nodefaults
dependencies:
- python=3.12
# INFERENCE/SERVING image (vLLM). Split out from the former saturn-python-llm
# (which tried to be one image for both training and serving). vLLM 0.11 boots
# only on transformers 4.x: transformers 5.x removed
# tokenizer.all_special_tokens_extended, which vLLM 0.11 reads at startup, so
# 5.x -> AttributeError -> CrashLoopBackOff. Fine-tuning (axolotl, which needs
# the transformers 5.x API) now lives in saturn-python-axolotl, so this image
# is free to pin transformers<5 without breaking training.
- transformers>=4.55,<5
- tokenizers
- numpy
- psutil
- pydantic
- fastapi
- uvicorn
- aiohttp
- requests
- typing-extensions
- packaging
- filelock
- matplotlib
- pandas
- seaborn
- tqdm
- click
- rich
- tensorboard
- wandb
- ipykernel
- pip
- pip:
- --extra-index-url https://download.pytorch.org/whl/cu129
- torch
- torchvision
- torchaudio
# Held below transformers 5 to match the conda pin above (the pip: block is
# run through pip by `mamba env update`, so repeat the bound to stop pip
# backtracking to 5.x).
- transformers>=4.55,<5
# vLLM serving stack.
- vllm==0.11.0
- ray
- sentence-transformers
# peft so vLLM can load the LoRA adapters Token Factory fine-tunes produce.
- peft
- accelerate
- bitsandbytes
- https://github.com/Dao-AILab/flash-attention/releases/download/v2.8.3/flash_attn-2.8.3%2Bcu12torch2.8cxx11abiTRUE-cp312-cp312-linux_x86_64.whl
- xformers
- gpustat
- nvidia-ml-py
- huggingface-hub
- black
- isort
- mypy
- pytest
- saturn-client
6 changes: 6 additions & 0 deletions saturn-python-vllm/recipe-template.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
{
"name": "saturn-python-vllm",
"description": "Serving LLMs with vLLM (inference; transformers 4 stack)",
"hardware_type": "gpu",
"supports": ["jupyterlab", "dask"]
}
2 changes: 1 addition & 1 deletion saturn-python/environment.yml
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ dependencies:
- pandas
- pip
- pyarrow
- python=3.11
- python=3.13
- python-graphviz
- s3fs
- scikit-learn
Expand Down
Loading
Loading