Yhl/llm example#12
Conversation
0cc65a7 to
d05fb97
Compare
| pipe = pipeline( | ||
| "text-generation", | ||
| model=model_id, | ||
| torch_dtype=torch.bfloat16, |
There was a problem hiding this comment.
[🟠 High] [🔵 Bug]
The worker is explicitly schedulable on any GPU, but the pipeline forces torch.bfloat16; requests can fail at model load time on GPU types that do not support BF16, causing intermittent endpoint failures depending on where the job lands. Use an adaptive dtype so the worker remains portable across heterogeneous GPU assignments.
# 02_ml_inference/01_text_generation/gpu_worker.py
gpu_config = LiveServerless(
gpus=[GpuGroup.ANY], # Run on any GPU
)
...
torch_dtype=torch.bfloat16,| torch_dtype=torch.bfloat16, | |
| torch_dtype="auto", |
| from fastapi import APIRouter | ||
| from pydantic import BaseModel | ||
|
|
||
| from tetra_rp import ( |
There was a problem hiding this comment.
[🟠 High] [🟡 Investigate]
This new example imports tetra_rp directly, but the repository’s documented/base install contract is centered on runpod-flash (uv sync from the repo root installs that dependency set), so a clean environment may hit ModuleNotFoundError when flash run imports this module. Verify by running the repo quick-start flow in a fresh venv; if it fails, migrate this example to the supported runpod_flash API or add/document project-level dependency installation for tetra_rp.
# 02_ml_inference/01_text_generation/gpu_worker.py
from tetra_rp import (
GpuGroup,
LiveServerless,
remote,
)
Description
Brief description of what this PR adds or fixes.
Type of Change
Example Category
If adding a new example, which category does it belong to?
Checklist
Functionality
flash runCode Quality
Documentation
.env.examplefile providedDependencies
requirements.txtpyproject.tomlincluded with project metadataTesting
python -m py_compile)Security
What This Example Demonstrates
List the key concepts or patterns this example demonstrates:
Testing Instructions
How should reviewers test this example?
Screenshots/Output (if applicable)
Add screenshots or example output if relevant.
Additional Context
Any additional information reviewers should know about this PR.
Related Issues
Closes #(issue number)