-
Notifications
You must be signed in to change notification settings - Fork 991
Open
Labels
sig/k8s-infraCategorizes an issue or PR as relevant to SIG K8s Infra.Categorizes an issue or PR as relevant to SIG K8s Infra.
Description
AI Conformance is currently designing the automated testing framework for Kubernetes AI Conformance to move away from our current process of manual self-attestation (see our design doc).
As part of this effort, we plan to set up CI jobs (presubmits and periodics) that require access to specialized hardware. Specifically, we need trivial quotas for accelerators (e.g., nvidia-tesla-t4 or L4) to verify basic functionality (e.g., secure accelerator access) without incurring high costs.
I found existing GPU jobs in GCP and AWS. Should our new AI conformance jobs share the existing approach/resources used by those jobs, or provision new ones?
Cost estimate for accelerators:
| Variable | Example Value (T4 GPU) |
|---|---|
| Cost per Hour | ~$0.35 (on-demand) or ~$0.11 (preemptible/spot) / instance |
| Job Frequency | 1 periodic + ~0.1 presubmit / day |
| Duration per Job | ~30 min |
| Monthly Estimate | ($0.11/hr * 2 instances * 1.1 jobs/day * 0.5 hours) * 30 days = ~$3.63 / month (note: this assumes using spot instance; duration per job might increase as we add more AI conformance tests) |
@ritazh @mfahlandt @terrytangyuan @dims @BenTheElder @ameukam
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
sig/k8s-infraCategorizes an issue or PR as relevant to SIG K8s Infra.Categorizes an issue or PR as relevant to SIG K8s Infra.