Resource Request: Accelerator quotas for AI Conformance CI jobs

AI Conformance is currently designing the automated testing framework for Kubernetes AI Conformance to move away from our current process of manual self-attestation (see our [design doc](https://docs.google.com/document/d/1nBhbDygCsefKhJciM5QmW3ZsGpec8E0p8JiwYkKvM0A)).

As part of this effort, we plan to set up CI jobs (presubmits and periodics) that require access to specialized hardware. Specifically, we need trivial quotas for accelerators (e.g., nvidia-tesla-t4 or L4) to verify basic functionality (e.g., secure accelerator access) without incurring high costs. 

I found existing GPU jobs in [GCP](https://github.com/kubernetes/test-infra/tree/master/config/jobs/kubernetes/sig-cloud-provider/gcp/gpu) and [AWS](https://github.com/kubernetes/test-infra/blob/0ad2b8be3f39209fda61f8dcf4b529658f609635/config/jobs/kubernetes/sig-cloud-provider/aws/ec2-e2e.yaml#L244-L379). Should our new AI conformance jobs share the existing approach/resources used by those jobs, or provision new ones?

Cost estimate for accelerators: 

Variable | Example Value (T4 GPU)
-- | --
Cost per Hour | ~$0.35 (on-demand) or ~$0.11 (preemptible/spot) / instance
Job Frequency | 1 periodic + ~0.1 presubmit / day
Duration per Job | ~30 min
Monthly Estimate | ($0.11/hr * 2 instances * 1.1 jobs/day * 0.5 hours) * 30 days = **~$3.63** / month _(note: this assumes using spot instance; duration per job might increase as we add more AI conformance tests)_

@ritazh @mfahlandt @terrytangyuan @dims @BenTheElder @ameukam 


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Resource Request: Accelerator quotas for AI Conformance CI jobs #9145

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Variable	Example Value (T4 GPU)
Cost per Hour	~$0.35 (on-demand) or ~$0.11 (preemptible/spot) / instance
Job Frequency	1 periodic + ~0.1 presubmit / day
Duration per Job	~30 min
Monthly Estimate	($0.11/hr * 2 instances * 1.1 jobs/day * 0.5 hours) * 30 days = ~$3.63 / month (note: this assumes using spot instance; duration per job might increase as we add more AI conformance tests)

Resource Request: Accelerator quotas for AI Conformance CI jobs #9145

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions