crunr documentation
Run any compute job on a GPU with one command — on crunr cloud (no AWS account needed) or your own AWS EC2. Zero idle cost. Zero ops.
01Requirements
# crunr cloud
| Requirement | Notes |
|---|---|
| Python | 3.10+ |
| crunr cloud account | Sign up and create an API key at cloud.crunr.com |
| Internet connection | For upload, log streaming, and output download |
# AWS EC2
| Requirement | Min version | Notes |
|---|---|---|
| Python | 3.10 | Required for installation via pip |
| ssh | any | Must be in your PATH |
| rsync | any | Must be in your PATH |
| AWS account | — | Active account with an IAM access key |
02Installation
Install crunr from PyPI:
pip install crunrVerify the installation:
crunr --versionTo upgrade to the latest version:
pip install --upgrade crunr03Quick Start
$ Option A — crunr cloud no AWS needed
# 1. Log in with your API key (from cloud.crunr.com)
crunr cloud login
# 2. Make cloud the default backend (optional but convenient)
crunr use cloud
# 3. Run a script on a GPU
crunr run train.py --instance a100 --max-hours 4
# 4. Inspect and retrieve
crunr jobs # list your cloud jobs
crunr pull <JOB_ID> # download outputs
crunr balance # check credits$ Option B — AWS EC2 your own account
# 1. Connect your AWS account
crunr auth
# 2. Run a script on a GPU instance
crunr run train.py --gpu
# 3. Check run history
crunr jobscrunr auth prompts for your AWS Access Key ID, Secret Access Key, and region, writes them to ~/.aws/credentials, and verifies them with a live STS call.
04Contexts — AWS vs crunr cloud
crunr has two backends. The active context decides which one context-sensitive commands target.
| AWS context (default) | crunr cloud context | |
|---|---|---|
| Set with | crunr use aws | crunr use cloud |
| Compute | EC2 in your AWS account | Managed GPUs on crunr cloud |
| Credentials | crunr auth (AWS keys) | crunr cloud login (API key) |
| Billing | You pay AWS directly | crunr credits (pay-as-you-go) |
Context-sensitive commands (route to the active backend):
run · jobs · logs · pull · share · cancel · balance
Always-explicit commands:
| Scope | Commands |
|---|---|
| AWS-only | crunr auth, crunr ps, crunr ssh, crunr clean, crunr s3 … |
| Cloud-only | crunr cloud … (e.g. crunr cloud jobs, crunr cloud balance) |
The active context is stored in ~/.crunr/config.json. The default is aws.
crunr use cloud # run/jobs/logs/pull/share/cancel/balance → crunr cloud
crunr use aws # back to AWS EC2 / S305How It Works
# crunr cloud
Step 1 — create job → POST /jobs returns a job ID + one-time upload URL
Step 2 — upload code → directory bundled (.tar.gz, minus .crunrignore) and uploaded
Step 3 — confirm → SHA-256 verified; job is queued
Step 4 — run → GPU provisioned, script runs, logs stream live (SSE)
Step 5 — collect → outputs stored in crunr cloud; pull any time# AWS EC2
Phase 1 — local setup → select instance, look up AMI, create security group + SSH key
Phase 2 — provision → launch EC2 (on-demand by default, or --spot), wait for running
Phase 3 — sync → wait for SSH, rsync directory up, install requirements.txt
Phase 4 — execute → run your script autonomously; stream stdout with live cost meter
Phase 5 — collect → download outputs/ from S3 (if enabled) or rsync from instance06Command Reference
Context-sensitive commands route to whichever backend is active (crunr use aws or crunr use cloud). AWS-only and cloud-only commands are labeled below.
$ crunr use
Switch the active context. Persists to ~/.crunr/config.json.
crunr use <aws|cloud>crunr use cloud # target crunr cloud GPUs
crunr use aws # target AWS EC2 (default)$ crunr run
Provision, execute, and clean up — the core command. Routes to crunr cloud or AWS based on the active context (or --cloud).
crunr run [SCRIPT] [OPTIONS]Flags available in both contexts
| Flag / Argument | Default | Description |
|---|---|---|
| SCRIPT | — | Path to a .py or .sh script, or a shell command in quotes. On AWS, omit for an interactive bash session; on crunr cloud a script is required. |
| --gpu | off | Request a GPU. On crunr cloud this is always implied. |
| --memory GB | — | Minimum GPU VRAM in GB. On cloud, selects a GPU type that meets the requirement. |
| --env KEY=VALUE | — | Set an environment variable in the job. Repeat for multiple. |
| --dir PATH | . | Local directory to sync / upload (default: current directory). |
crunr cloud context flags
| Flag | Default | Description |
|---|---|---|
| --instance GPU | rtx4090 | GPU type: a100, h100, rtx4090, rtx3090, l40s, and more — see GPU Types. Case-insensitive, separators ignored. |
| --max-hours N | 2 | Hard time limit. Job auto-stops at this limit. Up to your account cap. |
| --cloud | off | Force this single run onto crunr cloud, ignoring AWS context. |
AWS context flags
| Flag | Default | Description |
|---|---|---|
| --instance TYPE | — | Exact EC2 instance type (e.g. g5.xlarge). Overrides --gpu/--memory. |
| --spot | off | Use spot pricing (60–90% cheaper, may be interrupted). Default is on-demand. |
| --disk GB | auto | Root EBS volume size. Defaults: 8 GB CPU, 150 GB GPU. |
| --profile NAME | default | AWS credential profile. |
| --region REGION | from profile | Override the AWS region for this run. |
| --s3 | off | Back up outputs to S3 using saved config. Run crunr s3 setup first. |
| --s3-bucket NAME | — | S3 bucket for outputs. Created if missing. Implies --s3. |
| --s3-prefix PREFIX | crunr-jobs | Key prefix inside the bucket. |
| --s3-no-local | off | Skip local download when S3 backup succeeds. |
| --s3-ttl DAYS | — | Auto-delete this job's S3 data after N days. |
Examples — crunr cloud
crunr run train.py --cloud # cheapest default GPU (RTX 4090)
crunr run train.py --cloud --instance a100 # specific GPU
crunr run train.py --instance h100 --max-hours 8 # after: crunr use cloud
crunr run train.py --memory 40 # pick a GPU with ≥40 GB VRAM
crunr run train.py --env EPOCHS=50 --env LR=0.001
crunr run train.py --dir ~/projects/my-modelExamples — AWS
crunr run train.py --gpu # cheapest GPU, on-demand
crunr run train.py --gpu --spot # spot pricing
crunr run train.py --gpu --memory 24 # ≥24 GB VRAM
crunr run train.py --instance p3.2xlarge
crunr run preprocess.py # CPU-only
crunr run "python -c 'import torch; print(torch.__version__)'" --gpu
crunr run --gpu # interactive GPU shell$ crunr cloud login
Save your crunr cloud API key. The key is verified with a live API call before being saved to ~/.crunr/cloud.json (owner-only permissions).
crunr cloud login [--key KEY] [--api-base URL]| Flag | Description |
|---|---|
| --key KEY | API key (prompted securely if omitted). Must start with crunr_sk_. |
| --api-base URL | Override the API endpoint (default: https://api.crunr.com). |
crunr cloud login # prompts for the key
crunr cloud login --key crunr_sk_xxxxx$ crunr cloud jobs / crunr jobs
List your crunr cloud jobs, newest first. In cloud context, crunr jobs is the same command; in AWS context it shows your local AWS run history instead.
crunr cloud jobs [--limit N]
crunr jobs [--limit N] # cloud context| Flag | Default | Description |
|---|---|---|
| --limit N | 20 | Maximum number of jobs to show. |
The table shows job ID, status, GPU, command, relative creation time, and GPU time used.
$ crunr cloud logs / crunr logs
Stream live logs for a cloud job over Server-Sent Events. Reconnects automatically on timeout. Use after detaching from a long job.
crunr cloud logs <JOB_ID>
crunr logs <JOB_ID> # cloud context$ crunr cloud pull / crunr pull
Download a cloud job's output files to your machine.
crunr cloud pull <JOB_ID> [--dest DIR]
crunr pull <JOB_ID> [--dest DIR] # cloud context| Flag | Default | Description |
|---|---|---|
| --dest DIR | ./crunr-outputs/<JOB_ID>/ | Local destination directory. |
Files land under <dest>/outputs/. In AWS context, crunr pull downloads from S3 (see S3 Persistence for extra flags).
$ crunr cloud cancel / crunr cancel
Cancel a job that is QUEUED, PROVISIONING, SETUP, or RUNNING.
crunr cloud cancel <JOB_ID>
crunr cancel <JOB_ID> # cloud context$ crunr cloud balance / crunr balance
Show your crunr cloud credit balance and storage usage.
crunr cloud balance
crunr balance # cloud contextOutput includes:
| Field | Meaning |
|---|---|
| Available | Credits you can spend right now. |
| Reserved | Credits locked by currently running jobs (released at settlement). |
| Balance | Available + Reserved. |
| Spent | Lifetime credits consumed. |
| Storage | GB used, free allowance, and any overage with projected monthly cost. |
Top up at cloud.crunr.com/dashboard/billing.
$ crunr auth AWS only
Configure AWS credentials. Interactive wizard on first use; updates existing profiles on subsequent calls.
crunr auth [PROFILE] [--list] [--verify [PROFILE]] [--default PROFILE]| Flag / Argument | Description |
|---|---|
| PROFILE | Profile name to create or update. Defaults to default. |
| --list, -l | List all configured profiles with regions and masked key IDs. |
| --verify [PROFILE] | Test that credentials are valid with a live STS call. |
| --default PROFILE | Promote an existing profile to be the default. |
crunr auth # first-time setup
crunr auth work # add a named profile
crunr auth --list
crunr auth --verify work
crunr auth --default work$ crunr ssh AWS only
Open a shell on a crunr instance that is already running a job. Disconnecting does not terminate the instance — the job keeps running.
crunr ssh [INSTANCE_ID] [--profile NAME] [--region REGION]| Flag / Argument | Description |
|---|---|
| INSTANCE_ID | EC2 instance ID (e.g. i-0abc1234def). Optional — auto-selected when exactly one crunr instance is running. |
| --profile NAME | AWS credential profile. |
| --region REGION | AWS region. |
crunr ssh # auto-connect if one instance is running
crunr ssh i-0abc1234def
# inside: nvidia-smi · tail -f /tmp/crunr-*.log · htop · df -h$ crunr ps AWS only
List crunr instances currently running in AWS. Useful after a crash or interrupted Ctrl+C.
crunr ps [--profile NAME] [--region REGION]$ crunr clean AWS only
Terminate every EC2 instance tagged ManagedBy=crunr. Use to clean up orphaned instances after a crash.
crunr clean [--profile NAME] [--region REGION] [--yes]| Flag | Description |
|---|---|
| --profile NAME | AWS credential profile. |
| --region REGION | AWS region to sweep. |
| --yes, -y | Skip the confirmation prompt. |
crunr clean # interactive — asks before terminating
crunr clean --yes # non-interactive — use in scripts$ crunr s3 AWS only
Manage S3 output persistence. All subcommands accept --profile and --region.
crunr s3 <subcommand> [OPTIONS]$ crunr s3 setup
Create an S3 bucket, apply security hardening, set up the IAM instance profile, and save the config to ~/.crunr/config.json.
crunr s3 setup --bucket crunr-yourname-outputs
crunr s3 setup --bucket crunr-myproject --prefix crunr-jobs --ttl 90| Flag | Description |
|---|---|
| --bucket NAME | Bucket name (prompted if omitted). Must be globally unique. |
| --prefix PREFIX | Key prefix (default: crunr-jobs). |
| --ttl DAYS | Lifecycle rule to expire job data after N days. |
$ crunr s3 list
List jobs stored in S3, newest first.
crunr s3 list
crunr s3 list --bucket other-bucket$ crunr s3 pull
Download a job's outputs, log, and metadata from S3.
crunr s3 pull <JOB_ID>
crunr s3 pull <JOB_ID> --dest ~/recovered/
crunr s3 pull <JOB_ID> --outputs-only
crunr s3 pull <JOB_ID> --no-log| Flag | Description |
|---|---|
| --dest DIR | Local destination. Default: ./crunr-<JOB_ID>/ |
| --outputs-only | Download only outputs/; skip stdout.log and metadata.json. |
| --no-log | Skip downloading stdout.log. |
$ crunr s3 status
Show the saved S3 config and total storage usage.
crunr s3 status$ crunr s3 rm
Permanently delete all S3 objects for a specific job.
crunr s3 rm <JOB_ID>
crunr s3 rm <JOB_ID> --yes07crunr cloud GPU Types
Pass any of these to --instance (case-insensitive; separators ignored). Live availability and pricing are shown in the picker when you submit. Run crunr cloud for the live catalog.
| Family | --instance values |
|---|---|
| Consumer GeForce | rtx3070, rtx3080, rtx3080ti, rtx3090, rtx3090ti, rtx4070ti, rtx4080, rtx4080s, rtx4090, rtx5080, rtx5090 |
| Ada / Blackwell workstation | rtx2000ada, rtx4000ada, rtx5000ada, rtx6000ada, rtxpro4000, rtxpro4500, rtxpro5000, rtxpro6000 |
| Ampere workstation | rtxa2000, rtxa4000, rtxa4500, rtxa5000, rtxa6000 |
| Data center | a40, a100 (40 GB), a10080gb, a100sxm80, l4, l40, l40s, h100, h100pcie, h100nvl, h200, h200nvl, b200, b300, mi300x, v100, v100sxm |
# Choosing the right GPU for your workload
Two things decide whether a job runs and how fast: VRAM (does the model + batch fit?) and throughput (how fast the GPU computes). VRAM is the hard constraint — too little and you get CUDA out of memory; everything else is a speed/price trade-off.
Rule of thumb: inference needs roughly param count × 2 bytes (fp16); training needs 3–4× that (gradients + optimizer state). A 7B model is ~14 GB just to load — plan for ~40 GB+ to fine-tune it.
| Your workload | Good fit | Why |
|---|---|---|
| Learning, small CNNs, CIFAR/MNIST, quick experiments | rtx4090, rtx3090, l4 | Cheapest GPUs; 16–24 GB is plenty; fast to provision |
| Fine-tuning small models (≤3B), Stable Diffusion, most CV | rtx4090, l40s, rtxa6000 | 24–48 GB covers typical batch sizes at a moderate price |
| Fine-tuning / inference of 7B–13B LLMs | a100 (40 GB), a10080gb, l40s | 40–80 GB fits 7–13B in fp16 with headroom for batches |
| Large-model training, 30B+ LLMs, big batches | h100, h200, a100sxm80 | 80 GB+ and the highest memory bandwidth; fastest per-step |
| Newest Blackwell builds (cu130), max single-GPU speed | rtxpro6000, b200 | Newest architecture/driver; needed only if you pin a cu130 build |
| Cheap preprocessing or light GPU steps | l4, rtx3070 | Low $/hr for jobs that barely touch the GPU |
- Start smaller and cheaper. If it fits and finishes in time, you're done — a faster GPU only saves wall-clock, and you pay per second either way.
- Hitting CUDA out of memory? Step up VRAM (--memory 40 → A100, --memory 80 → H100) or reduce batch size.
- H100/H200 have far higher memory bandwidth than an A100 of the same VRAM — worth it for throughput-bound training.
- Consumer cards (RTX 4090) and newest data-center cards can be in short supply; if the picker shows none free, a same-tier alternative (e.g. l40s for rtxa6000) usually is.
- cu128 wheels run on the whole fleet; only pin cu130 if you specifically target the newest cards — see Best Practices §2.
08crunr cloud Billing & Credits
crunr cloud is pay-as-you-go on prepaid credits.
| Fact | Details |
|---|---|
| Billed only while GPU is live | No idle billing — the meter starts when your job reaches RUNNING and stops when it ends. |
| Transparent pricing | The rate shown in the GPU picker and live cost ticker is exactly what you're charged — no hidden fees. |
| Up-front hold | On submit, crunr reserves credits covering the worst case (roughly max-hours × rate). Unused credits are released at job settlement. You must have at least the hold amount available. |
| --max-hours caps cost | A job is force-stopped at its max-hours limit, so a hung job can't drain your balance. |
crunr balance # Available / Reserved / Balance / SpentPre-submit rejections (no GPU is spun up):
| Rejection | Fix |
|---|---|
| Insufficient credits | Hold exceeds your available balance. Top up at cloud.crunr.com/dashboard/billing, lower --max-hours, or pick a cheaper GPU. |
| Max-hours exceeded | --max-hours is above your account limit. Request a higher cap at cloud.crunr.com/dashboard/limits. |
09crunr cloud Storage
Job outputs are stored in crunr cloud after each run and can be pulled or shared any time.
| Detail | |
|---|---|
| Free allowance | Each account gets a free storage allowance; usage beyond it is billed per GB-month (shown in crunr balance). |
| Auto-expiry | Outputs auto-expire 30 days after a run. Pull anything you want to keep before then. |
| Downloads are free | crunr pull and crunr share don't cost credits. |
crunr balance # shows GB used / free / overage
crunr cloud pull <JOB_ID> # download before expiry10The Code Bundle & .crunrignore
For crunr cloud, your --dir (default: current directory) is bundled into a .tar.gz and uploaded. Large artifacts are excluded by default so uploads stay fast.
# Excluded by default
*.pt *.bin *.h5 *.pkl *.safetensors # model weights
data/ datasets/ *.parquet *.arrow # datasets
.git/ .github/ __pycache__/ *.pyc *.pyo
.venv/ venv/ env/ node_modules/
.DS_Store Thumbs.db outputs/ *.log# .crunrignore
Add your own patterns in a .crunrignore file (one glob per line; # for comments):
# .crunrignore
checkpoints/
*.npz
scratch/11Writing crunr cloud Jobs — Best Practices
A few habits make the difference between a job that “just works” and one that wastes GPU time or silently loses your results. Practice #1 is the most important.
# Recommended project structure
Before running anything, set up your project like this. crunr uploads your --dir (default: current directory) and only collects what lands in outputs/.
my_project/
├── train.py ← entry script (passed to crunr run)
├── requirements.txt ← deps; pin cu128 torch (see §2 below)
├── .crunrignore ← exclude weights, datasets, cache dirs
│
├── src/ ← your modules (uploaded)
│ ├── model.py
│ ├── dataset.py
│ └── utils.py
│
├── configs/ ← hyperparameter YAML / JSON (uploaded)
│ └── config.yaml
│
├── data/ ← ignored by default; download at runtime
│
└── outputs/ ← ⚠️ everything you want to keep goes here
├── best_model.pt
├── metrics.json
└── plots/1 Save everything you want to keep to ./outputs/
crunr only collects the outputs/ directory. Files written anywhere else are discarded when the pod terminates.
This is the single most common way to lose results. Your job can finish perfectly, print “✓ done”, and still leave you with nothing to pull — because the files went to the working directory instead of outputs/.
# ❌ WRONG — these vanish when the pod shuts down
torch.save(model.state_dict(), "best_model.pt")
open("evaluation_results.txt", "w").write(report)
# ✅ RIGHT — these are uploaded to crunr cloud and survive
from pathlib import Path
Path("outputs").mkdir(exist_ok=True)
torch.save(model.state_dict(), "outputs/best_model.pt")
open("outputs/evaluation_results.txt", "w").write(report)| Backend | Where outputs land |
|---|---|
| crunr cloud | Stored in crunr cloud; retrieve with crunr pull <JOB_ID> → lands in ./crunr-outputs/<JOB_ID>/outputs/ |
| AWS | Downloaded to <your --dir>/outputs/ (and to S3 if --s3 is enabled) |
Pre-submit checklist: every artifact is under outputs/; directory created with Path("outputs").mkdir(exist_ok=True); paths are relative (outputs/model.pt), not absolute (/root/model.pt).
2 Pin a CUDA build in requirements.txt
A plain pip install torch now resolves to a cu130 build that needs a very new driver; most GPUs in the fleet run an older driver, so torch would silently fall back to CPU — paying GPU rates for CPU speed. Pin cu128 (CUDA 12.8), which runs on every GPU in the catalog (V100 through Blackwell):
# requirements.txt
--index-url https://download.pytorch.org/whl/cu128
--extra-index-url https://pypi.org/simple
torch==2.9.1+cu128
torchvision==0.24.1+cu1283 Print progress — logs stream live
stdout/stderr stream to crunr logs in near real time with line-buffering forced on. Plain print(...) and tqdm bars show up as they happen — no flush=True needed. Print enough to tell a stuck job from a slow one.
4 Keep the code bundle small
Only your code needs to upload. Weights, datasets, and caches are excluded by default (see §10 Code Bundle). Add anything else large to .crunrignore. Download big datasets at runtime into /tmp — uploads over 2 GB are rejected.
5 Set --max-hours to a real ceiling
--max-hours both caps your cost and sizes the up-front credit hold (max-hours × rate). Set it a bit above expected runtime — high enough a legitimately slow job isn't killed, low enough a hung job can't drain credits.
6 Pass secrets via --env, never hardcode them
Use --env WANDB_API_KEY=... and read os.environ[...] in your script. crunr redacts obvious secrets from streamed logs, but don't rely on that — keep secrets out of source files that get bundled.
7 Checkpoint long runs into outputs/
For multi-hour training, write checkpoints to outputs/ periodically (e.g. outputs/ckpt_latest.pt). If the run ends early for any reason, whatever you last wrote is already uploaded and pullable.
12Environment Variables
Pass variables to your job with --env (both backends):
crunr run train.py --env EPOCHS=50 --env LR=0.0003 --env WANDB_API_KEY=xxxximport os
epochs = int(os.environ["EPOCHS"])13AWS — S3 Output Persistence
S3 persistence protects against losing results if your laptop disconnects mid-job: the EC2 instance pushes outputs to S3 using its own IAM role before local download begins.
# How it works
Phase 5 — collecting outputs
5a: EC2 → S3 (instance uploads outputs/ using its IAM role)
5b: EC2 → local (downloaded from S3; rsync fallback)
If S3 upload fails: crunr falls back to rsync-only (no data lost)# Setup (one-time)
crunr s3 setup --bucket crunr-yourname-outputs# Running with S3
crunr run train.py --gpu --s3 # saved config
crunr run train.py --gpu --s3-bucket crunr-x-out # explicit bucket
crunr run train.py --gpu --s3 --s3-no-local # S3 only
crunr run train.py --gpu --s3 --s3-ttl 30 # expire after 30 days# S3 key structure
s3://your-bucket/crunr-jobs/<job-id>/
outputs/ ← everything written to outputs/
stdout.log ← full job stdout/stderr
metadata.json ← job id, instance type, duration, cost, exit code14AWS — Instance Types
crunr picks the cheapest available on-demand instance by default. Use --instance to pin a type or --memory GB for a minimum.
# CPU instances
| Instance | vCPUs | RAM | Spot ~$/hr | On-demand/hr | Best for |
|---|---|---|---|---|---|
| t3.micro | 2 | 1 GB | ~$0.004 | $0.0104 | Tiny scripts, quick tests |
| t3.medium | 2 | 4 GB | ~$0.014 | $0.0416 | Light data processing |
| t3.large | 2 | 8 GB | ~$0.028 | $0.0832 | General single-threaded jobs |
| t3.xlarge | 4 | 16 GB | ~$0.056 | $0.1664 | Multi-threaded processing |
| t3.2xlarge | 8 | 32 GB | ~$0.112 | $0.3328 | Medium batch jobs |
| c5.4xlarge | 16 | 32 GB | ~$0.27 | $0.68 | CPU-intensive compute |
| c5.9xlarge | 36 | 72 GB | ~$0.61 | $1.53 | Heavy parallel workloads |
| m5.4xlarge | 16 | 64 GB | ~$0.31 | $0.768 | Memory + compute balance |
| r5.4xlarge | 16 | 128 GB | ~$0.40 | $1.008 | In-memory data processing |
# GPU instances
crunr uses AWS Deep Learning AMIs (CUDA, cuDNN, Nvidia drivers pre-installed).
| Instance | GPU | VRAM | vCPUs | RAM | Spot ~$/hr | On-demand/hr |
|---|---|---|---|---|---|---|
| g4dn.xlarge | 1× T4 | 16 GB | 4 | 16 GB | ~$0.16 | $0.526 |
| g4dn.12xlarge | 4× T4 | 64 GB | 48 | 192 GB | ~$1.17 | $3.912 |
| g5.xlarge | 1× A10G | 24 GB | 4 | 16 GB | ~$0.34 | $1.006 |
| g5.2xlarge | 1× A10G | 24 GB | 8 | 32 GB | ~$0.49 | $1.212 |
| g5.12xlarge | 4× A10G | 96 GB | 48 | 192 GB | ~$1.41 | $5.672 |
| p3.2xlarge | 1× V100 | 16 GB | 8 | 61 GB | ~$0.92 | $3.06 |
| p3.8xlarge | 4× V100 | 64 GB | 32 | 244 GB | ~$3.50 | $12.24 |
| p4d.24xlarge | 8× A100 40GB | 320 GB | 96 | 1152 GB | ~$10.50 | $32.77 |
| p4de.24xlarge | 8× A100 80GB | 640 GB | 96 | 1152 GB | ~$16.00 | $40.97 |
crunr run train.py --gpu --memory 24 # ≥24 GB VRAM (A10G or better)
crunr run train.py --gpu --memory 40 # ≥40 GB VRAM (A100)
crunr run train.py --gpu --memory 80 # ≥80 GB VRAM (A100-80GB)15AWS — GPU Quota
# How to request (5–30 min for approval)
Open Service Quotas
AWS Console → Service Quotas → AWS Services → Amazon EC2
Request G and VT (on-demand)
Search "Running On-Demand G and VT instances" → Request quota increase → enter 32
Request G and VT (spot)
Search "Running Spot G and VT instances" → Request quota increase → enter 32
P-series (if needed)
For p3, p4d: search Running On-Demand P instances / Running Spot P instances
Wait and re-run
Usually approved within 5–30 min for G-series in us-east-1. No command changes needed.
# Check your current quota
aws service-quotas get-service-quota \
--service-code ec2 \
--quota-code L-DB2E81BA # On-Demand G and VT16AWS — Spot vs On-Demand
crunr defaults to on-demand (never preempted).
| Mode | Cost | Reliability | Use when |
|---|---|---|---|
| On-demand (default) | Full price | Never preempted | Any job — the safe default |
| --spot | 60–90% cheaper | May be reclaimed with 2-min notice | Batch jobs, experiments, S3-backed jobs |
crunr run train.py --gpu --spot17AWS — IAM Permissions
Attach this policy to your crunr IAM user or role. Covers base EC2 plus S3 persistence. Replace crunr-* with a tighter bucket name if you prefer.
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "RunrVerify",
"Effect": "Allow",
"Action": ["sts:GetCallerIdentity"],
"Resource": "*"
},
{
"Sid": "RunrDescribe",
"Effect": "Allow",
"Action": [
"ec2:DescribeInstances",
"ec2:DescribeImages",
"ec2:DescribeKeyPairs",
"ec2:DescribeSecurityGroups",
"ec2:DescribeSpotPriceHistory",
"ec2:DescribeAvailabilityZones",
"ec2:DescribeVpcs",
"ec2:DescribeSubnets",
"ec2:DescribeInstanceTypes",
"ec2:DescribeInstanceStatus"
],
"Resource": "*"
},
{
"Sid": "RunrKeyPair",
"Effect": "Allow",
"Action": ["ec2:CreateKeyPair", "ec2:DeleteKeyPair"],
"Resource": "*"
},
{
"Sid": "RunrSecurityGroup",
"Effect": "Allow",
"Action": [
"ec2:CreateSecurityGroup",
"ec2:AuthorizeSecurityGroupIngress"
],
"Resource": "*"
},
{
"Sid": "RunrInstances",
"Effect": "Allow",
"Action": [
"ec2:RunInstances",
"ec2:TerminateInstances",
"ec2:CreateTags",
"ec2:RequestSpotInstances",
"ec2:DescribeSpotInstanceRequests",
"ec2:CancelSpotInstanceRequests"
],
"Resource": "*"
},
{
"Sid": "CrunrS3Bucket",
"Effect": "Allow",
"Action": [
"s3:CreateBucket",
"s3:ListBucket",
"s3:GetBucketLocation",
"s3:PutBucketPublicAccessBlock",
"s3:PutBucketPolicy",
"s3:PutEncryptionConfiguration",
"s3:PutBucketOwnershipControls",
"s3:PutLifecycleConfiguration",
"s3:GetLifecycleConfiguration"
],
"Resource": "arn:aws:s3:::crunr-*"
},
{
"Sid": "CrunrS3Objects",
"Effect": "Allow",
"Action": [
"s3:PutObject",
"s3:GetObject",
"s3:DeleteObject"
],
"Resource": "arn:aws:s3:::crunr-*/*"
},
{
"Sid": "CrunrIAMRole",
"Effect": "Allow",
"Action": [
"iam:CreateRole",
"iam:GetRole",
"iam:PutRolePolicy",
"iam:GetRolePolicy",
"iam:DeleteRolePolicy",
"iam:DeleteRole",
"iam:TagRole"
],
"Resource": "arn:aws:iam::*:role/crunr-s3-writer"
},
{
"Sid": "CrunrIAMProfile",
"Effect": "Allow",
"Action": [
"iam:CreateInstanceProfile",
"iam:GetInstanceProfile",
"iam:AddRoleToInstanceProfile",
"iam:RemoveRoleFromInstanceProfile",
"iam:DeleteInstanceProfile",
"iam:TagInstanceProfile"
],
"Resource": "arn:aws:iam::*:instance-profile/crunr-instance-profile"
},
{
"Sid": "CrunrPassRole",
"Effect": "Allow",
"Action": "iam:PassRole",
"Resource": "arn:aws:iam::*:role/crunr-s3-writer",
"Condition": {
"StringEquals": {
"iam:PassedToService": "ec2.amazonaws.com"
}
}
}
]
}| Block | Why |
|---|---|
| RunrDescribe / RunrInstances | Core EC2 lifecycle — instance ARNs unknown at policy time, so * is required |
| CrunrS3Bucket | Create and harden the bucket; list jobs; manage lifecycle rules |
| CrunrS3Objects | Upload metadata from your laptop; download outputs via s3 pull |
| CrunrIAMRole | Create the crunr-s3-writer role that EC2 instances use to write outputs |
| CrunrIAMProfile | Attach that role to instances at launch time |
| CrunrPassRole | Critical for S3 — without this, RunInstances with IamInstanceProfile is denied |
18Supported AWS Regions
crunr supports all 18 major AWS regions. Select your region during crunr auth. Override for a single job with --region.
| Region | Location |
|---|---|
| us-east-1 | US East (N. Virginia) — cheapest spot prices |
| us-east-2 | US East (Ohio) |
| us-west-1 | US West (N. California) |
| us-west-2 | US West (Oregon) |
| eu-west-1 | EU (Ireland) |
| eu-west-2 | EU (London) |
| eu-west-3 | EU (Paris) |
| eu-central-1 | EU (Frankfurt) |
| eu-north-1 | EU (Stockholm) |
| ap-southeast-1 | Asia Pacific (Singapore) |
| ap-southeast-2 | Asia Pacific (Sydney) |
| ap-northeast-1 | Asia Pacific (Tokyo) |
| ap-northeast-2 | Asia Pacific (Seoul) |
| ap-south-1 | Asia Pacific (Mumbai) |
| ca-central-1 | Canada (Central) |
| sa-east-1 | South America (São Paulo) |
| me-south-1 | Middle East (Bahrain) |
| af-south-1 | Africa (Cape Town) |
crunr run train.py --gpu --region eu-west-119Troubleshooting
Run crunr cloud login, or set CRUNR_API_KEY=crunr_sk_.... Keys are created at cloud.crunr.com.
The up-front hold (max-hours × rate) exceeds your available balance. Lower --max-hours, pick a cheaper GPU, or top up at cloud.crunr.com/dashboard/billing.
--max-hours is above your account limit. Request a higher cap at cloud.crunr.com/dashboard/limits.
The chosen GPU has no free capacity. crunr offers alternatives after ~30s. Cancel and retry:
crunr cloud cancel <JOB_ID>
crunr run train.py --cloud --instance a100Exclude large files with .crunrignore (see Code Bundle) and download / generate them inside the job.
Almost always means your script saved results outside the outputs/ directory. crunr only collects outputs/ — files written to the working directory (e.g. ./model.pt, ./results.txt) are discarded when the pod terminates.
Confirm with crunr balance — Storage will read 0.00 GB if nothing was saved. Fix: write everything to outputs/ — see Best Practices §1.
If the job is still running, outputs aren't collected until it finishes. Check status with crunr cloud jobs.
- macOS/Linux: brew install rsync or apt install rsync
- Windows: Install Git for Windows or use WSL 2.
crunr authcrunr run train.py --gpu --memory 24 # A10G
crunr run train.py --gpu --memory 40 # A100crunr ps # list survivors
crunr clean # terminate themFrom v1.3.0 crunr self-heals this on the next run. To force a refresh manually:
# macOS / Linux
rm ~/.crunr/crunr-key.pem
# Windows (PowerShell)
Remove-Item "$env:USERPROFILE\.crunr\crunr-key.pem" -ForceSee AWS — GPU Quota for the full request walkthrough.
20File Locations
| File | Purpose |
|---|---|
| ~/.crunr/cloud.json | crunr cloud API key + endpoint (owner-only) |
| ~/.crunr/config.json | Active context (aws/cloud) and saved S3 config |
| ~/.aws/credentials | AWS access keys (shared with AWS CLI) |
| ~/.aws/config | AWS regions and output format |
| ~/.crunr/crunr-key.pem | SSH private key for AWS instances (auto-created, reused) |
| ~/.crunr/jobs.json | Local AWS run history (viewed with crunr jobs in AWS context) |
# Environment variables
| Variable | Purpose |
|---|---|
| CRUNR_API_KEY | crunr cloud API key (overrides saved key) |
| CRUNR_API_BASE | Cloud endpoint (default: https://api.crunr.com) |
| CRUNR_CLOUD | Force cloud backend for crunr run |
21Version History
| Version | Notes |
|---|---|
| 2.4.x | crunr cloud backend — managed GPUs with no AWS account: crunr cloud login/jobs/logs/pull/share/cancel/balance, crunr use aws|cloud contexts, crunr run --cloud, --instance <gpu>, --max-hours N; credit-based billing (crunr balance), live GPU picker with crunr pricing, .crunrignore code bundling, crunr cloud output storage with 30-day expiry |
| 1.3.0 | crunr ssh — connect to a running job's instance without stopping the job; key pair fingerprint self-healing; Windows SSH key uses full-control ACL |
| 1.2.0 | crunr share — presigned S3 download links; on-demand default (--spot to opt in); real-time cost meter; spot→on-demand fallback price fix |
| 1.1.0 | Real-time cost meter during job streaming; CLI logo branding |
| 1.0.0 | First stable production release |
| 0.2.x | S3 output persistence, autonomous job wrapper, crunr logs, clean CLI error handling, GPU reliability fixes |
| 0.1.0 | Initial public release — run, auth, jobs, ps, clean |