Quickstart Docs Examples

Sign in Try it now

Start here

Overview
Quickstart
API/CLI Reference
About

Use Cases

Process thousands of files quickly.
Process one giant file quickly.
Process data in your database quickly.
More use cases
Run batch inference and vector embeddings
Run pipeline stages on different hardware

How To

Read/Write Files to Cloud Storage.
Choose how to split your work
Combine many results/files into one. (Map-Reduce)
More how to articles
Limit parallelism for APIs, databases, and websites
Use custom Docker images and GPUs
Run jobs in the background.

Examples

All examples
Process 2.4TB of Parquet Files in 76s
Parallel Hyperparameter Tuning
Genomic Pipeline on 1,000 CPUs
ML, embeddings, and search
GPU embeddings on A100s
Batch inference without serving
Embed the whole arXiv
Label-free visual search over the Met
Multimodal Airbnb analysis
Full-corpus analysis
571M Amazon reviews
NYC taxi history
9.49M Flickr photos
NOAA rain extremes
One million GitHub READMEs
Production data jobs
S3 to Postgres ETL
Millions of image resizes
One Parquet file per worker
Pandas apply in parallel
Enrich millions of users through a rate-limited API
Crawl a million website pages without hiding failures
Scientific and geospatial work
Genome alignment
GDAL raster processing
Billion-path Monte Carlo

Examples

Full-corpus analysis

Full-corpus analysis

Examples for scanning large public datasets without sampling away the hard parts.

571M Amazon reviews
NYC taxi history
9.49M Flickr photos
NOAA rain extremes
One million GitHub READMEs