Examples that show what Burla can do.
A curated gallery of demos, data stories, and reusable patterns for running ordinary Python across hundreds of CPUs and GPUs without rebuilding your workflow around Spark, Airflow, or queues.
Live data stories
These are the examples that already had GitHub Pages sites. Each card opens the migrated live site under this repo.
Airbnb at continental scale
Scan 1.7M photos and 50.7M reviews across 119 cities to find the listings, amenities, and image patterns hiding in plain sight.
571M Amazon reviews distilled
Rank real reviews by profanity, screaming, mismatch, and pure customer chaos. No LLM, just massive parallel text processing.
The Met's hidden twins
Embed 192K public-domain artworks and surface visual near-duplicates separated by centuries, cultures, and mediums.
NYC ghost neighborhoods
Process 2.76B taxi trips to find zones that died, rebounded, or became something new after years of city-scale change.
Fossils of the arXiv
Cluster 2.71M abstracts to find extinct topics, emerging fields, and the loneliest paper in the scientific corpus.
World Photo Index
Analyze 9.49M geotagged Flickr photos to discover what each country photographs more than anywhere else.
One million GitHub READMEs
Classify, summarize, and search 1.2M READMEs with deterministic heuristics running across a Burla cluster.
Hospital Price Reality Check
Parse 5,162 US hospital MRFs across 1,040 parallel CPUs and explore the chargemaster spread for 361 standard codes across 3,400 hospitals.
Heavy workloads
These examples focus on compute patterns that get painful on one machine: GPU batches, large media jobs, geospatial data, and scientific pipelines.
GPU embeddings
Embed 50K Wikipedia articles on A100s and query the vectors.
Image resize
Resize millions of images in parallel without a dedicated image platform.
Genome alignment
Run BWA-MEM over many FASTQ files with one Python fan-out.
GDAL raster jobs
Process geospatial rasters across a large Burla cluster.
Batch inference
Run CPU and GPU inference jobs without standing up a serving layer.
NOAA rain extremes
Scan 3.18B weather rows to find the rainiest day on record.
Everyday production patterns
Reusable shapes for common work: API jobs, ETL, Pandas, Parquet, scraping, and simulations.
Parallel scraping
Scrape thousands of pages concurrently from plain Python.
ETL without Airflow
Split Python ETL into simple tasks instead of a scheduler DAG.
Rate-limited APIs
Run huge API jobs while keeping concurrency under control.
Pandas apply
Scale slow row-wise functions without rewriting your DataFrame logic.
Parquet fan-out
Process one file per worker across thousands of Parquet shards.
Monte Carlo
Run independent simulations across 1,000+ cores from one script.