scanned

A Burla demo · 1,000 CPUs · reviews parsed

Every Amazon review is a cry for help.
We distilled the loudest ones.

We streamed 275 GB of raw Amazon reviews across 34 categories through a Burla cluster with 1,000 workers in parallel, and ranked them by seven flavors of unhinged: profanity, screaming, punctuation bombs, short-brutal one-liners, full-blown rants, five-star obscene, and five-star silent.

Flip Unhinged Mode to swap the wall for the rawest content we found. Every f-bomb, censored slur, and full-caps meltdown three map-reduce passes could surface. Nothing sanitized. Reader discretion strongly advised.

reviews parsed
reviews w/ profanity
categories
global profanity rate
1,000Burla workers
275 GBdata streamed
⚠️ Unhinged Mode active. Content below contains profanity, slurs, and severe language. All slurs are auto-redacted with category badges. Toggle off up top to return to the normal Wall.

The Wall of Rants

Loading…

Browse by category

34 categories, ranked by how many reviews contain profanity. Click one to see the top 30 most fucked-up reviews in that category.

What we found

Nine ways Amazon reviews go off the rails.

Methodology

How the distillation works. Reproducible on any Burla cluster.

Data

McAuley-Lab / Amazon-Reviews-2023, the raw JSONL review files (275 GB across 34 categories, 2023 snapshot). Streamed directly from Hugging Face via HTTP Range requests, no local download.

Pipeline

  1. Plan. Split each review file into ~500 MB byte-range chunks (~550 chunks total).
  2. Scatter. Dispatch chunks to Burla cluster (up to 1,000 parallel workers).
  3. Map. Each worker streams its byte range, line-aligns, parses JSONL, scores each review on seven signals, keeps top-K per signal in heaps. Writes one JSON per chunk.
  4. Reduce. One Burla worker per category merges all shard outputs into one per-category JSON (heaps merged, rating histograms summed).
  5. Analyze. Client builds findings + Wall + per-category pages.

Scoring

Rule-based, no LLM: word lists for strong/medium/mild profanity, caps-ratio for screaming, consecutive-exclamation-run for punctuation bombs, length × rage for rants, short × profanity for one-liner brutality, rating × profanity for five-star obscene, rating × length-zero for five-star silent.

Unhinged Mode (three-pass pipeline)

The first pass surfaced too many soft "crap" reviews, so we re-ran the full 275 GB / 571 M review corpus on 1,000 CPUs in parallel, twice more. Pass two targeted a hard-profanity lexicon (~14 strong roots) and produced 496,668 hits. Pass three added a censor-aware lexicon (asterisk, symbol, and zero-width variants) plus a categorized slur taxonomy (racial, homophobic, ableist, xenophobic, gendered), and classified each hit as deploy / quote_crit / reclaim / ambiguous. Results were rescored locally to drop proper-noun and idiom false positives (TV shows, pet breeds, brand names) and boost angry physical-product rants. The two hard passes were merged and deduped into the single Unhinged corpus you see when the toggle is on.

Source

Open source on GitHub. Reproduce this on your own cluster in ~15 minutes with one burla run scale.py.

Warning

These are real reviews written by real people on Amazon. Unhinged Mode unlocks content that includes profanity, racial slurs, homophobic slurs, ableist slurs, and descriptions of regret. Slur tokens are auto-redacted with category badges. The raw strings are never displayed. If that's not your thing, leave the toggle off, or close this tab.

Credits

Dataset: McAuley Lab, UCSD. Pipeline: Burla. UI: plain HTML + CSS + JS, zero frameworks.