Use Cases
Process data in your database quickly.
A way to process database rows in parallel by splitting work into ID ranges.
Process data in your database quickly.
If your table has millions of rows, one long query loop is usually slow.
A simple faster pattern is:
- split rows into many ID ranges
- process each range in parallel
- combine range results
This pattern works best with a numeric column you can split into ranges, such as an indexed id column.
Before you start
Make sure you have already:
- installed Burla:
pip install burla - connected your machine:
burla login - started your cluster in the Burla dashboard
For this example, also install a PostgreSQL driver:
pip install psycopg2-binary
Step 1: Decide your row ranges
Start with ranges that do not overlap.
def build_id_ranges(start_id, end_id, rows_per_range):
return [
(range_start_id, min(range_start_id + rows_per_range - 1, end_id))
for range_start_id in range(start_id, end_id + 1, rows_per_range)
]
id_ranges = build_id_ranges(start_id=1, end_id=100_000, rows_per_range=10_000)
Step 2: Write one function that processes one range
Each function call opens its own database connection and handles one ID range.
import psycopg2
def process_id_range(id_range):
start_id, end_id = id_range
with psycopg2.connect(
host="localhost",
dbname="app",
user="app",
password="app",
) as connection:
with connection.cursor() as cursor:
cursor.execute(
"SELECT amount FROM orders WHERE id BETWEEN %s AND %s",
(start_id, end_id),
)
amounts = [row[0] for row in cursor.fetchall()]
return {"row_count": len(amounts), "total_amount": float(sum(amounts))}
Step 3: Run all ranges in parallel
Pass the list of ranges to remote_parallel_map.
from burla import remote_parallel_map
range_results = remote_parallel_map(process_id_range, id_ranges)
Step 4: Combine the range results
Now compute one final total from all range outputs.
total_rows = sum(range_result["row_count"] for range_result in range_results)
total_amount = sum(range_result["total_amount"] for range_result in range_results)
print(f"Total rows processed: {total_rows}")
print(f"Total amount: {total_amount}")
Step 5: Run a small test before the full job
Always test first with a small ID window.
from burla import remote_parallel_map
small_test_ranges = build_id_ranges(start_id=1, end_id=5_000, rows_per_range=1_000)
remote_parallel_map(process_id_range, small_test_ranges)
After small tests succeed, run your full range list.