Have you ever wondered how the performance of Polars Deltalake on Azure compares to a consumer grade laptop?
No? Well, I have. If I have sparked your curiosity, read on.
Here are the contenders
See Pricing for a full list of available app service plans.
The test measures three scenarios
The code is executed via REST API endpoints:
On the HP EliteBook I used func start to launch https://localhost:7071.
To publish to Azure I, followed the instructions from https://learn.microsoft.com/en-us/azure/azure-functions/create-first-function-cli-python
to set up the necessary development environment. This allowed me to publish the function via
func azure functionapp publish function-hekori-learning-002.
I used terraform to set up the Azure resources in the North Europe region.
Here is a code snippet showing the code executed when visiting https://function-hekori-learning-002.azurewebsites.net/api/polars/azure/read
@app.route(route="polars/azure/read", auth_level=func.AuthLevel.ANONYMOUS) def polars_azure_read(req: func.HttpRequest) -> func.HttpResponse: logging.info('Reading from delta table') tic = time.time() df = pl.read_delta(AZURE_STORAGE_PATH, storage_options=storage_options ) df = df.sql( "select sum(value) as sum, avg(value) as mean, count() as count, name from self group by name order by sum asc" ) toc = time.time() logging.info(f"Elapsed time {toc - tic:.2f} seconds") return func.HttpResponse( "Success from polars." + str(df) + '\n' + "Elapsed time " + str(toc - tic) + " seconds", status_code=200 )
As one can see the HP EliteBook is roughly one order of magnitude faster in all scenarios.
This is my personal interpretation
Please note that the delta table has a small size of 3 commits and 2 parquet files. I.e., the runtime effectively measure the overhead of the file access from the compute unit.
If you ❤️ this article and want to see more benchmark results with larger datasets for out of core processing give this article a ?
and subscribe ? to my channel ???.
The above is the detailed content of Polars Delta Lake: Azure Function vs. Laptop on Small Data. For more information, please follow other related articles on the PHP Chinese website!