Bazel Spark
Developing an automated setup tool for local pyspark projects
DATA SCIENCE
7/28/20241 min read
Pyspark and Delta-Lake are very powerful tools, but the requirement of working with them via DataBricks can be restrictive for some projects. Although the two building blocks (PySpark & Delta-Lake are open source) their setup & implementation can be tricky to get right. This project aims to fix that using Bazel an automated application setup & testing framework.
Goals & Requirements
Streamlined, One-Click Setup
Environments setup using anaconda
Mutliple distinct environments can be supported on a single machine
Support for S3 data storage
Platform-Agnostic installation (Mac/Windows/Linux) -- [stretch goal]
UI for better UX -- [stretch goal]
As of Summer 2024, the workflow is still in development, but initial proofs of concept have been successful. Eventually i'd like to migrate this into a more user friendly setup with a yaml file interface, or maybe a graphical UI generated with panel.
Theres not all that much to "show" as a result here, but give it a try!

