Data Tools
A set of Cloud Run tools for Airflow-driven pipelines that download, unzip, and clean data into GCS
Cleans and normalizes CSVs read from a mounted path, optionally renames headers, makes names SQL‑friendly, removes specified values, validates or infers column types, writes the cleaned CSV to GCS, and emits a matching BigQuery schema JSON (with optional head export and "nil transform" JSON).
Downloads a file from a source URL to a path on a mounted filesystem. Designed to be used within GCP Cloud Run Jobs, and triggered by Airflow.
Unzips a file from a mounted filesystem and uploads its contents to Google Cloud Storage under a specified object prefix. Designed to be used within GCP Cloud Run Jobs, and triggered by Airflow.
Our team of professional data engineers is here to help you get the most out of your data pipeline. Reach out today.