With CueLake, you can use SQL to build ELT (Extract, Load, Transform) pipelines on a data lakehouse. You write Spark SQL statements in Zeppelin notebooks. You then schedule these notebooks using workflows (DAGs). To extract and load incremental data, you write simple select statements. CueLake executes these statements against your databases and then merges incremental data into your data lakehouse (powered by Apache Iceberg). To transform data, you write SQL statements to create views and tables in your data lakehouse. CueLake uses Celery as the executor and celery-beat as the scheduler. Celery jobs trigger Zeppelin notebooks. Zeppelin auto-starts and stops the Spark cluster for every scheduled run of notebooks.

Features

  • Upsert Incremental data
  • Create Views in data lakehouse
  • Elastically Scale Cloud Infrastructure
  • Automated maintenance of Iceberg tables
  • Versioning in Github
  • Your data always stays within your cloud account

Project Samples

Project Activity

See All Activity >

Categories

Data Pipeline

License

Apache License V2.0

Follow CueLake

CueLake Web Site

Other Useful Business Software
Taking the Paper Out of Work Icon
Taking the Paper Out of Work

For organizations that need powerful ECM and document automation software

The Square 9 AI-powered intelligent document processing platform takes the paper out of work and makes it easier to get things done with digital workflows.
Learn More
Rate This Project
Login To Rate This Project

User Reviews

Be the first to post a review of CueLake!

Additional Project Details

Programming Language

JavaScript

Related Categories

JavaScript Data Pipeline Tool

Registered

2023-06-12