Remix
Remix is a two-way sync tool, that makes replicating canonical data models between various data systems
Remix is a free, open-source data streaming solution, written in Golang. It has three main objectives:
Remix features a "fan-in, fan-out" architecture when replicating data. It pulls in data, transforms it to fit into predefined models that you define, and puts objects in a queue. Then, for each system that you want to push to, it transforms the objects to a format that the target system will accept, and pushes to it.
We call the process of transforming the data "remixing". Right now, remixing simply renames fields, changes data types, and builds idempotent upsert / delete commands.
Just remember, data is remixed on the way in to fit your canonical data models, then remixed on the way out to be accepted by target systems.
AI thrives on clean, standardized datasets. Remix facilitates the creation of those datasets by forcing you to define data models in a standardized way.
Once you've defined those models, Remix replicates data (which conforms to those models) to any number of storage systems. Depending on your needs, you might replicate data to traditional databases, a data warehouse, vector databases, or even systems used to serve large-scale AI training runs like VAST, Databricks, or S3.
Models and remixing logic are defined with JSON Schema and YAML, respectively. This declarative nature makes it easy to keep your models and transformation logic in source control, such as Github, and works great with automated deployment systems like Terraform or Ansible.
JSON Schema is the most widely accepted format for defining shared, canonical data models. Because the hardest part about data modeling is getting everyone to agree and conform, it's important to use a popular, interchangeable format with a good tool ecosystem. JSON Schema's tool ecosystem is unmatched, and popular use cases include:
We use JSON Schema to validate models in Remix because (1) of course there is already a high-quality tool to do that and (2) the format's portability enables you to re-use those models in other systems.
Remix is a new tool and is being actively developed.
Would you like a certain integration or feature built? SQLpipe, the company behind Remix, offers service packages that allow you to influence the roadmap. Turnaround time can be as fast as a few weeks.
Currently, Remix is single-node software with no outside data storage dependencies. However, it has been designed in a way that facilitates being deployed in a distributed fashion, eg with Kubernetes.
As of right now, it keeps active, validated objects in a queue in RAM. There are two additional storage / cooperation features that will be added:
Our team of professional data engineers is here to help you get the most out of your data pipeline. Reach out today.