The last time I posted here, people had to book a call with us in order to access Artie. Today, that’s no longer the case. You can now connect your source and destination and start streaming immediately.
I spent years of my career building large-scale data pipelines and experienced how difficult it was to get real-time data firsthand. I believed there must be a better way to stream data into our warehouse, which resulted in Artie being born. And now with AI agents, reducing data latency has become more and more crucial as agents need to make decisions off of fresh data.
When I first started building Artie, I quickly learned that the components meant to keep CDC running smoothly are very much bolted on with tons of edge cases. Unfortunately in practice, they were not built to work together. We ended up dealing with schema drift, backfill race conditions, Kafka offset commits, and TOAST columns. I’d love to know if others have hit these same issues while building in-house.
artie.com, would love feedback!
TOAST columns: Artie has automatic detection built in. If a TOAST column hasn't changed, its value won't appear in the WAL. Artie detects this and skips the update for that column in the destination. This works without needing to set REPLICA IDENTITY FULL on your tables.
Schema drift: Artie never requires a schema registry. For relational sources like Postgres, Artie reads the source schema directly and syncs new columns immediately. For DDL changes, Artie uses lazy schema evaluation. On the next DML event for the table, it compares source vs. destination schema and applies any outstanding changes before writing the row.
Let me know if you have any other questions!