Announcing SDF General Availability
A platform for code-backed, scalable, and correct Data Warehouses.
SDF extracts SQL compilers from their clouds and puts them on your laptop with Executable Semantics to power the next generation of data developer tooling.
Introduction
We are thrilled to announce SDF’s public beta and share our vision for a transformational shift in data development towards Semantic Data Warehouses which are code-backed, scalable and correct.
🌱 You can find installation instructions here, and join our new Slack community here.
The Missing Link: Executable Semantics in SQL
In software engineering, formal semantics governs most tooling. Compilers implement precise language definitions to enable expressive error reporting, contextual intellisense, dependency analysis, linting and type checking, privacy and security analysis, optimized code generation and more. It doesn’t matter if the code is Rust, C, Java, or Typescript; all modern software development tooling and environments use type-aware compilation as the foundation for great developer experiences and robust software. That software tooling, in turn, is designed to optimize the inner loop of software engineering, from ideation to execution.
But in data, almost every vendor has created a unique dialect of SQL with opaque compilers that are only accessible remotely, through web GUIs or APIs. That’s why most SQL dialects are fundamentally incompatible with each other, and why any SQL editor needs a database connection to function. There is no accessible language definition and no local compilation. This makes for a frustrating development experience for engineers, and leads to vendor lock-in for companies.
SDF extracts SQL compilers from their clouds and puts them on your laptop to power the next generation of data developer tooling.
We’ve initially specialized in 4 SQL dialects: Snowflake, Trino, Redshift, and Bigquery, with other dialects and connectors in progress. We’ve analyzed millions of unique SQL queries ranging from one liners to generated queries with tens of thousands of columns.
Our goal is to understand these proprietary SQL dialects so deeply that SDF can analyze, verify and ultimately execute them!
We call this depth of understanding Executable Semantics and it forms the basis of SDF’s transformation layer and SDF DB, our built in analytical query engine.
What is SDF?
SDF is a multi-dialect SQL compiler, transformation framework, and analytical database engine packaged into one CLI.
Since SDF is vertically integrated, it can provide stronger guarantees, faster feedback, and a more seamless developer experience. When working locally, SDF is both the database and transformation layer, ensuring validity. It natively compiles SQL dialects, like Snowflake, and connects to their corresponding data warehouses to materialize models.
Regardless of which database you work with, at any point the entirety of the Data Warehouse is fully defined and statically analyzed as code. This makes SDF more like the build systems for TypeScript or Rust, rather than a traditional data development tool.
(1) Today’s Standard for Transformation: DBT
Today’s state of the art for data transformation is DBT. With DBT, models are authored locally. There is configuration (yml) and code-generation (jinja) which DBT ingests and processes, but has no ability to validate. DBT then issues a request to your cloud database which proceeds to compile the SQL and ultimately execute the query. Errors in your query can only be found by the cloud database, a time-consuming and often expensive process.
(2) SDF: The Power of Local Compilation
With SDF’s Transformation layer driven by Executable Semantics, queries are fully compiled locally. SDF first reads your cloud database’s state and then statically analyzes all configuration (macros, metadata), queries (sql), and dataflows (classifiers) described in code, providing powerful compile time guarantees. During development, compile time guarantees prevent breaking changes with real-time impact analysis and precise error reporting. SDF also integrates a native Type System for SQL (driven by classifiers) which, like TypeScript, enables developers to express business logic as statically verifiable code. As such, only queries that have been validated both syntactically and semantically are run against the database. Developers move faster and compute costs go down as a byproduct.
(3) SDF DB: The Power of Composable Execution
SDF DB is a high performance vectorized query engine built into SDF. SDF DB’s unique capability is to emulate other SQL dialects; both their semantics and their functions (starting with Trino).
Coupled with open data formats like Iceberg, SDF DB aims to free enterprises from vendor lock by providing intelligent federated execution between SDF’s transformation layer and your data cloud provider. We call this ability Composable Execution. Ultimately, SDF’s vision is to enable execution of the same query against proprietary compute providers (like Snowflake, BigQuery, and Redshift) and SDF DB; with both query engines yielding the same result.
Getting Started
SDF is a next generation transformation layer and execution engine designed to improve the data engineering experience with:
Compile Time Guarantees powered by SDF’s static analysis and
Composable Execution on SDF DB or an accompanying cloud provider
SDF is now in public Beta, with SDF DB in public Alpha.
See SDF In Action
SDF is free to use and has an open source core: Datafusion and sql-functions.
All feedback is welcome: bugs, integration requests, macro updates, and new libraries.
We look forward to hearing from you!
Thank You
A lot has been wrapped into SDF, and we’ve learned so much from our customers along the way. We’d like to thank the incredible engineers at Patreon, Deel, Classdojo, Cybersyn, Obie Insurance, Linqto and others at our partner F500 companies. SDF would not be here without you. Thank you. ♥
- SDF Labs Team
Terms:
Executable Semantics: Provides a precise and unambiguous specification of how a SQL query should behave, allowing for the automatic execution of the described behavior. This concept is used to bridge the gap between the abstract mathematical descriptions of semantics and practical implementations. SDF implements executable semantics for multiple distinct SQL dialects to formalize semantic behavior, function behavior, and metadata behavior. This enables SDF to both verify SQL statements, and execute them.
Composable Execution: the concept that a given SQL statement may be executable by more than one runtime (database). Coupled with open data formats like Iceberg, this allows the transformation layer (or users) to optimally allocate compute for a given query.
Static Analysis: the examination of code without executing it. It involves analyzing the source code or the compiled code to identify potential errors, security vulnerabilities, and other quality issues.
Formal Semantics: in software engineering, this refers to the rigorous mathematical description of the meaning (semantics) of programs and programming languages. This approach provides a framework for understanding and reasoning about the behavior of software systems in a precise and unambiguous way. Formal semantics are essential for proving properties about programs, such as correctness, safety, and security.
Compile Time Guarantees: assertions and validations that can be made purely based on code and configuration. In SDF, that code is SQL or jinja and configuration is described in YML or provided by a remote database.
Runtime Time Guarantees: assertions that can be validated only during or after a query is executed. These are typically formulated as expectations on resulting columns.
Congratulations!
Congrats! I am super excited to see where this is headed!