150 Western Avenue

#CS Special Talk

Attend In Person or Virtually

Data systems are everywhere. A data system is a collection of data structures and algorithms working together to achieve complex data processing tasks. For example, with data systems that utilize the correct data structure design for the problem at hand, we can reduce the monthly bill of large-scale data applications on the cloud by hundreds of thousands of dollars. We can accelerate data science tasks by dramatically speeding up the computation of statistics over large amounts of data. We can train drastically more neural networks within a given time budget, improving accuracy. However, knowing the right data system design for any given scenario is a notoriously hard problem; there is a massive space of possible designs, while no single design is perfect across all data, queries, and hardware contexts. In addition, building a new system may take several years for any given (fixed) design. 

 

We will discuss our quest for the first principles of data system design. We will show that it is possible to reason about this massive design space. This allows us to create a self-designing data system that can take drastically different shapes to optimize for the workload, hardware, and available cloud budget using a grammar for data systems. These shapes include data structure, algorithms, and overall system designs which are discovered automatically and do not (always) exist in the literature or industry, yet they can be more than 10x faster.