Svoboda | Graniru | BBC Russia | Golosameriki | Facebook

Keynotes


June 26th

Volker Markl

Technical University of Berlin, Germany

Keynote title: NebulaStream – Data Stream Processing in Massively Distributed Heterogeneous Environments

Abstract

Modern data-driven applications arising in such domains as smart manufacturing, healthcare, and the Internet of Things, pose new challenges to data processing systems. Traditional stream processing systems, such as Flink, Spark, or Kafka Streams are ill-suited to cope with the massive scale of distribution, the heterogeneous computing landscape, and the requirement for timely processing and actuation. Classical approaches like managed runtimes, interpretation-based query processing, and the optimization of single queries that neglect interactions, greatly limit throughput, latency, energy-efficiency, and the general usability of these systems for emerging applications involving distributed data processing at scale in a sensor-edge-cloud-environment. At BIFOLD / TU Berlin, we are researching and building NebulaStream, a novel data-stream processing system for massively distributed, heterogeneous environments. NebulaStream supports (potentially resource-constrained) heterogeneous devices, a hierarchical topology (with the distribution of computation and data flow in a cloud-edge-continuum), and the sharing of computations and data across multiple concurrent queries. The key distinguishing features of NebulaStream from a technological perspective, include the following. (1) An incremental and continuous query optimizer that considers the sharing of computation and intermediate results in conjunction with the placement of operations in a massively distributed, heterogeneous cloud-edge continuum. (2) A compilation-based approach for streaming queries, which avoids the need for managed runtimes and ensures excellent throughput, latency, and energy-efficiency across the board, from small embedded devices to powerful processors. (3) A distributed runtime that supports on-demand in-network processing on a hierarchical topology of heterogeneous devices in an efficient and fault-tolerant way. In this talk, we will describe several challenges arising due to novel applications and architectures for distributed data stream processing. We will present NebulaStream, an innovative open-source system, currently being built to address these challenges. In addition, we will describe NebulaStream’s design principles, architecture, performance, application scenarios, as well as the current status of the open-source development.

Biography

Volker Markl is a Full Professor and Chair of the Database Systems and Information Management (DIMA) Group at the Technische Universität Berlin (TU Berlin). At the German Research Center for Artificial Intelligence (DFKI), he is Chief Scientist and Head of the Intelligent Analytics for Massive Data Research Group. In addition, he is Director of the Berlin Institute for the Foundations of Learnig and Data (BIFOLD), a merger of the Berlin Big Data Center (BBDC) and the Berlin Center for Machine Learning (BZML). BIFOLD is one of Germany’s national Competence Centers for Artificial Intelligence and will further bolster ongoing collaborative research in scalable data management and Machine Learning. Dr. Markl is a database systems researcher conducting research at the intersection of of distributed systems, scalable data processing, text mining, computer networks, machine learning, and applications in healthcare, logistics, Industry 4.0, and information marketplaces. Earlier in his career, he was a Research Staff Member and Project Leader at the IBM Almaden Research Center in San Jose, California, USA and a Research Group Leader at FORWISS, the Bavarian Research Center for Knowledge-based Systems located in Munich, Germany. Volker Markl is a computer science graduate from Technische Universität München, where he earned his Diploma in 1995 with a thesis on exception handling in programming languages. He earned his PhD in 1999 the area of multidimensional indexing under the supervision of Rudolf Bayer. Volker Markl has published numerous scholarly papers on indexing, query optimization, lightweight information integration, and scalable data processing at prestigious venues. He holds 18 patents, has transferred technology into several commercial products, and has been involved in two successful startup exits. He has been both the Speaker and Principal Investigator for the Stratosphere Project, which resulted in a Humboldt Innovation Award as well as Apache Flink, the open-source big data analytics system. He currently serves as the President of the VLDB Endowment and was elected as one of Germany’s leading Digital Minds (Digitale Köpfe) by the German Informatics (GI) Society. Volker also is a member of the Scientific Advisory Board of Software AG. Most recently, Volker and his team earned the ACM SIGMOD 2020 Best Paper Award, for their work on "Pump Up the Volume: Processing Large Data on GPUs with Fast Interconnects“.


June 27th

Tyler Akidau

Snowflake Inc., USA

Keynote title: Simplicity and Elegance in Stream Processing: A Five Year Odyssey

Abstract

At DEBS 2019, I had the opportunity to speak about my take on the open problems in stream processing at the time. Now five years later, I’m happy to have the opportunity to return and talk about the investments we’ve made on those problems at Snowflake in the years since. Using those seven open problems as a framework (Graceful Evolution, Operational Ease of Use, SQL, Formal Semantics, Latency ↔ Cost ↔ Correctness, Batch + Streaming Interoperability, Database-style optimizations), I’ll discuss the areas where we’ve made good progress, both at Snowflake and across the industry as a whole, as well as the areas where a substantial amount of work remains. Much of the talk will center around Dynamic Tables, Snowflake’s declarative batch+streaming transformation primitive that is the centerpiece of our streaming offerings. Designed to hide the complexity of stream processing under the simple but powerful interface of a SQL query and a target lag, Dynamic Tables deliver the promise of truly unified batch and stream processing in an easy to use, accessible, and operationally hands off packaging. A truly remarkable feat of engineering, I’ll show how Dynamic Tables have helped move the needle for each of the seven open problems from my 2019 talk. In addition, I will touch upon other pieces of the Snowflake streaming portfolio, such as our streaming ingestion service, Snowpipe Streaming; talk briefly about our time spent collaborating on the noble experiment that was the ill-fated SQL Standards Expert Group on Streaming; and give a glimpse of some of the more forward looking efforts we’re actively working on now. By the end, I hope to convey the optimism we at Snowflake all feel regarding the progress made, and the opportunities remaining, in this fascinating field of streaming data.

Biography

Tyler Akidau has spent the better part of the last two decades working on and opining about large scale distributed stream processing. Best known as the author of the seminal Streaming 101 and Streaming 102 blog posts, as well as the O’Reilly Streaming Systems book, his true passion lies in helping build and lead talented teams of exceptional engineers to pragmatically push forward the state of the art. He is currently a Distinguished Software Engineer at Snowflake, helping drive the streaming agenda there, amongst other efforts. He’s also proud to be the co-author on a number of industrial track conference publications, the most recent of which being the 2023 SIGMOD paper, What’s the Difference? Incremental Processing with Change Queries in Snowflake.


June 28th

Evangelia Kalyvianaki

University of Cambridge, UK

Keynote title: Distributed scheduling in modern data centers: to optimize or not?

Biography

Dr Evangelia (Eva) Kalyvianaki is an Associate Professor in the Department of Computer Science and Technology (CST) at the University of Cambridge where she co-leads the Systems Research Group (SRG). She is also the vice-chair of the European Chapter of ACM SIGOPS (EuroSys). She was an Associate Editor for IEEE/ACM Transactions of Networking (ToN) journal a Fellow at the Alan Turing Institute (2018-2021). Dr Kalyvianaki’s research interests span the areas of cloud computing, resource management, big data processing, distributed systems and systems in general. She has publications in top-tier leading conferences in systems (USENIX ATC), in data management and database systems (SIGMOD, ICDE), autonomic computing (ICAC, TAAS) and control theory (CDC, ToSC, ECC). She and her co-authors have received  the 2023 ACM SIGMOD Test-of-Time Award for their paper entitled “Integrating Scale Out and Fault Tolerance in Stream Processing using Operator State Management”. Her past work on novel cloud pricing models was featured in “The Register”. Over the years, Dr Kalyvianaki and her collaborators have received significant funding from the UKRI and the industry; e.g., she was co-awarded the 2014 VMware Systems Research Award. She has an extensive track record of editorial work and top systems’ conference paper and journal reviewing. At CST, she teaches Cloud Computing and Operating Systems, and she was the Deputy Director of Postgraduate Education on Researcher Development and supervises several PhDs students and BSc and award-winning MSc projects. She is currently on sabbatical academical leave from CST and she is working at Meta at Magnit.


Important Dates

Events Dates (AoE)
Research Papers
Abstract Submission February 16th, 2024 February 23rd, 2024
Paper Submission February 22nd, 2024 March 4th, 2024
Notification April 12th, 2024
Final Decision May 13th, 2024
Camera Ready May 24th, 2024
Submission Dates
Industry and Application Paper Submission March 25th, 2024 April 5th, 2024
Doctoral Symposium Submission May 27th, 2024
Poster and Demo Paper Submission May 8th, 2024 May 21st, 2024
Notification Dates
Author Notification Industry and Application Track April 27th, 2024 May 5th, 2024
Author Notification Doctoral Symposium June 3rd, 2024
Author Notification Poster & Demo May 22nd, 2024 June 3rd, 2024
Camera Ready
Camera Ready for Industry and Application Track May 24th, 2024
Camera Ready for Doctoral Symposium June 10th, 2024
Camera Ready for Poster & Demo May 29th, 2024 June 10th, 2024
Conference
Conference June 25th–28th 2024