Years of planning have gone into preparing for ICPE 2024 in London, UK. For the first time in UK, the organization of ICPE has generated a great deal of excitement and expectation of productive interactions between the usual participants of ICPE conferences, the members of the various SPEC working groups, and a desire to increase the involvement of the local scientific community with ICPE.
It is our pleasure to welcome you to the 15th ACM/SPEC International Conference on Performance Engineering (ICPE), hosted at South Kensington, London, UK, from May 7-11, 2024. ICPE is the leading international forum for presenting and discussing novel ideas, innovations, trends and experiences in the field of performance engineering.
ICPE formed from merging the ACM Workshop on Software Performance (WOSP, since 1998) and the SPEC International Performance Engineering Workshop (SIPEW, since 2008). Despite the peculiar time we are all living in around the world, we are pleased to introduce an exciting program, which is the result of hard work by the authors, the program committee, and the conference organizers.
Proceeding Downloads
How the Cloud made Performance Appear on the Board Agenda
Cloud spending is growing! Gartner predicts a 20% surge to 678.8 billion in 2024, making it a top expense after personnel for many organisations. In fact, 78% of US businesses and 54% in EMEA already leverage the cloud for diverse needs, from ...
ShuffleBench: A Benchmark for Large-Scale Data Shuffling Operations with Distributed Stream Processing Frameworks
Distributed stream processing frameworks help building scalable and reliable applications that perform transformations and aggregations on continuous data streams. This paper introduces ShuffleBench, a novel benchmark to evaluate the performance of ...
Vectorized Intrinsics Can Be Replaced with Pure Java Code without Impairing Steady-State Performance
Several methods of the Java Class Library (JCL) rely on vectorized intrinsics. While these intrinsics undoubtedly lead to better performance, implementing them is extremely challenging, tedious, error-prone, and significantly increases the effort in ...
Rethinking 'Complement' Recommendations at Scale with SIMD
Maximizing cart value by increasing the number of items in electronic carts is one of the key strategies adopted by e-commerce platforms for optimal conversion of positive user intent during an online shopping session. Recommender systems play a key-role ...
An Adaptive Logging System (ALS): Enhancing Software Logging with Reinforcement Learning Techniques
The efficient management of software logs is crucial in software performance evaluation, enabling detailed examination of runtime information for postmortem analysis. Recognizing the importance of logs and the challenges developers face in making ...
Time Series Forecasting of Runtime Software Metrics: An Empirical Study
Software applications can produce a wide range of runtime software metrics (e.g., number of crashes, response times), which can be closely monitored to ensure operational efficiency and prevent significant software failures. These metrics are typically ...
An Empirical Analysis of Common OCI Runtimes' Performance Isolation Capabilities
Industry and academia have strong incentives to adopt virtualization technologies. Such technologies can reduce the total cost of ownership or facilitate business models like cloud computing. These options have recently grown significantly with the rise ...
An Experimental Setup to Evaluate RAPL Energy Counters for Heterogeneous Memory
Power consumption of the main memory in modern heterogeneous high-performance computing (HPC) constitutes a significant part of the total power consumption of a node. This motivates energy-efficient solutions targeting the memory domain as well. ...
Using Evolutionary Algorithms to Find Cache-Friendly Generalized Morton Layouts for Arrays
- Stephen Nicholas Swatman,
- Ana-Lucia Varbanescu,
- Andy D. Pimentel,
- Andreas Salzburger,
- Attila Krasznahorkay
The layout of multi-dimensional data can have a significant impact on the efficacy of hardware caches and, by extension, the performance of applications. Common multi-dimensional layouts include the canonical row-major and column-major layouts as well as ...
Energy Efficiency Features of the Intel Alder Lake Architecture
The continuous evolution of processors requires vendors to translate ever-growing transistor budgets into performance improvements, e.g., by including more functional units, memory controllers, input/output (I/O) interfaces, graphics processing units (...
Developing Index Structures in Persistent Memory Using Spot-on Optimizations with DRAM
The emergence of persistent memory (PMem) is greatly impacting the design of commonly used data structures to obtain the full benefit from the new technology. Compared to the DRAM, PMem's larger capacity and lower cost make it an attractive alternative ...
What does Performance Mean for Large Language Models?
In the last decade there has been a significant leap in the capability of foundation AI models, largely driven by the introduction and refinement of transformer-based machine learning architectures. The most visible consequence of this has been the ...
InstantOps: A Joint Approach to System Failure Prediction and Root Cause Identification in Microserivces Cloud-Native Applications
- Raphael Rouf,
- Mohammadreza Rasolroveicy,
- Marin Litoiu,
- Seema Nagar,
- Prateeti Mohapatra,
- Pranjal Gupta,
- Ian Watts
As microservice and cloud computing operations increasingly adopt automation, the importance of models for fostering resilient and efficient adaptive architectures becomes paramount. This paper presents InstantOps, a novel approach to system failure ...
Daedalus: Self-Adaptive Horizontal Autoscaling for Resource Efficiency of Distributed Stream Processing Systems
To maintain a stable Quality of Service (QoS), these systems require a sufficient allocation of resources. At the same time, over-provisioning can result in wasted energy and high operating costs. Therefore, to maximize resource utilization, autoscaling ...
Demeter: Resource-Efficient Distributed Stream Processing under Dynamic Loads with Multi-Configuration Optimization
Distributed Stream Processing (DSP) focuses on the near real-time processing of large streams of unbounded data. To increase processing capacities, DSP systems are able to dynamically scale across a cluster of commodity nodes, ensuring a good Quality of ...
BFQ, Multiqueue-Deadline, or Kyber? Performance Characterization of Linux Storage Schedulers in the NVMe Era
Flash SSDs have become the de-facto choice to deliver high I/O performance to modern data-intensive workloads. These workloads are often deployed in the cloud, where multiple tenants share access to flash-based SSDs. Cloud providers use various ...
The Cost of Simplicity: Understanding Datacenter Scheduler Programming Abstractions
Schedulers are a crucial component in datacenter resource management. Each scheduler offers different capabilities, and users use them through their APIs. However, there is no clear understanding of what programming abstractions they offer, nor why they ...
Accelerating ML Workloads using GPU Tensor Cores: The Good, the Bad, and the Ugly
Machine Learning (ML) workloads generally contain a significant amount of matrix computations; hence, hardware accelerators for ML have been incorporating support for matrix accelerators. With the popularity of GPUs as hardware accelerators for ML, ...
MalleTrain: Deep Neural Networks Training on Unfillable Supercomputer Nodes
First-come first-serve scheduling can result in substantial (up to 10%) of transiently idle nodes on supercomputers. Recognizing that such unfilled nodes are well-suited for deep neural network (DNN) training, due to the flexible nature of DNN training ...
Leftovers for LLaMA
n recent years, large language models (LLMs) have become pervasive in our day-to-day lives, with enterprises utilizing their services for a wide range of NLP-based applications. The exponential growth in the size of LLMs poses a significant challenge for ...
Processing Natural Language on Embedded Devices: How Well Do Modern Models Perform?
- Souvika Sarkar,
- Mohammad Fakhruddin Babar,
- Md Mahadi Hassan,
- Monowar Hasan,
- Shubhra Kanti Karmaker Santu
Voice-controlled systems are becoming ubiquitous in many IoT-specific applications such as home/industrial automation, automotive infotainment, and healthcare. While cloud-based voice services (\eg Alexa, Siri) can leverage high-performance computing ...
Optimizing Edge AI: Performance Engineering in Resource-Constrained Environments
Recent years have witnessed the growth of Edge AI, a transformative paradigm that integrates neural networks with edge computing, bringing computational intelligence closer to end users. However, this innovation is not without its challenges, especially ...
TBASCEM - Tight Bounds with Arrival and Service Curve Estimation by Measurements
This paper aims to solve the challenge of quantifying the perfor- mance of Hardware-in-the-Loop (HIL) computer systems used for data re-injection. The system can be represented as a multiple queue and server system that operates on a First-In, First-Out (...
A Learning-Based Caching Mechanism for Edge Content Delivery
With the advent of 5G networks and the rise of the Internet of Things (IoT), Content Delivery Networks (CDNs) are increasingly extending into the network edge. This shift introduces unique challenges, particularly due to the limited cache storage and the ...
Function Offloading and Data Migration for Stateful Serverless Edge Computing
Serverless computing and, in particular, Function-as-a-Service (FaaS) have emerged as valuable paradigms to deploy applications without the burden of managing the computing infrastructure. While initially limited to the execution of stateless functions ...
No Clash on Cache: Observations from a Multi-tenant Ecommerce Platform
Caching is a classic technique for improving system performance by reducing client-perceived latency and server load. However, cache management still needs to be improved and is even more difficult in multi-tenant systems. To shed light on these problems ...
MemSaver: Enabling an All-in-memory Switch Experience for Many Apps in a Smartphone
The availability of diverse applications (apps) and the need to use many apps simultaneously have propelled users to constantly switch between apps in smartphones. For an instantaneous switch, these apps are often expected to stay in the memory. However, ...
Systemizing and Mitigating Topological Inconsistencies in Alibaba's Microservice Call-graph Datasets
Alibaba's 2021 and 2022 microservice datasets are the only publicly available sources of request-workflow traces from a large-scale microservice deployment. They have the potential to strongly influence future research as they provide much-needed ...
Disambiguating Performance Anomalies from Workload Changes in Cloud-Native Applications
Modern cloud-native applications are adopting the microservice architecture in which applications are deployed in lightweight containers that run inside a virtual machine (VM). Containers running different services are often co-located inside the same ...