Research

Ongoing projects

  • Learning LLVM IR with GNN/embedding/ML for full stack optimization. In 2021, I initiated a code characterization and optimization initiative based on the LLVM Intermediate Representation. The key idea is to embed the code intermediate representation from the compiler into a vector and use it to guide the different optimization parameters. The method achieves the same performance gains as state-of-the-art offline tuning strategies while using 3x less resources. Through Lana Scravaglieri Ph.D. thesis, we further increase the performance and energy savings by expanding the optimization spaces with vectorization and floating point accuracy tradeoffs.
  • MPI code verification and correction with ML/LLMs. In 2023, we started an MPI program verification initiative. MPI programming is necessary to use large-scale clusters but it can be difficult and error prone. Thus, the key idea of our research is to utilize embedding and deep-learning graph neural networks (GNNs) to identify bugs in MPI programs. This eases the development of large-scale applications. Specifically, we designed and developed two models that determine, from a code’s compiler representation, whether the code is correct or contains a known MPI error. We tested our models using two dedicated MPI benchmark suites for verification: MBI and MPI-CorrBench and achieved results on par with the costly state-of-the-art verification tools. This work pioneered the usage of ML-based strategies for HPC verification. We are currently evaluating the potential for IA/LLM program fix approaches with Asia Auville Ph.D. thesis.
  • I/O classification. In 2024, I started contributing to the characterization of the I/O behavior of HPC applications. Informing the system about I/O is valuable for I/O scheduling or burst buffer optimization. We investigate I/O traces from several clusters and look at features to group similar I/O job activities: our goal is to define a taxonomy (a la Berkeley’s Dwarves) of I/O patterns over HPC systems to help practitioners and users.

Previous projects

  • Member of the Maelstrom project. An Inria & Simula (Norway) joint team.
  • Participant in the 3BEARS Broad Bundle of BEnchmARks for Scheduling in HPC, Big Data, and ML project. In collaboration with the HPC team of the University of Basel (Switzerland).
  • Contributor to the garbage collector energy optimization effort lead by the Uppsala (Sweden) Programming Languages team.
  • Member of the Atos/Eviden Plan de relance on system failure detection.
  • Dual-PI in a collaboration involving the University Iowa (USA) SwAPP Lab working on code representation and optimization.

Publication

My publication record can be accessed at DBLP and Google Scholar.

Grants

  • 2024, Inria exploratory action, Dual-PI, LLM4DICE, Large Language Models for Detection and Correction of Errors.
  • 2022, Atos/Eviden Plan de relance, Dual-PI, Statistical methods for system error detection.

Software tool

My phd was focused on parallel programs decomposition. Most of its ideas are included in the open source framework CERE. A short video using CERE for NUMA and prefetcher optimizations can be accessed here :)

In 2024 in the context of Lana Scravaglieri Ph.D. thesis, I started working on a framework to set up compiler, runtime, and hardware parameters, efficiently and flexibly. The tool will be available soon.

Recent volunteering and invited talks

  • Web chair, communication chair: ICPP 2022
  • Co-Organizer: HPC Bugs Fest during SC 2023 and 2024
  • PC: PDP 2021/2022/2023, COMPAS 2022, ICCS 2022, Correctness 2023, IPDPSW 2024, ICPP 2024
  • Reviewer: IPDPS 2022, PeerJ Computer Science 2023, SC 2023, Cluster 2023, SC 2024
  • Invited talks: Uppsala University 2021, University of Versailles 2022, Intel's VSSAD seminar 2023