EMOSS22 – Emerging Open Storage Systems and Solutions for Data Intensive Computing (Half Day Workshop)
IO-SEA organises the EMOSS22 half-day workshop as part of the HPDC’22. Information regarding registration will follow soon.
The world of storage is changing and there have been continuous innovations happening in the methods of storing, managing and accessing data. We are in an era where innovation and scientific progress will be heavily depend on exploitation of massive data assets. Critical to this will be the architectures of data storage and I/O infrastructures.
Convergence is coming into storage infrastructures. HPC and Cloud storage have been historically different and have charted their own paths appropriate for their communities. This has been changing with the evolution of use cases and applications. HPC and Cloud are coming together. “Cloudification” of HPC is happening, where in cloud technologies such as containerization is coming into HPC. Monolithic scientific applications generating simulation I/O is now giving way to large workflows that have massive data ingest components and also generate vast volumes of data that needs to be post processed. AI use cases within HPC is feeding into this data frenzy. Never before has data been so critical in HPC.
We have seen the emergence of cloud based storage technologies in the realm of HPC, such as object storage. We are also seeing that data storage infrastructures at Exascale will not be monolithic but a mix of various data “pools” that will include highly scalable parallel file systems that are being revised and re-architected for Exascale. Object storage, that has its inception in the cloud, has been repurposed, and, will be expose various possible “gateways” on top of it suitable for scientific HPC use cases. For base storage technologies- NVRAM, SSD, disk, Tape – willall have a role to play. Mixing and matching them in various ways will provide us various points in the performance/cost curve.
Storage software infrastructure innovation will be at the center of all this. We will focus this workshop on looking at the next generation of emerging storage software infrastructure innovations happening in scientific computing.
The half day workshop will look critically at some of these technologies and will invite the pioneers in their field to present to the community.
The Program (Virtual) – Links will be provided post registration
9:00am – 9:40am (EDT) – Ephemeral Data Access Environments in the IO-SEA project
The IO-SEA project’s objective is to build a novel approach for the IO stack of the exascale class supercomputers. Most supercomputers today come with static file systems, continuously connected to the compute nodes. These file systems receive all of the I/O requests of every applications running on the cluster, with the hope that they will be able to process these requests smoothly. However, this model is currently reaching its limits, as the effect of highly I/O demanding applications severely impacts the performance of production systems. IO-SEA is building a new I/O stack based upon ephemeral I/O services tailored and launched on purpose for workflows, running on dedicated “data nodes”, providing access to the users’ datasets stored in an object store. In this talk, we will expose the main concepts and discuss the challenges of this architecture.
About the Speaker
Philippe Couvée has been working for the Atos HPC R&D (Grenoble, France) for more than 20 years. He is leading a team of 17 researchers and engineers developing products to facilitate data access from large supercomputers. His recent focus is on data centric solutions that combine cache and acceleration techniques with advanced instrumentation and data analytics. He is also teaching computer architecture and system programming at CNAM school
9:40am – 10:20am (EDT) – Hierarchical storage from NVMe to tapes
Hierarchical storage allows to build both high-performance and high-capacity storage systems by combining various technologies like NVMe, SSDs, HDDs, or tapes.
While most existing systems only manage a limited number of levels and technologies, the IO-project aims to implement a deep storage hierarchy in the context of Exascale workloads. This system will be capable of managing most of the existing storage devices, from NVMe to tapes.
To achieve this, it leverages two open-source object stores: CORTX, an innovative object store developed by Seagate and designed for Exascale, and Phobos, a tape-capable object store developed by CEA.
The resulting system will implement features to allow user and applications to optimize their data placement by specifying “hints” like extended attributes. It will also provide features to migrate data massively according to arbitrary criteria.
The system will be accessible through a native API, as well as a S3 gateway.
About the Speaker
Thomas Leibovici leads the HPC storage group at CEA. He created several popular open source software in the domain of HPC storage like Robinhood policy engine  and Phobos object store . He also contributed to several implementations of hierarchical storage systems, like Lustre-HSM and M0tr HSM.
In the IO-SEA projects, he drives the WP4 about “Hierarchical Storage Management”.
Virtual Coffee Break, 10:20am – 10:30am
10:30am – 11:10am (EDT) – A Story of MarFS at LANL
Data storage isn’t easy. HPC data storage at massive scale while balancing resiliency, performance, and usability demands isn’t easy combined with a healthy portion of extremely challenging. This talk will provide a crash course in storage philosophy at LANL, how and why we make things difficult for ourselves, and how we ensure the best experience for our users through novel storage technologies and commodity hardware.
About the Speaker
David Bonnie is a storage architect with a background in scalable storage systems and software development. He contributed to the development of OrangeFS/PVFS2 and holds a deep interest in understanding current and future storage hardware and software. David is currently the campaign storage, future archive, and enterprise backup technical lead for the High Performance Computing Division at Los Alamos National Laboratory, positions that include leading development, integration, and production support efforts for LANL’s pre-archive, archive, and backup tiers. He is integral in the development and deployment of these solutions which are designed to serve the needs of ever-growing datasets paired with spiraling bandwidth and reliability challenges. David holds a B.S. and M.S. in Computer Engineering, both from Clemson University.
11:10am – 11:50am (EDT) – One big happy family: sharing the S3 layer between Ceph, CORTX, and DAOS
Object storage has transformed the storage industry. Freed from the complex hierarchical organization of file systems, object storage systems have achieved tremendous growth and scalability in the past two decades. However, object storage systems for Enterprise/Cloud computing and those for High Performance Computing (HPC) have had some differences; the main difference being the client interface. Enterprise computing has preferred a GET-PUT interface which, similar to Map-Reduce, has enabled tremendous human productivity by simplifying the interface. Whereas, typical HPC frameworks tend to optimize computational productivity over human productivity which means they prefer more complex, low-level interfaces with more flexibility and more ability to optimize.
Given this, it is no surprise that three object storage systems (Ceph, CORTX, and DAOS) originally motivated by HPC all have similar low-level interfaces (librados, libmotr, and libdaos respectively). However, given the increased convergence of Cloud and HPC, these object storage systems also need to support the industry standard interface which has becomes Amazon’s S3 protocol. In this talk, we will discuss how the Ceph project was the first to add an S3 layer, how they later made it modular so that multiple object backends could share it, and how two small groups of engineers have added modular backends for both CORTX and DAOS.
About the Speakers
Zuhair AlSader is a distributed systems software engineer at Seagate. He is currently working on integrating Intel’s DAOS storage system with Ceph and Seagate’s CORTX. He holds a masters of computer science from the University of Waterloo. His research interests include storage systems, message queues and serverless architecture.
Andriy Tkachuk is a CORTX SW Engineer, with 10+ years experience in distributed storage systems development.
11:50am – 12:30am (EDT) – Why a Data Plane Architecture is Critical for Optimizing Next-Generation Workloads
For next generation workloads like AI and ML to power the next evolutionary leap in business transformation or research innovation, they need to be able to do more. AI can support enterprise scale requirements – if the data is there. But without a modern data infrastructure that can rapidly process massive volumes of data, it can’t deliver on its full promise and potential.
Recent research has found that AI GPU accelerators can spend up to 70% of their time idle, waiting for data. They require a new kind of data infrastructure that is purpose-built to fuel massive quantities of data at low latencies. Speed is critically important to the AI-fueled enterprise – efficiently feeding the right data to the system is the difference between a project requiring months versus mere days.
In this workshop discussion, WEKA CTO Shimon Ben-David will explain why WEKA opted to architect a modern data plane to support next-generation workloads and discuss how the WEKA Data Platform is helping organizations to achieve first to market results with their AI and ML deployments.
About the Speaker
Shimon Ben-David is the Chief Technology Officer at WEKA, where he actively engages with customers and partners to track emerging trends and bring actionable feedback to WEKA’s Engineering and Product Management teams.
Prior to joining WEKA, Shimon ran Support Services for Primary Data, XtremIO, and IBM. Shimon met the leadership team of WEKA when he managed IT at XIV, acquired by IBM in 2007. He studied Computer Science and Philosophy at Ramat Gan University.
Panel Discussion: 12:30pm – 01:00pm
About the Chairs
Dr. Sai Narasimhamurthy is Engineering Director, Seagate Systems, working on Research and Development for next generation storage systems and responsible for EU R&D for the Seagate Systems business. Sai currently also holds the position of vice-chair of industry for the ETP4HPC organization and co-leads the storage and I/O working group for developing ETP4HPC’s Strategic Research Agenda (SRA). He has also actively led and contributed to many European R&D consortia (SAGE, Sage2, Maestro, IO-SEA etc) in the area of HPC focused on I/O and storage. Previously (2005 – 2009) , Sai was CTO and Co-founder at 4Blox, inc in the area of Storage Area Networks. During the course of his doctoral dissertation at Arizona State University (2001 – 2005) Sai has worked on Storage Area Networking protocols focusing on solutions for bulk data transfer over IP networks.
Sai is also the Dissemination & Exploitation lead for the IO-SEA project (https://iosea-project.eu/) and the workshop is a part of IO-SEA dissemination activity.
Glenn K. Lockwood is the principal storage architect at the National Energy Research Scientific Computing Center (NERSC) at Lawrence Berkeley National Laboratory where he leads future storage systems design, I/O performance engineering, and many storage R&D activities across the center. He was a lead designer of the 35 PB all-NVMe Perlmutter file system, and he also played a key role in defining NERSC’s Storage 2020 vision which culminated in the deployment of its 128 PB Community File System. In addition to storage systems design, Glenn is also actively engaged in the parallel I/O community; he represents NERSC on the HPSS Executive Committee, is a maintainer of the IOR and mdtest community benchmarks, and is a contributor to the Darshan I/O profiling library. Glenn holds a Ph.D. in materials science and a B.S. in ceramic engineering from Rutgers University.
This workshop is conducted as a dissemination activity of the IO-SEA EU project.
This project has received funding from the European High-Performance Computing Joint Undertaking (JU) and from BMBF/DLR under grant agreement No 955811. The JU receives support from the European Union’s Horizon 2020 research and innovation programme and France, the Czech Republic, Germany, Ireland, Sweden and the United Kingdom.