In the era of pervading data storage, replication can be the key to large-scale systems. Here's how a INESC TEC research explores these challenges

In a study published in ACM Computing Surveys, Paulo Sérgio Almeida, INESC TEC researcher, synthesises the existing knowledge on approaches to Conflict-free Replicated Data Types, a topic he has been exploring over the past decade. These enable replication in distributed systems with automatic conflict resolution, ensuring high availability – even in the face of communication failures.

The need to store large amounts of data, namely in storage structures distributed by multiple computers on the same network i.e., distributed systems, can cause issues in terms of data availability and fault tolerance. These issues increase when considering, on one hand, the current trend toward cloud computing, with data centres spread across the globe (subject to partitions), and on the other hand, local-first approaches, with local data and direct communication between participants, independent of the cloud.

Replicating data is an increasingly explored technique that aims to improve these services, although it’s not a problem-free method. For example, simultaneous updates to multiple replicas of the same data – without timely coordination between the computers hosting them – could lead to inconsistencies that, as a rule, cannot be resolved. Hence, much of the work done on distributed systems has focused on avoiding simultaneous data updates. But such approaches have negative consequences in terms of availability.

In this context, Conflict-free Replicated Data Types (CRDTs) are a type of replicated data used in distributed systems where the various replicas evolve independently, but automatically converge to the same state. They are mostly used in systems that privilege availability and low latency in terms of access, as they’re tolerant to network partitions.

In “Approaches to Conflict-free Replicated Data Types”, Paulo Sérgio Almeida, INESC TEC researcher, explored the research that served as the basis for his Habilitation lecture, synthesising key knowledge about approaches to CRDTs; the author aimed to present a reference work on replicated data types, allowing researchers and developers to grasp the fundamental concepts and access relevant literature.

As Paulo Sérgio Almeida explains in the paper – published in the Q1-rated journal ACM Computing Surveys -, “classic distributed systems aim to achieve strong consistency through replicated state machines. However, while high performance is possible with this process, strong consistency in systems with large spatial scale comes at a cost, particularly in terms of response time and loss of availability when network partitions occur”.

If we consider online businesses on a global scale – such as Amazon – any disruption, no matter how brief, can have significant financial consequences, particularly concerning consumer trust. CRDTs have already been successfully applied in the industry, enabling the design of systems that maintain a short response time – even at large spatial scales.

The researcher mentioned in this news piece is associated with INESC TEC and U.Minho.

In the era of pervading data storage, replication can be the key to large-scale systems. Here’s how a INESC TEC research explores these challenges

Categories

NEWSLETTER