From the stone to the cloud: we travel at the speed of data

From writings in stone to the use of keyboards – on a smartphone or a computer -, millennia of evolution have brought human beings not only new forms of expression, but also new ways of archiving information. History tells us that the need to preserve information, e.g., keeping official documents, goes back to old times. The Royal Archives of Ebla, a Bronze Age city discovered in the 20^th century, is a good example of this practice. The Roman civilisation also decided to build a space to file official documents, the Tabularium.

However, unlike the records in stone, the documentation on papyrus, and later on paper, proved to be more vulnerable – not only due to the characteristics of said material, but also because of external factors, i.e., conflicts or natural disasters, leading to the loss of data.

Over the years – and due to the advent of the digital age -, filing techniques have been improved, and a lot of documentation has been digitised, to facilitate the preservation and consultation processes. The fact that we currently have machines that allow us to store large amounts of information has also contributed to this process; nowadays, folders with files accumulate on our computers, smartphones, and tablets. But like the papyrus and paper, this method of data storage also entails challenges. In this edition of Spotlight, we will address some of said challenges, towards the clouds.

Towards the cloud: the challenge of data storage

Every day, vast amounts of digital information are generated worldwide, and all information is stored temporarily or permanently on personal and entities’ devices. Due to the easy and fast process of creating a file and storing it on a computer – and, possibly, on the cloud – it is estimated that by 2025, data creation could reach 180 zettabytes.[1]

As a way of addressing this vast amount of data, we’ve witnessed the development of solutions to store all information in a secure and easily accessible way. In 1963, the first proposal to create a service emerged, which allowed at least two people to access the same computer simultaneously. The foundations were laid for the system we now know as cloud.

Almost 50 years later, in 2009, the cloud concept presented some developments, motivating research on the subject, and companies like Amazon, Google and Microsoft launched the first cloud computing services, as explained by João Tiago Paulo, a researcher at INESC TEC’s High-assurance Software Laboratory (HASLab) – also stating that remote email storage services already existed before this date.

Despite all the benefits associated with cloud computing (mainly by allowing companies to profit from time and resources, by using remote storage services to store and manage their data), the use of these services has also brought a challenge: the development of tools that allow the creation of a digital space capable of storing a large amount of data, which can be consulted at any time, from anywhere in the world, without jeopardising their security and privacy.

[1]A zettabyte is equal to approximately 1 billion terabytes.

“As soon as companies start using cloud computing, they face many challenges from the point of view of distributed systems, because it becomes a highly complex system”, explained the researcher, mentioning the use of the cloud by e-commerce businesses.

“In the case of an e-commerce sales application, with a system running on its own infrastructure, it requires a professional or a team to ensure there’s enough storage to support a large number of customers shopping at the same time, ensuring that the service is always available. Considering that many other companies started to use the cloud, the services must be able to support not only the thousands of costumers of said service, but also all the services running there. In terms of distributed systems, it is very interesting, because we are talking about a highly complex system, which grants computing and storage resources, relieves people from managing these servers and worrying about/solving issues like malfunctions; however, this raises some concerns, namely the provision of a centralised service to millions of people, which must always be available and operate correctly”, explained João Paulo.

In this sense, the way management companies store data and ensure active servers for this purpose, the constant concern to keep the system safe and functioning, and the profitability of resources and time for companies, make the cloud a place where more people and companies choose to upload personal and confidential information.

After clicking on the “save” option, what happens to the data?

We know, when we choose the “save” option, that we are placing a certain file, for example, on the cloud; but do we really know where we are storing our information, who has access to it and how the services that remotely manage our data work? Are we in any danger of losing our data? Is it safer to store our folders on a hard/flash drive?

Saving files locally presents certain risks, since a hard drive – or even a computer – may malfunction, and the likelihood of recovering the saved data is often low. For this reason, people and companies choose to store data on the cloud, relying on the full effectiveness and percentage of responsiveness, quick access to content – through a button, from any location, provided it is connected to an internet network – and the security of the process. There is clearly a “before and after” the cloud, concerning the work flow of certain entities and the organisation of personal files.

But how does this whole process work?

“After a person creates a folder, the files are saved on a hard drive, locally, on OneDrive, or on the cloud, with the help of an interface. What happens is that these bytes, instead of being stored locally on my hard drive, are sent, via network, to a remote site, e.g., an Amazon data centre “, explained João Paulo.

“The data centre receives said bytes and stores them on disks. The process is the same; the big difference between this model and the previous model is that when I wanted to save my data, I did it on my hard drive and if my hard drive failed, I had to worry about making backups. Once this data reaches the cloud, it is stored on disks too, but the whole data management and security processes are carried out by the companies that manage the services. They do it for me. In the end, I know that when I access my data, they will be there”, added the researcher, also mentioning that the main difference is that instead of accessing their files through their own computer, users will do so through the Internet.

But if we are no longer storing our files on our computers and if companies are sharing their data with other companies that store and manage them… what concerns should we have?

According to Bernardo Portela, a researcher at INESC TEC’s HASLab, if we think as personal users, our main concern regarding the cloud will depend a lot on our fears in terms of privacy. As for businesses, these concerns are “absolutely central”. “If we talk about companies, who are really the ones who truly benefit from using the cloud’s greatest computing power, then there’s no doubt about it. Until viable alternatives emerge, in many cases, the cloud is not even a reasonable hypothesis, precisely because of privacy issues”, pointed out the researcher.

In addition, Bernardo Portela stated that there are specific laws in force – i.e., the General Data Protection Regulation (GDPR) – to prevent companies from storing information characterised as sensitive, on any space other than their infrastructure – meaning that, in said cases, it is not possible to resort to remote services.

However, and if we think about the possibility to do so – at a time when there has been a series of computer attacks on several organisations, including companies, whose data is stolen, under threats of ransom, hacked or deleted pages -, the issue of data vulnerability becomes particularly relevant, especially among critical sectors, given the degree of sensitivity of the data they store.

Once stored on the cloud, is our data secured?

“Attacks on data centres come in many different forms, ranging from relatively harmless threats, like phishing attacks, or weak/accessible passwords by people in charge of managing servers. Through the access to databases, attackers can gain access to the data”, explained Bernardo Portela.

According to the researcher, from the moment we store a file on the cloud, we may not know, as users, where the server used for storage is, but the truth is that it is a server like any other: “if it has access to the network, then it has vulnerabilities”. “Therefore, network administrators and system administrators can access the server and explore the content of the stored files. If people are not careful and they don’t prevent access, the attackers can use an administrator’s password and access the data”, he pointed out, highlighting that most attacks result from human failures.

Under the constant challenge of keeping information protected on the cloud, Bernardo Portela explained the research work that has been done in this area, to allow data to be stored and shared securely. According to the researcher, the best way to store data securely and to archive it is known since the 1990s.

“I resort to a cipher system to build cryptograms, which represent my data. This allows me to upload these cryptograms on the cloud, rather than my original data. Concerning the cloud, these cryptograms are opaque and random values that you can store, but the cloud doesn’t know what they mean. When I wish to recover my data, I ask for the cryptogram and I use a key to decipher it and recover the original data”, said Bernardo Portela.

Having assimilated this knowledge, the researcher explained that the great challenge is, in fact, to enable the cloud not only to store the data safely, but also to process it. “For example, I want to save a list of files and I want the cloud to order them or make a calculation on the data that is stored there,” he explained, emphasising that when storing the data on the cloud with so-called standard cryptograms, it will not be able to process them because it will face random information.

In this sense, there is a need to develop new technologies or exploit existing ones in the market, in order to find a solution that allows a balance between the safe storage of information and the ability to process it on the cloud. Bernardo Portela mentioned that, over the last few years, there have been many research endeavours on secure data processing, but there is still a long way to go before we can use all the cloud capabilities – or the ones we want to use.

SafeCloud: the path to privacy and security on the cloud

Although cloud infrastructures have proven advantages, while contributing to improve the competitiveness of modern economies, certain factors should be considered in the process of developing these technological solutions – especially considering the need to find a balance between security and processing.

In this context, INESC TEC has been working on projects like SafeCloud or PRACTICE, which aim, as Bernardo Portela explained, “to channel the greatest computing capacity and all the advantages of the cloud, without the disadvantages of giving up data control”.

In the case of the SafeCloud project, João Paulo said that the starting point for the development of the project was the notion that the cloud is a reality, and that it is changing the way companies store data and compute them. “We know that there are cloud computing services and the ability to store files on remote storage companies, and to manage them on the cloud. This means that our files are no longer on our computers, which can be critical for companies that deal with sensitive data, such as hospitals”, explained the researcher. In fact, entities working in the healthcare sector gather sensitive data about their patients, professionals or even some patents developed by researchers.

As a way to find a feasible solution to make the cloud capable of processing sensitive data, and considering the privacy and security concerns about stored data, the SafeCloud project (that took place between 2015 and 2018) emerged with the goal of architecting the cloud infrastructures, dividing the data into several administrative domains, which appear to have little risk of collision.

The project focused on ensuring that the tasks of data transmission, storage and processing are divided by different domains, making the data more secure. Part of this knowledge led to the creation of a start-up – Safecloud technologies – that provides consulting services highly specialised in data privacy, distributed systems, and large-scale infrastructures, and allowed other people to integrate other projects around Blockchain and data density.

What is the best architecture for cloud software?

Alongside the research work on privacy and data security in the cloud, INESC TEC has also been working on the design of good practices for the development of cloud software. “When we develop cloud software, it is important to be able to make the most of its potential”, said Filipe Correia, a researcher at the Centre for Human-Centered Computing and Information Science (HumanISE). According to the researcher, the need to adopt good practices for the development of software for the cloud is vital.

Although there are many papers and books on good practices for development of cloud software, the researcher acknowledged that sometimes these good practices are not well applied, or are even applied outside the appropriate context, which makes them “bad practices”.

As a way of establishing and documenting these good practices, Filipe Correia pointed out the relevance of the concept of software standards, and the focus on three important aspects: the detection of the issue, which must be analysed and taken into account; the solution applied to address the issue; and, finally, the context in which it can be applied, so that a good practice does not become a bad practice.

The paper A Survey on the Adoption of Patterns for Engineering Software for the Cloud – based precisely on good practices in cloud system development -, presents the work developed by Tiago Boldt Sousa, Hugo Sereno Ferreira and Filipe Correia; it was published in IEEE Transactions on Software Engineering, one of the most prestigious international journals in the area of Software Engineering.

“In this work, we described a set of development standards for the cloud, based on our experience, which we later validated with the contribution of a broad set of software professionals”. The researcher added that, before advancing to empirical studies, in the production of the paper, the authors dedicated their efforts to document these solutions based on personal experience and literature.

“Then, we questioned ourselves if these problems are what the teams are actually experiencing; so, we asked a group of developers to what extent they use the practices presented in their daily lives”, he said. The researchers outlined the best practices and at which stage of the project they should apply them. “Finally, we tried to identify the dependence between these practices”, so that, depending on the project that software development professionals had in hand, “they knew which practices would make more sense”, concluded the researcher.

“Obviously, there are still several challenges in developing and optimising systems for cloud operations. They require a significant effort from professionals, and the research carried out thus far has made interesting contributions”, he assured.

As to the future, researchers believe there is still a long way to go until the use of cloud technology is fully matured. The truth is that the cloud seems to be, as Filipe Correia reminded us, “everywhere”. And, with the constant technological evolution, can the path that led us to the cloud take us even further?

The researchers mentioned are associated with UP-FCUP, UP-FEUP, U.Minho and INESC TEC.

From the stone to the cloud: we travel at the speed of data

AgeingFit 2024: INESC TEC joins the main European event on healthy ageing

Categories

NEWSLETTER