Disinformation, hate speech and science communication: access to information

On February 15, 2020, in Munich (Germany), the Director-General of the World Health Organization (WHO), Tedros Adhanom Ghebreyesus, addressed the audience at the Munich Security Conference. A few weeks had passed since the first cases of COVID-19 infection were detected in the city of Wuhan, China – in Portugal, the confirmation of the first cases would take place two weeks later. Near the end of his speech, the Director-General of the WHO said that we are not only fighting an epidemic[1], but also an infodemic. He added that “fake news spreads faster and more easily than this virus, and is just as dangerous”, explaining that the WHO works with social media management companies to prevent the spread of disinformation[2].

Moving from said conference in Munich, to the United States of America, in 2016, we find an electoral period characterised by the dissemination of false information and conspiracy theories – Pizzagate is one example. At this point, the term “fake news” – widely used by candidate Donald Trump – comes into force in our vocabulary; in fact, the fast dissemination and sharing of unsubstantiated news may have influenced the presidential election results.

Returning to the present, in 2022, we have witnessed the spread of false information about the war in Ukraine. Some even talk about infowar – information war – and many believe that disinformation has been playing a major role, not only in the conflict, but also in the citizens’ right to access information.

At a time when more than half the world’s population uses social media – yes, one can easily be part of this world – and when fake news is more shared than real news[3], can machines help us understand why certain information, news or sources are unreliable? If you are a social media user, and if you enjoy reading news online – with special interest in science and technology issues, join us on this edition of Spotlight; if this is not your case… you can still join us, it will be worth your while!

The trail of (dis)information

Disinformation entails a set of challenges and consequences, posing a threat to democratic states. Whether through WhatsApp groups, scrolling on Twitter or watching videos on YouTube, you have probably come across false information; and even though it’s sometimes easy to realise that what you’re reading does not correspond to reality, you may find it hard to verify if certain information is true or not. In this sense, INESC TEC researchers have been working on the development of solutions that aim to support citizens in identifying disinformation.

This research initiative, with more than a decade, started with a project called Breadcrumbs, which aimed to create a prototype of an online social news network with automatic identification of shared interests. Following this project, another one emerged –  REMINDS (Relevancy Mining Detection System), which aimed to detect the relevance of publications on social networks, using technology.

“At a certain point of the project, the team started to realise that there were many posts including false information. We then started to question if the system we’re developing was able to address this issue. We faced the need to identify fake news, because they should never be classified as relevant”, said Álvaro Figueira, a professor at the Faculty of Sciences of the University of Porto (FCUP) and a researcher at INESC TEC. The curiosity to move forward with research in this area ended up leading to the project “Detecting Fake News Automatically“, and the PhD thesis by Nuno Guimarães (researcher at INESC TEC), with Álvaro Figueira’s supervision.

When asked about this, Álvaro Figueira and Nuno Guimarães recalled, step by step, their work and the clues they followed until they reached a solution (under development) that they hope can be used by everyone to detect unreliable information. It’s not a work of fiction, but you may feel like Sherlock Holmes while reading the next few lines, following the trail of disinformation.

Reliable or unreliable? That is the question

According to Nuno Guimarães, the first step was the definition of concepts[4]. Although the terms fake news or misinformation are commonly used, literature shows that there still seems to be certain differences as to how people use them. In this sense, the team chose to use the term “unreliable information”. “If our goal is the development of a solution that is implemented, telling the user that something is false has a completely different impact from saying that something is unreliable; so, we decided to follow that approach”, explained the researcher, adding that it is still a broader term. “Clickbait, for example, is not necessarily disinformation, but it can be unreliable, since people think they will find a certain type of information and end up finding another – it is not disinformation per se“, he said.

INESC TEC researchers have been working on the development of solutions that aim to support citizens in identifying disinformation.

Once the concept was defined, the research work advanced to the detection of Twitter accounts that disseminate unreliable information. “We followed two approaches to determine whether certain accounts were reliable or unreliable. On the one hand, we created metrics to classify accounts, namely, on behaviour, analysing how many posts were reliable and how often they were published. On the other hand, we defined metrics related to impact, to check if said accounts had a lot of followers, if they followed other accounts, if they were verified, etc.”.

According to the researcher, the metrics also considered whether “the published content came from doubtful news sources already flagged, or if they propagated false or extremely biased information, omitting very important factors in a story or presenting a partial outlook”. Through this analysis, the team realised that not all accounts that had a high rating on the metrics were bot accounts – created and operating automatically. “We realised that there are humans deliberately propagating unreliable information, either intentionally or by sheer ignorance”.

Based on this discovery, and to automatically detect and classify accounts using artificial intelligence and machine learning, the team also considered the volume of published information, i.e., variability. “Our machine learning models not only analysed the information about the accounts, but they also considered differences in terms of volume of information – there are accounts that disseminate 100 or 200 posts per month, while others present 100 or 200 post per day,” said Nuno Guimarães.

To the amount of information, the team added the need to check the published content; at this point, the task began to become truly challenging, because the machine learning models tested until then, worldwide, were very effective in detecting false news on a certain topic or during a certain period. However, as soon as the subject changed, the results were not as desirable. “Concerning the U.S.A. elections, the models were trained to detect fake election news with 90% accuracy, but what we understood was that we could not analyse things this way if the goal was to have models working over in the long run”.

Hence, the team trained the models developed within 30 to 60 days and left them to evaluate information over 18 months, at a time when there’s a sudden change of topic – the COVID-19 pandemic. “When we started to evaluate the different models – both state-of-the-art and new proposals made by us -, we realised that the models really did not adapt when there was a sudden change in the topic, since their performance decreased. Still, certain solutions that we evaluated, with good results, were able to be robust in the face of these topic changes”, explained Nuno Guimarães, also mentioning that these results are now being improved with the objective of developing a prototype that people can use.

Emojis, emotions and sentences or how machines detect false information

So far, we have described how our researchers have followed a set of clues until they came up with a solution that can detect untrustworthy information; however, how do you train machines to help you understand what you should or should not trust?

“Since the early development of systems to detect false news, the main idea has always been to detect text patterns”, said Álvaro Figueira, emphasising that the “techniques used to detect language patterns have evolved as science has evolved in the ability to process said patterns”. According to the researcher, when the first works in this area began, there were certain aspects, e.g., the number of capital letters, punctuation, use of adjectives, number of emojis or symbols, which could indicate false news.

Later, the systems started to consider emotions, through libraries capable of extracting feelings from a text. “We noticed the existence of more typical sequences of emotions in fake news than others, and soon a second generation of systems capable of detecting fake news emerged. Later, the main focus became sentence construction. There’s also a usual pattern in this case, which we do not know, but the machines are able to discover”, said Álvaro Figueira.

Nevertheless, the researcher pointed out a set of challenges associated with the automatic detection of disinformation. In addition to the stability of the models, especially considering the change of theme (concept drift), there is also a limitation in the existence of data. “We require a certain amount of fake and true news to help the machine understand what is what, and the amount of true news is larger than fake news. In addition, for the machine to be trained, it is necessary to explain which are the false ones, since supervised learning generally works better”, he explained.

Nuno Guimarães added another obstacle: “very complex models can perceive patterns of language, which we humans cannot, and this is a problem – especially in this area, because for people who are sceptical about fake news, if we can’t explain them why and we simply say something is either false or true, it won’t be enough”. In this sense, the prototype under development will provide users a set of additional information – in addition to classification as reliable or unreliable -, so they can understand the classification process. “It’s like having a black box, and we want to know what’s in said black box. The users have access to certain clues to understand the decision”, highlighted Álvaro Figueira.

Hate speech: how technology can help to detect it

Over the past few years there has been an increase in hate speech on online platforms. According to the United Nations, unlike traditional media, online hate speech is published and shared more easily, at a lower cost and anonymously, posing several challenges to democratic states. “In a simplified way, hate speech can be described as an offensive communication directed at groups or individuals based on characteristics such as religion, gender, and nationality, among many others. It is a major issue, with an impact on today’s societies, acknowledged by national governments, international organisations, and companies, with emphasis on those working in the digital sector – namely platforms, social networks, and the media”, explained Sérgio Nunes.

“The motivation for our research in this area stems from the individual and social impact of hate speech, and our belief that information technology plays a central role in this context, contributing to mitigate this problem.”

According to the INESC TEC researcher and professor at the Faculty of Engineering of the University of Porto (FCUP), this is a problem that has proliferated in the context of online communication, mainly due to the disintermediation of communication. “With digital media, personal communication is often mediated by tools and devices, contributing to the reduction of empathy towards others, and a notion of anonymity and impunity,” he stated, mentioning as an example online journalism, in which “the participation of readers, in comments sections or forums, is increasingly frequent and valued”. In these cases, hate speech seems to proliferate, and this was the starting point for the development of the StopPropagHate project.[5]

The project, which aimed to develop a solution for automatic detection of hate speech on online communication outlets, sought to study how news on certain topics may or may not influence comments, enhancing hate speech, while contributing with tools that could help media publishers to manage this problem – for example, by signalling potentially problematic individual discussions or comments, for further analysis by people.

“The motivation for our research in this area stems from the individual and social impact of hate speech, and our belief that information technology plays a central role in this context, contributing to mitigate this problem – not only through the original design of the systems, but also in the implementation of solutions and tools”, highlighted Sérgio Nunes. In practice, the team of researchers developed a prototype that includes two tools: one for readers, which allows estimating whether a given comment is at risk of being flagged as containing hate speech; another for newsrooms, which allows estimating whether a particular news piece is more likely to generate comments with hate speech.

While acknowledging the importance of artificial intelligence techniques in detecting hate speech online, Sérgio Nunes pointed out that the technology may not be enough to stop the problem. “The same statement can have very different interpretations, depending on the context in which it occurs; for example, a neutral phrase in one context can clearly configure hate speech in another. This is easily verifiable when human evaluators do not agree in many cases. This issue features a strong dependence on cultural, social and communication factors, among others. Current text classification models cannot apprehend this complexity or all relevant factors because they simply do not have this information”, he concluded.

From technology to words: the role of science communication against disinformation

Access to information is identified as crucial in the fight against disinformation, so much so that the United Nations decided that September 28 marks the International Day for Universal Access to Information, reinforcing the relevance of the theme[6]. In this sense, the importance of communicating in a simple way emerges, especially when we talk about topics like those listed above – a pandemic, an election, a war; hence, the message must be easily understood by anyone.

In Science, namely concerning Research and Development (R&D) results, INESC TEC – through its Communication Service (SCOM) – has carried out a series of science communication initiatives, which aim to disseminate and inform about the work that is carried out by the Institute. And this can be a particularly challenging job, as one needs to dive into the researchers’ work – in some cases, literally – to gather the most relevant resources, emerging, taking a breath, and typing words that everyone can understand.

“What does it mean to communicate properly? Is it sending a message to a lot of people? In my opinion, when we are talking about science communication, this cannot be the sole or the most important requirement. Communicating adequately means that our recipients should clearly understand what we are saying. Obviously, this is all much easier when we are communicating subjects that are within the reach of most of the everyday citizens. When we are communicating aspects that are more specific and technical, like as science, the process is, of course, more complex”, explained Joana Coelho.

The manager of INESC TEC’s Communication Service (SCOM) mentioned that science has been contributing to the world’s progress for centuries and that there are still those who deny it. “Where are we failing? In my opinion, in the way we convey the message. Our role as science communicators must be to ‘translate’ the scientific language – used by researchers – into the language used by the general audience, making sure that there is no doubt about what has been done, how it has been done, why it has been done and the impact on people’s lives. These are the aspects that we have tried to advance at INESC TEC”, she said, emphasising the importance of this effort in terms of preventing disinformation and in promoting citizens’ access to information on R&D results.

In this sense, Joana Coelho explored the strategy that has been implemented by the Institute over the past few years, not only through the regular sharing of information with journalists, but also in the production of content for dissemination through its own channels – website, newsletter and social media. Joana Coelho mentioned the recent commitment in terms of international press office, via the Alpha Galileo platform, through which press releases are sent to international journalists specialised in topics related to science and technology. “The difference between the press releases we send nationwide and those we send via Alpha Galileo is how we dissect the information. In other words, we write longer texts, in which we try to explain, in a simple way, everything that is inherent to the science and innovation activities we are developing”, she stated.

“What does it mean to communicate properly? Is it sending a message to a lot of people? In my opinion, when we are talking about science communication, this cannot be the sole or the most important requirement.”

Along with the information provided to journalists, there’s a clear promotion of certain types of formats (like the text you’re reading) – Spotlight, the Science Bits podcast or the Science & Society magazine. “When we discussed this section – Spotlight -, we devised something written and edited by ourselves, with valuable contributions by our researchers, where we could easily establish a relationship between the science and innovation we develop and various social challenges. This relationship between everyday life, society, economics, and science is decisive in making people understand what we do, in a simpler way. In addition to text, all editions of Spotlight feature specific illustrations on the subject explored, to support the reader throughout the text – in a simple, intuitive, and aesthetically different way from the usual photography that features on most of the news published”.

In order to exploit different formats and platforms, INESC TEC also launched a podcast on science and technology. “We realised that stress and lack of time in everyday life do not always allow us to dedicate the desired time to read longer texts. We also realised that it is necessary to assume that there are different audiences, and that people enjoy different formats. In this sense, we decided to explore the podcast solution, that we can listen to while working, driving, cooking, tidying up the house, etc., simply by putting on some headphones and listening to science-related topics”, explained Joana Coelho, also remembering that the fifth edition of the Science & Society magazine, dedicated to Energy Transition, was recently published.

“The magazine features an online edition and a printed edition, and it was designed to disseminate science to society. The articles target not only ordinary citizens, but also managers, politicians and technical staff who are involved in the activities of the sectors explored in each issue. So far, we have five editions, each one dedicated to a theme explored by INESC TEC”, said the SCOM manager.

Every day, before going to sleep – while sitting on the couch or lounging in bed, scrolling on social media – we are bombarded with false information; as this text shows, it can be difficult to know if what we read is true or false. The last few years have shown us the dangers that disinformation poses to democratic states. The pandemic was particularly relevant in this matter. We know that there are ongoing efforts, both from a technological point of view and from a communication point of view, but what is the best way moving forward? Whatever it is, there’s something we know for sure: technology, literacy and communication will be part of the equation.

The researchers mentioned are associated with UP-FCUP, UP-FEUP and INESC TEC.

 

[1] COVID-19 was declared a pandemic in March 2020.

[2] Speech by the Director-General of WHO at the Munich Security Conference: https://www.who.int/director-general/speeches/detail/munich-security-conference; or https://securityconference.org/en/medialibrary/asset/an-update-on-the-coronavirus-20200215-1600/.

[3] In a study published in 2018, a MIT team found that, on Twitter, fake news is 70% more likely to be shared than real news. Information available on: https://news.mit.edu/2018/study-twitter-false-news-travels-faster-true-stories-0308.

 [4]The European Commission, for example, uses the term “disinformation” to define false or misleading content that is disclosed with the intention of deceiving or obtaining economic or political gains, which may cause harm; and the term “misinformation” to classify false or misleading content that is shared without harmful intent, although the consequences of sharing may be harmful. More information on the concepts can be found on: https://digital-strategy.ec.europa.eu/en/policies/online-disinformation.

[5] The project followed a line of research initiated by Paula Fortuna, a former INESC TEC researcher, as part of her master’s thesis, under the supervision of Sérgio Nunes.

[6] The United Nations Sustainable Development Goals (SDGs) foresee the importance of access to information, namely 12.8 and 16.10. More information on the SDGs on: https://sdgs.un.org/goals.

Next Post
PHP Code Snippets Powered By : XYZScripts.com
EnglishPortugal