Work framed within the International Workshop on Semantic Evaluation 2025 included a comprehensive annotation process leading to the creation of a multilingual dataset, providing the basis for training models capable of identifying narratives in European Portuguese.
The vast number of communication channels introduced by the Internet led to new dimensions of misinformation and discourse manipulation. Uncovering messages and explaining narratives, especially in contexts of high polarisation, is an essential step in combating phenomena that negatively impact media literacy and, consequently, the formation of public opinion. With this in mind, researchers from INESC TEC created a dataset that will enable the identification, understanding, and explanation of narratives, supporting journalists, fact-checkers, and citizens.
The work, corresponding to Task 10 of the International Workshop on Semantic Evaluation (SemEval) 2025, focused on news about the war in Ukraine and climate change, “as they represent two separate international domains, but equally central to public debate,” explained Nuno Guimarães and Purificação Silvano, researchers at INESC TEC and members of the team that developed the task. The initiative was led by Ricardo Campos and Alípio Jorge, within an international consortium that included researchers from several universities around the world, as well as from the Joint Research Centre (JRC) of the European Commission. The two topics – in the case of the war in Eastern Europe, because it is a highly current geopolitical event, and in the case of climate change, because it is an “evergreen topic characterised by persistent controversies about science, policies, and economic interests” – allowed the analysis of narratives in both immediate and long-term contexts, offering a comprehensive scenario to study strategies of discursive manipulation.
In the case of Portuguese-language news, a “careful” selection was made to ensure “political diversity and relevance”: the list of news sources included, for example, national newspapers and opinion pieces from partisan websites. As it was observed that the Portuguese media ecosystem is still relatively unpolarised – especially when compared with contexts like that of the United States of America – the corpus was complemented with news from Brazilian sources (which were subsequently manually translated into European Portuguese). This way, as researcher Nuno Guimarães mentioned, “the team was able to ensure balanced and representative coverage,” reflecting “different ideological lines present in the Portuguese-speaking world”.
The next stage of the process consisted of the annotations, through which three key types of information were identified. The first group concerns entities (e.g., people, organisations, and countries) that play a relevant role in the narratives conveyed by the news. Once identified, these entities were classified based on a hierarchical taxonomy that included the main classes protagonist, antagonist, and victim, each subdivided into subclasses. The second group involves a classification of the narratives and sub-narratives relevant to each paragraph and to the news piece. In the case of the war in Ukraine (Ukraine-Russia War), the taxonomy includes narratives such as Discrediting Ukraine, with sub-narratives like Ukraine is a puppet of the West and Ukraine is a hub for criminal activities, among others. Regarding the topic of climate change (Climate Change), one of the narrative classes was Climate change is beneficial, with sub-narratives such as Temperature increase is beneficial. The third and final phase consisted of formulating a short textual explanation that justified the selection of the main narrative of the news item made in the previous phase, based on evidence found in the text.
Described by Purificação Silvano as “demanding and painstaking,” the annotation task lasted “around seven months” and involved – in the case of the European Portuguese data – four annotators and one curator, all with relevant linguistics training “to ensure an understanding of the argumentative strategies present in the texts and the identification of communicative goals, both at the level of each paragraph and in the overall organisation of the text”. The annotation teams, despite being experienced, went through a “preliminary phase of familiarisation with the annotation manual, training, and clarification of doubts, which enabled the standardisation and consolidation of annotation criteria,” the researcher explained.
As for the taxonomy, the analysis of data in Portuguese made it possible to identify not only attributes of the category of entities but also those of the categories of narratives and sub-narratives, which had not been initially included.
The thorough process of structuring and analysing narratives culminated in the creation of a dataset, a crucial tool for understanding how dominant discourses shape public perceptions and reinforce biases. Thus, the objective was not only to map narratives but also to support the development of methods capable of promoting a more critical and informed interpretation of the news ecosystem. According to Nuno Guimarães, this dataset provides a “basis for training models that not only identify narratives in European Portuguese but also explain why a text was classified in a given way”.
By having access to a tool with this explanatory dimension, communication professionals benefit from an instrument that does not merely indicate, for example, that a piece of news amplifies climate fears or discredits institutions. It also makes it possible to show “which arguments, entities, and discursive frameworks support that classification”. This opens the way for “automated monitoring tools that allow newsrooms and fact-checking platforms to detect, in real time, not only dominant narratives but also to understand how these narratives are constructed and reinforced,” explains Nuno Guimarães.
The researchers mentioned in this news piece are associated with INESC TEC, the Faculty of Sciences of the University of Porto, UBI, and the Faculty of Arts and Humanities of the University of Porto.