INESC TEC developed natural language processing resources for the Portuguese language

The main goal of the PTicola project was to expand and build new Natural Language Processing (NLP) capabilities for the Portuguese language. The results of this project – which include, for example, an English/European Portuguese translator and a PT-BR/PT-PT language variety identifier – address the gap in NLP resources available for PT-PT compared to PT-BR.

With access to Google Cloud Platform products, the PTicola project – Increasing Computationally Language Resources for Portuguese – created two tools considered crucial for the European Portuguese community: a variety identifier capable of differentiating PT-PT and PT-BR, plus a translation model from English to European Portuguese – both open-source.

“Both outcomes led to two publications accepted at the Annual AAAI Conference on Artificial Intelligence (AAAI), an event in high-level Artificial Intelligence (core A*), which will take place in Philadelphia, in late February,” said Alípio Jorge. In addition, and according to the INESC TEC researcher, “the tools we developed address a significant gap in NLP resources for European Portuguese, behind Brazilian Portuguese in terms of available language technologies.”

The results of PTicola were presented at a workshop held at INESC TEC

PTicola also contributed to new datasets of NLP tasks for Portuguese – such as temporal information extraction, semantic function marking and relationship extraction – and developed domain-specific tools, including a clinical case retrieval and classification system and an English-Portuguese biomedical translator. The retrieval and classification system was also accepted, as a demonstration, at a conference. In this case, at the European Conference on Information Retrieval (ECIR), which will take place in April, in the city of Lucca, Italy.

The project improved the current state of the art of the Portuguese language in different NLP tasks, through the development of new resources, whose effectiveness is significantly lower when compared to the same tasks in the English language. “The work we developed in this project not only expands the resource ecosystem for the Portuguese language, but also provides a basis for future research in specialised domains,” added Alípio Jorge.

The results of PTicola, funded by the Foundation for Science and Technology (FCT), were presented at a workshop held at INESC TEC on 13 February, which brought together almost 40 participants.

The researcher mentioned in this news piece is associated with INESC TEC.

PHP Code Snippets Powered By : XYZScripts.com
EnglishPortugal