Select your language



Sort de documënt: Revista Ladinia

Titul:

Traduzione automatica “neurale” per il ladino della Val Badia

Frontull, Samuel/Moser, Georg



Descriziun: Ladinia XLVIII, 119–144

https://doi.org/10.54218/ladinia.48.119-144


Ressumé
Te chësta relaziun presentunse na modalité de traduziun automatica neuronala por le ladin dla Val Badia. La modalité neuronala se ghira n gran numer de traduziuns d’ejëmpl por che le sistem funzionëies indortöra. Dal momënt che la desponibilité de chisc dac paralei por le ladin é dër limitada vál debojëgn da abiné adöm chisc dac tolon sö tesć ma te un n lingaz, tuc fora en gran pert dal foliet edemal ladin “La Usc di Ladins”. Nos adorun na implementaziun dl dizionar talian “Italiano–Ladin Val Badia” te Apertium por filtré y comedé i tesć y ince por cherié traduziuns temporanes. Tolon le API de DeepL laurunse spo fora ­chëstes traduziuns, ciaran dantadöt da les mioré dal punt d’odüda gramatical. Le corpus che vëgn insciö a s’al dé, vëgn adoré coche basa por deplü esperimënc. I insignun jö i modei Transformer bele dal mëteman inant y i adatun modei de traduziun che é bele, arjunjon resultac ezelënc cun trames les modalités. Nüsc esperimënc desmostra che chisc sistems neuronai funzionëia damí co i modei de traduziun automatica statistics y chi che se basëia sön regoles studiades cina dan da püch. I un metü a desposiziun i modei svilupá tres n’aplicaziun web. Implü unse cherié na plataforma por la revijiun costanta dl corpus por podëi mioré le model tres l’intervënt dla porsona tl post-editing.

Abstract
In this report we present a neural approach to machine translation for the Val Badia variant of Ladin. To achieve good results, neural models require a large number of exemplary translations on which they can be trained. The limited availability of such parallel data for Ladin makes it necessary to synthesise such data by using monolingual texts. We mainly use texts from the Ladin newspaper “La Usc di Ladins” as a basis for this so-called back translation. We translate these texts into Italian, using a rule-based system implemented in Apertium. Using DeepL API, we postprocess these translations and improve them, mainly at grammatical level. The resulting corpus serves as a basis for the different experiments we perform when we train models for this language pair. We train Transformer models from scratch and fine-tune pretrained models. With both methods we have achieved results that outperform the statistical and rule-based approaches to machine translation investigated so far. The models have been made available by means of a web application. Furthermore we have launched a platform for the continuous revision of the corpus to allow for the continuous improvement of the model through human post-editing.


Lingaz: IT

Data: 2024

Autur: Frontull, Samuel/Moser, Georg

Copyright: Istitut Ladin Micurá de Rü - ISSN 1124-1004

Media: