The Divvun and Giellatekno teams build language technology aimed at minority and indigenous languages
View GiellaLT on GitHub divvungiellatekno/giellalt.uit.no
Meeting on smenob transition
April 8th, 2014
Fran, Lene, Sjur, Trond
ny fil:
apertium-sme-nob.sme-nob.lex
gammel fil:
dev/archive/apertium-sme-nob.sme-nob.lex
Now we use SELECT/REMOVE rules instead of SUBSTITUTE rules.
The input looks like:
$ echo "biebmu" | apertium -d . sme-nob-biltrans
^biebmu<n><sg><nom><@HNOUN>/føde<n><m><unc><sg><nom><@HNOUN>/mat<n><m><unc><sg><nom><@HNOUN>$^.<clb>/.<sent><clb>$
And the output:
$ echo "biebmu" | apertium -d . sme-nob-lex
^biebmu<n><sg><nom><@HNOUN>/mat<n><m><unc><sg><nom><@HNOUN>$^.<clb>/.<sent><clb>$
It is much easier to debug this way. Although it means rewriting the old rules, and specifically writing default rules.
SELECT ("mat"i) IF (0 ("<biebmu>"i)) ;
the source language lemma ( “biebmu”) is the WORDFORM of the biltrans output.
word form = biebmu
readings:
$ echo "biebmu" | apertium -d . sme-nob-biltrans
^biebmu<n><sg><nom><@HNOUN>/føde<n><m><unc><sg><nom><@HNOUN>/mat<n><m><unc><sg><nom><@HNOUN>$^.<clb>/.<sent><clb>$
Output of biltrans in CG-style output:
"<biebmu>" n sg nom @HNOUN
"føde" n m unc sg nom @HNOUN
"mat" n m unc sg nom @HNOUN
"<.>" clb
"." sent clb
Lene vil oppdatere lex-fila frå dev/archive til gjeldande fil.
sme-nob boradišgohten.
- jeg begynte å spise.
+ #spise.
$ echo "boradišgohten" | apertium -d . sme-nob-biltrans
^boradit<v><tv><der_goahti><ind><prt><sg1><@+FMAINV>/spise<vblex><pers><der_goahti><ind><prt><sg1><@+FMAINV>$^.<clb>/.<sent><clb>$
Original:
^boradit<v><tv>+goahti<v><ind><prt><sg1><@+FMAINV>$
So, how do we change der_goahti to +goahti ?
boradišgohten
boradišgohten boradit+V+TV+Der/goahti+V+Ind+Prt+Sg1
borakeahttá
borakeahttá borrat+V+TV+VAbess
$ echo "boradišgohten" | apertium -d . sme-nob-morph
^boradišgohten/boradit<v><tv><der_goahti><v><ind><prt><sg1>/borrat<v><tv><der_d><v><der_goahti><v><ind><prt><sg1>$^./.<clb>$
$ echo "^boradit<v><tv>+goahti<v><ind><prt><sg1><@+FMAINV>$" | apertium-pretransfer | lt-proc -b sme-nob.autobil.bin
^boradit<v><tv>/spise<vblex><pers>$ ^goahti<v><ind><prt><sg1><@+FMAINV>/begynne<vblex><pret><sg><p1><@+FMAINV>$
$ echo "^boradit<v><tv>+goahti<v><ind><prt><sg1><@+FMAINV>$" | apertium-pretransfer | lt-proc -b sme-nob.autobil.bin | apertium-transfer -b apertium-sme-nob.sme-nob.t1x sme-nob.t1x.bin
^verb<SV><@+FMAINV><ind><pret><p1><sg><pers>{^begynne<vblex><pret>$}$ ^part<part>{^å<part>$}$ ^verb<SV><inf><pers><NC>{^spise<vblex><inf>$}$
-> begynte å spise
+Foc/ge +ge<pcle>
+Foc/gen +gen<pcle>
+Foc/ges +ges<pcle>
+Foc/gis +gis<pcle>
+Foc/naj +naj<pcle>
+Foc/ba +ba<pcle>
+Foc/be +be<pcle>
+Foc/hal +hal<pcle>
+Foc/han +han<pcle>
+Foc/bat +bat<pcle>
+Foc/son +son<pcle>
+Foc/naj+Qst <qst>+naj<pcle>
+Qst+Foc/son <qst>+son<pcle>
The + makes them into separate words (for the lexical transfer?)
$ echo "^boađátge/boahtit<v><iv><ind><prs><sg2>+ge<pcle>$" | cg-conv -a -l
"<boađátge>"
"boahtit" v iv ind prs sg2
"ge" pcle
<spectre> Unhammer, did you make any changes to the CG in sme-nob in order to deal with subreadings ?
<Unhammer> can't remember …
From SMA:
LEXICON FINAL1
ENDLEX ;
+Foc/ge+Use/Circ:#ge ENDLEX ; !
+Foc/gan+Use/Circ:#gan ENDLEX ; !
+Foc/gih+Use/Circ:#gih ENDLEX ; !
+Foc/gænnah+Use/Circ:#gænnah ENDLEX ; !
Sjur@Apertium:
Legge til ord frå nobsme/inc/False* til smenob
Propernouns