Language Technology at UiT

The Divvun and Giellatekno teams build language technology aimed at minority and indigenous languages

View GiellaLT on GitHub divvungiellatekno/giellalt.uit.no

sme-sma-mt meeting 15.8.2013

Francis, Lene, Trond.

Agenda

Documentation

Now also on [http://wiki.apertium.org]

Linguistics

ieš:

[http://pastebin.com/raw.php?i=kZDbYK1z]

Numerals

1    ^vïjhte<num><sg><ill><attr>$   #vïjhte\<num\>\<sg\>\<ill\>\<attr\> ^vïjhte/vïjhte<num><sg><acc>/vïjhte<num><sg><nom>$^./.<clb>$
1    ^göökte<num><sg><ill><attr>$   #göökte\<num\>\<sg\>\<ill\>\<attr\> ^göökte/göökte<num><sg><acc>/göökte<num><sg><nom>$^./.<clb>$
1    ^gellie<num><sg><ill><attr>$   #gellie\<num\>\<sg\>\<ill\>\<attr\> ^gellie/gellie<num><sg><nom>/gellie<pron><indef><sg><nom>$^./.<clb>$
1    ^akte<num><sg><ill><attr>$ #akte\<num\>\<sg\>\<ill\>\<attr\>   ^akte/akte<num><sg><nom>$^./.<clb>$

1    ^akte<num><sg><gen><der_lágan><a><attr>$   #akte\<num\>\<sg\>\<gen\>\<der_lágan\>\<a\>\<attr\> ^akte/akte<num><sg><nom>$^./.<clb>$

1    ^golme<num><sg><gen><cmp>$ #golme\<num\>\<sg\>\<gen\>\<cmp\>   ^golme/golme<num><sg><acc>/golme<num><sg><nom>$^./.<clb>$

passl

<e><p><l>viiddidit<s n="v"/><s n="tv"/><s n="der_passl"/><s n="v"/><s n="iv"/></l><r>væjranidh<s n="v"/><s n="iv"/></r></p></e>

Sámegiela dilli olggobealde hálddašanguovllu gal ii leat buorre, ja evalueren konkludere dainna ahte hálddašanguovlu ferte viiddiduvvot.

Lemma

Variants getting the same lemma. - looking for the lemma in the bidix.

Fran: make analysis use desc fst.

Technicalities

Declaring symbols

Apertium svn messages as mail

We get source forge mail, but not svn mail. Fran to have a look.

https://sourceforge.net/auth/subscriptions/

“Apertium: machine translation toolbox svn direct day All artifacts “

Syntax

echo "Mun oasttán láibbi buvddas." | preprocess | usme | lookup2cg | vislcg3 -g src/sme-dis.rle | vislcg3 -g src/smi-syn.rle

"<Mun>"
        "mun" Pron Pers Sg1 Nom @SUBJ>
"<oasttán>"
        "oastit" V TV Ind Prs Sg1 @+FMAINV
"<láibbi>"
        "láibi" Food N Sg Acc @<OBJ
"<buvddas>"
        "buvda" Org Build N Sg Loc @<ADVL
"<.>"
        "." CLB
"<Mus>"
        "mun" Pron Pers Sg1 Loc @HAB> <====== Gen
"<lea>"
        "leat" V IV Ind Prs Sg3 @+FMAINV
"<láibi>"
        "láibi" Food N Sg Nom @<SPRED
"<.>"
        "." CLB

Results:

echo "Mus lea láibi." | apertium -d . sme-sma
Mannesne lea laejpie.

$ echo "Mus lea láibi" | apertium -d . sme-sma
Mov (lea) laejpie.

echo "Mus leat guokte láibbi." | apertium -d . sme-sma
Mov (leah) göökte laejpien. <=== laejpieh

[http://wiki.apertium.org/wiki/North_S%C3%A1mi_and_South_S%C3%A1mi/Pending_tests]

fran@eki:~/source/apertium/nursery/apertium-sme-sma$ bash pending-tests.sh Running Pending-tests with mode “sme-sma” with updated tests……

Syntactic functions

We copy old gt/sme/src/smi-syn.rle to functions.cg3

it is here:

langs/sme/src/syntax/functions.cg3

genration report

cat generation-report.sme-sma.txt | grep "#"

hash gives

freq     postchunk                    form generated                        lemma analysed
-------------------------------------------------------------------------------------------
9    ^jienebe<a><comp><sg><nom>$    #jienebe\<a\>\<comp\>\<sg\>\<nom\>  ^jienebe/jienebe<pron><indef><sg><nom>$^./.<clb>$
7    ^rolle<n><sg><acc>$            #rolle\<n\>\<sg\>\<acc\>            ^rolle/*rolle$^./.<clb>$
6    ^njoetseldh<a><attr><cmp>+almetje<n><pl><ill>$ #njoetseldh\<a\>\<attr\>\<cmp\>almetjidie   ^njoetseldh/njoetsedh<v><iv><der_l
dh><n><sg><nom>/njoetseldh<a><sg><nom>$^./.<clb>$

    <e><p><l>eambbo<s n="a"/></l><r>jienebe<s n="a"/></r></p></e>
    <e><p><l>eanet<s n="a"/></l><r>jienebe<s n="a"/></r></p></e>
find line with output:
cat texts/samediggidiedadus_samegiela_birra_2012.sme.txt | apertium -d . sme-sma-postchunk | cat -n | grep "jïjtje<pron><refl><sg><gen><pxpl1>"

find input:
head -n 17 texts/samediggidiedadus_samegiela_birra_2012.sme.txt | tail -1 | apertium -d . sme-sma-tagger