Language Technology at UiT

The Divvun and Giellatekno teams build language technology aimed at minority and indigenous languages

View GiellaLT on GitHub divvungiellatekno/giellalt.uit.no

Page Content

Grammatikkontrollmøte 23.11.2016

Til stades: Kevin, Linda, Sjur

Tema:

Variantar

COPY:10258:wrong-valency-ahte-inf
beroštit+V+IV+Ind+Prs+Pl1       beroštat,beroštit

beroštit
beroštit	beroštit+V+IV+Inf
beroštit	beroštit+V+IV+Ind+Prs+Pl1
beroštit	beroštit+V+IV+Ind+Prs+Pl3
beroštit	beroštit+V+IV+Ind+Prt+Sg2

beroštat
beroštat	beroštit+V+IV+Ind+Prs+Pl1
beroštat	beroštit+V+IV+Ind+Prs+Sg2

Andre døme:

oarpmembealle+v1+CmpN/SgN+CmpN/SgG+CmpN/PlG+Sem/Hum:oarpmem9#beall BEALLE ;
oarpmembealle+v2+CmpN/SgN+CmpN/SgG+CmpN/PlG+Sem/Hum:oarpmem9#bealli GOAHTI-IU ;
oarpmembealle+v3+CmpN/SgN+CmpN/SgG+CmpN/PlG+Sem/Hum:oarpmen#beall BEALLE ;
oarpmembealle+v4+CmpN/SgN+CmpN/SgG+CmpN/PlG+Sem/Hum:oarpmen#bealli GOAHTI-IU ;

$ echo oarpmembealle+N+Sg+Loc | hfst-lookup -q src/generator-gt-norm.hfstol
oarpmembealle+N+Sg+Loc        oarpmembeales        0,000000
oarpmembealle+N+Sg+Loc        oarpmembealis        0,000000
oarpmembealle+N+Sg+Loc        oarpmenbeales        0,000000
oarpmembealle+N+Sg+Loc        oarpmenbealis        0,000000

oambealle+v1+CmpN/SgN+CmpN/SgG+CmpN/PlG+Sem/Hum:oam9#beall BEALLE ;
oambealle+v2+CmpN/SgN+CmpN/SgG+CmpN/PlG+Use/NG+Sem/Hum:oam9#bealli GOAHTI-IU ;

$ echo oambealle+N+Sg+Loc | hfst-lookup -q src/generator-gt-norm.hfstol
oambealle+N+Sg+Loc        oambeales        0,000000
oambealle+N+Sg+Loc        oambealis        0,000000

Og fleire døme:

animašunkursa+v1+OLang/UND+Sem/Edu:animašun#kur'sa GOAHTI-A ;
animašunkursa+v2+OLang/UND+Sem/Edu:animašun#gur'sa GOAHTI-A ;
animašunkursa+v3+OLang/UND+Sem/Edu:animašun#gur'se GOAHTI ;
animašunkursa+v4+OLang/UND+Sem/Edu:animašuvdna#kur'sa GOAHTI-A ;
animašunkursa+v5+OLang/UND+Sem/Edu:animašuvdna#gur'sa GOAHTI-A ;
animašunkursa+v6+OLang/UND+Sem/Edu:animašuvdna#gur'se GOAHTI ;

$ echo animašunkursa+N+Sg+Loc | hfst-lookup -q src/generator-gt-norm.hfstol
animašunkursa+N+Sg+Loc        animašungurssas        0,000000
animašunkursa+N+Sg+Loc        animašungursses        0,000000
animašunkursa+N+Sg+Loc        animašunkurssas        0,000000
animašunkursa+N+Sg+Loc        animašuvdnagurssas        0,000000
animašunkursa+N+Sg+Loc        animašuvdnagursses        0,000000
animašunkursa+N+Sg+Loc        animašuvdnakurssas        0,000000

bábir+v1+Dial/-GG+Sem/Mat_Txt:báh'pir MALIS ;
bábir+v2+Dial/-KJ+Sem/Mat_Txt:báh'pi7r MALIS ;
bábir+v3+Sem/Mat_Txt:báh'pur MALIS ;

Mun beroštan animašungursii.

"<Mun>"
        "mun" §AG Pron Sem/Hum Pers Sg1 Nom <W:0> @SUBJ> SUBSTITUTE:3252 MAP:17282 #1->1 SUBSTITUTE:8476
:
"<beroštan>"
        "beroštit" <mv> V <AG-Nom-Any> <TH-AktioLoc> <TH-Inf> <TH-Loc-Any> <TH-ahte> IV Ind Prs Sg1 <W:0> @+FMAINV SELECT:5086:r949 MAP:10165:r406 #2->2 SUBSTITUTE:7357:SubV=mv
* **"beroštit" V <AG-Nom-Any> <TH-Inf> <TH-Loc-Any> <TH-ahte> IV Actio Gen <W:0> SELECT:5086**: r949
* **"beroštit" V IV Actio Nom <W:0> SELECT:5086**: r949
* **"beroštit" V <AG-Nom-Any> <TH-AktioLoc> <TH-Inf> <TH-Loc-Any> <TH-ahte> IV Ind Prt ConNeg <W:0> SELECT:5086**: r949
* **"beroštit" V <AG-Nom-Any> <TH-AktioLoc> <TH-Inf> <TH-Loc-Any> <TH-ahte> IV PrfPrc <W:0> SELECT:5086**: r949
:
"<animašungursii>"
        "animašunkursa" N Sem/Edu Sg Ill <W:0> @<ADVL MAP:16745 &msyn-valency-loc-ill #3->3 ADD:10323:msyn-valency-loc-ill
msyn-valency-loc-ill
        "animašunkursa" N Sem/Edu Sg <W:0> @<ADVL MAP:16745 Loc &SUGGEST #3->3 COPY:10329:wrong-valency-loc-ill
animašunkursa+N+Sg+Loc  animašungurssas,animašungursses,animašunkurssas,animašuvdnagurssas,animašuvdnagursses,animašuvdnakurssas
* **"kursa" N Sem/Edu Sg Ill <W**: 10>
* **"animašuvdna" N Sem/Dummytag Cmp/SgNom Cmp <W:10> REMOVE:2053**: longest-match
"<.>"
        "." CLB <W:0> #4->4

+vN-taggar manglar i gramcheck-analysatoren:

"<1999:s>"
        "1999" Num Arab Sg Loc <W:0> #1->1
        "1999" Num Sem/Year Sg Loc <W:0> #1->1
:
"<bidjui>"
        "bidju" N Sem/Dummytag Sg Ill <W:0> &real-biddjui #2->2 ADD:4060:biddjui ADD:4070:biddjui ADD:4060:biddjui ADD:4070:biddjui
real-biddjui
        "biedju" N Sem/Plc Sg Ill <W:0> &real-biddjui #2->2 ADD:4060:biddjui ADD:4070:biddjui ADD:4060:biddjui ADD:4070:biddjui
real-biddjui
        "bidjat" Sem/Dummytag <W:0> V TV Der/PassS V IV Ind Prt Sg3 &SUGGEST #2->2 COPY:4078:real-biddjui
bidjat+V+TV+Der/PassS+V+IV+Ind+Prt+Sg3  biddjui
        "bidjat" Sem/Plc <W:0> V TV Der/PassS V IV Ind Prt Sg3 &SUGGEST #2->2 COPY:4078:real-biddjui
bidjat+V+TV+Der/PassS+V+IV+Ind+Prt+Sg3  biddjui
:
"<nu gohčoduvvon>"
        "nu gohčoduvvon" A Sem/Dummytag Attr <W:0> @>N SELECT:8955:r1736 MAP:15815:r86 #3->3
        "nu gohčoduvvon" Err/UnspaceCmp A Sem/Dummytag Attr Err/SpaceCmp <W:0> @>N SELECT:8955:r1736 MAP:15815:r86 #3->3
        "nu gohčoduvvon" Err/UnspaceCmp A Sem/Dummytag Attr <W:0> @>N SELECT:8955:r1736 MAP:15815:r86 &SUGGEST #3->3 COPY:8944:compound
nu gohčoduvvon+Err/UnspaceCmp+A+Attr    ?
* **"nu gohčoduvvon" A Sem/Dummytag Pl Nom <W:0> SELECT:8955**: r1736
* **"nu gohčoduvvon" A Sem/Dummytag Sg Nom <W:0> SELECT:8955**: r1736
* **"nu gohčoduvvon" Err/UnspaceCmp A Sem/Dummytag Pl Nom Err/SpaceCmp <W:0> SELECT:8955**: r1736
* **"nu gohčoduvvon" Err/UnspaceCmp A Sem/Dummytag Sg Nom Err/SpaceCmp <W:0> SELECT:8955**: r1736
* **"gohččut" VV TVV Der/d VV Der/PassL V IV PrfPrc <W**: 0> "< gohčoduvvon>"
* **"nu" Pcle <W:0> "<nu>" REMOVE:2053**: longest-match
* **"gohččut" VV TVV Der/d VV Der/PassL V IV PrfPrc <W**: 0> "< gohčoduvvon>"
* **"nu" Adv <W:0> "<nu>" REMOVE:2053**: longest-match
* **"gohččut" VV TVV Der/d VV Der/PassL V IV Ind Prt ConNeg <W**: 0> "< gohčoduvvon>"
* **"nu" Pcle <W:0> "<nu>" REMOVE:2053**: longest-match
* **"gohččut" VV TVV Der/d VV Der/PassL V IV Ind Prt ConNeg <W**: 0> "< gohčoduvvon>"
* **"nu" Adv <W:0> "<nu>" REMOVE:2053**: longest-match
* **"gohččut" VV TVV Der/d VV Der/PassL V IV Ind Prs Sg1 <W**: 0> "< gohčoduvvon>"
* **"nu" Pcle <W:0> "<nu>" REMOVE:2053**: longest-match
* **"gohččut" VV TVV Der/d VV Der/PassL V IV Ind Prs Sg1 <W**: 0> "< gohčoduvvon>"
* **"nu" Adv <W:0> "<nu>" REMOVE:2053**: longest-match
* **"gohčodit" VV TVV Der/PassL V IV PrfPrc <W**: 0> "< gohčoduvvon>"
* **"nu" Pcle <W:0> "<nu>" REMOVE:2053**: longest-match

$ echo "nugohčoduvvon" | hfst-tokenise --giella-cg tools/preprocess/tokeniser-gramcheck-gt-desc.pmhfst
"<nugohčoduvvon>"
        "nu gohčoduvvon" Err/UnspaceCmp A Sem/Dummytag Attr <W:0>
        "nu gohčoduvvon" Err/UnspaceCmp A Sem/Dummytag Pl Nom <W:0>
        "nu gohčoduvvon" Err/UnspaceCmp A Sem/Dummytag Sg Nom <W:0>
:\n

$ echo "nu gohčoduvvon" | hfst-tokenise --giella-cg tools/preprocess/tokeniser-gramcheck-gt-desc.pmhfst
"<nu gohčoduvvon>"
        "nu gohčoduvvon" A Sem/Dummytag Attr <W:0>
        "nu gohčoduvvon" A Sem/Dummytag Pl Nom <W:0>
        "nu gohčoduvvon" A Sem/Dummytag Sg Nom <W:0>
        "nu gohčoduvvon" Err/UnspaceCmp A Sem/Dummytag Attr Err/SpaceCmp <W:0>
        "nu gohčoduvvon" Err/UnspaceCmp A Sem/Dummytag Pl Nom Err/SpaceCmp <W:0>
        "nu gohčoduvvon" Err/UnspaceCmp A Sem/Dummytag Sg Nom Err/SpaceCmp <W:0>
        "gohččut" VV TVV Der/d VV Der/PassL V IV PrfPrc <W:0> "< gohčoduvvon>"
                "nu" Pcle <W:0> "<nu>"
        "gohččut" VV TVV Der/d VV Der/PassL V IV PrfPrc <W:0> "< gohčoduvvon>"
                "nu" Adv <W:0> "<nu>"
        "gohččut" VV TVV Der/d VV Der/PassL V IV Ind Prt ConNeg <W:0> "< gohčoduvvon>"
                "nu" Pcle <W:0> "<nu>"
        "gohččut" VV TVV Der/d VV Der/PassL V IV Ind Prt ConNeg <W:0> "< gohčoduvvon>"
                "nu" Adv <W:0> "<nu>"
        "gohččut" VV TVV Der/d VV Der/PassL V IV Ind Prs Sg1 <W:0> "< gohčoduvvon>"
                "nu" Pcle <W:0> "<nu>"
        "gohččut" VV TVV Der/d VV Der/PassL V IV Ind Prs Sg1 <W:0> "< gohčoduvvon>"
                "nu" Adv <W:0> "<nu>"
        "gohčodit" VV TVV Der/PassL V IV PrfPrc <W:0> "< gohčoduvvon>"
                "nu" Pcle <W:0> "<nu>"
        "gohčodit" VV TVV Der/PassL V IV PrfPrc <W:0> "< gohčoduvvon>"
                "nu" Adv <W:0> "<nu>"
        "gohčodit" VV TVV Der/PassL V IV Ind Prt ConNeg <W:0> "< gohčoduvvon>"
                "nu" Pcle <W:0> "<nu>"
        "gohčodit" VV TVV Der/PassL V IV Ind Prt ConNeg <W:0> "< gohčoduvvon>"
                "nu" Adv <W:0> "<nu>"
        "gohčodit" VV TVV Der/PassL V IV Ind Prs Sg1 <W:0> "< gohčoduvvon>"
                "nu" Pcle <W:0> "<nu>"
        "gohčodit" VV TVV Der/PassL V IV Ind Prs Sg1 <W:0> "< gohčoduvvon>"
                "nu" Adv <W:0> "<nu>"
:\n

UnspaceCmp vs SpaceCmp

Løysing: legg til skrik på alle ordgrenser som er UnspaceCmp

Men hugs at skriket berre skal på #-en som svarer til feilen (viss det er fleire #, skal ikkje dei andre ha skrik).

Sjekk etter at UnspaceCmp har fått skrik på rette plassen: Taklar me faktiske kombinasjonar av spacecmp og unspacecmp? FEILEN: vivva sásaguovttos → RETTINGA: vivvasása guovttos

Grammatikkontrollforslag med kommentarar

Kommentar frå Børre om vuosttaš vs vuosttáš: Det er ikkje sikkert folk som ser forslaget kjenner seg trygge på om grammatikkontrollen tar rett eller ikkje. Men viss dei ser eit kort døme på bruken av begge to («okta dain vuosttaš» vs «…»)

Kontekstsensitiv stavekontroll med CG

Me byggjer ein alternativ pipeline:

"hfst-ospell-cg" -l analyser.hfst -m errmodel.hfst \
|  vislcg3 -g disambiguator.cg3 | divvun-suggest-liknande

hfst-ospell-cg ventar naivt tokenisert input («a b. C» blir til ‘a’, ‘b’, ‘.’, ‘C’ sjølv om kanskje «a b» er eit fleirordsuttrykk osb.).

Vi kan òg vurdera ein naiv MWE-sjekk for stavekontrollen, med opp til tre ord, slik at ord som i seg sjølv er ukjende for stavekontrollen likevel ikkje blir flagga dersom dei er del av ein MWE.

Døme på korleis stavekontrollen fungerer no dersom akseptor er ein full analysator, og ikkje berre ein automat:

$ echo mohtorgielkámádija | hfst-ospell -a -S -l analyser-desktopspeller-gt-norm.hfstol -m hfst/errmodel.default.hfst
"mohtorgielkámádija" is NOT in the lexicon:
Corrections for "mohtorgielkámádija":
muhtorgielkámáđija    -1.698242    muhtorgielkámáđidja+N+Sg+Acc
muhtorgielkámáđija    -1.698242    muhtorgielkámáđidja+N+Sg+Gen
muhtorgielkámáđija    -1.698242    muhtorgielkámáđidja+N+Sg+Gen+Allegro
muhtorgielkámáđija    -1.698242    muhtorgielká+N+Cmp#máđidja+N+Sg+Gen
muhtorgielkámáđija    -1.698242    muhtorgielká+N+Cmp#máđidja+N+Sg+Gen+Allegro
muhtorgielkámáđija    -1.698242    muhtorgielká+N+Cmp#máđidja+N+Sg+Acc
muhtorgielkámáđija    -1.698242    muhtor+N+Cmp#gielká+N+Cmp#máđidja+N+Sg+Gen
muhtorgielkámáđija    -1.698242    muhtor+N+Cmp#gielká+N+Cmp#máđidja+N+Sg+Gen+Allegro
muhtorgielkámáđija    -1.698242    muhtor+N+Cmp#gielká+N+Cmp#máđidja+N+Sg+Acc
mohtorgielkámáđija    7.301758    mohtorgielkámáđidja+N+Sg+Acc
mohtorgielkámáđija    7.301758    mohtorgielkámáđidja+N+Sg+Gen
mohtorgielkámáđija    7.301758    mohtorgielkámáđidja+N+Sg+Gen+Allegro
mohtorgielkámáđija    7.301758    mohtorgielká+N+Cmp#máđidja+N+Sg+Gen
mohtorgielkámáđija    7.301758    mohtorgielká+N+Cmp#máđidja+N+Sg+Gen+Allegro
mohtorgielkámáđija    7.301758    mohtorgielká+N+Cmp#máđidja+N+Sg+Acc
mohtorgielkámáđija    7.301758    mohtor+N+Cmp#gielká+N+Cmp#máđidja+N+Sg+Gen
mohtorgielkámáđija    7.301758    mohtor+N+Cmp#gielká+N+Cmp#máđidja+N+Sg+Gen+Allegro
mohtorgielkámáđija    7.301758    mohtor+N+Cmp#gielká+N+Cmp#máđidja+N+Sg+Acc
mohtorgielkámáđijat    17.301758    mohtorgielkámáđidja+N+Pl+Nom
mohtorgielkámáđijat    17.301758    mohtorgielká+N+Cmp#máđidja+N+Pl+Nom