Language Technology at UiT

The Divvun and Giellatekno teams build language technology aimed at minority and indigenous languages

View GiellaLT on GitHub divvungiellatekno/giellalt.uit.no

Page Content

Giellatekno-møte 24.6.

Saksliste

fst:

Bakgrunn

Vi hadde hatt diskusjon i Canada, m.a. med Måns Huldén, om dette.

I staden for referat legg vi her ut skisse frå diskusjonen:

--------------
samis	sápmi+Hum+N+Sg+Loc+AErr

L   sápmi+N+Sg+Loc +AErr !!
    sápmi^WGs


T   sápmi^WGs
    sámis


S   sámis    á (->) a ;
    samis
--------------
define Lex {ääliö} ("+MIS":"^") ;  <--- lexc +AErr:+AErr

def Rule1 ä (->) a X , ö (->) o X | _ ?* "^" ;
def Rule2 "^" =>  X ?* _ ;
def Rule3 X -> 0 .o. "^" -> 0;

regex Lex .o. Rule1 .o. Rule2 .o. Rule3;
------------------------
heile sme-lexc + kopi.sme-lex med ("+MIS":"^");
define Lex lexicon.xfst ("+MIS":"^")
define
business as usual [sme.lexc  kopi&+MIS] .o. sme.bin --> sme.fst
Rule 123
------------------------
heile sme-lexc med nytt ENDLEX:
LEXICON ENDLEX +MIS:%^ # ; # ;
def Rule1 ä (->) a X , ö (->) o X | _ ?* "^" ;
def Rule2 "^" =>  X ?* _ ;
def Rule3 X -> 0 .o. "^" -> 0;
regex lexicon.xfst .o. Rule1 .o. Rule2 .o. Rule3;

------------------------
foma
source aalio.foma
up aalio
    ääliö+MIS
up ääliö
    ääliö

re operator: => context restriction, e.g.
A => B _ C, D _ E

"LangA"
á:a =>   _ :*  ( %>: ) ( »: ) ( »: »: »: ) %+AErr:0 ;
 Cns:+ _ ( %>: ) ( »: ) ( »: »: »: ) (¤:) (StemCns:*) ( %>: ) ( »: ) ( »: »: »: ) X7: ;

"OnlyA"
%+AErr:0 => á:a :* _ ;

--------------

guesser

foma
source guesstest.foma

---------------------------------
guesstest.foma
------------
def C [b|c|d|f|g|h|k|l|m|n|p|q|r|s|t|v|w|x|z](%@);
def V [a|e|i|o](u);
def PhonWord [C* V (V) C*]+;

read lexc guesstest.lexc
substitute defined PhonWord for "^GUESSNOUNSTEM"

def Grammar;
def Phonology [..] -> e | [s h | c h | s | x ]( z) _ s .#. ;
def Grammar Grammar .o. Phonology ;

def GuessGrammar  $[ "+GUESS" ] .o. Grammar ;
def CleanGrammar ~$[ "+GUESS" ] .o. Grammar ;

regex CleanGrammar .p. GuessGrammar ;  # Lower-side priority union
----------------------

guesstest.lexc:
----
Multichar_Symbols +N +Sg +Pl ^GUESSNOUNSTEM +GUESS

LEXICON Root
Nouns ;

LEXICON Nouns

cat                                 NInfl ;
dog                                 NInfl ;
mouse                               NInfl ;
horse                               NInfl ;
^GUESSNOUNSTEM+GUESS:^GUESSNOUNSTEM NInfl ;

LEXICON NInfl
+N:0 Infl;

LEXICON Infl
+Pl:s  #;
+Sg:   #;
---------------------

apply up> horse
horse+N+Sg
apply up> blabla
blabla+GUESS+N+Sg
apply up> blablas
blablas+GUESS+N+Sg
blabla+GUESS+N+Pl
apply up> blablass
???
apply up> blablases
blablases+GUESS+N+Sg
blablase+GUESS+N+Pl
blablas+GUESS+N+Pl
blablass+GUESS+N+Sg
apply up>
---------------------

giellabeassi
giellabeassi	girji+N+SgNomCmp+Cmp#busse+N+G3+Sg+Nom  = 1
giellabeassi	girjebusse+N+G3+Sg+Nom  = 0
==> 0 er mindre enn 1

a. giellabeassi
b. giella#beassi

a - b = the winner: giellabeassi

a.
giellabeassi   <========
g#i#ellabeassi
giella#beassi
...

b.
giella#beassi
g#iella#beassi
...