Language Technology at UiT The Arctic University of Norway

The Divvun and Giellatekno teams build language technology aimed at minority and indigenous languages

View GiellaLT on GitHub divvungiellatekno/giellalt.uit.no

Page Content

Møte om samansetjingar - del 2

Normativitet for samansetjingar:

Sjekka inn fra korpus analysert 11.10.2016: Finnes i sma/src/morphology/incoming/

+Cmp/Sh - tagger kortform når det finnes to muligheter, kortform og ikke

Type (man teller bare forskjellige ord): | 34438 | Cmp/SgNom <= her er også Der/NomAct, dvs at forleddet er et verb | 3978 | Cmp/Attr | 1401 | Cmp/SgGen | 886 | Cmp/PlGen dvs. liten variasjon

Token (man teller alle ordene/forekomstene): | 134140 | Cmp/SgNom <= her er også Der/NomAct, dvs at forleddet er et verb | 13217 | Cmp/Attr | 10640 | Cmp/Hyph | 8402 | Cmp/SgGen | 6028 | Cmp/PlGen dvs. liten variasjon | 3426 | Cmp/SplitR

Hapaxfrekvens (lista er laget av ord som bare finnes en gang i korpuset): | 37314 | Cmp/SgNom | 4788 | Cmp/Hyph | 3177 | Cmp/Attr | 1415 | Cmp/PlGen | 1414 | Cmp/SgGen | 592 | Cmp/SplitR

Formfrekvens: | +N+Cmp/SgNom | 110827 | +N+Cmp/SgGen | 7028 | +N+Cmp/PlGen | 5585

Dvs. 11,3% av samansetjingane har eit genitivforledd

CpmNP-taggar

Frå sma/*/root.exc:

This entry / word should be in the following position(s): | Tag | Meaning |Exmpl.| Explanation | — | — | — | — | +CmpNP/All | … in all positions, default, this tag does not have to be written | - | +CmpNP/First| … only be first part in a compound or alone | huvva | to prevent dáhpáhuvva | +CmpNP/Pref | … only first part in a compound, NEVER alone | eahpe-| eahpeipmil = a no god | +CmpNP/Last | … only be last part in a compound or alone | ráigi+N+CmpNP/Last+Sem/Route+Sg+Gen+Allegro:rái K ; | +CmpNP/Suff | … only last part in a compound, NEVER alone | lohka+CmpNP/Suff+Sem/Dummytag:loh’ka GOAHTI-A ; | +CmpNP/None | … does not take part in compounds | váhku+CmpNP/None+Sem/Dummytag:váh’ku GOAHTI-U ; | because it resembles vahkku | +CmpNP/Only | … only be part of a compound, i.e. can never be used alone, but can appear in any position

Pref:

adverbs.lexc:vulos+Adv+CmpNP/Pref+CmpN/SgN+Sem/Dummytag+Err/Orth:vulos Rreal ;
sme-propernouns.lexc:Ohcejohka+N+Prop+CmpNP/Pref+Sem/Plc+Err/Orth+Sem/Dummytag:ohce#joh Rreal ;
sme-propernouns.lexc:Guovdageaidnu+N+Prop+CmpNP/Pref+CmpN/SgN+Err/HyphSub+Sem/Dummytag:Guovdageain Rnoun ;
sme-propernouns.lexc:Kárášjohka+N+Prop+CmpNP/Pref+CmpN/SgN+Err/HyphSub+Sem/Dummytag:Kárášjoh9 Rnoun ;
sme-propernouns.lexc:Kárášjohka+N+Prop+CmpNP/Pref+CmpN/SgN+Err/HyphSub+Sem/Dummytag:Kárášjot Rreal ;
sme-propernouns.lexc:Ohcejohka+N+Prop+CmpNP/Pref+CmpN/SgN+Err/HyphSub+Sem/Dummytag:Ohcejoh9 Rreal ; ! Cannot stand alone
sme-propernouns.lexc:Ohcejohka+N+Prop+CmpNP/Pref+CmpN/SgN+Err/HyphSub+Sem/Dummytag:Ohcejot Rnoun ; ! Can stand alone
sme-propernouns.lexc:Skáidejávri+Prop+N+CmpNP/Pref+CmpN/SgN+Err/HyphSub+Sem/Dummytag:Skáidejár Rreal ;

First :sma

SpellNoSugg i abbreviations:  152 sma, 152 sme, 156 smj, 152 smn, 172 sms
abbreviations.lexc:ö+Use/SpellNoSugg+CmpNP/First:ö ab-nodot-noun; !
abbreviations.lexc:č+Use/SpellNoSugg+CmpNP/First:č ab-nodot-noun; !
abbreviations.lexc:š+Use/SpellNoSugg+CmpNP/First:š ab-nodot-noun; !
adjectives.lexc:eahpedásset+CmpNP/First+Sem/Hum:eahpe#dásse7d JOHTIL ;
adjectives.lexc:eahpedoaibmi+CmpNP/First+Sem/Hum:eahpe#doaibmi LEXATTR ;
adjectives.lexc:eahpesávahahtti+CmpNP/First+Sem/Hum:eahpe#sávaha HAHTTI ;
nouns.lexc:A-vitamiidna+CmpNP/First+Sem/Substnc:A-vitam ADRENALIN ;
nouns.lexc:mánáid-TV-kanála+CmpNP/First+Sem/Plc-abstr:mánáid9-TV-kanál SOSIAL ;
nouns.lexc:nuorta-finnmárkulaš+CmpN/SgN+CmpN/SgG+CmpN/PlG+CmpNP/First+Sem/Hum:nuorta-finnmárkulažž JOHTOLAT ;

Last:

adjectives.lexc:IT-guoskevaš+CmpNP/Last+Sem/Dummytag:IT-guoskev AHKASAS ; gööktesh
nouns.lexc:advokáhta% guovttis+CmpNP/Last+OLang/UND+Sem/Hum:advokáhta% guo GUOVTTIS ;
nouns.lexc:advokáhta% guovttos+CmpNP/Last+OLang/UND+Sem/Hum:advokáhta% gu GUOVTTU ;
nouns.lexc:feastit+V+TV+Der2+Der/NomAg+CmpN/SgN+CmpN/SgG+CmpN/PlG+CmpNP/Last+Sem/Hum:feasti¤ ACTORLEX ;
nouns.lexc:jahkelohki+CmpNP/Last+CmpN/SgNomLeft+CmpN/SgGenLeft+CmpN/PlGenLeft+Sem/Dummytag:jahke#loh'ki GOAHTI-I ;

Suff::sma

adjectives.lexc:lágan+CmpN/SgNomLeft+CmpN/SgGenLeft+CmpN/PlGenLeft+CmpNP/Suff+Sem/Hum: LAGAN;  ! XXX Look at syntactic analysis...
adjectives.lexc:lágaš+CmpN/SgNomLeft+CmpN/SgGenLeft+CmpN/PlGenLeft+CmpNP/Suff+Sem/Hum: LAGAS;  ! XXX Look at syntactic analysis...
nouns.lexc:lohka+CmpNP/Suff+Sem/Dummytag:loh'ka GOAHTI-A ;

None:

abbreviations.lexc:r+CmpNP/None:r ab-dot-IVprfprc; !Lene: None er ukjent
acronyms.lexc:as+CmpNP/None:as ACRO ; !A/S
acronyms.lexc:oy+CmpNP/None:oy ACRO ; !A/S
adjectives.lexc:ollu+CmpNP/None+Sem/Dummytag:ol'lu BOREALA ;
adjectives.lexc:olu+CmpNP/None+Sem/Dummytag:olu BOREALA ;
adjectives.lexc:veara+CmpNP/None+A+Ess+Sem/Dummytag:veara%>n K ;
adjectives.lexc:veara+CmpNP/None+A+Sg+Gen+Sem/Dummytag:veara K ;
adjectives.lexc:veara+CmpNP/None+A+Sg+Nom+Sem/Dummytag:veara K ;
nouns.lexc:Abimelek% guovttis+CmpNP/None+Sem/Hum:Abimelek% guo GUOVTTIS ;
nouns.lexc:Abimelek% guovttos+CmpNP/None+Sem/Hum:Abimelek% gu GUOVTTU ;
nouns.lexc:gaigŋirbealis+CmpNP/None+Err/Orth+Sem/Dummytag:gaigŋer#bealis LASIS ;
nouns.lexc:gaigŋirbealis+CmpNP/None+Sem/Dummytag:gaigŋir#bealis LASIS ;

Only:

nouns.lexc:bealle+CmpN/SgNomLeft+CmpN/SgGenLeft+CmpN/PlGenLeft+CmpNP/Only+Sem/Part:beall BEALLE ;

CmpN/Left-taggar

 grep CmpN/.*Left ../sme/src/morphology/stems/nouns.lexc |grep 'CmpN/SgNomLeft+CmpN/SgGenLeft+CmpN/PlGenLeft'|wc -l
    1291

Desse har krav til færre enn tre former på forledd: Ikkje SgGen:

gaska+CmpN/SgN+CmpN/SgNomLeft+CmpN/PlGenLeft+Sem/Feat-measr_Plc:gas'ka GOAHTI-A ;
earru+CmpN/SgN+CmpN/SgNomLeft+CmpN/PlGenLeft+Sem/Feat:earru ALBMI ;
earuhus+CmpN/SgN+CmpN/SgNomLeft+CmpN/PlGenLeft+Sem/Dummytag:earuhuss JOHTOLAT ;
erohus+CmpN/SgN+CmpN/SgNomLeft+CmpN/PlGenLeft+Sem/Feat:erohuss JOHTOLAT ;
guorra+CmpNP/Last+CmpN/SgNomLeft+CmpN/PlGenLeft+Sem/Dummytag:guorra GOAHTI-A ;
divuhus+CmpN/SgN+CmpN/SgNomLeft+CmpN/PlGenLeft+Sem/Dummytag:divuhuss JOHTOLAT ;
goazahus+CmpN/SgN+CmpN/SgNomLeft+CmpN/PlGenLeft+Sem/Dummytag:goazahuss JOHTOLAT ;

Ikkje SgNom:

kulturduohki+v1+CmpN/SgN+CmpN/SgGenLeft+CmpN/PlGenLeft+OLang/NOB+Sem/Feat:kultur#duoh'ki DUOHKI ;
kulturduohki+v2+CmpN/SgN+CmpN/SgGenLeft+CmpN/PlGenLeft+Sem/Feat:kulttor#duoh'ki DUOHKI ;
oahpahusduohki+CmpN/SgN+CmpN/SgGenLeft+CmpN/PlGenLeft+Sem/Feat:oahpahus#duoh'ki DUOHKI ;
bálda+CmpN/SgN+CmpN/SgGenLeft+CmpN/PlGenLeft+Sem/Plc:bál'da GOAHTI-A ;
geavat+CmpN/SgN+CmpN/SgGenLeft+CmpN/PlGenLeft+Sem/Dummytag:geavad GAHPIR ;
hálddahangeavat+CmpN/SgN+CmpN/SgGenLeft+CmpN/PlGenLeft+Sem/Dummytag:hálddahan#geavad GAHPIR ;
hálddašangeavat+CmpN/SgN+CmpN/SgGenLeft+CmpN/PlGenLeft+Sem/Dummytag:hálddašan#geavad GAHPIR ;
sisheagga+CmpN/SgN+CmpN/SgGenLeft+CmpN/PlGenLeft+Sem/Dummytag:sis#heagga GOAHTI-A ;
váiveheagga+CmpN/SgN+CmpN/SgGenLeft+CmpN/PlGenLeft+Sem/Dummytag:váive#heagga GOAHTI-A ;

Ikkje PlGen:

oahpponeavvoovddideaddji+CmpN/SgN+CmpN/SgG+CmpN/PlG+CmpN/SgNomLeft+CmpN/SgGenLeft+Sem/Dummytag:oahppo#neavvo#ovddideaddji¤ ACTORLONGSHORT ;

Ulik form = ulik tyding:

geahči --> uksageahči vs uvssageahči
vuolli ---> seaŋgavuolli vs seaŋggavuolli
ahki ---->  skuvlaahki vs skuvllaahki

*uksa, skuvla: vanligtvis bara sgnom

[/lang/common/CompoundTags.html]

What kind of words gets Gen Cmp?

Adjectives denoting:

Nouns denoting:

Left compound-tagging

Case of first part is dependent of second part:

What kind of words get +Left compound-tags?

Some very few specific words where the meaning of the compound alters with the case of the first part (for example Ahki, Dilli, Heahti, Duohki, Vuolli, Geahči.

In North Sami: Deverbal nouns like actios and actors stemming from transitive verbs (for example Sálbmalávlun vs. Sálmmalávlun, Sarvabivdi vs. Sarvvabivdi)

[/lang/sme/root-morphology.html]

+CmpN/SgNomLeft Singular Nominative
+CmpN/SgGenLeft Singular Genitive
+CmpN/PlGenLeft Plural Genitive

SMJ- eksempel:

åvddånbuktem+CmpN/SgN+CmpN/DefSgGen+CmpN/DefPlGen+CmpN/SgNomLeft+CmpN/SgGenLeft+CmpN/PlGenLeft+Sem/Dummytag:åvddån#buktem VSBST-ODD

Kor mange forledd er det i korpus?

2463 SgNom
 386 SgGen
 844 PlGen