The Divvun and Giellatekno teams build language technology aimed at minority and indigenous languages
View GiellaLT on GitHub divvungiellatekno/giellalt.uit.no
Møte om samansetjingar - del 2
Normativitet for samansetjingar:
Sjekka inn fra korpus analysert 11.10.2016: Finnes i sma/src/morphology/incoming/
+Cmp/Sh - tagger kortform når det finnes to muligheter, kortform og ikke
Type (man teller bare forskjellige ord): | 34438 | Cmp/SgNom <= her er også Der/NomAct, dvs at forleddet er et verb | 3978 | Cmp/Attr | 1401 | Cmp/SgGen | 886 | Cmp/PlGen dvs. liten variasjon
Token (man teller alle ordene/forekomstene): | 134140 | Cmp/SgNom <= her er også Der/NomAct, dvs at forleddet er et verb | 13217 | Cmp/Attr | 10640 | Cmp/Hyph | 8402 | Cmp/SgGen | 6028 | Cmp/PlGen dvs. liten variasjon | 3426 | Cmp/SplitR
Hapaxfrekvens (lista er laget av ord som bare finnes en gang i korpuset): | 37314 | Cmp/SgNom | 4788 | Cmp/Hyph | 3177 | Cmp/Attr | 1415 | Cmp/PlGen | 1414 | Cmp/SgGen | 592 | Cmp/SplitR
Formfrekvens: | +N+Cmp/SgNom | 110827 | +N+Cmp/SgGen | 7028 | +N+Cmp/PlGen | 5585
Dvs. 11,3% av samansetjingane har eit genitivforledd
Frå sma/*/root.exc
:
This entry / word should be in the following position(s): | Tag | Meaning |Exmpl.| Explanation | — | — | — | — | +CmpNP/All | … in all positions, default, this tag does not have to be written | - | +CmpNP/First| … only be first part in a compound or alone | huvva | to prevent dáhpáhuvva | +CmpNP/Pref | … only first part in a compound, NEVER alone | eahpe-| eahpeipmil = a no god | +CmpNP/Last | … only be last part in a compound or alone | ráigi+N+CmpNP/Last+Sem/Route+Sg+Gen+Allegro:rái K ; | +CmpNP/Suff | … only last part in a compound, NEVER alone | lohka+CmpNP/Suff+Sem/Dummytag:loh’ka GOAHTI-A ; | +CmpNP/None | … does not take part in compounds | váhku+CmpNP/None+Sem/Dummytag:váh’ku GOAHTI-U ; | because it resembles vahkku | +CmpNP/Only | … only be part of a compound, i.e. can never be used alone, but can appear in any position
Pref:
adverbs.lexc:vulos+Adv+CmpNP/Pref+CmpN/SgN+Sem/Dummytag+Err/Orth:vulos Rreal ;
sme-propernouns.lexc:Ohcejohka+N+Prop+CmpNP/Pref+Sem/Plc+Err/Orth+Sem/Dummytag:ohce#joh Rreal ;
sme-propernouns.lexc:Guovdageaidnu+N+Prop+CmpNP/Pref+CmpN/SgN+Err/HyphSub+Sem/Dummytag:Guovdageain Rnoun ;
sme-propernouns.lexc:Kárášjohka+N+Prop+CmpNP/Pref+CmpN/SgN+Err/HyphSub+Sem/Dummytag:Kárášjoh9 Rnoun ;
sme-propernouns.lexc:Kárášjohka+N+Prop+CmpNP/Pref+CmpN/SgN+Err/HyphSub+Sem/Dummytag:Kárášjot Rreal ;
sme-propernouns.lexc:Ohcejohka+N+Prop+CmpNP/Pref+CmpN/SgN+Err/HyphSub+Sem/Dummytag:Ohcejoh9 Rreal ; ! Cannot stand alone
sme-propernouns.lexc:Ohcejohka+N+Prop+CmpNP/Pref+CmpN/SgN+Err/HyphSub+Sem/Dummytag:Ohcejot Rnoun ; ! Can stand alone
sme-propernouns.lexc:Skáidejávri+Prop+N+CmpNP/Pref+CmpN/SgN+Err/HyphSub+Sem/Dummytag:Skáidejár Rreal ;
First :sma
SpellNoSugg i abbreviations: 152 sma, 152 sme, 156 smj, 152 smn, 172 sms
abbreviations.lexc:ö+Use/SpellNoSugg+CmpNP/First:ö ab-nodot-noun; !
abbreviations.lexc:č+Use/SpellNoSugg+CmpNP/First:č ab-nodot-noun; !
abbreviations.lexc:š+Use/SpellNoSugg+CmpNP/First:š ab-nodot-noun; !
adjectives.lexc:eahpedásset+CmpNP/First+Sem/Hum:eahpe#dásse7d JOHTIL ;
adjectives.lexc:eahpedoaibmi+CmpNP/First+Sem/Hum:eahpe#doaibmi LEXATTR ;
adjectives.lexc:eahpesávahahtti+CmpNP/First+Sem/Hum:eahpe#sávaha HAHTTI ;
nouns.lexc:A-vitamiidna+CmpNP/First+Sem/Substnc:A-vitam ADRENALIN ;
nouns.lexc:mánáid-TV-kanála+CmpNP/First+Sem/Plc-abstr:mánáid9-TV-kanál SOSIAL ;
nouns.lexc:nuorta-finnmárkulaš+CmpN/SgN+CmpN/SgG+CmpN/PlG+CmpNP/First+Sem/Hum:nuorta-finnmárkulažž JOHTOLAT ;
Last:
adjectives.lexc:IT-guoskevaš+CmpNP/Last+Sem/Dummytag:IT-guoskev AHKASAS ; gööktesh
nouns.lexc:advokáhta% guovttis+CmpNP/Last+OLang/UND+Sem/Hum:advokáhta% guo GUOVTTIS ;
nouns.lexc:advokáhta% guovttos+CmpNP/Last+OLang/UND+Sem/Hum:advokáhta% gu GUOVTTU ;
nouns.lexc:feastit+V+TV+Der2+Der/NomAg+CmpN/SgN+CmpN/SgG+CmpN/PlG+CmpNP/Last+Sem/Hum:feasti¤ ACTORLEX ;
nouns.lexc:jahkelohki+CmpNP/Last+CmpN/SgNomLeft+CmpN/SgGenLeft+CmpN/PlGenLeft+Sem/Dummytag:jahke#loh'ki GOAHTI-I ;
Suff::sma
adjectives.lexc:lágan+CmpN/SgNomLeft+CmpN/SgGenLeft+CmpN/PlGenLeft+CmpNP/Suff+Sem/Hum: LAGAN; ! XXX Look at syntactic analysis...
adjectives.lexc:lágaš+CmpN/SgNomLeft+CmpN/SgGenLeft+CmpN/PlGenLeft+CmpNP/Suff+Sem/Hum: LAGAS; ! XXX Look at syntactic analysis...
nouns.lexc:lohka+CmpNP/Suff+Sem/Dummytag:loh'ka GOAHTI-A ;
None:
abbreviations.lexc:r+CmpNP/None:r ab-dot-IVprfprc; !Lene: None er ukjent
acronyms.lexc:as+CmpNP/None:as ACRO ; !A/S
acronyms.lexc:oy+CmpNP/None:oy ACRO ; !A/S
adjectives.lexc:ollu+CmpNP/None+Sem/Dummytag:ol'lu BOREALA ;
adjectives.lexc:olu+CmpNP/None+Sem/Dummytag:olu BOREALA ;
adjectives.lexc:veara+CmpNP/None+A+Ess+Sem/Dummytag:veara%>n K ;
adjectives.lexc:veara+CmpNP/None+A+Sg+Gen+Sem/Dummytag:veara K ;
adjectives.lexc:veara+CmpNP/None+A+Sg+Nom+Sem/Dummytag:veara K ;
nouns.lexc:Abimelek% guovttis+CmpNP/None+Sem/Hum:Abimelek% guo GUOVTTIS ;
nouns.lexc:Abimelek% guovttos+CmpNP/None+Sem/Hum:Abimelek% gu GUOVTTU ;
nouns.lexc:gaigŋirbealis+CmpNP/None+Err/Orth+Sem/Dummytag:gaigŋer#bealis LASIS ;
nouns.lexc:gaigŋirbealis+CmpNP/None+Sem/Dummytag:gaigŋir#bealis LASIS ;
Only:
nouns.lexc:bealle+CmpN/SgNomLeft+CmpN/SgGenLeft+CmpN/PlGenLeft+CmpNP/Only+Sem/Part:beall BEALLE ;
grep CmpN/.*Left ../sme/src/morphology/stems/nouns.lexc |grep 'CmpN/SgNomLeft+CmpN/SgGenLeft+CmpN/PlGenLeft'|wc -l
1291
Desse har krav til færre enn tre former på forledd: Ikkje SgGen:
gaska+CmpN/SgN+CmpN/SgNomLeft+CmpN/PlGenLeft+Sem/Feat-measr_Plc:gas'ka GOAHTI-A ;
earru+CmpN/SgN+CmpN/SgNomLeft+CmpN/PlGenLeft+Sem/Feat:earru ALBMI ;
earuhus+CmpN/SgN+CmpN/SgNomLeft+CmpN/PlGenLeft+Sem/Dummytag:earuhuss JOHTOLAT ;
erohus+CmpN/SgN+CmpN/SgNomLeft+CmpN/PlGenLeft+Sem/Feat:erohuss JOHTOLAT ;
guorra+CmpNP/Last+CmpN/SgNomLeft+CmpN/PlGenLeft+Sem/Dummytag:guorra GOAHTI-A ;
divuhus+CmpN/SgN+CmpN/SgNomLeft+CmpN/PlGenLeft+Sem/Dummytag:divuhuss JOHTOLAT ;
goazahus+CmpN/SgN+CmpN/SgNomLeft+CmpN/PlGenLeft+Sem/Dummytag:goazahuss JOHTOLAT ;
Ikkje SgNom:
kulturduohki+v1+CmpN/SgN+CmpN/SgGenLeft+CmpN/PlGenLeft+OLang/NOB+Sem/Feat:kultur#duoh'ki DUOHKI ;
kulturduohki+v2+CmpN/SgN+CmpN/SgGenLeft+CmpN/PlGenLeft+Sem/Feat:kulttor#duoh'ki DUOHKI ;
oahpahusduohki+CmpN/SgN+CmpN/SgGenLeft+CmpN/PlGenLeft+Sem/Feat:oahpahus#duoh'ki DUOHKI ;
bálda+CmpN/SgN+CmpN/SgGenLeft+CmpN/PlGenLeft+Sem/Plc:bál'da GOAHTI-A ;
geavat+CmpN/SgN+CmpN/SgGenLeft+CmpN/PlGenLeft+Sem/Dummytag:geavad GAHPIR ;
hálddahangeavat+CmpN/SgN+CmpN/SgGenLeft+CmpN/PlGenLeft+Sem/Dummytag:hálddahan#geavad GAHPIR ;
hálddašangeavat+CmpN/SgN+CmpN/SgGenLeft+CmpN/PlGenLeft+Sem/Dummytag:hálddašan#geavad GAHPIR ;
sisheagga+CmpN/SgN+CmpN/SgGenLeft+CmpN/PlGenLeft+Sem/Dummytag:sis#heagga GOAHTI-A ;
váiveheagga+CmpN/SgN+CmpN/SgGenLeft+CmpN/PlGenLeft+Sem/Dummytag:váive#heagga GOAHTI-A ;
Ikkje PlGen:
oahpponeavvoovddideaddji+CmpN/SgN+CmpN/SgG+CmpN/PlG+CmpN/SgNomLeft+CmpN/SgGenLeft+Sem/Dummytag:oahppo#neavvo#ovddideaddji¤ ACTORLONGSHORT ;
Ulik form = ulik tyding:
geahči --> uksageahči vs uvssageahči
vuolli ---> seaŋgavuolli vs seaŋggavuolli
ahki ----> skuvlaahki vs skuvllaahki
*uksa, skuvla: vanligtvis bara sgnom
[/lang/common/CompoundTags.html]
Adjectives denoting:
Nouns denoting:
Living creatures, people, animals etc tex: lottejuolgi (first part GenSg)
Growths
Organisations (like Gielda, Guovlu, Riika, Goahti, Dállu etc)
Topografy (like Johka, Mearra, but not Várri, Jávri)
People-groups (like Sápmi, Duiska etc)
Weather and state of the ground etc (like Dálki, Siivu, Čáhci, Dulvi, Muohta etc)
Time (Áigi, Idja, Beaivi etc)
and nouns on -vuohta and -upmi (like ráhkisvuohta and buktojupmi)
plural nouns
Case of first part is dependent of second part:
Some very few specific words where the meaning of the compound alters with the case of the first part (for example Ahki, Dilli, Heahti, Duohki, Vuolli, Geahči.
In North Sami: Deverbal nouns like actios and actors stemming from transitive verbs (for example Sálbmalávlun vs. Sálmmalávlun, Sarvabivdi vs. Sarvvabivdi)
[/lang/sme/root-morphology.html]
+CmpN/SgNomLeft Singular Nominative
+CmpN/SgGenLeft Singular Genitive
+CmpN/PlGenLeft Plural Genitive
SMJ- eksempel:
åvddånbuktem+CmpN/SgN+CmpN/DefSgGen+CmpN/DefPlGen+CmpN/SgNomLeft+CmpN/SgGenLeft+CmpN/PlGenLeft+Sem/Dummytag:åvddån#buktem VSBST-ODD
Kor mange forledd er det i korpus?
2463 SgNom
386 SgGen
844 PlGen