Korp is a Corpus tool and Karp a Lexicon tool from the Swedish Språkbanken. We want to install them locally.
Links
The Korp code:
Links to the Karp code are forthcoming.
Work plan
- Download Korp code
- Install at gtweb
- Install corpora
- Make interface
Corpora available
- Free
    - skuvlahistorja1-6
- fad
 
- Bound
    - news
- ficti
- NT
 
Corpus mixes
- smesme: news + ficti
- nob2sme: fad + skuvlahistorja1-6
- smedep: news + ficti + facta/skuvlahistorja1-6 + bibel/newtestament
Interface
Menu:
- search for sme wordforms (kwic-snt in corpus ccat) – corpus: smesme
- search for sme lemmas (kwic-snt? in analysed corpus syn) – corpus choices: smesme, nob2sme
- search for sme and nob in translations (lemma search in sentence aligned sentences) – corpus: nob2sme
- deepdict sme (lemma search -> dependency daughters in corpus dep) – corpus: smedep
Lemgram
Definitions
- lexeme = member of an open lexical category, having meaning and form but being neither
- lemma = wordform used as representative for lexeme
- grammatical word pair of lemma+grammatical properties and wordform
- paradigm = set of grammatical words realising a lemma
- lemgram = set of wordforms in paradigm
Generation
Generation of lemgrams from lexc:
Use dict-isme-norm.fst or generator-dict-gt-norm.xfst or generator-dict-gt-norm.hfst. We remove the tags v1, v2.. from the fst. It is better for the user that all variants of the same paradigm are in the same lemgram. Many fst-lemmas have more than one entry in lexc, so the list should be uniqed before generating forms. I suggest that we start with these files:
For nouns, we pick different 3 lists: The ordinary nouns, the actors (NomAg), and the G3-marked nouns. For the other parts of speech, one command is enough. Commands to filter (ir)relevant forms:
noun-sme-lex.txt:
*Ordinary words:
egrep -v "(G3|ACTOR|CmpN/Only|ShCmp|RCmpnd|\+V\+|^\!)"
- ACTOR:
    grep N+NomAg
- G3:
    grep N+G3verb-sme-lex.txt:egrep -v "(ENDLEX|\+V|^\!)"adj-sme-lex.txt:egrep -v "(LEXICON|Der| Rreal | R |^\!)"adv-sme-lex.txt:egrep -v "(LEXICON| K |^\!)"