The Divvun and Giellatekno teams build language technology aimed at minority and indigenous languages
Here are some tips for the linguist.
Add this to your .bashrc or .profile:
alias crk="pushd ~/main/langs/crk"
Open a new terminal window: In any catalogue, write ‘crk’ ENTER.
Examples for parameters: ```./configure --with-hfst --enable-dicts```
How to get a list of parameters: ```./configure -h```
How to see the parameters which are set: ```head config.log```
# TESTSSCRIPTS
## Test that all tags are declared and written correctly
```sh devtools/tag_test.sh```
## Test that lemmas can be generated (also done with make check)
```sh test/lemma-check.sh```
You can add or remove tests in test/src/morphology/Makefile.am:
GENERATION_TESTS_IN=generate-adjective-lemmas.sh.in
generate-noun-lemmas.sh.in
generate-propernoun-lemmas.sh.in
generate-verb-lemmas.sh.in
## Run yaml tests (also done with make check)
```sh test/yaml-check.sh```
## Look at paradigm generation
sh devtools/verb_minip.sh ‘^lemma[:+]’ sh devtools/noun_minip.sh ‘^lemma[:+]’ sh devtools/prop_minip.sh ‘^lemma[:+]’ sh devtools/adj_minip.sh ‘^lemma[:+]’
You can look at the generation of all members of one cont.lex:
```sh devtools/verb_minip.sh LAAVU ```
You can edit the list of forms in the paradigm files which are mentioned in the scripts, e.g.
```P_FILE="test/data/testnounparadigm.txt"```
## Make tables with paradigms
sh devtools/generate-adj-wordforms.sh sh devtools/generate-noun-wordforms.sh sh devtools/generate-prop-wordforms.sh sh devtools/generate-verb-wordforms.sh
You can edit the list of forms:
morf_codes=”+N+Sg+Nom
+N+Sg+Gen”
You can edit which cont.lexes to test:
```exception_lexicons="(3JESANEH|3PAPAREH|3VANHIMEH)"```
You can edit how many lemmas of each cont.lex to test:
```lemmacount=2```
## Run only one yaml-test
Remove all yamltests (check in your local modifications first!):
rm test/src/gt-norm-yamls/*
Get the yaml-file you want to test, e.g.:
svn up test/src/gt-norm-yamls/V-mato_gt-norm.yaml sh test/yaml-check.sh
## Make/update all yaml-tests in one for a certain PoS (and a certain pattern?)
This example is adding all verbs into one file:
head -11 test/src/gt-norm-yamls/V-AI-matow_gt-norm.yaml > test/src/gt-norm-yamls/U-all_gt-norm.yaml tail +11 test/src/gt-norm-yamls/V* | grep -v “==” » test/src/gt-norm-yamls/U-all_gt-norm.yaml```
This example is adding all nouns with final -y into one file:
head -11 test/src/gt-norm-yamls/N-AN-amisk_gt-norm.yaml > test/src/gt-norm-yamls/A-Ny-all_gt-norm.yaml
tail +11 test/src/gt-norm-yamls/N*y_gt-norm.yaml | grep -v "==" >> test/src/gt-norm-yamls/A-Ny-all_gt-norm.yaml
The example is for the inanimate noun ôtênaw. Use an already functioning yaml-file as a starting point (here N-AN-amiskw_gt-norm.yaml). You still have to do a little editing afterwords, like correcting the docu about the lemma, and making it more readable by adding empty lines. And you must of course correct the output.
head -12 test/src/gt-norm-yamls/N-AN-amisk_gt-norm.yaml\
> test/src/gt-norm-yamls/N-IN-otenaw_gt-norm.yaml
cat test/data/NI-par.txt | sed 's/^/ôtênaw/' | dcrk |\
tr '\t' ':' | sed 's/:/: /' | grep -v '^$' |\
sed 's/^/ /' >> test/src/gt-norm-yamls/N-IN-otenaw_gt-norm.yaml
Comment: The last sed-command should give 5 whitespaces