The Divvun and Giellatekno teams build language technology aimed at minority and indigenous languages
Test ideas
We want to test our transducers betters
Now and then Multichar Symbols slip through twolc and give “words” like
Suome^Vn
pro correct Suomeen
.
How to test for this:
multichar.fst
of themLANG.fst .o. multichar.fst
in xfstThis test one should be able to set up language-independently. In case we get
Example, from fkv (this must be adjusted to a script):
regex [ ?* [a|e|i|o|u|ä|ö] [a|e|i|o|u|ä](ö] e ) ;
xfst -e "regex @\"src/analyser-gt-desc.xfst\" .o. @\"VVe.fst\" ; "
xfst[1]: print lower-words > lower-words.txt