The Divvun and Giellatekno teams build language technology aimed at minority and indigenous languages
View GiellaLT on GitHub divvungiellatekno/giellalt.uit.no
What this is NOT: this is NOT an overview of the YAML testing framework. You can find a description of YAML testing on a separate page.
Check that all prerequisites are met, and bail out if not (exit 77/SKIP)
Portability means it should:
awk
and gawk
)Must be 0 - 255, where some have a special meaning:
$GTHOME
and similar variables$srcdir
- the directory in which the original test script is locatedconfigure.ac
- but ONLY if you
process the testing script with configure.ac
(details about this later)Most of the tools we need (and in principle all of them) are (should be)
declared in configure.ac
or related files (found in $GTLANG/m4/
.
Autoconf (the tool that parses configure.ac
) has its own machinery to find
variants of different tools, to define variables for the identified tools, etc.
By using the variables defined in configure.ac
you can be sure that the
tools actually are available, and you can easily add tests for those variables
in case the tool is not found.
By following this practice, the system becomes more robust and portable.
Details of how to actually do this is given further down.
… depending on what has been configured.
The new infrastructure treats the Xerox and the Hfst tools on an equal footing, meaning that some have the Xerox tools installed, some have Hfst, and some have both.
The test scripts should check for what has actually been built, and what is available on the system, and use the one or both that is available or configured.
We’ll return to the details further down.
Example:
How do we do this?
TESTS=
if WANT_GENERATION
1. Add your shell scripts for running tests requiring only a generator:
TESTS+=test-noun-generation.sh \
test-verb-generation.sh \
test-adj-generation.sh \
test-propernoun-generation.sh
endif # WANT_GENERATION
There is a list of all presently defined conditionals here.
Makefile.am
We will use the test script
$GTLANGS/sma/test/src/morphology/test-noun-generation.sh.in
as an example
throughout this section.
You can test anything that is scriptable or programable. The only requirement is that the answer can be captured as a YES or NO, i.e. PASS or FAIL.
Here are some ideas:
Anything that can return an exit value. Common choices are:
… but also C/C++ a.o. are used.
Typically you start a shell script by defining variables:
###### Variables: #######
sourcefile=${srcdir}/../../../src/morphology/stems/nouns.lexc
generatorfile=./../../../src/generator-gt-norm
resultfile=missingNounLemmas
The variable ${srcdir
`` refers to the source dir of the test script, that is,
the directory in which the test script is located.
Here is another variable assignment:
1. Get external Mac editor for viewing failed results from configure:
EXTEDITOR=@SEE@
If your testing script relies on a lot of external tools, it is a good idea to
make sure that the tools are actually installed on the system. This is the job
of the configure.ac
file. To make use of this feature, there are a couple of
things to remember:
.sh.in
configure.ac
— this is done by
adding two lines as follows to that file:AC_CONFIG_FILES([test/src/morphology/test-noun-generation.sh], \
[chmod a+x test/src/morphology/test-noun-generation.sh])
The first line tells autoconf to process the *.sh.in
file, and produce the
actual *.sh
file, the second line ensures that the final shell file is
executable.
In this processing all configure.ac variables will be replaced with their actual
value as identifed during the configuration phase. Such variables look like
@VARIABLE@
in the test script.
lookup
tool as part of the testconfigure.ac
to check for the availability of lookup
LOOKUP
in configure
*.sh.in
file, and when configured,
the variable is replaced with the actual value*.sh.in
file: @LOOKUP@
That is, in a hypothetic test file test-lemmas.sh.in
we could write
something like:
LOOKUP=@LOOKUP@
The corresponding test file test-lemmas.sh
will after configuration look
something like:
LOOKUP=/usr/local/bin/lookup
Then we can add tests in the testing script to check whether $LOOKUP
is
empty, and if it is, the test script can bail out with a SKIP return value
(77).
NB! Sometimes the variable is not empty when the tool is not found, but could
contain strings like false
or no
instead. Check the actual value if the
test for the tool doesn’t fall out as expected.
We need to test that the data sources used in the test are actually found:
1. Check that the source file exists:
if [ ! -f "$sourcefile" ]; then
echo Source file not found: $sourcefile
exit 1
fi
Here we use the variable we defined, and if it does not exist, we exit with an error.
When doing morphological tests, we want to test both xfst and hfst. First we define a variable fsttype
:
1. Use autotools mechanisms to only run the configured fst types in the tests:
fsttype=
@CAN_HFST_TRUE@fsttype="$fsttype hfst"
@CAN_XFST_TRUE@fsttype="$fsttype xfst"
The strings @CAN_HFST_TRUE@
and @CAN_XFST_TRUE@
come from autoconf, and
will tell us what they say.
The we check that the variable is not empty:
1. Exit if both hfst and xerox have been shut off:
if test -z "$fsttype"; then
echo "All transducer types have been shut off at configure time."
echo "Nothing to test. Skipping."
exit 77
fi
Finally, the actual loop looks like:
for f in $fsttype; do
...
done
###### Extraction: #######
1. extract non-compounding lemmas:
grep ";" $sourcefile | grep -v "^\!" \
| egrep -v '(CmpN/Only|\+Gen\+|\+Der\+| R )' | sed 's/% /€/g' \
| sed 's/%:/¢/g' | tr ":+" " " \
| cut -d " " -f1 | tr -d "#" | tr "€" " " | tr "¢" ":" \
| sort -u | grep -v '^$' > nouns.txt
1. extract compounding lemmas:
grep ";" $sourcefile | grep -v "^\!" \
| grep ' R '| tr ":+" " " | cut -d " " -f1 | tr -d "#" \
| sort -u > Rnouns.txt
This is an excerpt from the sma
test file mentioned earlier, and should only
serve as an example:
###### Test non-comopunds: #######
# generate nouns in Singular, extract the resulting generated lemma,
# store it:
sed 's/$/+N+Sg+Nom/' nouns.txt | $lookuptool $generatorfile.$f \
| cut -f2 | fgrep -v "+N+Sg" | grep -v "^$" | sort -u \
> analnouns.$f.txt
# Generate nouns, extract those that do not generate in singular,
# generate the rest in plural:
sed 's/$/+N+Sg+Nom/' nouns.txt | $lookuptool $generatorfile.$f \
| cut -f2 | grep "N+" | cut -d "+" -f1 | sed 's/$/+N+Pl+Nom/' \
| $lookuptool $generatorfile.$f | cut -f2 \
| grep -v "^$" >> analnouns.$f.txt
The full test script file can be found here.
1. List here (space separated) all test scripts that should be run
1. unconditionally:
TESTS=
if WANT_GENERATION
1. Add your shell scripts for running tests requiring only a generator:
TESTS+=test-noun-generation.sh \
test-verb-generation.sh \
test-adj-generation.sh \
test-propernoun-generation.sh
endif # WANT_GENERATION
1. List tests that are presently (expected) failures here, ie things that should
1. be fixed *later*, but is not critical at the moment:
XFAIL_TESTS=generate-noun-lemmas.sh \
test-propernoun-generation.sh
If we have written an *.in file - as in this example - we need to process it
with configure to replace @VARIABLE@
style variables with their configure
values. To do that, you need to add two lines like the following to
configure.ac
:
AC_CONFIG_FILES([test/src/morphology/test-noun-generation.sh], \
[chmod a+x test/src/morphology/test-noun-generation.sh])
With these two lines, configure
will be able to produce the shell script
that we added to Makefile.am
above.
make check
- runs all defined testsmake check TESTS=a-test-script.sh
- will run only the test
script a-test-script.sh
To run a subset of tests, cd
into the subdir containing the subset of tests
you want to run, and do make check
there. Only the tests in that directory
and its subdirectories will be run.
(NB! Advanced topic - skip if not relevant)
When using
out-of-source builds (aka
VPATH builds), running single tests like above will not work, due to the way
Automake treats the TESTS
variable when there are subdirs with their own
tests. To make it work, you need to restrict make
to only run in the
local directory where you have the test script you want to run:
cd to/dir/with/test/script/in/build/tree/
make check TESTS=a-test-script.sh SUBDIRS=.
Setting the SUBDIRS
variable to just a period (meaning “this directory”)
forces make
to ignore the subdirs, and the single test works as intended.
NOTE: this is only relevant if you have out-of-source builds, and want
to run a single test script. If you want to run all test scripts in your
working directory and below (i.e. make check
), there is no need to do
anything extra - everything works as expected.
The tests are run on a per directory basis, which means that all tests in a
directory will be run, and then make
will give a report.
If some of the tests FAILed, then that is an error in the view of make
, and
make
stops. This is a property of make
and the Automake
system. You
can override this behavior with option ` -i, –ignore-errors. The problem
with using
-i is of course that you risk ignoring errors, since the error
message can easily scroll out of view before
make` is done.
Testing within the Automake framework can have five outcomes: