The Divvun and Giellatekno teams build language technology aimed at minority and indigenous languages
Below there are a couple of example tasks, and steps to take to realise them.
If you want to change the build procedure (e.g. to add or remove a new feature from a specific fst for all languages), work through this task.
Here is the procedure, with dictionary-include.am
in
am-shared
as an example.
The local directory am-shared
is an exact copy of:
$GTCORE/langs-templates/und/am-shared/
dictionary-include.am
locally (for your test language)
and make sure everything works.dictionary-include.am
to the und/am-shared/
directoryund/und.timestamp
dictionary-include.am
and und.timestamp
.cd $GTHOME/langs
./update-all-from-core.sh -t und
und/
When a new fst type is called for, the procedure is roughly the same as above, with a few additions. As an example, we will create a new fst for dictionary analysers (ie to be used for analysing input to dictionary lookup).
But first we need to answer the following question: where do we add the code for
building the new fst? The idea so far has been to add default targets to the
file am-shared/topdir-include.am
, and optional targets (turned on via
configure
options) to separate include files such as
dictionary-incluce.am
. Generally this line will continue, but if the list of
default targets grow too long, even those might be split out in separate include
files.
For our example, we will edit dictionary-incluce.am
. As always, edit in a
local language dir first, to test that the new target works. When all is done
and works fine, copy the modifications to the und
template. Here are the
steps to go through:
am-shared/dictionary-incluce.am
- the following steps will tell the
system how to build the fst:
analyser-dict-gt-desc.tmp.hfst
- it is important that
the target is named *.tmp.*
to allow local overrides.*.tmp.hfst -> *.hfst
target in the local
Makefile.am (if no such changes are needed, *.tmp.hfst
will just be
copied to *.hfst
).filters/
dir,
and add dependencies to them all (such that the build will break properly if
a filter is not available, and all required filters are rebuilt if needed).src/Makefile.am
- the following steps will tell the system when
to build the fst target:
GT_ANALYSERS_HFST
(for hfst transducers).if WANT_DICTIONARIES
and within that if
block, write the following:GT_ANALYSERS_HFST+=analyser-dict-gt-desc.hfst
+=
part will add the new fst to the list of fst’s already assigned
to the variable)M4
work
and will be covered in a separate tutorial./configure
with the proper optionund
template, add a note in
und.timestamp
, and commit$GTHOME/langs/update-all-from-core.sh -t und
langs/
dir)Task: add plx and Hunspell conversion to the new infra, but only for a limited set of languages (sma, smj, later sme).
Steps:
We want a new template named plxtools
.
Then, we need to fill that template with the following content:
plxtools.timestamp
am-shared/plx-include.am # this is the real build file
tools/spellcheckers/plx/
Makefile.am # includes plx-include.am
src/ # shared src files - rsrc, rev, version
tmp/ # large plx files, make clean safe
The Hunspell conversion is common to all languages, and is thus part of the und/
template. These parts need to be added to that template:
am-shared/regex-include.am # this is the real build file
tools/spellcheckers/filters/ # move common filters in here
Makefile.am # includes the usual regex-include.am
This is pretty simple:
cd $GTLANG
touch plxtools.timestamp
svn add plxtools.timestamp
As soon as the file is created, the merge script will pick take notice, and start merging files from that template to the language with the timestamp file for that template.
Both Hunspell and PLX spellers should by default not be built. To turn them on, one should use something like --enable-plx
and --enable-hunspell
. See the Oahpa ditto for a way of doing this.
Merge the template for a specific language as follows:
cd $GTLANG
../../giella-core/scripts/merge-templates.sh -t plxtools
That is, specify the template you want to merge using the -t option, both to avoid timeconsuming operations, and to avoid merging several unrelated things at the same time.
Essentially:
make
make check
and looking at the output.
After the known bugs have been fixed, re-merge, test and evaluate. Iterate till everything works as planned.
There are definitely a number of other issues. The goal is to have a portable build system with as few dependencies as possible, and with all dependencies checked for and reported properly to the user if missing.
These goals require that we follow the Autotools conventions, and use supported variables and macros where we earlier often used more homegrown solutions.
See the following sites for useful documentation and help:
Install make2graph
make -Bnd sma-mobile.zhfst | make2graph | dot -Tpng -o sma-mobile-dag.png
Result: a visual representation of the dependency graph, making it easy to spot wrong dependency chains, and where the problem most likely is.