The Divvun and Giellatekno teams build language technology aimed at minority and indigenous languages
View GiellaLT on GitHub divvungiellatekno/giellalt.uit.no
Opened at 10.12. Additions to the agenda:
We should aim at generating static html. There is “forrest run” dynamically, and “forrest site” that makes static html pages.
X [0] doc/lang/sms/src/verb-sms-lex.txt
BROKEN: No pipeline matched request: doc/lang/sms/src/verb-sms-lex.txt
X [0] doc/ling/vislcg.html
BROKEN: /Users/trondtro/xtdoc/sd/src/documentation/content/xdocs/doc/ling/vislcg.xml
(No such file or directory)
X [0] index_smi.html
BROKEN: /Users/trondtro/xtdoc/sd/src/documentation/content/xdocs/index_smi.xml
(No such file or directory)
X [0] doc/lang/smj/docu-sme-flowchart.html
BROKEN: /Users/trondtro/xtdoc/sd/src/documentation/content/xdocs/doc/lang/smj/
docu-sme-flowchart.xml (No such file or directory)
Link problems:
All internal links should be corrected, and links to original source files should be replaced with links to excerpts.
Forrest on cochise is still problematic: no such command
Thomas has done a lot of work last week, many positive answers (see last week’s meeting memo).
Also Info nuorra. Agriculture ministry in Sweden have sent some texts in Northern, Lule, Southern and Anár sámi. Some private texts too.
We need a contract to use when making agreements with people and organisations. Børre will look at the one used in Oslo
The antiword issue: Antiword thinks everything is in Unicode in word, and converts that to whatever 8-bit characters et we want (the -m option). Our problem is that we want a many-to-one machine, and not a one-to-many-machine.
Use antiword to convert from .doc to docbook format. Then perl postprocesses the errors in the .xml files that have Levi and the other private hacks
Thomas: I have now gone through all the 54 northern sami adjectival sublexicas in sme-lex.txt. and - with the exception of two bugs (reported to Bugzilla) - all the paradigms now seem to be generated right. When I have done the testing I have noticed though that some of the adjectives themselves are directed to the wrong sublexicas. Started to go through these adjectives. Other issues regarding comparition of certain adjectives have to be treated and will be posted in discussion group.
Maaren: no time last week. Maybe sitting home would work?
Trond, Lena: Worked on transitivity in verbs. Checked so far: -ot verbs, -eret ^LOAN words, work started on DIEHTI verbs. The standard version of the n, a, v are now reversly sorted in cvs, in order to facilitate systematic lexicon work..
Sjur and Børre will continue working on that until the official opening at wednesday 16th. After that there will be no one to maintain it, and it needs documentation, but the workload will be much smaller (hopefully).
FIN1) Asiakas syö lounasta. (The client is eating lunch.) (FIN1a)
A1
STA:cl(fcl)
S:n(sg,nom) Asiakas
P:v(fin,pres,ind,3sg) syö
Od:n(sg,par) lounasta
FIN82) Soittaessasi viulua unohdin ajan kuluvan.
(While you were playing the violin, I forgot how the time passes.) (FIN9b)
A1
STA:cl(fcl)
A:cl(icl)
=P:v(inf2,act,ine+poss2sg) Soittaessasi
=Od:n(sg,par) viulua
P:v(fin,imperf,ind,1sg) unohdin
Od:cl(icl)
=S:n(sg,gen) ajan
=P:v(pcp1,act,sg,gen) kuluvan
[http://visl.sdu.dk/], cf. e.g. Finnish, under languages:
“Projektets overordnede målsætning er at udvikle og udbygge et unificeret sprogbeskrivelsesapparat for nordiske sprog med udgangspunkt i grammatiske konventioner og sprogteknologiske værktøjer anvendt i VISL-systemet. Fokus vil være på undervisningsredskaber, paralleltekster og spredningseffekten mellem rammesprogene dansk og norsk på den ene side og de mindre nordiske sprog på den anden side.
Samisk: fx Ressourceinventarisering, aktuelle CG-aktiviteter, specifikke grammatiske problemer i et kontrastivt perspektiv”
What we will need from the project is a view on syntactic analysis (S, Od, P, A, …)
Wednesday 1/2 day, Thursday and Friday off, as well as Monday.
Closed at 12.10