Language Technology at UiT The Arctic University of Norway

The Divvun and Giellatekno teams build language technology aimed at minority and indigenous languages

View GiellaLT on GitHub divvungiellatekno/

Page Content

Referat, Giellateknomøte, 25. mars.
Trond, Ciprian, Lene


Prosjekt om vekta automatar

Trond orienterte om prosjektsøknaden Weight-training til ERC + diskusjon.

Referanser i Korp

Namn, referanse

Jussi og Lene sine kommentarar til Korp:

Idemyldring rundt namn:


Konklusjon: Samte (forkorting for Sámi teaksta / samisk tekst / Saami text)


Cip: Short description of the author problem in Korp.

  1. presentation issue in Korp: Surname and family name are not presented one after the other
    => something to debug in Korp
  2. many documents registered with our corpus lack the info on authors: huge amount of “UNKNOWN” labels
  3. as noted by Jussi, SDA and Diedut get wrong author labels because they are not split by article
    => split by article
  4. what about the instances with several authors? Only the first is shown by Korp and/or registered with the cqp?
    => use first author pluss et.Al.
  5. Korp-implementation issue: in the current version, it is not possible just to show the author without giving the possibility to search for (contrary to what we agreed upon). I have proposed a change from this implementation at the meeting in Gothenburg last year.
    => I have to check possible changes.

Fad-ord inn i smenob

Lene har gått gjennom csv listene og Ciprian har generert nye xmlfiler:

100 % overlapp mellom lemma og tg i fad og gt: fad2merge check fad vs source xml meir informasjon i fad enn i gt

Innhald i fad2merge:

check_fad-vs-src/xml_data/auto/total/ total match men mer/annen info i fad, x = extra
check_fad-vs-src/xml_data/manual/   de med 'm', skal redigeres manuelt

full match, men mer informasjon, lagt fad2merge/check_fad-vs-src/xmldata, og er merka med x=”fad”



   <e usage="vd">
         <l pos="N">máksinnákca</l>
         <tg xml:lang="nob">
            <t decl="yyy" gen="m" pos="N">likviditet</t>
      <mg x="fad">
         <tg xml:lang="nob">
            <t pos="N">betalingsevne</t>
      <mg x="fad">
         <tg xml:lang="nob">
            <t pos="N">likviditetsforvaltning</t>
      <mg x="fad">
         <tg xml:lang="nob">
            <t pos="N">likviditet</t>