Language Technology at UiT

The Divvun and Giellatekno teams build language technology aimed at minority and indigenous languages

View GiellaLT on GitHub divvungiellatekno/giellalt.uit.no

Referat, Giellateknomøte, 25. mars.
Trond, Ciprian, Lene

Saksliste

Prosjekt om vekta automatar

Trond orienterte om prosjektsøknaden Weight-training til ERC + diskusjon.

Referanser i Korp

Namn, referanse

Jussi og Lene sine kommentarar til Korp:

Idemyldring rundt namn:

SAMTE
Samte
SamT
samde

Konklusjon: Samte (forkorting for Sámi teaksta / samisk tekst / Saami text)

Author

Cip: Short description of the author problem in Korp.

  1. presentation issue in Korp: Surname and family name are not presented one after the other
    => something to debug in Korp
  2. many documents registered with our corpus lack the info on authors: huge amount of “UNKNOWN” labels
  3. as noted by Jussi, SDA and Diedut get wrong author labels because they are not split by article
    => split by article
  4. what about the instances with several authors? Only the first is shown by Korp and/or registered with the cqp?
    => use first author pluss et.Al.
  5. Korp-implementation issue: in the current version, it is not possible just to show the author without giving the possibility to search for (contrary to what we agreed upon). I have proposed a change from this implementation at the meeting in Gothenburg last year.
    => I have to check possible changes.

Fad-ord inn i smenob

Lene har gått gjennom csv listene og Ciprian har generert nye xmlfiler:

100 % overlapp mellom lemma og tg i fad og gt: fad2merge check fad vs source xml meir informasjon i fad enn i gt

Innhald i fad2merge:

00_readme.txt
check_fad-vs-sna/
check_fad-vs-src/xml_data/auto/total/ total match men mer/annen info i fad, x = extra
check_fad-vs-src/xml_data/manual/   de med 'm', skal redigeres manuelt
fad-sna2merge/
inc/

full match, men mer informasjon, lagt fad2merge/check_fad-vs-src/xmldata, og er merka med x=”fad”

Vedtak:

Døme:

   <e usage="vd">
      <lg>
         <l pos="N">máksinnákca</l>
      </lg>
      <mg>
         <tg xml:lang="nob">
            <t decl="yyy" gen="m" pos="N">likviditet</t>
         </tg>
      </mg>
      <mg x="fad">
         <tg xml:lang="nob">
            <t pos="N">betalingsevne</t>
         </tg>
      </mg>
      <mg x="fad">
         <tg xml:lang="nob">
            <t pos="N">likviditetsforvaltning</t>
         </tg>
      </mg>
      <mg x="fad">
         <tg xml:lang="nob">
            <t pos="N">likviditet</t>
         </tg>
      </mg>
   </e>