Language Technology at UiT

The Divvun and Giellatekno teams build language technology aimed at minority and indigenous languages

View GiellaLT on GitHub divvungiellatekno/giellalt.uit.no

Page Content

Korpusbargu

Plánenčoahkkin 3.3.17: Børre, Lene, Trond.

Ođđa bargi barggu birra

Parallelliseren

Heajos parallelliseren. Sivat sáhttet leat:

Paraleallateavsttaid lea vejolaš geavahit masa:

  1. sátnegirjjit - sátnegirjipárat = korp
  2. MT-barggu vuođđamateriála
  3. TM (OmegaT)

Paralleallafiillat odne (galle fiilla prestable/tmx:s):

tmx $ for i in *
> do
> echo $i
> find $i -name '*.html'|wc -l
> done

855 fin2sme dict, mt  <===== 1
488 fin2smn dict  <===== 2
484 sme2smn mt
460 sme2sma mt
372 nob2sme dict, mt
326 sme2smj
310 smj2sma
290 fin2sms dict   <===== 3
283 sms2fin dict
280 smn2sms
279 sme2sms
177 sma2nob dict  <====== 4
150 sme2nob dict
135 sme2nno
87  smj2nob
55  sme2eng dict
20  fkv2nob dict
10  smj2sme

Eará áššit