Illustrated guide in Polish (attention: very heavy!)
PLUczeK is a user-friendly alignment editor and converter with GUI used for building parallel corpora. It works with an independent program Hunalign to pre-align an original and translated texts and allows the user to modify the alignment and record it in the XML format. PLUczeK runs under Windows XP, Vista and Windows 7. It was created for working on the Polish-Ukrainian Parallel Corpus, PolUKR but can be used for any pair of languages.
See the usage guidelines in Polish for a more detailed description how to install and use PLuczek sourceforge site/ to download the program.
PLUczeK was created for working in Windows environment and was tested on Windows XP Professional, Vista and Windows 7.
For correct work of the editor, Hunalign should be located in its folder, and Hunalign's dictionary with .dic extension should be in the subfolder called "data". The description of Hunalign's dictionary format can be found on its website mentioned above but a dummy dictionary (empty file) can also be used.
The .NET platform that is necessary for running PLUczeK can be downloaded here.
Input and output data format
PLUczeK demands a special format for its input data. The texts should be divided correctly into paragraphs and sentences (or just sentences). The border between sentences is defined as a single emppty line (\n\n), while the border between paragraphs is defined by two empty lines (\n\n\n). The texts should be in .txt format with UTF-8 encoding and contain no more than two empty lines together.
PLUczeK generates three files as output. Apart from the two texts transformed into XML there is also a stand-off (separate) XML alignment file. All the output formats are XML/XCES compatible.
Working with PLUczeK, step by step
- Step 1: Automatic pre-alignment
- Step 2: Choosing text files
- Step 3: Reading the data into the table
- Step 4: Division into pages
- Step 5: Modification of raws
- Step 6: Recording the table into output files
- Step 7: Reloading earlier created files for further edition