1) Factors that determine correct spelling
You will be able to describe what you need to know about the language to start checking spelling in ParaTExt.
The goal of your translation work is to publish Scripture. It is very important that your translation have good spelling when it is published. This makes the translation easier to read, makes a concordance possible, and adds credibility to your publications. But what is “good” spelling? The first part of good spelling is using the orthography correctly and not using any characters from the orthography of another language. The second part of good spelling is consistency.
The key part of any orthography is the alphabet. (See the following link for more information on orthographies: http://www.sil.org/literacy-and-education/orthography-development.) If you tell ParaTExt what the alphabet for the target language is, then it can find any characters not in the approved alphabet. But, the orthography is also more than just individual letters. Many languages have accents or other diacritics that need to be specified. Some examples are: ã, ó, ò, ô, ë, and č but there are many more that are used in the world's languages. In some languages combinations of letters are used to represent a single sound. Things like "ch", "th", "gb", "nkw" or even "ng'". These are called digraphs when two characters are used or multi-graphs when more than two characters are used for one sound. ParaTExt will actually work reasonably well if you don't specify the multi-graphs as long as alphabetic characters are used.
However when characters that are normally punctuation, like the apostrophe, are used as a word-forming character, then ParaTExt can become confused and not work as you expect. Two of the most common punctuation characters that are used as word-forming are the apostrophe and the hyphen. Since the apostrophe is so commonly used in some parts of the world, later in this course we will show you the extra configuration that ParaTExt needs to use the apostrophe as if it were a "letter". The procedure will be the same for any other punctuation characters used as a word-forming character.
|NOTE: Orthography also includes rules for punctuation, numbers, the marking of direct and indirect quotes and some other things but those topics will be covered else where in this course.|
Configuring an orthography in ParaTExt is a three step process. The first is listing the characters in upper and lowercase in alphabetical order. This includes any characters with accents or diacritics and any multi-graphs that are important in the target language. The next step is to specify any word-medial punctuation. The last step is to generate a list of all characters in the text and approve or disapprove them as valid in your orthography. We will now go through the steps in more detail.
NOTE: The goal of this course is to teach the basic operation of ParaTExt. We cannot cover all possible orthographic issues. But our aim is to show in principle how ParaTExt handles the typical issues related to orthography and spelling. If you need additional help contact a Language technology or linguistics consultant. (See the following articles http://www.sil.org/literacy-and-education/resources-developing-orthographies for additional help.)
The alphabet is specified in two places in ParaTExt to accommodate all of the checking and sorting that ParaTExt can do. The first is Language Settings for the project and the second is the Character Inventory. We will cover the Character Inventory later. The Language Settings is where upper and lower case of each letter is defined and the alphabetical sort order is set for the language. Any time ParaTExt sorts words it will use the order set in Language Settings.
It is important to list all of the characters with diacritics and important multi-graphs in order in the Language Settings so that ParaTExt will sort words as you expect. Otherwise, you many not be able to find words when doing searches or working with any of the tools that make lists of words. To open Language Settings, click on your project to make sure it has focus, click the Project menu, then click Language Settings...
Specifying the alphabetic characters
Below is a screen capture of the Language Settings for Spanish. All the characters needed for the language (here Spanish) are listed in alphabetical order. The lowercase character should be listed first followed by "/" then the uppercase character. Doing this tells ParaTExt what the upper and lowercase letters are and the alphabetic order that you want to use.
Sample Language Settings
Accents and other diacritics
Characters with diacritics are typically placed on the same line as the base character. In the screenshot above, the line a/A á/Á follows this rule.
Digraphs and multi-graphs
Spanish has two multi-graphs: the "ch" and the "ll". Notice that these are placed on a separate line. Words beginning with "ch" will now be sorted after "c", and words beginning with "ll" will be sorted after "l".
Specifying multi-graphs is more important for some languages than others. For example, in languages with frequent pre-nasalized multi-graphs (like nd, mp, mb, and nt) or labialized multi-graphs (like bw, tw, and dw) ParaTExt will work better if it knows to treat them as single units: When you want to hyphenate long words, ParaTExt will never place a hyphen in the middle of a multi-graph which has been defined in Language Settings. However, if you do not define the multi-graphs, ParaTExt's hyphenation tool will quickly learn what the multi-graphs are anyway when you correct incorrectly hyphenated words.
If you have trouble specifying your characters, you can find more information by searching for "language settings" in Help. If you need more help ask your project administrator, a consultant or a language technology specialist.
Punctuation used as a word-forming character
In some orthographies, characters that are typically thought of as punctuation are used as word-forming character. One example is the apostrophe. It is often used to represent the glottal stop. When the apostrophe is used in this way it is unclear to ParaTExt whether to treat the character like punctuation or a letter. Because of this confusion, it is strongly recommended not to use the apostrophe (UNICODE 0027) as a glottal stop or any other word-forming character. It is recommended to use either the modifier letter apostrophe (UNICODE 02BC) or the Latin small letter saltillo (UNICODE A78C). You can read about these recommendations as well as tips on how to type these characters under "What is recommended character for glottal stop?" in the ParaTExt Help.
For a description of UNICODE click on the following link:
Below is an example of an language that uses the saltillo to represent a glottal stop in orthography. Notice how the ' is placed on the line after z\Z and that the uppercase is not shown.If you do use the normal apostrophe as word-forming, but you do not add it to the Alphabetic Characters list under Language Settings and the apostrophe is not listed as Word-medial punctuation that is found on the Other Characters tab (see Takwane example below), then ParaTExt will break words into parts every time it finds an apostrophe. An example would be that a word like vong'onong'ono would be interpreted as three words vong, onong, and ono. If you have this situation do not approve the spelling of each part, correct the Language Settings so they accurately reflect the orthography being used.
This feature is for older non-UNICODE fonts. Most projects will not need to put anything in the non-standard diacritics box. However, if you type specific characters to represent diacritics, enter them in the Non-standard diacritics box. For example, if you type e# to represent ê, or n^ to represent ñ, enter # ^ as in the example below.
|If you already have a translation project in ParaTExt you may want to open it and look at some of these menus and features yourself. If you don't already have an editable ParaTExt project, you could download the practice project PEh. (Download the practice project PEh by clicking on the link: http://lingtran.net/tiki-download_file.php?fileId=985 . I will download as a backed up project. You will need to "restore" PEh into your ParaTExt. This is done from within in ParaTExt. For information on how to restore a project, search in help for "how do I restore a ParaTExt backup".)|
3) Where is the menu for specifying alphabetic characters, their correct order, and word-medial punctuation?
4) Where is the tab for specifying the alphabetic characters (also called word-forming characters)?
5) Can you specify the next 13 alphabetic characters for this project?
6) Where is the tab for specifying word-medial punctuation?
7) Character Inventory and checking for invalid characters in a text
Now we are ready for the last step in configuring the orthography. That step is to generate a list of all characters in the text and marking them as valid or invalid in the Characters Inventory. Once this is done then ParaTExt can identify characters in the text that don't belong, and you can do this check routinely to find future errors. (This check is called the Character Check which is part of Basic Checks. There is a separate unit in this course teaching all of the Basic Checks.) The Characters Inventory tool is found on the Checking menu.
When you click on Character Inventory and the Character Inventory Tool looks like the following:
A new feature of ParaTExt 8 is that all the alphabetic characters, and word-medial punctuation that were entered into Language Settings will automatically be marked as valid in the character inventory. This feature should mark all of the valid alphabetic characters for the target language, unless the Language Settings were not filled in correctly. Using the Character Inventory you can quickly see any alphabetic characters that you forgot to add to the Language Settings, and letters that should not be in the text and need changing.
The purpose of this inventory is to allow you to:
- check the characters in a text and the context in which they occur,
- confirm which characters are valid, and
- flag or correct any invalid or unknown characters.
Each character that is found in the text must be marked as valid, invalid, or unknown. If you have Paratext installed, stop now and open your project. Check to make sure that all the valid characters are marked as valid.
Once you have checked that you are only using the characters and diacritics approved for the target language, you are ready to work on the consistency of the spelling in the translation. This is the bigger job, since ParaTExt has to be told if a word is spelled correctly or not. ParaTExt has tools to organize the words to show you similar words to help you as you mark words as correct or incorrect, but someone has to mark each unique word as correct or incorrect. Fortunately, once a word is marked as being spelled correctly, ParaTExt will automatically mark it as correct everywhere else it occurs, and ParaTExt will remember incorrect spellings and mark those spellings mistakes if they are repeated in the future.
Watch the video below demonstrating the Character Inventory tool, and checking for invalid characters in a text.