Aspell – Using more than one language
Problem: Need to spell check a word file and return mispelt words. Issue is that there are “false positives” as the words are for a different language, eg say Swahili and English words.
Solution: Aspell
Make sure you have aspell. The default dictionary mine had was english. Then install the dictionary for the other language you want to have. The dictionaries are here ftp://ftp.gnu.org/gnu/aspell/dict/0index.html. Download and follow install directions.
I got some sample text on Zanzibar from http://en.wikipedia.org/wiki/Zanzibar and saved it in a file called Zanzibar.txt.
shell> less Zanzibar.txt Wildlife The main island of Zanzibar, Unguja, has a fauna which reflects its connection to the African mainland during the last Ice Age. Endemic mammals with continental relatives include the Zanzibar red colobus, one of Africa's rarest primates, the Zanzibar red colobus may number only about 1500. Isolated on this island for at least 1,000 years, the Zanzibar red colobus (Procolobus kirkii) is recognized as a distinct species, with different coat patterns, calls and food habits than related colobus species on the mainland
Doing a spell check with aspell and the english dictionary
shell> cat Zanzibar.txt | aspell --lang=en list | sort -u birdlife colobus genet Jozani kirkii Pemba Procolobus servaline Unguja
Do a spell check with the english and swahili dictionary. (I just pipe the results to aspell again and check the words using the swahili dictionary).
shell> cat Zanzibar.txt | aspell --lang=en list | aspell --lang=sw list | sort -u birdlife colobus genet Jozani kirkii Procolobus servaline
If there are words that you would like to always ignore. You can put the in a file and have aspell ignore them.*
shell>less .aspell.sw.pws personal_ws-1.1 sw jozani
Note that the case of the word does not matter.
shell> cat Zanzibar.txt | aspell --lang=en list | aspell --lang=sw list --personal=/home/rodnee/.aspell.sw.pws --dont-suggest | sort -u birdlife colobus genet kirkii Procolobus servaline
To ignore english words you can do the same, just save the file as en and set the language as en.*
shell>less .aspell.en.pws personal_ws-1.1 en birdlife
*You can put as many words as you like.
Thanks ! 😉
very good explanation, especially for multi-language check
thanks dude! you just gave me some fresh ideas
Excellent option! so good and clear explanation!