Home > bash > Aspell – Using more than one language

Aspell – Using more than one language

Problem: Need to spell check a word file and return mispelt words. Issue is that there are “false positives” as the words are for a different language, eg say Swahili and English words.

Solution: Aspell

Make sure you have aspell. The default dictionary mine had was english. Then install the dictionary for the other language you want to have. The dictionaries are here ftp://ftp.gnu.org/gnu/aspell/dict/0index.html. Download and follow install directions.

I got some sample text on Zanzibar from http://en.wikipedia.org/wiki/Zanzibar and saved it in a file called Zanzibar.txt.

shell> less Zanzibar.txt
Wildlife
The main island of Zanzibar, Unguja, has a fauna which reflects its connection to the African mainland during the last Ice Age. Endemic mammals with continental relatives include the Zanzibar red colobus, one of Africa's rarest primates, the Zanzibar red colobus may number only about 1500. Isolated on this island for at least 1,000 years, the Zanzibar red colobus (Procolobus kirkii) is recognized as a distinct species, with different coat patterns, calls and food habits than related colobus species on the mainland

Doing a spell check with aspell and the english dictionary

shell> cat Zanzibar.txt | aspell --lang=en list | sort -u
birdlife
colobus
genet
Jozani
kirkii
Pemba
Procolobus
servaline
Unguja

Do a spell check with the english and swahili dictionary. (I just pipe the results to aspell again and check the words using the swahili dictionary).

shell> cat Zanzibar.txt | aspell --lang=en list | aspell --lang=sw list | sort -u
birdlife
colobus
genet
Jozani
kirkii
Procolobus
servaline

If there are words that you would like to always ignore. You can put the in a file and have aspell ignore them.*

shell>less .aspell.sw.pws
personal_ws-1.1 sw
jozani

Note that the case of the word does not matter.

shell> cat Zanzibar.txt | aspell --lang=en list | aspell --lang=sw list --personal=/home/rodnee/.aspell.sw.pws     --dont-suggest | sort -u
birdlife
colobus
genet
kirkii
Procolobus
servaline

To ignore english words you can do the same, just save the file as en and set the language as en.*

shell>less .aspell.en.pws
personal_ws-1.1 en
birdlife

*You can put as many words as you like.

Advertisements
  1. May 20, 2011 at 12:22 pm

    Thanks ! 😉

  2. March 8, 2015 at 1:57 pm

    very good explanation, especially for multi-language check

  3. November 30, 2015 at 3:06 am

    thanks dude! you just gave me some fresh ideas

  4. March 30, 2017 at 10:04 pm

    Excellent option! so good and clear explanation!

  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: