Random thoughts shooting out of volatile mind
SILPA - An Indian Language Processing Application
SILPA stands for Swathantra Indian Language Processing Application. Swathantra stands for free as in freedom. Its a web framework which provides a set of language processing application for Indian Languages. It has also ported many existing stand alone application to the web framework. SILPA is entirely written using Python and its a framework written from the scratch with out following any of existing Python framework. Credit of writing this framework goes to Santhosh Thottingal.
Coming to  the implementation details of SILPA. SILPA is a modular framework, in the sense all the language processing services are organized as separate modules and user can activate or deactivate these modules by commenting or un-commenting respective lines in a configuration file (silpa.conf). SILPA uses the WSGI interface. Silpa also provides JSON RPC API's to expose this services to wide range of platform.
Some of the service provided by the SILPA include Spell checker service, transliteration which include ISO 15919 and IPA and according to Santhosh this is the first application providing the transliteration in ISO 15919 and IPA for Indian languages. Silpa also has Indic Soundex module which is the modification of Soundex algorithm to suit the indic languages. Of course all these services are still in development stage.
SILPA provided the transliteration service for Indic Languages before the Google's scriptconv.googlelabs.com but service was not so popular which will make people think that Google brought the transliteration service for Indian Language first. The website transliteration service which is powered by SILPA can be accessed at http://thottingal.in/go
Similar to transliteration service Silpa also provides spell checking service for all Indian Languages and English. PHP helper library is also written for this module to allow the developer consume this service from PHP. Online spell checking service for Indian Language is designed using this PHP library and is available at http://thottingal.in/projects/spellchecker/. The source code for PHP library is available in Github
You can find more about Silpa and service provided by it in this link. You can also try out these services here. SILPA is currently lacking developers, currently we have only 2 active developers Santhosh, and myself. Of course I'm beginner and helping in fixing minor issues and performance improvements and also Kannada related bug fixes. We will be happy to have more developers since there are many pending tasks to be done for this application. I'll list few of them below.
  • Improving the Spell checker algorithm and making it more accurate
  • Currently transliteration from English to Indic language is chained through Malayalam and we need developers from respective languages to provide more efficient and independent transliteration
  • We have plans to pull the words from as many sources as possible including Wiktionaries to create Indic language dictionary.
  • Writing documentation for the API interfaces (both in Python and JSON) etc/

If any one is interested in helping us in improving the service for their  language. You can join us by registering in Savannah and requesting for membership in the same link. You can also subscribe to Silpa mailing list. The basic skills we are looking for is
  • Proficiency in natural language
  • Basic knowledge in Python
  • Basic knowledge of working with Git.

Posted by: copyninja on Monday, 24 May 2010

blog comments powered by Disqus
Fork me on GitHub