Random thoughts shooting out of volatile mind
Tutorial: Python and C Coupling Continued (Dhvani with Python)
In my previous post I gave a brief tutorial on how to use C shared library in Python using ctypes module. The example for the tutorial was very simple and didn't involve any complex data structures. In this post I'll be giving a brief insight on how to use ctypes to interact with C libraries involving complex data structures and function pointer. In this example I'll be using Dhvani an Indian Language TTS system. Here is the Dhvani API reference.

    Problem: Use dhvani API's to genreate speech or a output file in Python

Let's have a look at the Dhvani API reference (link given above). We have a structure  dhvani_options which is very important part of dhvani API. Below is the structure

    typedef struct {
        struct dhvani_VOICE *voice; /* not used now.. for future use.*/
        float pitch;
        float tempo;
        int rate;
        dhvani_Languages language;
        int output_file_format;
        int isPhonetic;
        int speech_to_file;
        char* output_file_name;
        t_dhvani_synth_callback* synth_callback_fn;
        t_dhvani_audio_callback* audio_callback_fn;
    } dhvani_options;

 And an enum dhvani_ERROR

    typedef enum {
        DHVANI_OK = 0,
    } dhvani_ERROR;

Now if we carefully observe the dhvani_options structure it involves a few complex data structure like dhvani_VOICE (structure) t_dhvani_synth_callback and t_dhvani_audio_callback and an enum dhvani_Languages. Besides this all others are basic data structures. I had to hack the dhvani source code to see what are this complex data structures are and below is the snippets which I found in dhvani source code.

    typedef enum {
        HINDI = 1,
        MALAYALAM = 2,
        TAMIL = 3,
        KANNADA = 4,
        ORIYA = 5,
        PANJABI = 6,
        GUJARATI = 7,
        TELUGU = 8,
        BENGALI = 9,
        MARATHI = 10,
        PASHTO = 11
    } dhvani_Languages;

    typedef int (t_dhvani_synth_callback) (int);
    typedef int (t_dhvani_audio_callback) (short *, int);

 dhvani_VOICE structure was never defined it was meant only for future use, so it spared me some time :). I also found one more enum in the source which are used by the API's. I'm listing them below.

    typedef enum {
        DHVANI_OGG_FORMAT = 0,
    } dhvani_output_file_format;

So we need to define few enums and a dummy strucute and 2 function pointers in Python before we can define dhvani_options structure in Python. Look at the below snippet

Since there is nothing equivalent to enums in Python we just define constants with int value to match enumeration in C. We are using CFUNCTYPE from ctypes library to define the callback function which are nothing but function pointers in CFUNCTYPE allows us to specify the return type of the functions and the arguments to the function. Next is dhvani_VOICE structure as you can see its just a place holder.
So now we have defined all the required data structures its time we define the main structure of dhvani i.e dhvani_options. Below is the snippet of code for defining dhvani_options in Python.

To define a structure we need to create a class that inherits from ctypes.Structure clas. All the fields of the strucutre are defined as the list of tupples __fields__. Each tuple contains a string specifying name of the field in C structure and its type. The new type you can see here is POINTER which is a ctype function for C Pointer type. Rest of the code is self explanatory.

With this we have all the required data structures defined in Python. Next step is to find the required functions and map them to Python equivalent function calls. Inspecting the dhvani API reference we  see that function calls dhvani_say and dhvani_speak_file are required to generate audio from the text. Below is the prototype of these functions.
dhvani_ERROR dhvani_say(char *, dhvani_options *);
dhvani_ERROR dhvani_speak_file(FILE *, dhvani_options *);
 dhvani_say is a simple function but dhvani_speak_file expects FILE* as first argument!! Now that is problematic, we don't have anything equivalaent to C's FILE* structure. After searching for a while finally I found this answer. The trick is use Python's file object to open the file and use fileno() API of file object to get back the integer file descriptor, then use fdopen function call provided by standard C library to get back FILE* structure for the descriptor. Since dhvani uses standard C library shared library of dhvani will contain reference to fdopen so next step is to map fdopen to Python function call. Below is prototype of fdopen which I'll be using.
FILE *fdopen(int fd, const char *mode);
 And below is the snippet for mapping all these 3 functions to Python

As you can see wherever FILE* was used in C functions I used c_void_p i.e void pointer, this is mainly because pointers all have same memory size the type only indicates what type of content is present in the memory location pointed by the pointers. So in this case I'm telling python run-time that its void pointer but during the call where prototype expects (FILE*) it will be automatically typecasted (at least I assume so if you have better explanation please comment)

So we have everything read now and we can directly use dhvani API's to get the audio generated for any Indian Language from our python code. Below is the code of tts module of SILPA which now uses dhvani shared library and is out of its experimental status.

I won't be explaining rest of the code here as its self explanatory and my job was to explain the important part i.e coupling dhvani library with Python. And that is done :). Please give the suggestions in comments :)

P.S: The entire code may not be directly usable as it has code required to integrate it with SILPA

-- Vasudev
Posted by: copyninja on Saturday, 26 March 2011

blog comments powered by Disqus
Fork me on GitHub