Tue 31 May 2011

Using the myGengo Translation API with Python

For those who haven't heard the news, Google has deprecated a slew of their APIs, leaving many developers and services in a bit of a pinch. While there's admittedly still time for developers to transition, it's a good time to start considering alternatives. In my opinion, it's probably a good idea to choose an alternative that has the technology in question as a core competency, otherwise you're much more liable to have your provider pull the rug out from underneath you.

With that said, many engineers are hit particularly hard by the deprecation of the Translation API that Google has so generously offered up to this point, and desire a solid alternative. While there are other machine translation APIs out there, I wanted to take a moment to show more developers how integrating with the myGengo API can get them the best of both worlds.

A Polite Heads Up

As of May 31, 2011 I am currently working with myGengo to further develop their translation services. However, this post represents my own thoughts and opinions, and in no way represents myGengo as a company. myGengo offers both free machine translation and paid human translation under one API. I simply want to show other developers that this is very easy to use.

Getting Started with the myGengo API

This takes all of 5 minutes to do, but it's required before you can start getting things translated. A full rundown is available, which includes details on using the API Sandbox for extensive testing. For the code below, we're going to work on the normal account.

A Basic Example

The myGengo API is pretty simple to use, but the authentication and signing can be annoying to do at first (like many other APIs). To ease this, there's a few client libraries you can use - the one I advocate using (and to be fair, I also wrote it) is the mygengo-python library, which you just installed. With this it becomes incredibly easy to start making calls and submitting things for translation:

from mygengo import MyGengo

gengo = MyGengo(
    public_key = 'your_public_key',
    private_key = 'your_private_key',
    sandbox = False, # possibly True, depending on your dev needs
)

print gengo.getAccountBalance()['response']['credits']

The above script should print out your current account credits.

Actually Translating Text

Extending the above bit of code to actually translate some text is very simple - the thing to realize up front is that myGengo works on a system of tiers, with said tiers being machine, standard, pro, and ultra. These dictate the type of translation you'll get back. Machine translations are the fastest and free, but least accurate; the latter three are all tiers of human translation, and their rates vary accordingly (see the website for current rates).

For the example below, we're going to just use machine translation, since it's an effective 1:1 replacement for Google's APIs. A great feature of the myGengo API is that you can upgrade to a human translation whenever you want; while you're waiting for a human to translate your job, myGengo still returns the machine translation for any possible intermediary needs.

Note: It's your responsibility to determine what level you need - if you're translating something to be published in another country, for instance, human translation will inevitably work better since a native translator understands the cultural aspects that a machine won't.
# -*- coding: utf-8 -*-
from mygengo import MyGengo

gengo = MyGengo(
    public_key = 'your_mygengo_api_key',
    private_key = 'your_mygengo_private_key',
    sandbox = False, # possibly False, depending on your dev needs
)

translation = gengo.postTranslationJob(job = {
    'type': 'text', # REQUIRED. Type to translate, you'll probably always put 'text' here (for now ;)
    'slug': 'Translating English to Japanese with the myGengo API', # REQUIRED. For storing on the myGengo side
    'body_src': 'I love this music!', # REQUIRED. The text you're translating. ;P
    'lc_src': 'en', # REQUIRED. source_language_code (see getServiceLanguages() for a list of codes)  
    'lc_tgt': 'ja', # REQUIRED. target_language_code (see getServiceLanguages() for a list of codes)
    'tier': 'machine', # REQUIRED. tier type ("machine", "standard", "pro", or "ultra")
})

# This will print out 私はこの音楽が大好き!
print translation['response']['job']['body_tgt']

This really couldn't be more straight-forward. We've just requested our text be translated from English to Japanese by a machine, and gotten our results instantly. This is only the tip of the iceberg, too - if you have multiple things you need translated, you can actually bundle them all up and post them all at once (see this example in the mygengo-python repository).

Taking it One Step Further!

Remember the "human translation is more accurate" point I noted above? Well, it hasn't changed in the last paragraph or two, so let's see how we could integrate this into a web application. The problem with human translation has historically been the human factor itself; it's slower because it has to pass through a person or two. myGengo has gone a long way in alleviating this pain point, and their API is no exception: you can register a callback url to have a job POSTed back to when it's been completed by a human translator.

This adds another field or two to the translation API call above, but it's overall nothing too new:

# -*- coding: utf-8 -*-
from mygengo import MyGengo

gengo = MyGengo(
    public_key = '',
    private_key = '',
    sandbox = False, # possibly False, depending on your dev needs
)

translation = gengo.postTranslationJob(job = {
    'type': 'text', # REQUIRED. Type to translate, you'll probably always put 'text' here (for now ;)
    'slug': 'Translating English to Japanese with Python and myGengo API', # REQUIRED. Slug for internally storing, can be generic.
    'body_src': 'I love this music!', # REQUIRED. The text you're translating. ;P
    'lc_src': 'en', # REQUIRED. source_language_code (see getServiceLanguages() for a list of codes)  
    'lc_tgt': 'ja', # REQUIRED. target_language_code (see getServiceLanguages() for a list of codes)
    'tier': 'standard', # REQUIRED. tier type ("machine", "standard", "pro", or "ultra")
    # New pieces...
    'auto_approve': 0,
    'comment': 'This is an optional comment for a translator to see!',
    'callback_url': 'http://yoursite.com/your/callback/view'
})

# This will print out a machine translation (私はこの音楽が大好き!), and you can 
# set up a callback URL (see below) to get the translated text back when it's been
# completed by a human. You can alternatively poll in intervals to check.
print translation['response']['job']['body_tgt']

# Credit for the note about machine translation goes to https://github.com/aehlke, who
# pointed out where I forgot to note. ;)

All we've done here is change the level we want, to use a human (standard level), and supplied a callback url to post the job to once it's completed. As you can see, the response from our submission includes a free machine translation to use in the interim, so you're not left completely high and dry. You can also specify a comment for the translator (e.g, if there's some context that should be taken into account).

Now we need a view to handle the job being sent back to us when it's completed. Being a python-focused article, we'll use Django as our framework of choice below, but this should be fairly portable to any framework in general. I leave the routing up to the reader, as it's largely basic Django knowledge anyway:

def update_job(request):
    """Handles parsing and storing a POSTed completed job from myGengo.
    """
    if request.method == "POST":
        # Load the POSTed object, it's JSON data.
        resp = json.loads(resp)
        
        # Your translated text is now available in resp['body_tgt']
        # Save it, process it, whatever! ;D
        
        return HttpResponse(200)
    else:
        return HttpResponse(400)

Now, wasn't that easy? Human translations with myGengo are pretty fast, and you get the machine translation for free - it makes for a very bulletproof approach if you decide to use it.

Room for Improvement?

mygengo-python is open source and fork-able over on GitHub. I'm the chief maintainer, and love seeing pull requests and ideas for new features. If you think something could be made better (or is lacking completely), don't hesitate to get in touch!

Ryan around the Web