Travel portal Expedia recently released a tool to help its clients make sense of User-Generated Content (UGC). After all, what’s a hotel manager to do when 80 hotel guests say the free breakfast is “fine,” nine say it’s “awful,” one says it’s “awesome,” and 14 say it’s “something else”?
Expedia’s tool is sensible for sentiment analysis, but here’s the catch: It can only process English-language reviews, at least for now. We don’t know for sure if Expedia will introduce a multilingual version of this tool further down the road, but if it doesn’t—or until it does—here’s what its users will miss.
棒极了, Horrível, and ಪರವಾಗಿಲ್ಲ
(The above phrases mean “it’s awesome,” “it’s awful,” and “middling” in Chinese, Brazilian Portuguese, and Kannada, respectively.)
We frequently talk about the growth of local language content, with more and more non-English-speaking populations coming online in different parts of the world. But what goes unmentioned is that these users are not only consuming online content in their languages, they're also creating it. Though there’s no exact statistic on the volume of user-generated content in languages other than English, ask anyone in the tourism or ecommerce categories about it, and they’ll tell you it’s growing fast.
By tapping into only English user reviews, you are just listening to half the story. The other half is very much worth your while.
Why is multilingual UGC important?
To understand this, we must first realize that the dollars renminbis, rupees, and riyals of today spent on global travel and ecommerce—the two sectors most sensitive to UGC—are not originating in Anglophone countries. The speakers of non-English languages are not only asserting their presence online, but also traveling more and buying more online. Without a doubt, these are important constituencies.
Next, if you don’t understand these individuals’ feedback, there’s no way you can respond to their complaints—you may be completely missing out on what your Chinese guests didn’t like about your rooms or breakfast. Sometimes, the problems may be individual-specific, but what if it’s culture-specific? You risk losing an entire community of users.
What can you do about it?
Sentiment analysis of multilingual reviews can be done through two methods: crowdsourcing and Machine Translation (MT). While crowdsourcing may be best when you need to know the feedback around a newly launched product or service, MT is a more sustainable approach because of the volumes of content involved and the speed required for analysis. This is also a more sensible tactic when you’re not just trying to do sentiment analysis of UGC but also wanting to make it available in real time to your customers.
Ideally, this shouldn’t be done with free MT, even if you’re just trying to get a rough sense of the translation. Use MT engines trained for your domain to get the best results. Raw output from trained engines can be published directly, though you can always add post-editing if you want better output. Just make it clear to your customers how you obtained the translations; that way, you will set the right expectations. Customers will know that it’s only a rough meaning that the translation is trying to deliver.
Travel and ecommerce companies will do well to invest early in translation because their business tends to go global quickly. User feedback is actually gold that these companies may be sitting on. It goes toward improving the product (if understood and acted on) and making it more viral. It is also a content asset that is created almost for free. Solution providers for travel and ecommerce must liaise with the language services industry to help their clients take full advantage of this asset.