Every day, we create 2.5 quintillion bytes of data. This massive generation of info is a new thing: most of the data in the world today has been generated in only the last two years.
Experts predict 40 zettabytes of data by 2020. This data shows up in a bunch of different formats: text, sensor data, audio, video, click streams, log files and more. Further, it comes from anywhere and everywhere: tweets, barcodes systems, credit card transactions, online purchases, any social interaction data (Facebook), and browsing statistics. Lastly, big data shows up in all the world’s languages. It’s enough to make your head spin.
IBMs has four key concepts that alliteratively demonstrate the features and realities of big data:
- Volume: lots of it
- Velocity: data comes and goes, and changes constantly
- Variety: different forms
- Veracity: it is reliable? Can you make better decisions when using it?
Use Technology or Die
The volume of data would overwhelm anyone. After taking a deep breath, most executives call in their more analytical people to figure out how to use this data to advance their business. These businesses recognize the need to crawl and scrape all potential sources to aggregate the data, clean it, convert it (i.e., from audio to text), analyze it, rank it (what’s useful, what’s not), consolidate it, put it into a Pivot Table, and then share it. But how?
Obviously, you can’t do this manually, so companies turn to big data analytics platforms to help. These companies offer state of the art technologies using advanced analytics to control the flood of information. These tools sort and analyze online information. However, no tool can replace human analysis and judgment. A recent article in Wired magazine states that more than 60 percent of companies need employees to develop new skills to gain real insights from big data.
Google “big data analysis” to see a long list of companies helping enterprises wrestle down this content and extract value from it.
Innovate and Dominate
Why would companies bother to do all this work? You can be sure there is a great expenditure, both of cash and resources.
Many of our clients need to use big data to better understand their market. From there, they can figure out how to provide customers worthwhile, relevant and consistent experiences. The result? Brand loyalty increases and so do sales.
With big data, businesses could do things such as:
- predict customer behavior
- analyze sentiment
- improve products
- create personalized shopping experiences
- advance decision making
- improve productivity and efficiency
- detect new market trends and changes in demand
- increase margin
- reduce expenditure
- catch errors
A McKinsey article from earlier this year elaborates on the things that can be achieved.
Don’t tell me you thought this data was only in English, or that, more likely, you didn’t think about the language of the data at all.
According to TAUS, emerging markets (BRIC) produce 36% of the world’s data, but that is set to increase to 62% by 2020. Big data comes from these geographies: US – 32%, Western Europe – 19%, China – 13%, India – 4%, and the rest of the world 32%. This is a universe of data, and a Tower of Babel at the same time. Enterprises worldwide would need this content in one language, whatever that language may be. This “Big Language” question is critical in figuring out how the power of Big Data can be harnessed globally.
It can’t possibly all be translated. Human translators would be able to provide translated content equivalent of one drop in the ocean. Besides that, the volume and the number of language pairs would break any translation budget. Machine Translation can translate a much higher volume of this deluge of content. But you can’t copy/paste bits of content into MT; there must be a connector to the sources in order to do it, with an automatic push/pull. Only when companies combine cloud-based big data and translation applications with intelligent human users can they begin to harness this data to their advantage.
This question – translating big data so it is useful to enterprises everywhere – is so important the TAUS is tackling it at their Dublin Industry Leaders Forum 2013. Some of the questions they hope to explore include:
- How will this data help advance the translation industry?
- Can MT reliably be used to translate this data?
- Can all this translated data help to train engines?
- Where do professional linguists fit in?
- How is quality controlled? Is quality important?
- How will data be aggregated and pulled from the myriad of sources?
- How will all that content be converted so all is in translatable format?
- What about the crowd / cloud? (Non-professional bilinguals translating content, ala Twitter and Wikipedia)?
The translation industry is going to play a crucial and strategic role here. It will help drive answers to these questions, and help clients resolve questions around how to manage multilingual big data. Small data is gone, as are small solutions. Data is going to get bigger and bigger and bigger, and companies must find ways to use and control it in order to compete, innovate, and get ahead. Technology, clearly, is the key. And this will happen in dozens of world languages.
If you have any thoughts, questions or answers on Big Data and Big Language, start a discussion below. And, see you in Dublin in early June!