Research.

"I will now proced to decode."

The last mile in machine translation.

Join us

The last mile
in machine translation.

At Language Weaver, we have a long and storied history of research and development in the field of Natural Language Processing. Our multi-faceted, multi-national team conducts state-of-the-art research with the short-term aim of advancing the science, and longer term goals to introduce this work into our tools and technologies that help our customers to better understand their content and create new content more effectively. 

Some of the areas our team are actively carrying out research and development, include: 

  • Neural Machine Translation 
  • MT Quality Estimation 
  • Multilingual Summarization 
  • Named Entity Recognition 
  • Sentiment Analysis 
  • Text Generation 
  • Text Simplification and Paraphrasing 
  • Question Answering 
  • Topic and Style Analysis 

We regularly attend and speak at conferences, and publish our work in well-known places such as NAACL, (E)ACL, EMNLP, MT Summit, and others. You can see some of our selected publications below.

Life at Language Weaver

The best aspect of working at Language Weaver is that it’s never dull! Our team are never stuck working on the same task, or researching the same topic constantly, because we are always working with new clients on new data, interesting languages, and wide-ranging domains and applications. 

There is always the chance to refine and broaden skillsets, trying out new techniques to solve real world problems for customers who are processing and translating billions of words each year. Because our team comes from such a broad range of backgrounds, we learn a lot from each other too. 

With bases in Los Angeles, Cluj-Napoca, Dublin, and other locations in Europe, our team of scientists, engineers, and linguists form a dynamic, energetic team with strong grounding in NLP and a willingness to broaden horizons. Between us, we almost speak as many languages as our MT engines can translate too!

In addition to the day to day, we also have a weekly reading group where we present our own research and other leading papers in the field. On top of that, we publish a weekly blog -  “The Neural MT Weekly”  - read by 1,000’s of readers each week!

Interested in joining our team? Contact us!

Publications 

Selected Publications:    

2021:    

Roemmele, M. and Sidhpura, D., and DeNeefe S., and Tsou, L. (2021). AnswerQuest: A System for Generating Question-Answer Items from Multi-Paragraph Documents. 16th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2021), Demo Track.    

2020:    

Saunders, D., Feely, W.  and Byrne, B. (2020). Inference-only sub-character decomposition improves translation of unseen logographic characters, Proceedings of the 7th Workshop on Asian Translation.    

2019:    

Feely, W., Hasler, E.  and    de Gispert, A. (2019). Controlling Japanese Honorifics in English-to-Japanese Neural Machine Translation. Proceedings of the 6th Workshop on Asian Translation.    

Saunders, D., Stahlberg, F., de Gispert, A. and Byrne, B. (2019). Domain Adaptive Inference for Neural Machine Translation. Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics (ACL).    

Roemmele, M. (2019). Identifying Sensible Lexical Relations in Generated Stories. Workshop on Narrative Understanding at the 2019 Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL 2019)    

2018:    

Iglesias, G., Tambellini, W., de Gispert, A., Hasler, E. and Byrne, B. (2018). Accelerating NMT Batched Beam Decoding with LMBR Posteriors for Deployment. Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics (NAACL-HLT).    

Hasler, E., de Gispert, A., Iglesias, G. and Byrne, B (2018). Neural Machine Translation Decoding with Terminology Constraints. Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics (NAACL-HLT)    

Saunders, D., Stahlberg, F., de Gispert, A and Byrne, B. (2018). Multi-representation Ensembles and Delayed SGD Updates Improve Syntax-based NMT. Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (ACL).    

Stahlberg, F., de Gispert, A. and Byrne, B. (2018). The University of Cambridge's Machine Translation Systems for WMT18. Proceedings of the Conference of Machine Translation (WMT).    

2017:    

Hasler, E., de Gispert, A., Stahlberg, F., Waite, A. and Byrne, B. (2017). Source sentence simplification for statistical machine translation. Computer Speech & Language, vol 45, pps 221-235.    

Stahlberg, F., de Gispert, A., Hasler, E. and Byrne, B. (2017). Neural Machine Translation by Minimising the Bayes-risk with Respect to Syntactic Translation Lattices. Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics (EACL).    

Hasler, E., Stahlberg, F., Tomalin, M. de Gispert, A. and Byrne, B. (2017). A Comparison of Neural Models for Word Ordering. International Conference on Natural Language Generation (INLG).    

2015  

Gispert, A., Iglesias, G., Byrne, W., (2015) Fast and Accurate Preordering for SMT using Neural Networks, North American Chapter of the Association for Computational Linguistics: Human Language Technologies    

Dreyer, M., & Graehl, J. (2015) hyp: A Toolkit for Representing, Manipulating, and Optimizing Hypergraphs, North American Chapter of the Association for Computational Linguistics: Human Language Technologies    

Dreyer, M., & Dong, D., (2015) APRO: All-Pairs Ranking Optimization for MT Tuning, North American Chapter of the Association for Computational Linguistics: Human Language Technologies    

2014    

May, J., Benjira, Y., Echihabi, A., (2014) An Arabizi-English Social Media Statistical Machine Translation System, Association for Machine Translation in the Americas    

Jehl, L., Gispert, A., Hopkins, M., Byrne, M., (2014) Source-side preordering for translation using logistic regression and depth-first branch-And-bound search, European Chapter of the Association for Computational Linguistics, (pp 239-248).    

2013    

Hopkins, M., & May, J. (2013) Models of Translation Competitions. Proceedings of ACL, 2013.  

Munteanu, D. S., & Marcu, D. (2013) Exploiting Comparable Corpora. In Building and Using Comparable Corpora, Springer Publications.    

2012  

Soricut, R., Bach, N., & Wang, Z. (2012) The SDL Language Weaver Systems in the WMT12 Quality Estimation Shared Task, InProceedings of the Seventh Workshop on Statistical Machine Translation(WMT 2012), June 2012, Montreal, Quebec, Canada.   

Dreyer, M. & Marcu, D. (2012) HyTER: Meaning-Equivalent Semantics for Translation Evaluation, In Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Montreal, Canada.    

2011    

Hopkins, M., & May, J. (2011) Tuning as Ranking. Proceedings of EMNLP, 2011.    

Hopkins, M., Langmead, G., & Vo, T.(2011) Extraction Programs: A Unified Approach to Translation Rule Extraction. Proceedings of WMT, 2011.    

2010    

Soricut, R., & Echihabi, A. (2010) TrustRank: Inducing Trust in Automatic Translations via Ranking, Association for Computational Linguistics Conference, (pp 612-621).    

Hopkins, M., & Langmead, G. (2010) SCFG Decoding Without Binarization. Proceedings of EMNLP, 2010.    

Wang, W., May, J., Knight, K., & Marcu, D. (2010) Re-Structuring, Re-Labeling, and Re-Aligning for Syntax-based Machine Translation, Computational Linguistics. (36.2).    

2009    

Hopkins, M., & Langmead, G. (2009) Cube Pruning as Heuristic Search. Proceedings of EMNLP, 2009.    

Yamada, K., & Muslea, I. (2009). Re-ranking for large-scale statistical machine translation, Learning Machine Translation, (pp 151-169)    

2007   

Wang, W., Knight,K., & Marcu, D. (2007) Binarizing Syntax Trees to Improve Syntax-Based Machine Translation Accuracy. Proceedings of EMNLP-07, pp. 746-754, Prague.    

2006    

Marcu, D., Wang, W., Echihabi, A., & Knight, K. (2006) SPMT: Statistical Machine Translation with Syntactified Target Language Phrases", Empirical Methods in Natural Language Conference, (pp 44-52).    

Huang, B., & Knight, K. (2006). Relabeling Syntax Trees to Improve Syntax-Based Machine Translation Quality. Proceedings of HLT-NAACL, 2006.