cjpnudt的个人博客分享 http://blog.sciencenet.cn/u/cjpnudt

博文

[读论文]---084 基于神经网络的序列-序列学习

已有 3341 次阅读 2016-8-12 18:02 |系统分类:科研笔记

Sequence to Sequence Learning with Neural Networks

基于神经网络的序列-序列学习

Deep NeuralNetworks (DNNs) are powerful models that have achieved excellent performance ondifficult learning tasks. Although DNNs work well whenever large labeledtraining sets are available, they cannot be used to map sequences to sequences.In this paper, we present a general end-to-end approach to sequence learningthat makes minimal assumptions on the sequence structure. Our method uses amultilayered Long Short-Term Memory (LSTM) to map the input sequence to avector of a fixed dimensionality, and then another deep LSTM to decode thetarget sequence from the vector. Our main result is that on an English toFrench translation task from theWMT-14 dataset, the translations produced bythe LSTM achieve a BLEU score of 34.8 on the entire test set, where the LSTM’sBLEU score was penalized on out-of-vocabulary words. Additionally, the LSTM didnot have difficulty on long sentences. For comparison, a phrase-based SMTsystem achieves a BLEU score of 33.3 on the same dataset. When we used the LSTMto rerank the 1000 hypotheses produced by the aforementioned SMT system, itsBLEU score increases to 36.5, which is close to the previous state of the art.The LSTM also learned sensible phrase and sentence representations that aresensitive to word order and are relatively invariant to the active and thepassive voice. Finally, we found that reversing the order of the words in all sourcesentences (but not target sentences) improved the LSTM’s performance markedly,because doing so introduced many short term dependencies between the source andthe target sentence which made the optimization problem easier.

深度神经网络是在很多困难的学习任务上取得非常好性能的强有力的模型。尽管DNNs只要在大型的带标记的训练集有效的情况下,他们无法被用来进行从顺序到顺序的映射。在本文中,我们提出了一个点到点的方法来进行序列学习而最小化序列结构假设。我们的方法使用了一个多层的LSTM来映射输入序列为一个固定维度的向量,然后利用另一个深度LSTM来从这个向量解码为目标序列。我们的主要的结果是在一个英文到法文的翻译任务(WMT-14数据集)上,LSTM实现了一个34.8BLEU分,而LSTMBLEU分数被非词典词惩罚。另外,LSTM没有在长难句上有困难。作为对比,一个基于短语的SMT系统在同样的数据集上实现了33.3%BLEU分数。当我们用LSTM来重新排列1000个上述SMT系统的假设的时候,它的BLEU分明显上升到36.5%,和最好的方法接近。LSTM同样学习敏感的的短语和句子表达,特别是和词顺序有关的,而且和正面或者负面没有关系。最后,我们发现在所有来源句子而不是目标句子中翻转单词的顺序会显著提高算法的性能,因为这样做会在来源和目标句子之间建立很多的依赖,让最优化更简单。




https://blog.sciencenet.cn/blog-656867-996093.html

上一篇:[读论文]---083 通过标齐和翻译的联合学习实现神经机器翻译
下一篇:[读论文]---085 词和短语的分布式表示以及他们的合成
收藏 IP: 202.197.9.*| 热度|

0

该博文允许注册用户评论 请点击登录 评论 (0 个评论)

数据加载中...

Archiver|手机版|科学网 ( 京ICP备07017567号-12 )

GMT+8, 2024-4-20 05:27

Powered by ScienceNet.cn

Copyright © 2007- 中国科学报社

返回顶部