Papers
arxiv:1911.05507

Compressive Transformers for Long-Range Sequence Modelling

Published on Nov 13, 2019
Authors:
,
,
,

Abstract

The Compressive Transformer, an attentive sequence model with memory compression, achieves top language modeling results and can be applied to speech and reinforcement learning tasks.

We present the Compressive Transformer, an attentive sequence model which compresses past memories for long-range sequence learning. We find the Compressive Transformer obtains state-of-the-art language modelling results in the WikiText-103 and Enwik8 benchmarks, achieving 17.1 ppl and 0.97 bpc respectively. We also find it can model high-frequency speech effectively and can be used as a memory mechanism for RL, demonstrated on an object matching task. To promote the domain of long-range sequence learning, we propose a new open-vocabulary language modelling benchmark derived from books, PG-19.

Community

Sign up or log in to comment

Get this paper in your agent:

hf papers read 1911.05507
Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 16

Browse 16 models citing this paper

Datasets citing this paper 4

Spaces citing this paper 58

Browse 58 spaces citing this paper

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.