Find the relevant ML papers you're missing.
Upload your draft or describe your research β surface the most relevant prior work across 106,800+ papers and see how each result compares.
Submit a paper or write a query β here's exactly what priorwork.fyi returns.
The dominant technical theme is the architectural evolution of sequence-to-sequence models for natural language processing, specifically focusing on overcoming the computational and representational limits of recurrent neural networks. This shared focus on machine translation benchmarks (e.g., WMT'14) and sequence transduction complexity likely caused their co-retrieval.
Confidence: highModifications to self-attention to achieve linear time complexity or faster autoregressive decoding.
Papers: [R5] [R7] [R10]
Enhancements or alternatives to positional encoding and structural sequence representations.
Papers: [R2] [R8] [R9]
Predominantly empirical deep learning systems papers introducing novel architectural components, complemented by formal complexity/mathematical equivalence analyses in a few cases.
The submitted paper is the foundational baseline that serves as the direct anchor for the entire retrieved set. While the retrieved papers largely propose modifications to fix the submitted model's quadratic complexity, lack of recurrence, or static depth, the submitted paper itself defines the exact paradigm they seek to improve. It uniquely establishes the core multi-head self-attention mechanism that all subsequent papers in this cluster either optimize, hybridize, or replace.
This paper introduces an end-to-end neural phrase extraction and cross-attention mechanism to address long-distance dependency failures in standard self-attention. New work on modeling long sentences must compare against its +1.72 BLEU improvement on sequences exceeding 45 tokens.
Both target sequence transduction and machine translation on the WMT14 benchmarks. This paper builds directly upon the submitted paper by adding explicit phrase pooling and cross-attention representations. Its length-based findings highlight a direct limitation in the submitted paper's standard self-attention mechanism, offering an explicit avenue for architectural improvement.
Ready to find prior work for your own research?