What are the top open-source, fine-tunable, large-context encoder-decoder models available today?

Hello everyone,

I’m looking for recommendations on fine-tuning a model for translation tasks.

Although the dataset can be reduced to around 200KB of sequences, each input sequence pair can be quite large—up to 1MB. Even though these sequences are essentially computer code, a base model trained on plain language might still offer benefits by providing fundamental knowledge that enhances performance.

I also plan to train the same model architecture from scratch and compare its performance with the fine-tuned version.

Here are the model requirements:

  • Open license for research (preferably, though not necessary for commercial use)
  • Transformer-based with separate encoder and decoder components
  • Ability to handle substantial context length, measured in thousands of tokens
  • Ideally supports inference capabilities
4 Likes

Papers with Code offers state-of-the-art models for various datasets, making it a useful starting point, though it’s not always perfect.

You can explore it here: (Machine Translation | Papers With Code)

Additionally, a Google search can uncover other potential baseline models for transpiring, including commercial options.

3 Likes

Are there some good tutorials on training (toy) translation models in PyTorch?

2 Likes

Big science T0pp is one of the larger ones at 11b parameters. Huggingface has some more details bigscience/T0pp · Hugging Face

1 Like

How much GPU have you got?