What are the top open-source, fine-tunable, large-context encoder-decoder models available today?

Taylor · September 5, 2024, 8:15am

Hello everyone,

I’m looking for recommendations on fine-tuning a model for translation tasks.

Although the dataset can be reduced to around 200KB of sequences, each input sequence pair can be quite large—up to 1MB. Even though these sequences are essentially computer code, a base model trained on plain language might still offer benefits by providing fundamental knowledge that enhances performance.

I also plan to train the same model architecture from scratch and compare its performance with the fine-tuned version.

Here are the model requirements:

Open license for research (preferably, though not necessary for commercial use)
Transformer-based with separate encoder and decoder components
Ability to handle substantial context length, measured in thousands of tokens
Ideally supports inference capabilities

Joy · September 5, 2024, 8:18am

Papers with Code offers state-of-the-art models for various datasets, making it a useful starting point, though it’s not always perfect.

You can explore it here: (Machine Translation | Papers With Code)

Additionally, a Google search can uncover other potential baseline models for transpiring, including commercial options.

elon · September 5, 2024, 10:54am

Are there some good tutorials on training (toy) translation models in PyTorch?

kyle · September 5, 2024, 10:55am

Big science T0pp is one of the larger ones at 11b parameters. Huggingface has some more details bigscience/T0pp · Hugging Face

m_guru · September 5, 2024, 10:56am

How much GPU have you got?