Hello everyone,
I’m looking for recommendations on fine-tuning a model for translation tasks.
Although the dataset can be reduced to around 200KB of sequences, each input sequence pair can be quite large—up to 1MB. Even though these sequences are essentially computer code, a base model trained on plain language might still offer benefits by providing fundamental knowledge that enhances performance.
I also plan to train the same model architecture from scratch and compare its performance with the fine-tuned version.
Here are the model requirements:
- Open license for research (preferably, though not necessary for commercial use)
- Transformer-based with separate encoder and decoder components
- Ability to handle substantial context length, measured in thousands of tokens
- Ideally supports inference capabilities