Introducing ToddlerBERTa, a language model similar to BabyBERTa, that we have developed to explore its capabilities. We have experimented with five different models, each having different hyperparameters. To evaluate the performance of ToddlerBERTa, we have tested it on various benchmarks such as BLiMP, SuperGLUE, MSGS, and a Supplement benchmark from the BabyLM challenge.

Our findings show that smaller models can excel in specific tasks, while larger models perform better when trained on substantial amounts of data. Despite being trained on a smaller dataset, ToddlerBERTa demonstrates impressive performance, even rivaling the state-of-the-art RoBERTa-base model. It showcases robust language understanding, even with single-sentence pretraining, and competes with baselines that leverage broader contextual information.

Through our work, we provide valuable insights into the impact of hyperparameter choices and data utilization on the performance of language models. Our research contributes to the advancement of language models and helps in making informed decisions in their development.