Model Merging, Mixtures of Experts, and Towards Smaller LLMs Home | Substack Load Balancing: to prevent overfitting on the same experts Auxiliary Loss