In one of the previous posts, we have seen how tying embeddings can be destabilize the training if the the data do not satisfy certain assumptions (see here). In this post, we will explore a simple idea to get the best of both worlds: early training boost with tied embeddings and late training stability with untied one. This was a research idea that I had in mind however it did not work as well as expected so I decided to share it here.