Welcome to Disarray

Hello and welcome to my blog! As a postdoctoral researcher in machine learning and artificial intelligence, I’m here to share my journey, insights, and notes with you. I strive to be as clear and grounded as possible, and I hope you’ll find something interesting and valuable in my posts.

Semantics of LLM, Weight Tying, and a story.
This summer, I was lucky enough to being accepted to ICML 2024 as spotlight poster. I told this story on reddit (you can see the original post here). However, now that I have a blog, I thought it would be nice to keep it here as well. Well, enough with the introduction, let's get to the story behind the paper titled By tying embedding you are assuming the distributional hypothesis.
Date: 26 September, 2024 | Estimated Reading Time: ~11 min

Superposition, Phase Diagrams, and Regularization.
For anyone who read through the toy models of superposition. This is mostly a rehash of the same ideas presented in the third section of Elhage et al. 's work. However, to keep things new, I will introduce a regularization term to the loss function. This will allow us to explore the phase diagram in a novel scenario. This analysis suggests that regularization inhibits superposition.
Date: 19 September, 2024 | Estimated Reading Time: ~15 min

T-Free: Sparse Embedding Representations.
In this post, I will discuss T-Free. A recent paper from aleph-alpha that introduces a new method for learning embedding representations that promises to be both memory and computationally efficient. Of course this post is based from their original paper. However, I will share some of my insights and thoughts on their work.
Date: 6 September, 2024 | Estimated Reading Time: ~15 min

Population Minimizer of The Categorical Cross Entropy Loss
In this post, I will explore how to find the population minimizer for the conditional risk associated with Categorical Cross Entropy. We will use only basic fundamental probability principles in order to make the derivation accessible to anyone with a basic understanding of probability. The only analytical tool needed are the Lagrange multipliers for constrained optimization problems.
Date: 29 August, 2024 | Estimated Reading Time: 10-30 min