Welcome to Disarray

Hello and welcome to my blog! As a postdoctoral researcher in machine learning and artificial intelligence, I’m here to share my journey, insights, and notes with you. I strive to be as clear and grounded as possible, and I hope you’ll find something interesting and valuable in my posts.

Mid-Training Untying: A Fix That Barely Fixes Anything
In one of the previous posts, we have seen how tying embeddings can be destabilize the training if the the data do not satisfy certain assumptions (see here). In this post, we will explore a simple idea to get the best of both worlds: early training boost with tied embeddings and late training stability with untied one. This was a research idea that I had in mind however it did not work as well as expected so I decided to share it here.
Date: 06 March, 2025 | Estimated Reading Time: ~5 min

Chaos Theory with Differential Topology (Part II).
This is the second part of our series on chaos theory. In this post, we will introduce the concept of differentiable manifolds and tangent spaces. Using these, we will explore the differentiation of maps between these objects. Finally, we will bring everything together with a simple example.
Date: 29 December, 2024 | Estimated Reading Time: ~20 min

Chaos Theory with Differential Topology (Part I).
A brief introduction to chaos theory from a differential topolody perspecitive from a guy that is studying these topics for fun. I will share some of the things I learned from Alligood et al. (1998). This is also an attempt to understand the basics of differential topology. Here we discuss basics of differential topology and basic definitions of dynamical systems.
Date: 15 December, 2024 | Estimated Reading Time: ~20-30 min

Semantics of LLM, Weight Tying, and a story.
This summer, I was lucky enough to being accepted to ICML 2024 as spotlight poster. I told this story on reddit (you can see the original post here). However, now that I have a blog, I thought it would be nice to keep it here as well. Well, enough with the introduction, let's get to the story behind the paper titled By tying embedding you are assuming the distributional hypothesis.
Date: 26 September, 2024 | Estimated Reading Time: ~11 min

Superposition, Phase Diagrams, and Regularization.
For anyone who read through the toy models of superposition. This is mostly a rehash of the same ideas presented in the third section of Elhage et al. 's work. However, to keep things new, I will introduce a regularization term to the loss function. This will allow us to explore the phase diagram in a novel scenario. This analysis suggests that regularization inhibits superposition.
Date: 19 September, 2024 | Estimated Reading Time: ~15 min

T-Free: Sparse Embedding Representations.
In this post, I will discuss T-Free. A recent paper from aleph-alpha that introduces a new method for learning embedding representations that promises to be both memory and computationally efficient. Of course this post is based from their original paper. However, I will share some of my insights and thoughts on their work.
Date: 6 September, 2024 | Estimated Reading Time: ~15 min

Population Minimizer of The Categorical Cross Entropy Loss
In this post, I will explore how to find the population minimizer for the conditional risk associated with Categorical Cross Entropy. We will use only basic fundamental probability principles in order to make the derivation accessible to anyone with a basic understanding of probability. The only analytical tool needed are the Lagrange multipliers for constrained optimization problems.
Date: 29 August, 2024 | Estimated Reading Time: 10-30 min