<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
        xmlns:image="http://www.google.com/schemas/sitemap-image/1.1">
  <url>
    <loc>https://rifaki.me/</loc>
    <lastmod>2026-04-26</lastmod>
    <changefreq>weekly</changefreq>
    <image:image>
      <image:loc>https://rifaki.me/portrait.jpg</image:loc>
      <image:title>Mouhssine Rifaki</image:title>
      <image:caption>Reinforcement learning researcher and PhD student at Imperial College London. Bachelor's in mathematics from Sorbonne, master's from the MVA program at ENS Paris-Saclay.</image:caption>
    </image:image>
  </url>
  <url>
    <loc>https://rifaki.me/posts/</loc>
    <lastmod>2026-04-26</lastmod>
    <changefreq>monthly</changefreq>
  </url>
  <url>
    <loc>https://rifaki.me/publications/</loc>
    <lastmod>2026-04-26</lastmod>
    <changefreq>monthly</changefreq>
  </url>
  <url>
    <loc>https://rifaki.me/projects/</loc>
    <lastmod>2026-04-26</lastmod>
    <changefreq>monthly</changefreq>
  </url>
  <url>
    <loc>https://rifaki.me/posts/adversarial-examples/</loc>
    <lastmod>2026-04-26</lastmod>
    <changefreq>yearly</changefreq>
    <image:image>
      <image:loc>https://rifaki.me/posts/img/adversarial-fgsm.png</image:loc>
      <image:title>Adversarial examples and the robustness story that survived</image:title>
      <image:caption>Adversarial examples were not just a security bug. They exposed a mismatch between predictive features and human-aligned robust features.</image:caption>
    </image:image>
  </url>
  <url>
    <loc>https://rifaki.me/posts/diffusion-models/</loc>
    <lastmod>2026-04-26</lastmod>
    <changefreq>yearly</changefreq>
    <image:image>
      <image:loc>https://rifaki.me/posts/img/diffusion-forward-reverse.png</image:loc>
      <image:title>Score matching, diffusion models, and why noise became a generator</image:title>
      <image:caption>Diffusion models work because denoising estimates the score of noisy data distributions, turning generation into reverse-time dynamics.</image:caption>
    </image:image>
  </url>
  <url>
    <loc>https://rifaki.me/posts/double-descent/</loc>
    <lastmod>2026-04-26</lastmod>
    <changefreq>yearly</changefreq>
    <image:image>
      <image:loc>https://rifaki.me/posts/img/belkin-fig1.png</image:loc>
      <image:title>Double descent: the U that became a W and what held up at scale</image:title>
      <image:caption>A reading of what double descent actually is, what the label-noise caveat does to the story, and what held up at scale.</image:caption>
    </image:image>
  </url>
  <url>
    <loc>https://rifaki.me/posts/edge-of-stability/</loc>
    <lastmod>2026-04-26</lastmod>
    <changefreq>yearly</changefreq>
    <image:image>
      <image:loc>https://rifaki.me/posts/img/cohen-fig1.png</image:loc>
      <image:title>The edge of stability and what it does to the flat-minima story</image:title>
      <image:caption>A reading of Cohen's edge-of-stability result, Arora's theory, and what the phenomenon does to older narratives about SGD and flat minima.</image:caption>
    </image:image>
  </url>
  <url>
    <loc>https://rifaki.me/posts/flat-minima/</loc>
    <lastmod>2026-04-26</lastmod>
    <changefreq>yearly</changefreq>
    <image:image>
      <image:loc>https://rifaki.me/posts/img/keskar-fig1.png</image:loc>
      <image:title>On flat minima and the argument that won't die</image:title>
      <image:caption>Why the flat minima debate keeps restarting.</image:caption>
    </image:image>
  </url>
  <url>
    <loc>https://rifaki.me/posts/grokking/</loc>
    <lastmod>2026-04-26</lastmod>
    <changefreq>yearly</changefreq>
    <image:image>
      <image:loc>https://rifaki.me/posts/img/grokking-fig1.png</image:loc>
      <image:title>Grokking: a puzzle and four explanations that might be the same thing</image:title>
      <image:caption>A reading of the grokking phenomenon in deep learning.</image:caption>
    </image:image>
  </url>
  <url>
    <loc>https://rifaki.me/posts/implicit-bias/</loc>
    <lastmod>2026-04-26</lastmod>
    <changefreq>yearly</changefreq>
    <image:image>
      <image:loc>https://rifaki.me/posts/img/implicit-bias-margin.png</image:loc>
      <image:title>Implicit bias in gradient descent and the max-margin ghost</image:title>
      <image:caption>Why unregularized gradient descent can still select a particular classifier, and why the clean linear result only partially transfers to deep networks.</image:caption>
    </image:image>
  </url>
  <url>
    <loc>https://rifaki.me/posts/information-bottleneck/</loc>
    <lastmod>2026-04-26</lastmod>
    <changefreq>yearly</changefreq>
    <image:image>
      <image:loc>https://rifaki.me/posts/img/tishby-fig2.png</image:loc>
      <image:title>Reading the information bottleneck paper with the benefit of hindsight</image:title>
      <image:caption>A reading of Tishby's information bottleneck paper after the pushback.</image:caption>
    </image:image>
  </url>
  <url>
    <loc>https://rifaki.me/posts/lottery-tickets/</loc>
    <lastmod>2026-04-26</lastmod>
    <changefreq>yearly</changefreq>
    <image:image>
      <image:loc>https://rifaki.me/posts/img/frankle-fig3.png</image:loc>
      <image:title>The lottery ticket hypothesis: what it claimed and what remained after replication</image:title>
      <image:caption>A reading of the original claim, the rewinding fix, and what remained after the replication critique.</image:caption>
    </image:image>
  </url>
  <url>
    <loc>https://rifaki.me/posts/mode-connectivity/</loc>
    <lastmod>2026-04-26</lastmod>
    <changefreq>yearly</changefreq>
    <image:image>
      <image:loc>https://rifaki.me/posts/img/mode-connectivity-paths.png</image:loc>
      <image:title>Mode connectivity and the case for one wide basin</image:title>
      <image:caption>Low-loss paths, permutation symmetries, linear mode connectivity, and what the single-basin picture does and does not prove.</image:caption>
    </image:image>
  </url>
  <url>
    <loc>https://rifaki.me/posts/neural-scaling-laws/</loc>
    <lastmod>2026-04-26</lastmod>
    <changefreq>yearly</changefreq>
    <image:image>
      <image:loc>https://rifaki.me/posts/img/kaplan-fig1.png</image:loc>
      <image:title>Scaling laws from Kaplan to Chinchilla to broken laws and what still holds</image:title>
      <image:caption>A reading of the scaling laws arc, what each paper claimed, and what still holds at current scale.</image:caption>
    </image:image>
  </url>
  <url>
    <loc>https://rifaki.me/posts/neural-tangent-kernel/</loc>
    <lastmod>2026-04-26</lastmod>
    <changefreq>yearly</changefreq>
    <image:image>
      <image:loc>https://rifaki.me/posts/img/ntk-linearization.png</image:loc>
      <image:title>The neural tangent kernel and the cost of freezing features</image:title>
      <image:caption>The NTK made neural-network training analyzable by freezing the features. That was the insight, and also the limitation.</image:caption>
    </image:image>
  </url>
</urlset>
