<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
        xmlns:image="http://www.google.com/schemas/sitemap-image/1.1">
  <url>
    <loc>https://rifaki.me/</loc>
    <lastmod>2026-04-28</lastmod>
    <changefreq>weekly</changefreq>
    <image:image>
      <image:loc>https://rifaki.me/portrait.jpg</image:loc>
      <image:title>Mouhssine Rifaki</image:title>
      <image:caption>Reinforcement learning researcher. Visiting researcher at NYU Tandon EMERGE Lab. Incoming PhD candidate at Imperial College London. Bachelor's in mathematics from Sorbonne, master's at the MVA program at ENS Paris-Saclay.</image:caption>
    </image:image>
  </url>
  <url>
    <loc>https://rifaki.me/notes/</loc>
    <lastmod>2026-04-28</lastmod>
    <changefreq>monthly</changefreq>
  </url>
  <url>
    <loc>https://rifaki.me/publications/</loc>
    <lastmod>2026-04-28</lastmod>
    <changefreq>monthly</changefreq>
  </url>
  <url>
    <loc>https://rifaki.me/projects/</loc>
    <lastmod>2026-04-28</lastmod>
    <changefreq>monthly</changefreq>
  </url>
  <url>
    <loc>https://rifaki.me/notes/adversarial-examples/</loc>
    <lastmod>2026-04-28</lastmod>
    <changefreq>yearly</changefreq>
    <image:image>
      <image:loc>https://rifaki.me/notes/img/adversarial-fgsm.png</image:loc>
      <image:title>Adversarial examples</image:title>
      <image:caption>Adversarial examples, FGSM, PGD, Madry's saddle-point formulation, the robust-features view, and the accuracy trade-off.</image:caption>
    </image:image>
  </url>
  <url>
    <loc>https://rifaki.me/notes/diffusion-models/</loc>
    <lastmod>2026-04-28</lastmod>
    <changefreq>yearly</changefreq>
    <image:image>
      <image:loc>https://rifaki.me/notes/img/diffusion-forward-reverse.png</image:loc>
      <image:title>Score matching and diffusion</image:title>
      <image:caption>Denoising score matching, the SDE view, Karras's design-space disentanglement, and where diffusion sits relative to flow matching and consistency models.</image:caption>
    </image:image>
  </url>
  <url>
    <loc>https://rifaki.me/notes/double-descent/</loc>
    <lastmod>2026-04-28</lastmod>
    <changefreq>yearly</changefreq>
    <image:image>
      <image:loc>https://rifaki.me/notes/img/belkin-fig1.png</image:loc>
      <image:title>Double descent</image:title>
      <image:caption>Belkin's bias-variance picture, Nakkiran's three axes, the label-noise caveat, and what the phenomenon does and does not say at scale.</image:caption>
    </image:image>
  </url>
  <url>
    <loc>https://rifaki.me/notes/edge-of-stability/</loc>
    <lastmod>2026-04-28</lastmod>
    <changefreq>yearly</changefreq>
    <image:image>
      <image:loc>https://rifaki.me/notes/img/cohen-fig1.png</image:loc>
      <image:title>The edge of stability</image:title>
      <image:caption>Cohen's edge-of-stability finding, Arora's analysis, and what it does to the older flat-minima story.</image:caption>
    </image:image>
  </url>
  <url>
    <loc>https://rifaki.me/notes/flat-minima/</loc>
    <lastmod>2026-04-28</lastmod>
    <changefreq>yearly</changefreq>
    <image:image>
      <image:loc>https://rifaki.me/notes/img/keskar-fig1.png</image:loc>
      <image:title>On flat minima</image:title>
      <image:caption>Hochreiter and Schmidhuber, Keskar's small-batch result, Dinh's reparameterization objection, SAM, and where the flatness-generalization debate currently sits.</image:caption>
    </image:image>
  </url>
  <url>
    <loc>https://rifaki.me/notes/grokking/</loc>
    <lastmod>2026-04-28</lastmod>
    <changefreq>yearly</changefreq>
    <image:image>
      <image:loc>https://rifaki.me/notes/img/grokking-fig1.png</image:loc>
      <image:title>Four explanations for Grokking</image:title>
      <image:caption>Power's modular-arithmetic finding, Nanda's circuit-level analysis, and the four candidate explanations that may all be the same mechanism.</image:caption>
    </image:image>
  </url>
  <url>
    <loc>https://rifaki.me/notes/implicit-bias/</loc>
    <lastmod>2026-04-28</lastmod>
    <changefreq>yearly</changefreq>
    <image:image>
      <image:loc>https://rifaki.me/notes/img/implicit-bias-margin.png</image:loc>
      <image:title>The implicit-bias program</image:title>
      <image:caption>Soudry's max-margin result for linear models, the geometry of optimizer choice, and how cleanly the linear case does and does not transfer to deep networks.</image:caption>
    </image:image>
  </url>
  <url>
    <loc>https://rifaki.me/notes/information-bottleneck/</loc>
    <lastmod>2026-04-28</lastmod>
    <changefreq>yearly</changefreq>
    <image:image>
      <image:loc>https://rifaki.me/notes/img/tishby-fig2.png</image:loc>
      <image:title>Reading Tishby's information bottleneck</image:title>
      <image:caption>The original Tishby claim, Saxe's reply on activation choice, Goldfeld's estimator critique, and what survives.</image:caption>
    </image:image>
  </url>
  <url>
    <loc>https://rifaki.me/notes/lottery-tickets/</loc>
    <lastmod>2026-04-28</lastmod>
    <changefreq>yearly</changefreq>
    <image:image>
      <image:loc>https://rifaki.me/notes/img/frankle-fig3.png</image:loc>
      <image:title>Lottery ticket hypothesis</image:title>
      <image:caption>Frankle and Carbin's original procedure, Liu's rebuttal, the rewinding fix, and what holds up after the Frankle-to-Liu exchange.</image:caption>
    </image:image>
  </url>
  <url>
    <loc>https://rifaki.me/notes/mode-connectivity/</loc>
    <lastmod>2026-04-28</lastmod>
    <changefreq>yearly</changefreq>
    <image:image>
      <image:loc>https://rifaki.me/notes/img/mode-connectivity-paths.png</image:loc>
      <image:title>Mode connectivity</image:title>
      <image:caption>Garipov and Draxler's curved low-loss paths, the permutation turn, linear mode connectivity, and what the geometry does and does not prove.</image:caption>
    </image:image>
  </url>
  <url>
    <loc>https://rifaki.me/notes/neural-scaling-laws/</loc>
    <lastmod>2026-04-28</lastmod>
    <changefreq>yearly</changefreq>
    <image:image>
      <image:loc>https://rifaki.me/notes/img/kaplan-fig1.png</image:loc>
      <image:title>Kaplan, Chinchilla, and broken laws</image:title>
      <image:caption>Kaplan's original fit, Chinchilla's correction, Caballero's broken-power-law alternative, and what predictive ability the labs actually use.</image:caption>
    </image:image>
  </url>
  <url>
    <loc>https://rifaki.me/notes/neural-tangent-kernel/</loc>
    <lastmod>2026-04-28</lastmod>
    <changefreq>yearly</changefreq>
    <image:image>
      <image:loc>https://rifaki.me/notes/img/ntk-linearization.png</image:loc>
      <image:title>The neural tangent kernel</image:title>
      <image:caption>Jacot's infinite-width limit, the lazy-training regime, and what the NTK explains and what feature learning leaves on the table.</image:caption>
    </image:image>
  </url>
</urlset>
