diff --git a/qualtran/bloqs/mod_arithmetic/qualtran_modular_arithmetics.ipynb b/qualtran/bloqs/mod_arithmetic/qualtran_modular_arithmetics.ipynb
new file mode 100644
index 0000000000..9933691522
--- /dev/null
+++ b/qualtran/bloqs/mod_arithmetic/qualtran_modular_arithmetics.ipynb
@@ -0,0 +1,392 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "<h1 style=\"text-align: center;\">Quantum Modular Arithmetic in Qualtran</h1>\n",
+    "\n",
+    "<p style=\"text-align: center;\"><a href=\\\"mailto:noureldinyosri@gmail.com\\\">Noureldin Yosri</a></p>"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "id": "wMDM6DC6FJlO"
+   },
+   "source": [
+    "# Introduction\n",
+    "\n",
+    "Modular arithmetic is the backbone of number theortic and cryptographic algorithms such as factoring and elliptic curve cryptography. Qualtran provides a library of the state of the art constructions for peforming these operations on a fault tolerant quantum computer.\n",
+    "\n",
+    "This notebook shows the different operations, their constructions and computational costs. The following table lists the different operations, their costs and source of the construction.\n",
+    "\n",
+    "| operation                      | Bloq                                                                               | Qualtran Toffoli Cost     | Toffoli Cost From [Figure 8](https://arxiv.org/abs/2306.08585)   | references                                                                                                                                                                                                                                                                                                                                                                                                                  |\n",
+    "|:-------------------------------|:-----------------------------------------------------------------------------------|:--------------------------|:-----------------------------------------------------------------|:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|\n",
+    "| modular addition               | `ModAdd`                                | $4 n - 1$                 | $4 n$                                                            | [How to compute a 256-bit elliptic curve private key with only 50 million Toffoli gates](https://arxiv.org/abs/2306.08585)                                                                                                                                                                                                                                                                                                  |\n",
+    "| modular controlled addition    | `CModAdd`                               | $5 n + 1$                 | $5 n$                                                            | [How to compute a 256-bit elliptic curve private key with only 50 million Toffoli gates](https://arxiv.org/abs/2306.08585)                                                                                                                                                                                                                                                                                                  |\n",
+    "| modular negation               | `ModNeg`                             | $3 n - 3$                 | $2 n$                                                            | [How to compute a 256-bit elliptic curve private key with only 50 million Toffoli gates](https://arxiv.org/abs/2306.08585)                                                                                                                                                                                                                                                                                                  |\n",
+    "| modular controlled negation    | `CModNeg`                            | $3 n - 2$                 | $3 n$                                                            | [How to compute a 256-bit elliptic curve private key with only 50 million Toffoli gates](https://arxiv.org/abs/2306.08585)                                                                                                                                                                                                                                                                                                  |\n",
+    "| modular subtraction            | `ModSub`                             | $6 n - 3$                 | $6 n$                                                            | [How to compute a 256-bit elliptic curve private key with only 50 million Toffoli gates](https://arxiv.org/abs/2306.08585)                                                                                                                                                                                                                                                                                                  |\n",
+    "| modular controlled subtraction | `CModSub`                            | $7 n - 1$                 | $7 n$                                                            | [How to compute a 256-bit elliptic curve private key with only 50 million Toffoli gates](https://arxiv.org/abs/2306.08585)                                                                                                                                                                                                                                                                                                  |\n",
+    "| modular doubling               | `ModDbl`                          | $2 n + 1$                 | $2 n$                                                            | [How to compute a 256-bit elliptic curve private key with only 50 million Toffoli gates](https://arxiv.org/abs/2306.08585)                                                                                                                                                                                                                                                                                                  |\n",
+    "| modular multiplication         | `DirtyOutOfPlaceMontgomeryModMul` | $2.25 n^{2} +$$ 7.25 n - 1$ | $2.25 n^{2} + 9 n$                                               | - [Performance Analysis of a Repetition Cat Code Architecture: Computing 256-bit Elliptic Curve Logarithm in 9 Hours with 126 133 Cat Qubits](https://arxiv.org/abs/2302.06639)</br>- [How to compute a 256-bit elliptic curve private key with only 50 million Toffoli gates](https://arxiv.org/abs/2306.08585)                                                                                                            |\n",
+    "| modular multiplicative inverse | `KaliskiModInverse`                     | $26 n^{2} + 9 n - 1$      | $26 n^{2} + 2 n$                                                 | - [Performance Analysis of a Repetition Cat Code Architecture: Computing 256-bit Elliptic Curve Logarithm in 9 Hours with 126 133 Cat Qubits](https://arxiv.org/abs/2302.06639)</br>- [Improved quantum circuits for elliptic curve discrete logarithms](https://arxiv.org/abs/2001.09580)</br>- [How to compute a 256-bit elliptic curve private key with only 50 million Toffoli gates](https://arxiv.org/abs/2306.08585) |\n",
+    "\n",
+    "\n",
+    "\n",
+    "\n",
+    "If you compare qualtran's costs with the costs from Figure 8 of [How to compute a 256-bit elliptic curve private key with only 50 million Toffoli gates](https://arxiv.org/abs/2306.08585) you will find that generally we match the coefficient of the leading term while having smaller coefficients for the lower terms and we also list the constants. The two exception to this are modular negation where we have a higher leading coefficient ($3$ vs $2$) and modular inversion where we match the leading coefficient but have a higher coefficient for the lower term ($9$ vs $2$).\n",
+    "\n",
+    "### Modular Negatition\n",
+    "The strucutre of the construction of this operation consists of two operations that cost $n$ toffolis between two n-bit toffolis resulting in $3n$ toffolis. The authors of [How to compute a 256-bit elliptic curve private key with only 50 million Toffoli gates](https://arxiv.org/abs/2306.08585) suggests using measurement based uncomputation (see. [Halving the cost of quantum addition](https://quantum-journal.org/papers/q-2018-06-18-74/#)) to turn the last n-bit toffoli into measurements + cliffords thus reducing the cost to $2n$. However, measurement based uncomputation doesn't strictly apply here since the control register is modified between the initial `And` computation and its adjoint/uncomputation, this leads to a mismatch between the value in the register and the value in the hidden ancillas with the overall effect being the introduction of random phase flips. That is the final state will have some coefficients multiplied by $-1$ randomly. We chose not to introduce this effect into our construction.\n",
+    "\n",
+    "\n",
+    "### Modular Inversion\n",
+    "Our cost for this operation $26n^2+9n-1$ which differs from their cost of $26n^2+2n$ by $7n-1$. The reason of this, is the constants that they omitted for the other operations. This operation consists of a loop that repeats the same circuit $2n$ times, thus the constants of that inner circuit contribute to the coefficient of $n$ since they get multiplied by $2n$. This bloq shows Qualtran's ability to compute accurate resource estimates."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "id": "eR9WcJmytsr4"
+   },
+   "source": [
+    "## Assumptions\n",
+    "1. Certain operations (e.g. modular doubling) assume the modulus is odd. This is because most applications (e.g. factoring) have this assumption and because the construction becomes simpler.\n",
+    "1.  All operations assume their inputs are valid. For example modular inversion assumes that all the input states have an inverse and modular multiplication assumes that its inputs are $\\in [1, \\mod)$ that is they exclude $0$.\n",
+    "1. In literature modular operations are assumed to become identity when they apply to states outside their valid range. For example, the circuit applying modular negation modulu $m$ with $n$ bits is defined as\n",
+    "$$\n",
+    "\\mathbb{O} \\ket{x} →\n",
+    "\\begin{cases}\n",
+    "\\ket{- x \\mod m} & 0 \\leq x < m \\\\\n",
+    "\\ket{x} & p < x < 2^n\n",
+    "\\end{cases}\n",
+    "$$\n",
+    "This is done to ensure that the circuit is unitary. However, constructions always assume that their inputs are valid as explained above.\n",
+    "\n",
+    "\n",
+    "\n",
+    "## Correctness checks for constructions\n",
+    "Qualtran provides a suite of tools to check the correctness of bloq construction/decomposition. Out of them, we highlight the classical simulation check. This check is very important to arithmetic bloqs since it ensure that if the input is a single classical state then final state is the expected single classical state with one of the limitations of this test being that it can't detect problems with phases.\n",
+    "\n",
+    "Even with this limitation the classical simulation check was able to detect problems with constructions from the literature. For the example, the use of measurement based uncomputation to lower the cost of modular negation in [How to compute a 256-bit elliptic curve private key\n",
+    "with only 50 million Toffoli gates](https://arxiv.org/pdf/2306.08585) where it is not applicable.\n",
+    "\n",
+    "A second test is checking the matrix of the construction which we calculate using tensor network contraction. Since all modular arithmetic function are reversible functions, their matrices are permutation matrices. More preciesly the submatrix for the valid input range (e.g. $[0, \\mod) × [0, \\mod)$ for modular negation) should be a permutation matrix."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 25,
+   "metadata": {
+    "id": "eJ--ENSj4fZq"
+   },
+   "outputs": [],
+   "source": [
+    "import qualtran\n",
+    "import cirq\n",
+    "import sympy\n",
+    "import numpy as np\n",
+    "\n",
+    "from qualtran import QMontgomeryUInt\n",
+    "from qualtran.drawing import show_bloq, show_call_graph\n",
+    "import as qma\n",
+    "from qualtran.resource_counting import get_cost_value, QECGatesCost"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "id": "5SjFPIudBVJr"
+   },
+   "source": [
+    "### Addition\n",
+    "Quantum modular addition performs the transformation $\\ket{x} \\ket{y} → \\ket{x} \\ket{(x+y) \\mod m}$\n",
+    "\n",
+    "In qualtran the bloq that represents this operation is `ModAdd`. The circuit construction for this operation is a translation of the classical program:\n",
+    "\n",
+    "```python\n",
+    "def add_mod(x, y, m):\n",
+    "    y += x  # normal in-place addition y := x + y.\n",
+    "    y -= m  # subtract constant y := x + y - m.\n",
+    "    c = y < 0  # c := (x + y - m < 0)\n",
+    "    if c:\n",
+    "      y += m  # y := x + y\n",
+    "    else:\n",
+    "      #  y := x + y - m  which is >= 0\n",
+    "    c ^= y > x  # Regardless of what happened in the condition above c := 1\n",
+    "    c ^= 1  # c := 0\n",
+    "    return x, y\n",
+    "```\n",
+    "\n",
+    "Which means we will do:\n",
+    "1. one addition\n",
+    "1. two addition with a constant\n",
+    "1. one bit flip\n",
+    "\n",
+    "Note that the check $y < 0$ is done by simply checking the overflow bit.\n",
+    "\n",
+    "Which can be seen the call graph below"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "colab": {
+     "base_uri": "https://localhost:8080/",
+     "height": 563
+    },
+    "id": "2YIWlwoOBSZE",
+    "outputId": "0c797205-fa6c-480c-f1a7-2cdedfbb7ddb"
+   },
+   "outputs": [],
+   "source": [
+    "bitsize = 3\n",
+    "mod = 7\n",
+    "\n",
+    "add = qma.ModAdd(bitsize, mod)\n",
+    "show_bloq(add)\n",
+    "show_call_graph(add, max_depth=1)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "colab": {
+     "base_uri": "https://localhost:8080/"
+    },
+    "id": "pUDsoRpBQ4uC",
+    "outputId": "d5bf5307-b48b-444c-933c-dab2cc8dd46d"
+   },
+   "outputs": [],
+   "source": [
+    "# To check the correctness we first do a classical simulation check\n",
+    "for x in range(mod):\n",
+    "    for y in range(mod):\n",
+    "        assert add.call_classically(x=x, y=y) == (x, (x + y) % mod)\n",
+    "print('passed classical check :D')"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "colab": {
+     "base_uri": "https://localhost:8080/"
+    },
+    "id": "8O9RbxF2PLjd",
+    "outputId": "f09683aa-6536-461e-fec2-16273637c311"
+   },
+   "outputs": [],
+   "source": [
+    "# Secondly we check that the construction/decomposition also passes the classical check.\n",
+    "add_decom = add.decompose_bloq()\n",
+    "for x in range(mod):\n",
+    "    for y in range(mod):\n",
+    "        assert add_decom.call_classically(x=x, y=y) == (x, (x + y) % mod)\n",
+    "print('passed classical check :D')"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "colab": {
+     "base_uri": "https://localhost:8080/"
+    },
+    "id": "rV0e_lJfRqJW",
+    "outputId": "807768f6-45f9-4221-b3da-9c5b2ba09ad4"
+   },
+   "outputs": [],
+   "source": [
+    "# Thirdly we check that the unitary doesn't add any phases.\n",
+    "matrix = add.tensor_contract()\n",
+    "eps = 1e-6\n",
+    "is_zero = np.abs(matrix) < eps\n",
+    "is_one = np.abs(matrix - 1) < eps\n",
+    "assert np.all(is_zero | is_one)  # all entries are ~0 or ~1\n",
+    "print('passed phase check :D')"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "colab": {
+     "base_uri": "https://localhost:8080/"
+    },
+    "id": "EfseX_p6SMKD",
+    "outputId": "f2330d6b-99e9-42b0-a29f-ee8dcc3bbc1f"
+   },
+   "outputs": [],
+   "source": [
+    "# Finally we check the toffoli cost\n",
+    "# According to table 5 in https://arxiv.org/pdf/2306.08585 this toffoli cost should be 4n up to a constant.\n",
+    "symbolic_n, symbolic_mod = sympy.symbols('n m')\n",
+    "cost = get_cost_value(qma.ModAdd(symbolic_n, symbolic_mod), QECGatesCost()).total_toffoli_only()\n",
+    "print('toffoli cost:', cost)\n",
+    "const = cost.subs(symbolic_n, 0)\n",
+    "assert cost - const == 4 * symbolic_n\n",
+    "print('passed symbolic cost check :D')"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 24,
+   "metadata": {
+    "id": "egBCKvtqSYh7"
+   },
+   "outputs": [],
+   "source": [
+    "# To help with other operations we create a test suite\n",
+    "\n",
+    "\n",
+    "def check_no_phase_change(blq, eps=1e-6):\n",
+    "    matrix = blq.tensor_contract()\n",
+    "    is_zero = np.abs(matrix) < eps\n",
+    "    is_one = np.abs(matrix - 1) < eps\n",
+    "    assert np.all(is_zero | is_one)  # all entries are ~0 or ~1\n",
+    "    print('passed phase check :D')\n",
+    "\n",
+    "\n",
+    "def check_symbolic_cost(blq, formula):\n",
+    "    cost = get_cost_value(blq, QECGatesCost()).total_toffoli_only()\n",
+    "    print(f'toffoli cost of {blq}:', cost)\n",
+    "    print(f'expect: {formula}')\n",
+    "    const = cost.subs(symbolic_n, 0)\n",
+    "    assert cost - const == formula\n",
+    "    print('passed symbolic cost check :D')"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "id": "gI5bdDX5WV3M"
+   },
+   "source": [
+    "## Controlled Modular Addition\n",
+    "\n",
+    "It's often useful to have special constructions of the controlled version of an operation. That is because controlling every suboperation of the original construction -while being correct- is expensive since it turns every CNOT into a toffoli and every toffoli ladder of size $n$ to a toffoli ladder of size $n+1$.\n",
+    "\n",
+    "For the case of controlled modular addition, it is enough to control the first addition and comparision.\n",
+    "\n",
+    "```python\n",
+    "def controlled_add_mod(ctr, x, y, m):\n",
+    "    if ctrl:\n",
+    "      y += x  # normal in-place addition.\n",
+    "    y -= m  # subtract constant.\n",
+    "    c = y < 0\n",
+    "    if c:\n",
+    "      y += m\n",
+    "    if ctrl:\n",
+    "      c ^= y > x  # after this line `c = 1`\n",
+    "    c ^= 1\n",
+    "    return x, y\n",
+    "```"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "colab": {
+     "base_uri": "https://localhost:8080/",
+     "height": 635
+    },
+    "id": "Tyjk06Z0T0pU",
+    "outputId": "b4d9c42e-be86-4749-c31f-8a68c832e6b9"
+   },
+   "outputs": [],
+   "source": [
+    "cadd = qma.CModAdd(qualtran.QMontgomeryUInt(bitsize), mod)\n",
+    "show_bloq(cadd)\n",
+    "show_call_graph(cadd, max_depth=1)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "colab": {
+     "base_uri": "https://localhost:8080/"
+    },
+    "id": "ug1x6YyhaGnN",
+    "outputId": "6d350c03-9af0-4a66-eaec-8a8252803e4d"
+   },
+   "outputs": [],
+   "source": [
+    "for ctrl in range(2):\n",
+    "    for x in range(mod):\n",
+    "        for y in range(mod):\n",
+    "            want = (x + y) % mod if ctrl else y\n",
+    "            assert cadd.call_classically(ctrl=ctrl, x=x, y=y) == (ctrl, x, want)\n",
+    "print('cadd passed classical check :D')\n",
+    "cadd_decom = cadd.decompose_bloq()\n",
+    "for ctrl in range(2):\n",
+    "    for x in range(mod):\n",
+    "        for y in range(mod):\n",
+    "            want = (x + y) % mod if ctrl else y\n",
+    "            assert cadd_decom.call_classically(ctrl=ctrl, x=x, y=y) == (ctrl, x, want)\n",
+    "print('construction of cadd passed classical check :D')"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "colab": {
+     "base_uri": "https://localhost:8080/"
+    },
+    "id": "pPqkrKz-T_Nk",
+    "outputId": "d144d5f8-0ad3-4636-fd00-ec94c2b96404"
+   },
+   "outputs": [],
+   "source": [
+    "check_no_phase_change(cadd)\n",
+    "check_symbolic_cost(qma.CModAdd(QMontgomeryUInt(symbolic_n), symbolic_mod), 5 * symbolic_n)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "id": "Khhh79zybVtG"
+   },
+   "source": []
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "id": "gniHK1YVWRkM"
+   },
+   "outputs": [],
+   "source": []
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "id": "hJa4jCVIbMIR"
+   },
+   "outputs": [],
+   "source": []
+  }
+ ],
+ "metadata": {
+  "colab": {
+   "provenance": []
+  },
+  "kernelspec": {
+   "display_name": "Python 3",
+   "name": "python3"
+  },
+  "language_info": {
+   "name": "python"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 0
+}