diff --git a/assign2.ipynb b/assign2.ipynb new file mode 100644 index 0000000..cff42c6 --- /dev/null +++ b/assign2.ipynb @@ -0,0 +1,373 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "ef047b81-6907-4758-a295-7a61093769e1", + "metadata": {}, + "source": [ + "\n", + "## Q1.1\n", + "Given :\n", + "\n", + "- The received value $ x $ follows a normal distribution: $ x \\sim N(\\theta, 4) $\n", + "- The prior distribution of $\\theta$ is: $ \\theta \\sim N(5, 9) $\n", + "- We receive $ x = 6 $\n", + "\n", + "We need to find the posterior distribution $ p(\\theta \\mid x) $.\n", + "\n", + "The general formula for Bayesian updating with normal distributions is:\n", + "\n", + "\n", + "$\\mu_{\\text{post}} = \\frac{\\sigma_x^2 \\mu_0 + \\sigma_0^2 x}{\\sigma_x^2 + \\sigma_0^2}$\n", + "\n", + "$\\sigma_{\\text{post}}^2 = \\frac{\\sigma_x^2 \\sigma_0^2}{\\sigma_x^2 + \\sigma_0^2}$\n", + "Where:\n", + " $ \\mu_0 $ and $ \\sigma_0^2 $ are the mean and variance of the prior distribution of $\\theta$\n", + " $ x $ and $ \\sigma_x^2 $ are the observed value and the variance of the likelihood distribution\n", + "\n", + "- Prior mean $ \\mu_0 = 5 $\n", + "- Prior variance $ \\sigma_0^2 = 9 $\n", + "- Likelihood variance $ \\sigma_x^2 = 4 $\n", + "- Observed value $ x = 6 $\n", + "\n", + "Posterior Variance\n", + "\n", + "$$\n", + "\\sigma_{\\text{post}}^2 = \\frac{\\sigma_x^2 \\sigma_0^2}{\\sigma_x^2 + \\sigma_0^2} = \\frac{4 \\cdot 9}{4 + 9} = \\frac{36}{13}\n", + "$$\n", + "\n", + "Posterior Mean\n", + "\n", + "$$\n", + "\\mu_{\\text{post}} = \\frac{\\sigma_x^2 \\mu_0 + \\sigma_0^2 x}{\\sigma_x^2 + \\sigma_0^2} = \\frac{4 \\cdot 5 + 9 \\cdot 6}{4 + 9} = \\frac{20 + 54}{13} = \\frac{74}{13}\n", + "$$\n", + "\n", + "Result\n", + "\n", + "The posterior distribution $ p(\\theta \\mid x) $ is:\n", + "\n", + "$\n", + "\\theta \\mid x \\sim N\\left(\\frac{74}{13}, \\frac{36}{13}\\right)\n", + "$\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "id": "936c0e40-8365-4275-b4f1-1a7aa717ed9e", + "metadata": {}, + "source": [ + "## Q 1.4" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "id": "1ac147a2-fc0e-49d6-992b-02ef5951133f", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Posterior Mean for Randall Vard: 87.93650793650794\n", + "Posterior Mean for Mary1: 130.15873015873015\n" + ] + } + ], + "source": [ + "# Given values\n", + "mu_0 = 100 # prior mean\n", + "variance_0 = 152 # prior variance\n", + "variance_1 = 100 # variance of likelihood\n", + "observed_scores = {\n", + " \"Randall Vard\": 80,\n", + " \"Mary1\": 150\n", + "}\n", + "\n", + "# Function to calculate the posterior mean\n", + "def calculate_posterior_mean(x, mu_0, variance_0, variance_1):\n", + " return (variance_0 * x + variance_1 * mu_0) / (variance_0 + variance_1)\n", + "\n", + "# Calculate posterior means\n", + "posterior_means = {name: calculate_posterior_mean(score, mu_0, variance_0, variance_1) for name, score in observed_scores.items()}\n", + "\n", + "# Print results\n", + "for name, mean in posterior_means.items():\n", + " print(f\"Posterior Mean for {name}: {mean}\")\n" + ] + }, + { + "cell_type": "markdown", + "id": "0cdc1e1c-9ef2-446d-8bb5-a107d2ac1ddb", + "metadata": {}, + "source": [ + "## Q3" + ] + }, + { + "cell_type": "markdown", + "id": "35bfc948-4d21-4cd0-9d6c-058dd1fc060d", + "metadata": {}, + "source": [ + "\n", + "\n", + "In logistic regression for binary classification, the likelihood function for the dataset \\((X, y)\\) given the parameters \\(\\beta\\) is defined as:\n", + "\n", + "$$\n", + "P(y | X, \\beta) = \\prod_{i=1}^n P(y_i | X_i, \\beta)\n", + "$$\n", + "\n", + "where $P(y_i | X_i, \\beta)$ is given by the logistic function:\n", + "\n", + "$$\n", + "P(y_i = 1 | X_i, \\beta) = \\sigma(X_i \\beta) = \\frac{1}{1 + \\exp(-X_i \\beta)}\n", + "$$\n", + "\n", + "and\n", + "\n", + "$$\n", + "P(y_i = 0 | X_i, \\beta) = 1 - \\sigma(X_i \\beta) = \\frac{\\exp(-X_i \\beta)}{1 + \\exp(-X_i \\beta)}\n", + "$$\n", + "\n", + "\n", + "\n", + "Assume a Gaussian prior on the parameters $\\beta$:\n", + "\n", + "$$\n", + "P(\\beta) = \\mathcal{N}(\\beta | 0, \\sigma^2 I)\n", + "$$\n", + "\n", + "where $0$ is the mean vector and $\\sigma^2 I$ is the covariance matrix with $I$ being the identity matrix.\n", + "\n", + "\n", + "using Bayes’ theorem, the posterior distribution is proportional to the product of the likelihood and the prior:\n", + "\n", + "$$\n", + "P(\\beta | X, y) \\propto P(y | X, \\beta) P(\\beta)\n", + "$$\n", + "\n", + "Taking the logarithm to get the log-posterior (which simplifies the product to a sum), we have:\n", + "\n", + "$$\n", + "\\log P(\\beta | X, y) = \\log P(y | X, \\beta) + \\log P(\\beta) + \\text{constant}\n", + "$$\n", + "\n", + "The log-likelihood $\\log P(y | X, \\beta)$ is:\n", + "\n", + "$$\n", + "\\log P(y | X, \\beta) = \\sum_{i=1}^n \\left[ y_i \\log \\sigma(X_i \\beta) + (1 - y_i) \\log (1 - \\sigma(X_i \\beta)) \\right]\n", + "$$\n", + "\n", + "The log-prior $\\log P(\\beta)$ is:\n", + "\n", + "$$\n", + "\\log P(\\beta) = -\\frac{1}{2\\sigma^2} \\beta^T \\beta + \\text{constant}\n", + "$$\n", + "\n", + "\n", + "\n", + "The MAP estimate maximizes the log-posterior:\n", + "\n", + "$$\n", + "\\hat{\\beta}_{MAP} = \\arg\\max_{\\beta} \\left( \\log P(y | X, \\beta) + \\log P(\\beta) \\right)\n", + "$$\n", + "\n", + "This is equivalent to minimizing the negative log-posterior:\n", + "\n", + "$$\n", + "\\hat{\\beta}_{MAP} = \\arg\\min_{\\beta} \\left( -\\log P(y | X, \\beta) - \\log P(\\beta) \\right)\n", + "$$\n", + "\n", + "\n", + "\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "id": "11e17932-30f3-4c89-95a9-f03aed5a9f14", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "VC Dimension for Axis-Aligned Rectangle in 2D:\n" + ] + }, + { + "data": { + "image/png": "", + "text/plain": [ + "
" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + "VC Dimension for Linear Function in d Dimensions:\n", + "Dimension 1: VC Dimension = 2\n", + "Dimension 2: VC Dimension = 3\n", + "Dimension 3: VC Dimension = 4\n", + "Dimension 4: VC Dimension = 5\n", + "Dimension 5: VC Dimension = 6\n", + "\n", + "VC Dimension for Constant Function:\n", + "VC Dimension = 0\n" + ] + } + ], + "source": [ + "# Import necessary libraries\n", + "import numpy as np\n", + "import matplotlib.pyplot as plt\n", + "from itertools import combinations\n", + "\n", + "\n", + "def plot_rectangle(points, labels, rect=None, title=\"\"):\n", + " fig, ax = plt.subplots()\n", + " for i, (point, label) in enumerate(zip(points, labels)):\n", + " ax.scatter(*point, color='red' if label else 'blue')\n", + " ax.annotate(f'{i}', (point[0] + 0.05, point[1] + 0.05), fontsize=12)\n", + " if rect:\n", + " (xmin, xmax, ymin, ymax) = rect\n", + " ax.add_patch(plt.Rectangle((xmin, ymin), xmax - xmin, ymax - ymin, fill=None, edgecolor='black', linewidth=2))\n", + " ax.set_title(title)\n", + " plt.xlim(-1, 5)\n", + " plt.ylim(-1, 5)\n", + " plt.show()\n", + "\n", + "# Example points\n", + "points = np.array([(1, 1), (2, 2), (3, 3), (4, 4)])\n", + "\n", + "# All possible labelings for 2 points\n", + "labelings_2 = list(combinations([0, 1], 2))\n", + "print(\"VC Dimension for Axis-Aligned Rectangle in 2D:\")\n", + "\n", + "# Shattering check for 2 points\n", + "for labels in labelings_2:\n", + " plot_rectangle(points[:2], labels, title=f\"Labels: {labels}\")\n", + "\n", + "# All possible labelings for 4 points\n", + "labelings_4 = list(combinations([0, 1], 4))\n", + "\n", + "# Shattering check for 4 points (only showing a subset for clarity)\n", + "for i, labels in enumerate(labelings_4[:3]):\n", + " plot_rectangle(points, labels, rect=(1, 3.5, 1, 3.5), title=f\"Labels: {labels}\")\n", + "\n", + "# VC Dimension for Linear Function in d Dimensions\n", + "def linear_vc_dimension(d):\n", + " return d + 1\n", + "\n", + "# Example dimensions\n", + "dimensions = [1, 2, 3, 4, 5]\n", + "\n", + "print(\"\\nVC Dimension for Linear Function in d Dimensions:\")\n", + "for d in dimensions:\n", + " print(f\"Dimension {d}: VC Dimension = {linear_vc_dimension(d)}\")\n", + "\n", + "# VC Dimension for Constant Function\n", + "def constant_vc_dimension():\n", + " return 0\n", + "\n", + "print(\"\\nVC Dimension for Constant Function:\")\n", + "print(f\"VC Dimension = {constant_vc_dimension()}\")\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "id": "6256bef9-276d-429f-8314-d53fb14067b3", + "metadata": {}, + "source": [ + "## Q2\n" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "id": "2aa9121f-d873-472e-9b1e-6a8338d91ae6", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "True Mean: 5, MLE Mean: 5.0386641116446516\n", + "True Variance: 4, MLE Variance: 3.831619958926069\n" + ] + }, + { + "data": { + "image/png": "", + "text/plain": [ + "
" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "import numpy as np\n", + "import matplotlib.pyplot as plt\n", + "\n", + "# Generate random dataset\n", + "np.random.seed(42) \n", + "mu_true = 5\n", + "sigma_true = 2\n", + "data = np.random.normal(mu_true, sigma_true, 1000)\n", + "mu_mle = np.mean(data)\n", + "sigma_mle = np.var(data)\n", + "\n", + "print(f\"True Mean: {mu_true}, MLE Mean: {mu_mle}\")\n", + "print(f\"True Variance: {sigma_true**2}, MLE Variance: {sigma_mle}\")\n", + "\n", + "# Plot histogram of the data and the estimated Gaussian\n", + "plt.hist(data, bins=100, density=True)\n", + "xmin, xmax = plt.xlim()\n", + "x = np.linspace(xmin, xmax, 100)\n", + "p = (1 / (np.sqrt(2 * np.pi * sigma_mle))) * np.exp(-0.5 * ((x - mu_mle) ** 2 / sigma_mle))\n", + "plt.plot(x, p, 'k', linewidth=2, label=f'Estimated Gaussian mu ={mu_mle:.2f}, sigma^2={sigma_mle:.2f})')\n", + "plt.legend()\n", + "plt.show()\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "dd217430-cd7a-464b-971c-da5e0b9dcf51", + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.12.3" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/q1,4,6.ipynb b/q1,4,6.ipynb new file mode 100644 index 0000000..8d3eeff --- /dev/null +++ b/q1,4,6.ipynb @@ -0,0 +1,661 @@ +{ + "cells": [ + { + "cell_type": "code", + "execution_count": 10, + "id": "fd40996c-bf8b-4957-ae03-605b00edb01b", + "metadata": {}, + "outputs": [], + "source": [ + "import scipy as sc\n", + "import numpy as np\n", + "import pandas as pd\n", + "import random" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "id": "296de93a-e1ba-457e-9d8d-e13830e453ad", + "metadata": {}, + "outputs": [], + "source": [ + "from matplotlib import pyplot as plt" + ] + }, + { + "cell_type": "markdown", + "id": "291a16cf-d0e2-4ed5-8ab4-4240a41c36b3", + "metadata": {}, + "source": [ + "q1 b first \n" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "id": "4d7fdc90-9b84-4477-b417-1f930bde5e74", + "metadata": {}, + "outputs": [], + "source": [ + "# setting the value of n and p\n", + "n = 20\n", + "p = 0.23\n", + "dist_b = [sc.stats.binom.pmf(r, n, p) for r in range(n+1)]" + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "id": "85130d67-4172-4a6c-9463-d3ed4245f674", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "[0.005368024674737596,\n", + " 0.032068718836094724,\n", + " 0.09100019565826885,\n", + " 0.16309125975118294,\n", + " 0.207041177151664,\n", + " 0.19789909919951268,\n", + " 0.14778179485677884,\n", + " 0.08828522809625762,\n", + " 0.042852732468800336,\n", + " 0.017066889121773288,\n", + " 0.005607692140011227,\n", + " 0.0015227499317621994,\n", + " 0.00034113553666101246,\n", + " 6.270623251311313e-05,\n", + " 9.36521654416624e-06,\n", + " 1.1189609377445386e-06,\n", + " 1.0444846415634899e-07,\n", + " 7.340915739025293e-09,\n", + " 3.654568441506096e-10,\n", + " 1.1490782522848984e-11,\n", + " 1.7161558313345875e-13]" + ] + }, + "execution_count": 12, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "dist_b" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "id": "56c7cf4c-ad95-4f29-b757-b337fa35fd58", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "" + ] + }, + "execution_count": 13, + "metadata": {}, + "output_type": "execute_result" + }, + { + "data": { + "image/png": "", + "text/plain": [ + "
" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "plt.bar(range(n+1), dist_b, width = 0.7)" + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "id": "11fcd57d-d1b4-4571-9e4a-14cf2d620ae0", + "metadata": {}, + "outputs": [], + "source": [ + "mean, var = sc.stats.binom.stats(n,p)" + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "id": "58c7f453-8e42-4d8a-8351-05ed27793515", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "4.6000000000000005" + ] + }, + "execution_count": 15, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "mean" + ] + }, + { + "cell_type": "code", + "execution_count": 16, + "id": "1887f411-6398-41e0-8b7a-f76b47339482", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "3.5420000000000007" + ] + }, + "execution_count": 16, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "var" + ] + }, + { + "cell_type": "code", + "execution_count": 17, + "id": "942dde0a-8613-4f53-b5fc-97821b1876e3", + "metadata": {}, + "outputs": [], + "source": [ + "dist_p = [sc.stats.poisson.pmf(r, n*p) for r in range(n+1)]" + ] + }, + { + "cell_type": "code", + "execution_count": 18, + "id": "97aed44d-31fc-4a5c-b903-7c2539553683", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "" + ] + }, + "execution_count": 18, + "metadata": {}, + "output_type": "execute_result" + }, + { + "data": { + "image/png": "", + "text/plain": [ + "
" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "plt.bar(range(n+1), dist_p)" + ] + }, + { + "cell_type": "code", + "execution_count": 19, + "id": "3dc864ea-3fd4-4495-ada2-37b6183ef316", + "metadata": {}, + "outputs": [], + "source": [ + "mean_p, var_p = sc.stats.poisson.stats(n*p)" + ] + }, + { + "cell_type": "code", + "execution_count": 20, + "id": "ea697cc1-2f16-475a-9593-ba88c53ae313", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "4.6000000000000005" + ] + }, + "execution_count": 20, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "mean_p" + ] + }, + { + "cell_type": "code", + "execution_count": 21, + "id": "13028652-8874-4636-9770-17dd21b07b66", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "4.6000000000000005" + ] + }, + "execution_count": 21, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "var_p" + ] + }, + { + "cell_type": "markdown", + "id": "5ef08c7f-bc7c-4f3c-b3da-d7202f8c7937", + "metadata": {}, + "source": [ + "thus we can say mean is same but varriance differs alot" + ] + }, + { + "cell_type": "markdown", + "id": "6b9c7336-0215-48d5-ad39-13504010da3f", + "metadata": {}, + "source": [ + "##Q1 b second ##\n" + ] + }, + { + "cell_type": "code", + "execution_count": 22, + "id": "2ba0639d-3f6f-424a-9718-7abc2f1900f7", + "metadata": {}, + "outputs": [], + "source": [ + "n = 300000\n", + "p = .000001" + ] + }, + { + "cell_type": "code", + "execution_count": 23, + "id": "eb41aa75-fde9-4fc9-a5f7-27b13fc4a218", + "metadata": {}, + "outputs": [], + "source": [ + "dist_b = pd.Series(sc.stats.binom.pmf(r,n,p) for r in range(n+1))" + ] + }, + { + "cell_type": "code", + "execution_count": 24, + "id": "12e42311-99fe-4c0e-a2d7-a5d3ddc23d8e", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "count 300001.000000\n", + "mean 0.000003\n", + "std 0.001413\n", + "min 0.000000\n", + "25% 0.000000\n", + "50% 0.000000\n", + "75% 0.000000\n", + "max 0.740818\n", + "dtype: float64" + ] + }, + "execution_count": 24, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "dist_b.describe()" + ] + }, + { + "cell_type": "markdown", + "id": "301c54fe-371e-41e1-bd08-e58c6702857b", + "metadata": {}, + "source": [ + "lets see for poisson with lambda = n*p" + ] + }, + { + "cell_type": "code", + "execution_count": 25, + "id": "b082dc3e-eb40-4c59-8b20-dc7b6afa8d98", + "metadata": {}, + "outputs": [], + "source": [ + "lamba = n*p" + ] + }, + { + "cell_type": "code", + "execution_count": 26, + "id": "92ffa6e7-ce74-4dbc-a4f5-632e50f710e8", + "metadata": {}, + "outputs": [], + "source": [ + "dist_p = pd.Series(sc.stats.poisson.pmf(r, lamba) for r in range(n+1))" + ] + }, + { + "cell_type": "code", + "execution_count": 27, + "id": "5ef2a006-1b43-456b-baed-7500f0eed719", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "count 300001.000000\n", + "mean 0.000003\n", + "std 0.001413\n", + "min 0.000000\n", + "25% 0.000000\n", + "50% 0.000000\n", + "75% 0.000000\n", + "max 0.740818\n", + "dtype: float64" + ] + }, + "execution_count": 27, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "dist_p.describe()" + ] + }, + { + "cell_type": "markdown", + "id": "7d2ec100-06ef-4920-8bd4-b669dcf02826", + "metadata": {}, + "source": [ + "now there is no difference between poisson and binomial when n tends to infinity and p tends to zero but n*p is a normal number" + ] + }, + { + "cell_type": "markdown", + "id": "c7c993b9-718b-4b41-923f-fb5d0ce841be", + "metadata": {}, + "source": [ + "# Q4 #" + ] + }, + { + "cell_type": "code", + "execution_count": 30, + "id": "dee42ea5-8a45-4ac3-a670-0be8d43d7473", + "metadata": {}, + "outputs": [], + "source": [ + "data = {'x=2':2/27,\n", + " 'x=3':3/27,\n", + " 'x=1':1/27,\n", + " 'x=1':1/27,\n", + " 'x=7':7/27,\n", + " 'x=0':0/27,\n", + " 'x=4':4/27,\n", + " 'x=5':9/27}\n", + "\n", + "# Create a DataFrame\n", + "df = pd.DataFrame(data, index = ['P(X) = x'])" + ] + }, + { + "cell_type": "code", + "execution_count": 31, + "id": "9fa6affb-bbf0-4208-8b9a-a557348f6636", + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
x=2x=3x=1x=7x=0x=4x=5
P(X) = x0.0740740.1111110.0370370.2592590.00.1481480.333333
\n", + "
" + ], + "text/plain": [ + " x=2 x=3 x=1 x=7 x=0 x=4 x=5\n", + "P(X) = x 0.074074 0.111111 0.037037 0.259259 0.0 0.148148 0.333333" + ] + }, + "execution_count": 31, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df" + ] + }, + { + "cell_type": "markdown", + "id": "8e56a174-6888-4346-a7b4-ee7d94ea855c", + "metadata": {}, + "source": [ + "now lets take random samples of size 10000 and calculate mean and varrience of them 10000 times each " + ] + }, + { + "cell_type": "code", + "execution_count": 33, + "id": "584dbb7e-39a1-42da-b9de-34a20c7ebeb6", + "metadata": {}, + "outputs": [], + "source": [ + "ListOfMeans = []\n", + "for i in range(10000):\n", + " ListOfSamples = []\n", + " for j in range(10000):\n", + " sample = random.sample([2,3,1,1,7,0,4,5], 1)\n", + " ListOfSamples.append(sample[0])\n", + " ListOfMeans.append(pd.Series(ListOfSamples).mean())\n", + " ListOfSamples = []" + ] + }, + { + "cell_type": "code", + "execution_count": 34, + "id": "931f0e36-9629-4137-938c-c11a0dc3b976", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "(array([ 1., 1., 0., 0., 2., 1., 1., 2., 0., 0., 2.,\n", + " 3., 5., 12., 7., 8., 9., 14., 15., 20., 22., 24.,\n", + " 32., 34., 41., 42., 53., 63., 66., 74., 91., 107., 111.,\n", + " 124., 170., 145., 167., 189., 188., 217., 248., 216., 250., 281.,\n", + " 274., 299., 283., 293., 290., 273., 310., 336., 267., 289., 302.,\n", + " 273., 296., 256., 225., 265., 222., 212., 199., 213., 177., 159.,\n", + " 155., 125., 109., 120., 88., 89., 85., 69., 58., 44., 43.,\n", + " 46., 30., 33., 24., 23., 14., 22., 8., 5., 8., 8.,\n", + " 4., 4., 6., 2., 1., 1., 2., 1., 0., 0., 1.,\n", + " 1.]),\n", + " array([2.7905 , 2.792166, 2.793832, 2.795498, 2.797164, 2.79883 ,\n", + " 2.800496, 2.802162, 2.803828, 2.805494, 2.80716 , 2.808826,\n", + " 2.810492, 2.812158, 2.813824, 2.81549 , 2.817156, 2.818822,\n", + " 2.820488, 2.822154, 2.82382 , 2.825486, 2.827152, 2.828818,\n", + " 2.830484, 2.83215 , 2.833816, 2.835482, 2.837148, 2.838814,\n", + " 2.84048 , 2.842146, 2.843812, 2.845478, 2.847144, 2.84881 ,\n", + " 2.850476, 2.852142, 2.853808, 2.855474, 2.85714 , 2.858806,\n", + " 2.860472, 2.862138, 2.863804, 2.86547 , 2.867136, 2.868802,\n", + " 2.870468, 2.872134, 2.8738 , 2.875466, 2.877132, 2.878798,\n", + " 2.880464, 2.88213 , 2.883796, 2.885462, 2.887128, 2.888794,\n", + " 2.89046 , 2.892126, 2.893792, 2.895458, 2.897124, 2.89879 ,\n", + " 2.900456, 2.902122, 2.903788, 2.905454, 2.90712 , 2.908786,\n", + " 2.910452, 2.912118, 2.913784, 2.91545 , 2.917116, 2.918782,\n", + " 2.920448, 2.922114, 2.92378 , 2.925446, 2.927112, 2.928778,\n", + " 2.930444, 2.93211 , 2.933776, 2.935442, 2.937108, 2.938774,\n", + " 2.94044 , 2.942106, 2.943772, 2.945438, 2.947104, 2.94877 ,\n", + " 2.950436, 2.952102, 2.953768, 2.955434, 2.9571 ]),\n", + " )" + ] + }, + "execution_count": 34, + "metadata": {}, + "output_type": "execute_result" + }, + { + "data": { + "image/png": "", + "text/plain": [ + "
" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "plt.hist(ListOfMeans, bins = 100)" + ] + }, + { + "cell_type": "markdown", + "id": "600de5cc-7b1b-4fc1-9731-79150ee147ee", + "metadata": {}, + "source": [ + "resulted graph is nearly same as normal distribution " + ] + }, + { + "cell_type": "code", + "execution_count": 35, + "id": "2d7779a5-4008-4bf2-8e69-44eb8b99590f", + "metadata": {}, + "outputs": [], + "source": [ + "mean1 = 4.52\n", + "stddev1 = 4.496" + ] + }, + { + "cell_type": "markdown", + "id": "55bb6b2d-716c-4648-8c0d-18fb231a7c3b", + "metadata": {}, + "source": [ + "# Q6 #" + ] + }, + { + "cell_type": "code", + "execution_count": 37, + "id": "473ed2a8-5c5a-4c1a-9a02-7faacc99dbb9", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Z-score: 28.27076228190531\n", + "P-value: 1.24609375\n", + "P-value for p = 0.497: 1.231284540041297\n" + ] + } + ], + "source": [ + "from scipy.stats import binom\n", + "\n", + "\n", + "x_bar = 4.97\n", + "n = 10\n", + "p_null = 0.5\n", + "se = (p_null * (1 - p_null) / n)**0.5\n", + "z = (x_bar - p_null) / se\n", + "p_value = 2 * (1 - binom.cdf(int(x_bar), n, p_null))\n", + "print(\"Z-score:\", z)\n", + "print(\"P-value:\", p_value)\n", + "p_alt = 0.497\n", + "p_value_alt = 2 * (1 - binom.cdf(int(x_bar), n, p_alt))\n", + "print(\"P-value for p = 0.497:\", p_value_alt)\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "43630cea-2e17-4ee0-947f-9c96a9a0a0a5", + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.12.3" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/q2.tex b/q2.tex new file mode 100644 index 0000000..0ac1bd2 --- /dev/null +++ b/q2.tex @@ -0,0 +1,31 @@ +\documentclass{article} +\usepackage{amsmath} + +\begin{document} + + + +Q2 \\ +To find $Cov(Z, W)$ + +\begin{enumerate} + \item Mean of $Z$ and $W$ + + $E[Z] = E[XY^3 + X^3Y] = 0$ (since the expectation of the product of independent variables is the product of their expectations). + $E[W] = E[XY + X^2Y + XY^2] = 0$. + + + + \item Covariance of $Z$ and $W$: + \begin{align*} + \text{Cov}(Z, W) &= E[ZW] - E[Z]E[W] \\ + &= E[(XY^3 + X^3Y)(XY + X^2Y + XY^2)] - (0)(0) \\ + &= E[2XY^4 + 2X^4Y^2 + 2XY^2] \\ + &= 2E[XY^4] + 2E[X^4Y^2] + 2E[XY^2] \\ + &= 2(0)(3) + 2(3)(1) + 2(0)(1) = 0 + 6 + 0 = 6 + \end{align*} +\end{enumerate} + +Therefore, $\text{Cov}(Z, W) = 6$. + +\end{document} diff --git a/q3.tex b/q3.tex new file mode 100644 index 0000000..9b4279c --- /dev/null +++ b/q3.tex @@ -0,0 +1,48 @@ +\documentclass{article} +\usepackage{amsmath} +\usepackage{amsfonts} + +\begin{document} + + +\author{} +\date{} + + + + + +\section*{Proof} +Let $S = X_1 + X_2 + \ldots + X_n$. By Chebyshev's Inequality, +\[ +P\left(\left|\frac{S}{n} - \mu\right| \geq \epsilon\right) \leq \frac{\text{Var}(S)}{n^2 \epsilon^2}. +\] + +Since the random variables $X_i$ are i.i.d., +\[ +\text{Var}(S) = \text{Var}(X_1) + \text{Var}(X_2) + \ldots + \text{Var}(X_n) = n \sigma^2. +\] + +Substituting this into the inequality above, +\[ +P\left(\left|\frac{S}{n} - \mu\right| \geq \epsilon\right) \leq \frac{\sigma^2}{n \epsilon^2}. +\] + +To match with Chebyshev's inequality, we choose $k\sigma$ as the distance from the mean $\mu$, where $\sigma = \frac{\sigma}{\sqrt{n}}$ is the standard deviation of the sample mean: +\[ +k\frac{\sigma}{\sqrt{n}} = \epsilon \Rightarrow k = \frac{\sqrt{n} \epsilon}{\sigma}. +\] + +Therefore, we have: +\[ +P\left(\left|\frac{S}{n} - \mu\right| \geq \epsilon\right) \leq \frac{\sigma^2}{n \epsilon^2} = \frac{1}{k^2}. +\] + +As $n$ approaches infinity, the probability tends to zero: +\[ +\lim_{n \to \infty} P\left(\left|\frac{S}{n} - \mu\right| \geq \epsilon\right) = 0. +\] + +This proves the Weak Law of Large Numbers. + +\end{document}