diff --git a/.gitignore b/.gitignore index 9c3a086e..f1844f00 100644 --- a/.gitignore +++ b/.gitignore @@ -330,3 +330,5 @@ ASALocalRun/ .mfractor/ **/.ipynb_checkpoints/** **/Kqlmagic_temp_files/** +**/.mypy_cache/** +**/kqlmagic/** diff --git a/A Getting Started Guide For Azure Sentinel ML Notebooks.ipynb b/A Getting Started Guide For Azure Sentinel ML Notebooks.ipynb new file mode 100644 index 00000000..441129f1 --- /dev/null +++ b/A Getting Started Guide For Azure Sentinel ML Notebooks.ipynb @@ -0,0 +1,888 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Getting Started with Azure ML Notebooks and Azure Sentinel\n", + "**Notebook Version:** 1.0
\n", + " **Python Version:** Python 3.6 (including Python 3.6 - AzureML)
\n", + " **Required Packages**:
\n", + " **Platforms Supported**:\n", + " - Azure Notebooks Free Compute\n", + " - Azure Notebooks DSVM\n", + " - OS Independent\n", + "\n", + "**Data Sources Required**:\n", + " - Log Analytics - SiginLogs (Optional)\n", + " - VirusTotal\n", + " - MaxMind\n", + " \n", + " \n", + "This notebook takes you through the basics needed to get started with Azure Notebooks and Azure Sentinel, and how to perform the basic actions of data acquisition, data enrichment, data analysis, and data visualization. These actions are the building blocks of threat hunting with notebooks and are useful to understand before running more complex notebooks. This notebook only lightly covers each topic but includes 'learn more' sections to provide you with the resource to deep dive into each of these topics. \n", + "\n", + "This notebook assumes that you are running this in an Azure Notebooks environment, however it will work in other Jupyter environments.\n", + "\n", + "**Note:**\n", + "This notebooks uses SigninLogs from your Azure Sentinel Workspace. If you are not yet collecting SigninLogs configure this connector in the Azure Sentinel portal before running this notebook.\n", + "This notebook also uses the VirusTotal API for data enrichment, for this you will require an API key which can be obtained by signing up for a free [VirusTotal community account](https://www.virustotal.com/gui/join-us)\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "---\n", + "## What is a Jupyter notebook?\n", + "You are currently reading a Jupyter notebook. [Jupyter](http://jupyter.org/) is an interactive development and data manipulation environment presented in a browser. Using Jupyter you can create documents, called Notebooks. These documents are made up of cells that contain interactive code, alongside that code's output, and other items such as text and images (what you are looking at now is a cell of Markdown text).\n", + "\n", + "The name, Jupyter, comes from the core supported programming languages that it supports: Julia, Python, and R. Whilst you can use any of these languages we are going to use Python in this notebook, in addition the notebooks that come with Azure Sentinel are all written in Python. Whilst there are pros, and cons to each language Python is a well-established language that has a large number of materials and libraries well suited for data analysis and security investigation, making it ideal for our needs.\n", + "\n", + "### Learn more:\n", + " - The [Infosec Jupyter Book](https://infosecjupyterbook.com/introduction.html) has more details on the technical working of Jupyter.\n", + " - [The Jupyter Project documentation](https://jupyter.org/documentation)\n", + "\n", + "---\n", + "## How to use a Jupyter notebook?\n", + "To use a Jupyter notebook you need a Jupyter server that will render the notebook and execute the code within it. This can take the form of a local [Jupyter installation](https://pypi.org/project/jupyter/), or a remotely hosted version such as [Azure Notebooks](https://notebooks.azure.com/). If you are reading this it is highly likely that you already have a Jupyter server that this notebook is using.\n", + "You can learn more about installing and running your own Jupyter server [here](https://realpython.com/jupyter-notebook-introduction/).\n", + "\n", + "### Using Azure Notebooks\n", + "If you accessed this notebook from Azure Sentinel, you are probably using Azure Notebooks to run this notebook. Azure Notebooks runs in the same way that a local Jupyter server with, except with the additional feature of integrated project management and file storage. When you open a notebook in Azure Notebooks the user interface is nearly identical to a standard Jupyter notebook experience.\n", + "\n", + "Before you can start running code in a notebook you need to make sure that it is connected to a Jupyter server and you have the correct type of kernel configured. For this notebook we are going to be using Python 3.6, hopefully Azure Notebooks has already loaded this kernel for you - you can check this by looking at the top left corner of the screen where you should see the currently connected kernel. \n", + "\n", + "![KernelIssue](https://github.com/Azure/Azure-Sentinel-Notebooks/raw/master/images/nb_img1.png)\n", + "\n", + "If this does not read Python 3.6 you can select the correct kernel by selecting Kernel > Change kernel from the top menu and clicking Python 3.6.\n", + "\n", + "> **Note**: the notebook works with Python 3.6, 3.7 or later. If you are using this notebook in Azure ML or another Jupyter environment you can choose any kernel that supports Python 3.6 or later\n", + "\n", + "![KernelPicker](https://github.com/Azure/Azure-Sentinel-Notebooks/raw/master/images/nb_img2.png)\n", + "\n", + "Once you have done this you should be ready to move onto a code cell.\n", + "> **Tip**: You can identify which cells are code by selecting them and looking at the drop down box at the center of the top menu. It will either read 'Code' (for interactive code cells), 'Markdown' (for Markdown text cells like this one), or RawNBConvert (these are just raw data and not interpreted by Jupyter - they can be used by tools that process notebook files, such as *nbconvert* to render the data into HTML or LaTeX). \n", + "\n", + "If you click on the cell below you should see this box change to 'Code'.\n", + "\n", + "### Learn More:\n", + "More details on Azure Notebooks can be found in the [Azure Notebooks documentation](https://docs.microsoft.com/en-us/azure/notebooks/) and the [Azure Sentinel documentation](https://docs.microsoft.com/en-us/azure/sentinel/notebooks).\n", + "\n", + "---\n", + "## Running code\n", + "Once you have selected a code cell you can run it by clicking the run button at the menu bar at the top, or by pressing Ctrl+Enter.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# This is our first code cell, it contains basic Python code.\n", + "# You can run a code cell by selecting it and clicking the Run button in the top menu, or by pressing Shift + Enter.\n", + "# Once you run a code cell any output from that code will be displayed directly below it.\n", + "print(\"Congratulations you just ran this code cell\")\n", + "y = 2+2\n", + "print(\"2 + 2 =\", y)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Variables set within a code cell persist between cells meaning you can chain cells together" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "y + 2" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Learn More : \n", + " - The [Infosec Jupyter Book](https://infosecjupyterbook.com/) provides an infosec specific intro to Python.\n", + " - [Real Python](https://realpython.com/) is a comprehensive set of Python learnings and tutorials.\n", + "
\n", + "
" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Now that you understand the basics we can move onto more complex code.\n", + "\n", + "---\n", + "## Setting up the environment\n", + "Code cells behave in the same way your code would in other environments, so you need to remember about common coding practices such as variable initialization and library imports. \n", + "Before we execute more complex code we need to make sure the required packages are installed and libraries imported. At the top of many of the Azure Sentinel notebooks you will see large cells that will check kernel versions and then install and import all the libraries we are going to be using in the notebook, make sure you run this before running other cells in the notebook.\n", + "If you are running notebooks locally or via dedicated compute in Azure Notebooks library installs will persist but this is not the case with Azure Notebooks free tier, so you will need to install each time you run. Even if running in a static environment imports are required for each run so make sure you run this cell regardless." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from pathlib import Path\n", + "import os\n", + "import sys\n", + "import warnings\n", + "from IPython.display import display, HTML, Markdown\n", + "\n", + "REQ_PYTHON_VER=(3, 6)\n", + "REQ_MSTICPY_VER=(0, 6, 0)\n", + "\n", + "display(HTML(\"

Starting Notebook setup...

\"))\n", + "# If you did not clone the entire Azure-Sentinel-Notebooks repo you may not have this file\n", + "if Path(\"./utils/nb_check.py\").is_file():\n", + " from utils.nb_check import check_python_ver, check_mp_ver\n", + "\n", + " check_python_ver(min_py_ver=REQ_PYTHON_VER)\n", + " try:\n", + " check_mp_ver(min_msticpy_ver=REQ_MSTICPY_VER)\n", + " except ImportError:\n", + " !pip install --user --upgrade msticpy\n", + " if \"msticpy\" in sys.modules:\n", + " importlib.reload(sys.modules[\"msticpy\"])\n", + " else:\n", + " import msticpy\n", + " check_mp_ver(MSTICPY_REQ_VERSION)\n", + " \n", + "from msticpy.nbtools import nbinit\n", + "nbinit.init_notebook(\n", + " namespace=globals(),\n", + " extra_imports=[\"ipwhois, IPWhois, pyyaml\"]\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "---\n", + "## Configuration\n", + "Once we have set up our Jupyter environment with the libraries that we'll use in the notebook, we need to make sure we have some configuration in place. Some of the notebook components need addtional configuration to connect to external services (e.g. API keys to retrieve Threat Intelligence data). This includes configuration for connection to our Azure Sentinel workspace, as well as some threat intelligence providers we will use later.\n", + "The easiest way to handle the configuration for these services is to store them in a msticpyconfig file (`msticpyconfig.yaml`). More details on msticpyconfig can be found here: https://msticpy.readthedocs.io/en/latest/getting_started/msticpyconfig.html\n", + "\n", + "### Learn more: \n", + "- In this notebook we will setup the basic config we need to get started. If you need a more complete walk-through we have a separate notebook to help you: https://github.com/Azure/Azure-Sentinel-Notebooks/blob/master/ConfiguringNotebookEnvironment.ipynb\n", + "
\n", + "
" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The Azure-Sentinel-Notebooks GitHub repo contains an template msticpyconfig file ready to be populated. If you have run this notebook before you may have a msticpyconfig file already populated, the cell below allows you to checks if this file. If your config file does not contain details under Azure Sentinel > Workspaces, or TIProviders the following cells will populate these for you.
\n", + "If you want to see an example of what a populated msticpyconfig file should look like a samples is included in the repo as msticpyconfig-sample.yaml." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import yaml\n", + "def print_config():\n", + " with open('msticpyconfig.yaml') as f:\n", + " data = yaml.load(f, Loader=yaml.FullLoader)\n", + " print(yaml.dump(data))\n", + "try:\n", + " print_config()\n", + "except FileNotFoundError:\n", + " print(\"No msticpyconfig.yaml was found in your current directory.\")\n", + " print(\"We are downloading a template file for you.\")\n", + " import urllib\n", + " urllib.request.urlretrieve(\"https://raw.githubusercontent.com/Azure/Azure-Sentinel-Notebooks/master/msticpyconfig.yaml\", \"msticpyconfig.yaml\")\n", + " print_config()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "If you do not have and msticpyconfig file we can populate one for you. Before you do this you will need a few things.\n", + "\n", + "The first is the Workspace ID and Tenant ID of the Azure Sentinel Workspace you wish to connect to.\n", + "\n", + " - You can get the workspace ID by opening Azure Sentinel in the [Azure Portal](https://portal.azure.com) and selecting Settings > Workspace Settings. Your Workspace ID is displayed near the top of this page.\n", + "\n", + "- You can get your tenant ID (also referred to organization or directory ID) via [Azure Active Directory](https://docs.microsoft.com/en-us/onedrive/find-your-office-365-tenant-id)\n", + "\n", + "We are going to use [VirusTotal](https://www.virustotal.com) to enrich our Azure Sentinel data. For this you will need a VirusTotal API key, one of these can be obtained for free (as a personnal key) via the [VirusTotal](https://developers.virustotal.com/v3.0/reference#getting-started) website.\n", + "We are using VirusTotal for this notebook but we also support a range of other threat intelligence providers: https://msticpy.readthedocs.io/en/latest/data_acquisition/TIProviders.html\n", + "

\n", + "In addition we are going to plot IP address locations on a map, in order to do this we are going to use [MaxMind](https://www.maxmind.com) to geolocate IP addresses which requires an API key. You can sign up for a free account and API key at https://www.maxmind.com/en/geolite2/signup. \n", + "

\n", + "Once you have these required items run the cell below and you will prompted to enter these elements:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "scrolled": true + }, + "outputs": [], + "source": [ + "ws_id = nbwidgets.GetEnvironmentKey(env_var='WORKSPACE_ID',\n", + " prompt='Please enter your Log Analytics Workspace Id:', auto_display=True)\n", + "ten_id = nbwidgets.GetEnvironmentKey(env_var='TENANT_ID',\n", + " prompt='Please enter your Log Analytics Tenant Id:', auto_display=True)\n", + "vt_key = nbwidgets.GetEnvironmentKey(env_var='VT_KEY',\n", + " prompt='Please enter your VirusTotal API Key:', auto_display=True)\n", + "mm_key = nbwidgets.GetEnvironmentKey(env_var='MM_KEY',\n", + " prompt='Please enter your MaxMind API Key:', auto_display=True)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + " The cell below will now populate a msticpyconfig file with these values:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import yaml\n", + "with open(\"msticpyconfig.yaml\") as config:\n", + " data = yaml.load(config, Loader=yaml.Loader)\n", + "data['AzureSentinel']\n", + "\n", + "workspace = {\"Default\":{\"WorkspaceId\": ws_id.value, \"TenantId\": ten_id.value}}\n", + "ti = {\"VirusTotal\":{\"Args\": {\"AuthKey\" : vt_key.value}, \"Primary\" : True, \"Provider\": \"VirusTotal\"}}\n", + "other_prov = {\"GeoIPLite\" : {\"Args\" : {\"AuthKey\" : mm_key.value, \"DBFolder\" : \"~/msticpy\"}, \"Provider\" : \"GeoLiteLookup\"}}\n", + "data['AzureSentinel']['Workspaces'] = workspace\n", + "data['TIProviders'] = ti\n", + "data['OtherProviders'] = other_prov\n", + "\n", + "with open(\"msticpyconfig.yaml\", 'w') as config:\n", + " yaml.dump(data, config)\n", + " \n", + "print(\"msticpyconfig.yaml updated\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We can now validate our configuration is correct." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from msticpy.common.pkg_config import refresh_config, validate_config\n", + "refresh_config()\n", + "validate_config()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "> **Note** you may see warnings for missing providers when running this cell.\n", + "> This is not an issue as we will not be using all providers in this notebook\n", + "> so long as you get thie message \"No errors found.\" you are OK to proceed.\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "---\n", + "## Getting Data\n", + "Now that we have configured the details necessary to connect to Azure Sentinel we can go ahead and get some data. We will do this with `QueryProvider()` from MSTICpy. \n", + "You can use the `QueryProvider` class to connect to different data sources such as MDATP, the Security Graph API, and the one we will use here, Azure Sentinel. \n", + "\n", + "### Learn more:\n", + " - More details on configuring and using QueryProviders can be found in the [MSTICpy Documentation](https://msticpy.readthedocs.io/en/latest/data_acquisition/DataProviders.html#instantiating-a-query-provider).\n", + "

" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "For now, we are going to set up a QueryProvider for Azure Sentinel, pass it the details for our workspace that we just stored in the msticpyconfig file, and connect. The connection process will ask us to authenticate to our Azure Sentinel workspace via [device authorization](https://docs.microsoft.com/en-us/azure/active-directory/develop/v2-oauth2-device-code) with our Azure credentials. You can do this by clicking the device login code button that appears as the output of the next cell, or by navigating to https://microsoft.com/devicelogin and manually entering the code. Note that this authentication persists with the kernel you are using with the notebook, so if you restart the kernel you will need to re-authenticate.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Initalize a QueryProvider for Azure Sentinel\n", + "qry_prov = QueryProvider(\"LogAnalytics\")\n", + "\n", + "# Get the Azure Sentinel workspace details from msticpyconfig\n", + "try:\n", + " ws_config = WorkspaceConfig()\n", + " md(\"Workspace details collected from config file\")\n", + "except:\n", + " raise(\"No workspace settings are configured, please run the cells above to configure these.\")\n", + " \n", + "# Connect to Azure Sentinel with our QueryProvider and config details\n", + "# ws_config.code_connect_str is a feature of MSTICpy that creates the required connection string from details in our msticpyconfig\n", + "qry_prov.connect(connection_str=ws_config.code_connect_str)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Now that we have connected we can query Azure Sentinel for data, but before we do that we need to understand what data is avalaible to query. The QueryProvider object provides a way to get a list of tables as well as tables and table columns:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Get list of tables in our Workspace\n", + "display(qry_prov.schema_tables [:5]) # We are outputting only the first 5 tables for brevity\n", + "# Get list of tables and thier columns\n", + "qry_prov.schema['SigninLogs'] # We are only displaying the columns for SigninLogs for brevity" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "MSTICpy includes a number of built in queries that you can run.
\n", + "You can list available queries with .list_queries() and get specific details about a query by calling it with \"?\" as a parameter" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Get a list of avaliable queries\n", + "qry_prov.list_queries()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Get details about a query\n", + "qry_prov.Azure.list_all_signins_geo(\"?\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "You can then run the query by calling it with the required parameters:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from datetime import datetime, timedelta\n", + "# set our query end time as now\n", + "end = datetime.now()\n", + "# set our query start time as 1 hour ago\n", + "start = end - timedelta(hours=1)\n", + "# run query with specified start and end times\n", + "logons_df = qry_prov.Azure.list_all_signins_geo(start=start, end=end)\n", + "# display first 5 rows of any results\n", + "logons_df.head() # If you have no data you will just see the column headings displayed" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Another way to run queries is to pass a string format of a KQL query to the query provider, this will run the query against the workspace connected to above, and will return the data in a [Pandas DataFrame](https://pandas.pydata.org/). We will look at working with Pandas in a bit more detail later." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Define our query\n", + "test_query = \"\"\"\n", + "SigninLogs\n", + "| where TimeGenerated > ago(7d)\n", + "| take 10\n", + "\"\"\"\n", + "\n", + "# Pass that query to our QueryProvider\n", + "test_df = qry_prov.exec_query(test_query)\n", + "\n", + "# Check that we have some data\n", + "if isinstance(test_df, pd.DataFrame) and not test_df.empty:\n", + " # .head() returns the first 5 rows of our results DataFrame\n", + " display(test_df.head())\n", + "# If where is no data load some sample data to use instead\n", + "else:\n", + " md(\"You don't appear to have any SigninLogs - we will load sample data for you to use.\")\n", + " qry_prov = QueryProvider(\"LocalData\", data_paths=[\"nbdemo/data/\"], query_paths=[\"nbdemo/data/\"])\n", + " logons_df = qry_prov.Azure.list_all_signins_geo()\n", + " display(logons_df.head())" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Learn more:\n", + " - You can learn more about the MSTICpy pre-defined queries in the [MSTICpy Documentation](https://msticpy.readthedocs.io/en/latest/data_acquisition/DataProviders.html#running-an-pre-defined-query)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "---\n", + "## Pandas\n", + "Our query results are returned in the form of a Pandas DataFrame. DataFrames are a core component of the Azure Sentinel notebooks and of MSTICpy and is used for both input and output formats.\n", + "Pandas DataFrames are incredibly versitile data structures with a lot of useful features, we will cover a small number of them here and we recommend that you check out the Learn more section to learn more about Pandas features.\n", + "
\n", + "
\n", + "### Displaying a DataFrame:\n", + "The first thing we want to do is display our DataFrame. You can either just run it or explicity display it by calling `display(df)`." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# For this section we are going to create a DataFrame from data we have saved in a csv file\n", + "df = pd.read_csv(\"https://raw.githubusercontent.com/microsoft/msticpy/master/tests/testdata/host_logons.csv\", index_col=[0] )\n", + "# Display our DataFrame\n", + "df # or display(df)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "> **Note** if the dataframe variable (`df` in the example above) is the last statement in a \n", + "> code cell, Jupyter will automatically display it without using the `display()` function. \n", + "> However, if you want to display a DataFrame in the middle of \n", + "> other code in a cell you must use the `display()` function.\n", + "\n", + "You may not want to display the whole DataFrame and instead display only a selection of items. There are numerous ways to do this and the cell below shows some of the most widely used functions." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "md(\"Display the first 2 rows using head(): \", \"bold\")\n", + "display(df.head(2)) # we don't need to call display here but just for illustration" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "md(\"Display the 3rd row using iloc[]: \", \"bold\")\n", + "df.iloc[3]" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "md(\"Show the column names in the DataFrame \", \"bold\")\n", + "df.columns" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "md(\"Display just the TimeGenerated and TenantId columnns: \", \"bold\")\n", + "df[[\"TimeGenerated\", \"TenantId\"]]" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We can also choose to select a subsection of our DataFrame based on the contents of the DataFrame:\n", + "\n", + "> **Tip**: the syntax in these examples is using a technique called *boolean indexing*. \n", + ">
`df[]`\n", + "> returns all rows in the dataframe where the boolean expression is True\n", + ">
In the first example we telling pandas to return all rows where the column value of\n", + "> 'TargetUserName' matches 'MSTICAdmin'" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "md(\"Display only rows where TargetUserName value is 'MSTICAdmin': \", \"bold\")\n", + "df[df['TargetUserName']==\"MSTICAdmin\"]" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "md(\"Display rows where TargetUserName is either MSTICAdmin or adm1nistratror:\", \"bold\")\n", + "display(df[df['TargetUserName'].isin(['adm1nistrator', 'MSTICAdmin'])])" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Our DataFrame call also be extended to add new columns with additional data if reqired:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "df[\"NewCol\"] = \"Look at my new data!\"\n", + "display(df[[\"TenantId\",\"Account\", \"TimeGenerated\", \"NewCol\"]].head(2))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Learn more:\n", + "There is a lot more you can do with Pandas, the links below provide some useful resources:\n", + " - [Getting starting with Pandas](https://pandas.pydata.org/pandas-docs/stable/getting_started/index.html)\n", + " - [Infosec Jupyerbook intro to Pandas](https://infosecjupyterbook.com/notebooks/tutorials/03_intro_to_pandas.html)\n", + " - [A great list of Pandas hints and tricks](https://www.dataschool.io/python-pandas-tips-and-tricks/)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "---\n", + "## Enriching data\n", + "\n", + "Now that we have seen how to query for data, and do some basic manipulation we can look at enriching this data with additional data sources. For this we are going to use an external threat intelligence provider to give us some more details about an IP address we have in our dataset using the [MSTICpy TIProvider](\"https://msticpy.readthedocs.io/en/latest/data_acquisition/TIProviders.html\") feature." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from datetime import datetime, timedelta\n", + "# Check if we have logon data already and if not get some\n", + "if not isinstance(logons_df, pd.DataFrame) or logons_df.empty:\n", + " # set our query end time as now\n", + " end = datetime.now()\n", + " # set our query start time as 1 hour ago\n", + " start = end - timedelta(days=1)\n", + " # run query with specified start and end times\n", + " logons_df = qry_prov.Azure.list_all_signins_geo(start=start, end=end)\n", + " \n", + "# Create our TI provider\n", + "ti = TILookup()\n", + "# Get the first logon IP address from our dataset\n", + "ip = logons_df.iloc[1]['IPAddress']\n", + "# Look up the IP in VirusTotal\n", + "ti_resp = ti.lookup_ioc(ip, providers=[\"VirusTotal\"])\n", + "\n", + "# Format our results as a DataFrame\n", + "ti_resp = ti.result_to_df(ti_resp)\n", + "display(ti_resp)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Using the [Pandas apply()](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.apply.html) feature we can get results for all the IP addresses in our data set and add the lookup severity score as a new column in our DataFrame for easier reference." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Take the IP address in each row, look it up against TI and return the seveirty score\n", + "def lookup_res(row):\n", + " ip = row['IPAddress']\n", + " resp = ti.lookup_ioc(ip, providers=[\"VirusTotal\"])\n", + " resp = ti.result_to_df(resp)\n", + " return resp[\"Severity\"].iloc[0]\n", + "\n", + "# Take the first 3 rows of data and copy they into a new DataFrame\n", + "enrich_logons_df = logons_df.iloc[:3].copy()\n", + "# Create a new column called TIRisk and populate that with the TI severity score of the IP Address in that row\n", + "enrich_logons_df['TIRisk'] = enrich_logons_df.apply(lookup_res, axis=1)\n", + "# Display a subset of columns from our DataFrame\n", + "display(enrich_logons_df[[\"TimeGenerated\", \"ResultType\", \"UserPrincipalName\", \"IPAddress\", \"TIRisk\"]])" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Learn more:\n", + "MSTICpy includes further threat intelligence capabilities as well as other data enrichment options. More details on these can be found in the [documentation](https://msticpy.readthedocs.io/en/latest/DataEnrichment.html)." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "---\n", + "## Analyzing data\n", + "With the data we have collected we may wish to perform some analysis on it in order to better understand it. MSTICpy includes a number of features to help with this, and there are a vast array of other data analysis capabilities available via Python ranging from simple processes to complex ML models. We will start here by keeping it simple and look at how we can decode some Base64 encoded command line strings we have in order to allow us to understand their content." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from msticpy.sectools import base64unpack as b64\n", + "# Take our encoded Powershell Command\n", + "b64_cmd = \"powershell.exe -encodedCommand SW52b2tlLVdlYlJlcXVlc3QgaHR0cHM6Ly9jb250b3NvLmNvbS9tYWx3YXJlIC1PdXRGaWxlIEM6XG1hbHdhcmUuZXhl\"\n", + "# Unpack the Base64 encoded elements\n", + "unpack_txt = b64.unpack(input_string=b64_cmd)\n", + "# Display our results and transform for easier reading\n", + "unpack_txt[1].T" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We can also use MSTICpy to extract Indicators of Compromise (IoCs) from a dataset, this makes it easy to extract and match on a set of IoCs within our data. In the example below we take a US Cybersecurity & Infrastructure Security Agency (CISA) report and extract all domains listed in the report:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import requests\n", + "# Set up our IoCExtract oject\n", + "ioc_extractor = iocextract.IoCExtract()\n", + "# Download our threat report\n", + "data = requests.get(\"https://www.us-cert.gov/sites/default/files/publications/AA20-099A_WHITE.stix.xml\")\n", + "# Extract domains listed in our report\n", + "iocs = ioc_extractor.extract(data.text, ioc_types=\"dns\")['dns']\n", + "# Display the first 5 iocs found in our report\n", + "list(iocs)[:5]" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Learn more:\n", + "There are a wide range of options when it comes to data analysis in notebooks using Python. Here are some useful resources to get you started:\n", + " - [MSITCpy DataAnalysis documentation](https://msticpy.readthedocs.io/en/latest/DataAnalysis.html)\n", + " - Scikit-Learn is a popular Python ML data analysis library, which has a useful [tutorial](https://scikit-learn.org/stable/tutorial/basic/tutorial.html)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "---\n", + "## Visualizing data\n", + "Visualizing data can provide an excellent way to analyse data, identify patterns and anomalies. Python has a wide range of data visualization capabilities each of which have thier own benefits and drawbacks. We will look at some basic capabilities as well as the in-build visualizations in MSTICpy.\n", + "


\n", + "**Basic Graphs**
\n", + "Pandas and Matplotlib provide the easiest and simplest way to produce simple plots of data:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "vis_q = \"\"\"\n", + "SigninLogs\n", + "| where TimeGenerated > ago(7d)\n", + "| sample 5\"\"\"\n", + "\n", + "# Try and query for data but if using sample data load that instead\n", + "try:\n", + " vis_data = qry_prov.exec_query(vis_q)\n", + "except FileNotFoundError:\n", + " vis_data = logons_df\n", + "\n", + "# Check we have some data in our results and if not use previously used dataset\n", + "if not isinstance(vis_data, pd.DataFrame) or vis_data.empty:\n", + " vis_data = logons_df\n", + "\n", + "# Plot up to the first 5 IP addresses\n", + "vis_data.head()[\"IPAddress\"].value_counts().plot.bar(\n", + " title=\"IP prevelence\", legend=False\n", + ")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "pie_df = vis_data.copy()\n", + " # If we have lots of data just plot the first 5 rows\n", + "pie_df.head()['IPAddress'].value_counts().plot.pie(legend=True)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Learn more:\n", + " - The [Infosec Jupyterbook](https://infosecjupyterbook.com/) includes a section on data visualization.\n", + " - [Bokeh Library Documentation](https://bokeh.org/)\n", + " - [Matplotlib tutorial](https://matplotlib.org/3.2.0/tutorials/index.html)\n", + " - [Seaborn visualization library tutorial](https://seaborn.pydata.org/tutorial.html)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "---\n", + "## Conclusion\n", + "This notebook has showed you the basics of using notebooks and Azure Sentinel for security investigaitons. There are many more things possible using notebooks and it is stronly encouraged to read the material we have referenced in the learn more sections in this notebook. You can also explore the other Azure Sentinel notebooks in order to take advantage of the pre-built hunting logic, and understand other analysis techniques that are possible.
\n", + "### Appendix:\n", + " - [Jupyter Notebooks: An Introduction](https://realpython.com/jupyter-notebook-introduction/)\n", + " - [Threat Hunting in the cloud with Azure Notebooks](https://medium.com/@maarten.goet/threat-hunting-in-the-cloud-with-azure-notebooks-supercharge-your-hunting-skills-using-jupyter-8d69218e7ca0)\n", + " - [MSTICpy documentation](https://msticpy.readthedocs.io/)\n", + " - [Azure Sentinel Notebooks documentation](https://docs.microsoft.com/en-us/azure/sentinel/notebooks)\n", + " - [The Infosec Jupyterbook](https://infosecjupyterbook.com/introduction.html)\n", + " - [Linux Host Explorer Notebook walkthrough](https://techcommunity.microsoft.com/t5/azure-sentinel/explorer-notebook-series-the-linux-host-explorer/ba-p/1138273)\n", + " - [Why use Jupyter for Security Investigations](https://techcommunity.microsoft.com/t5/azure-sentinel/why-use-jupyter-for-security-investigations/ba-p/475729)\n", + " - [Security Investigtions with Azure Sentinel & Notebooks](https://techcommunity.microsoft.com/t5/azure-sentinel/security-investigation-with-azure-sentinel-and-jupyter-notebooks/ba-p/432921)\n", + " - [Pandas Documentation](https://pandas.pydata.org/pandas-docs/stable/user_guide/index.html)\n", + " - [Bokeh Documentation](https://docs.bokeh.org/en/latest/)" + ] + } + ], + "metadata": { + "hide_input": false, + "kernelspec": { + "display_name": "Python 3.6", + "language": "python", + "name": "python36" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.7" + }, + "toc": { + "base_numbering": 1, + "nav_menu": {}, + "number_sections": true, + "sideBar": true, + "skip_h1_title": false, + "title_cell": "Table of Contents", + "title_sidebar": "Contents", + "toc_cell": false, + "toc_position": {}, + "toc_section_display": true, + "toc_window_display": false + }, + "varInspector": { + "cols": { + "lenName": 16, + "lenType": 16, + "lenVar": 40 + }, + "kernels_config": { + "python": { + "delete_cmd_postfix": "", + "delete_cmd_prefix": "del ", + "library": "var_list.py", + "varRefreshCmd": "print(var_dic_list())" + }, + "r": { + "delete_cmd_postfix": ") ", + "delete_cmd_prefix": "rm(", + "library": "var_list.r", + "varRefreshCmd": "cat(var_dic_list()) " + } + }, + "types_to_exclude": [ + "module", + "function", + "builtin_function_or_method", + "instance", + "_Feature" + ], + "window_display": false + }, + "widgets": { + "application/vnd.jupyter.widget-state+json": { + "state": {}, + "version_major": 2, + "version_minor": 0 + } + } + }, + "nbformat": 4, + "nbformat_minor": 4 +} diff --git a/A Getting Started Guide For Azure Sentinel Notebooks.ipynb b/A Getting Started Guide For Azure Sentinel Notebooks.ipynb new file mode 100644 index 00000000..6c52ce2b --- /dev/null +++ b/A Getting Started Guide For Azure Sentinel Notebooks.ipynb @@ -0,0 +1,947 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Getting Started with Azure Notebooks and Azure Sentinel\n", + "**Notebook Version:** 1.0
\n", + " **Python Version:** Python 3.6 (including Python 3.6 - AzureML)
\n", + " **Required Packages**:
\n", + " **Platforms Supported**:\n", + " - Azure Notebooks Free Compute\n", + " - Azure Notebooks DSVM\n", + " - OS Independent\n", + "\n", + "**Data Sources Required**:\n", + " - Log Analytics - SiginLogs (Optional)\n", + " - VirusTotal\n", + " - MaxMind\n", + " \n", + " \n", + "This notebook takes you through the basics needed to get started with Azure Notebooks and Azure Sentinel, and how to perform the basic actions of data acquisition, data enrichment, data analysis, and data visualization. These actions are the building blocks of threat hunting with notebooks and are useful to understand before running more complex notebooks. This notebook only lightly covers each topic but includes 'learn more' sections to provide you with the resource to deep dive into each of these topics. \n", + "\n", + "This notebook assumes that you are running this in an Azure Notebooks environment, however it will work in other Jupyter environments.\n", + "\n", + "**Note:**\n", + "This notebooks uses SigninLogs from your Azure Sentinel Workspace. If you are not yet collecting SigninLogs configure this connector in the Azure Sentinel portal before running this notebook.\n", + "This notebook also uses the VirusTotal API for data enrichment, for this you will require an API key which can be obtained by signing up for a free [VirusTotal community account](https://www.virustotal.com/gui/join-us)\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "---\n", + "## What is a Jupyter notebook?\n", + "You are currently reading a Jupyter notebook. [Jupyter](http://jupyter.org/) is an interactive development and data manipulation environment presented in a browser. Using Jupyter you can create documents, called Notebooks. These documents are made up of cells that contain interactive code, alongside that code's output, and other items such as text and images (what you are looking at now is a cell of Markdown text).\n", + "\n", + "The name, Jupyter, comes from the core supported programming languages that it supports: Julia, Python, and R. Whilst you can use any of these languages we are going to use Python in this notebook, in addition the notebooks that come with Azure Sentinel are all written in Python. Whilst there are pros, and cons to each language Python is a well-established language that has a large number of materials and libraries well suited for data analysis and security investigation, making it ideal for our needs.\n", + "\n", + "### Learn more:\n", + " - The [Infosec Jupyter Book](https://infosecjupyterbook.com/introduction.html) has more details on the technical working of Jupyter.\n", + " - [The Jupyter Project documentation](https://jupyter.org/documentation)\n", + "\n", + "---\n", + "## How to use a Jupyter notebook?\n", + "To use a Jupyter notebook you need a Jupyter server that will render the notebook and execute the code within it. This can take the form of a local [Jupyter installation](https://pypi.org/project/jupyter/), or a remotely hosted version such as [Azure Notebooks](https://notebooks.azure.com/). If you are reading this it is highly likely that you already have a Jupyter server that this notebook is using.\n", + "You can learn more about installing and running your own Jupyter server [here](https://realpython.com/jupyter-notebook-introduction/).\n", + "\n", + "### Using Azure Notebooks\n", + "If you accessed this notebook from Azure Sentinel, you are probably using Azure Notebooks to run this notebook. Azure Notebooks runs in the same way that a local Jupyter server with, except with the additional feature of integrated project management and file storage. When you open a notebook in Azure Notebooks the user interface is nearly identical to a standard Jupyter notebook experience.\n", + "\n", + "Before you can start running code in a notebook you need to make sure that it is connected to a Jupyter server and you have the correct type of kernel configured. For this notebook we are going to be using Python 3.6, hopefully Azure Notebooks has already loaded this kernel for you - you can check this by looking at the top left corner of the screen where you should see the currently connected kernel. \n", + "\n", + "![KernelIssue](https://github.com/Azure/Azure-Sentinel-Notebooks/raw/master/images/nb_img1.png)\n", + "\n", + "If this does not read Python 3.6 you can select the correct kernel by selecting Kernel > Change kernel from the top menu and clicking Python 3.6.\n", + "\n", + "> **Note**: the notebook works with Python 3.6, 3.7 or later. If you are using this notebook in Azure ML or another Jupyter environment you can choose any kernel that supports Python 3.6 or later\n", + "\n", + "![KernelPicker](https://github.com/Azure/Azure-Sentinel-Notebooks/raw/master/images/nb_img2.png)\n", + "\n", + "Once you have done this you should be ready to move onto a code cell.\n", + "> **Tip**: You can identify which cells are code by selecting them and looking at the drop down box at the center of the top menu. It will either read 'Code' (for interactive code cells), 'Markdown' (for Markdown text cells like this one), or RawNBConvert (these are just raw data and not interpreted by Jupyter - they can be used by tools that process notebook files, such as *nbconvert* to render the data into HTML or LaTeX). \n", + "\n", + "If you click on the cell below you should see this box change to 'Code'.\n", + "\n", + "### Learn More:\n", + "More details on Azure Notebooks can be found in the [Azure Notebooks documentation](https://docs.microsoft.com/en-us/azure/notebooks/) and the [Azure Sentinel documentation](https://docs.microsoft.com/en-us/azure/sentinel/notebooks).\n", + "\n", + "---\n", + "## Running code\n", + "Once you have selected a code cell you can run it by clicking the run button at the menu bar at the top, or by pressing Ctrl+Enter.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# This is our first code cell, it contains basic Python code.\n", + "# You can run a code cell by selecting it and clicking the Run button in the top menu, or by pressing Shift + Enter.\n", + "# Once you run a code cell any output from that code will be displayed directly below it.\n", + "print(\"Congratulations you just ran this code cell\")\n", + "y = 2+2\n", + "print(\"2 + 2 =\", y)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Variables set within a code cell persist between cells meaning you can chain cells together" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "y + 2" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Learn More : \n", + " - The [Infosec Jupyter Book](https://infosecjupyterbook.com/) provides an infosec specific intro to Python.\n", + " - [Real Python](https://realpython.com/) is a comprehensive set of Python learnings and tutorials.\n", + "
\n", + "
" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Now that you understand the basics we can move onto more complex code.\n", + "\n", + "---\n", + "## Setting up the environment\n", + "Code cells behave in the same way your code would in other environments, so you need to remember about common coding practices such as variable initialization and library imports. \n", + "Before we execute more complex code we need to make sure the required packages are installed and libraries imported. At the top of many of the Azure Sentinel notebooks you will see large cells that will check kernel versions and then install and import all the libraries we are going to be using in the notebook, make sure you run this before running other cells in the notebook.\n", + "If you are running notebooks locally or via dedicated compute in Azure Notebooks library installs will persist but this is not the case with Azure Notebooks free tier, so you will need to install each time you run. Even if running in a static environment imports are required for each run so make sure you run this cell regardless." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from pathlib import Path\n", + "import os\n", + "import sys\n", + "import warnings\n", + "from IPython.display import display, HTML, Markdown\n", + "\n", + "REQ_PYTHON_VER=(3, 6)\n", + "REQ_MSTICPY_VER=(0, 6, 0)\n", + "\n", + "display(HTML(\"

Starting Notebook setup...

\"))\n", + "# If you did not clone the entire Azure-Sentinel-Notebooks repo you may not have this file\n", + "if Path(\"./utils/nb_check.py\").is_file():\n", + " from utils.nb_check import check_python_ver, check_mp_ver\n", + "\n", + " check_python_ver(min_py_ver=REQ_PYTHON_VER)\n", + " try:\n", + " check_mp_ver(min_msticpy_ver=REQ_MSTICPY_VER)\n", + " except ImportError:\n", + " !pip install --user --upgrade msticpy\n", + " if \"msticpy\" in sys.modules:\n", + " importlib.reload(sys.modules[\"msticpy\"])\n", + " else:\n", + " import msticpy\n", + " check_mp_ver(MSTICPY_REQ_VERSION)\n", + " \n", + "from msticpy.nbtools import nbinit\n", + "nbinit.init_notebook(\n", + " namespace=globals(),\n", + " extra_imports=[\"ipwhois, IPWhois, pyyaml\"]\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "---\n", + "## Configuration\n", + "Once we have set up our Jupyter environment with the libraries that we'll use in the notebook, we need to make sure we have some configuration in place. Some of the notebook components need addtional configuration to connect to external services (e.g. API keys to retrieve Threat Intelligence data). This includes configuration for connection to our Azure Sentinel workspace, as well as some threat intelligence providers we will use later.\n", + "The easiest way to handle the configuration for these services is to store them in a msticpyconfig file (`msticpyconfig.yaml`). More details on msticpyconfig can be found here: https://msticpy.readthedocs.io/en/latest/getting_started/msticpyconfig.html\n", + "\n", + "### Learn more: \n", + "- In this notebook we will setup the basic config we need to get started. If you need a more complete walk-through we have a separate notebook to help you: https://github.com/Azure/Azure-Sentinel-Notebooks/blob/master/ConfiguringNotebookEnvironment.ipynb\n", + "
\n", + "
" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The Azure-Sentinel-Notebooks GitHub repo contains an template msticpyconfig file ready to be populated. If you have run this notebook before you may have a msticpyconfig file already populated, the cell below allows you to checks if this file. If your config file does not contain details under Azure Sentinel > Workspaces, or TIProviders the following cells will populate these for you.
\n", + "If you want to see an example of what a populated msticpyconfig file should look like a samples is included in the repo as msticpyconfig-sample.yaml." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import yaml\n", + "def print_config():\n", + " with open('msticpyconfig.yaml') as f:\n", + " data = yaml.load(f, Loader=yaml.FullLoader)\n", + " print(yaml.dump(data))\n", + "try:\n", + " print_config()\n", + "except FileNotFoundError:\n", + " print(\"No msticpyconfig.yaml was found in your current directory.\")\n", + " print(\"We are downloading a template file for you.\")\n", + " import urllib\n", + " urllib.request.urlretrieve(\"https://raw.githubusercontent.com/Azure/Azure-Sentinel-Notebooks/master/msticpyconfig.yaml\", \"msticpyconfig.yaml\")\n", + " print_config()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "If you do not have and msticpyconfig file we can populate one for you. Before you do this you will need a few things.\n", + "\n", + "The first is the Workspace ID and Tenant ID of the Azure Sentinel Workspace you wish to connect to.\n", + "\n", + " - You can get the workspace ID by opening Azure Sentinel in the [Azure Portal](https://portal.azure.com) and selecting Settings > Workspace Settings. Your Workspace ID is displayed near the top of this page.\n", + "\n", + "- You can get your tenant ID (also referred to organization or directory ID) via [Azure Active Directory](https://docs.microsoft.com/en-us/onedrive/find-your-office-365-tenant-id)\n", + "\n", + "We are going to use [VirusTotal](https://www.virustotal.com) to enrich our Azure Sentinel data. For this you will need a VirusTotal API key, one of these can be obtained for free (as a personnal key) via the [VirusTotal](https://developers.virustotal.com/v3.0/reference#getting-started) website.\n", + "We are using VirusTotal for this notebook but we also support a range of other threat intelligence providers: https://msticpy.readthedocs.io/en/latest/data_acquisition/TIProviders.html\n", + "

\n", + "In addition we are going to plot IP address locations on a map, in order to do this we are going to use [MaxMind](https://www.maxmind.com) to geolocate IP addresses which requires an API key. You can sign up for a free account and API key at https://www.maxmind.com/en/geolite2/signup. \n", + "

\n", + "Once you have these required items run the cell below and you will prompted to enter these elements:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "scrolled": true + }, + "outputs": [], + "source": [ + "ws_id = nbwidgets.GetEnvironmentKey(env_var='WORKSPACE_ID',\n", + " prompt='Please enter your Log Analytics Workspace Id:', auto_display=True)\n", + "ten_id = nbwidgets.GetEnvironmentKey(env_var='TENANT_ID',\n", + " prompt='Please enter your Log Analytics Tenant Id:', auto_display=True)\n", + "vt_key = nbwidgets.GetEnvironmentKey(env_var='VT_KEY',\n", + " prompt='Please enter your VirusTotal API Key:', auto_display=True)\n", + "mm_key = nbwidgets.GetEnvironmentKey(env_var='MM_KEY',\n", + " prompt='Please enter your MaxMind API Key:', auto_display=True)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + " The cell below will now populate a msticpyconfig file with these values:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import yaml\n", + "with open(\"msticpyconfig.yaml\") as config:\n", + " data = yaml.load(config, Loader=yaml.Loader)\n", + "data['AzureSentinel']\n", + "\n", + "workspace = {\"Default\":{\"WorkspaceId\": ws_id.value, \"TenantId\": ten_id.value}}\n", + "ti = {\"VirusTotal\":{\"Args\": {\"AuthKey\" : vt_key.value}, \"Primary\" : True, \"Provider\": \"VirusTotal\"}}\n", + "other_prov = {\"GeoIPLite\" : {\"Args\" : {\"AuthKey\" : mm_key.value, \"DBFolder\" : \"~/msticpy\"}, \"Provider\" : \"GeoLiteLookup\"}}\n", + "data['AzureSentinel']['Workspaces'] = workspace\n", + "data['TIProviders'] = ti\n", + "data['OtherProviders'] = other_prov\n", + "\n", + "with open(\"msticpyconfig.yaml\", 'w') as config:\n", + " yaml.dump(data, config)\n", + " \n", + "print(\"msticpyconfig.yaml updated\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We can now validate our configuration is correct." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from msticpy.common.pkg_config import refresh_config, validate_config\n", + "refresh_config()\n", + "validate_config()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "> **Note** you may see warnings for missing providers when running this cell.\n", + "> This is not an issue as we will not be using all providers in this notebook\n", + "> so long as you get thie message \"No errors found.\" you are OK to proceed.\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "---\n", + "## Getting Data\n", + "Now that we have configured the details necessary to connect to Azure Sentinel we can go ahead and get some data. We will do this with `QueryProvider()` from MSTICpy. \n", + "You can use the `QueryProvider` class to connect to different data sources such as MDATP, the Security Graph API, and the one we will use here, Azure Sentinel. \n", + "\n", + "### Learn more:\n", + " - More details on configuring and using QueryProviders can be found in the [MSTICpy Documentation](https://msticpy.readthedocs.io/en/latest/data_acquisition/DataProviders.html#instantiating-a-query-provider).\n", + "

" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "For now, we are going to set up a QueryProvider for Azure Sentinel, pass it the details for our workspace that we just stored in the msticpyconfig file, and connect. The connection process will ask us to authenticate to our Azure Sentinel workspace via [device authorization](https://docs.microsoft.com/en-us/azure/active-directory/develop/v2-oauth2-device-code) with our Azure credentials. You can do this by clicking the device login code button that appears as the output of the next cell, or by navigating to https://microsoft.com/devicelogin and manually entering the code. Note that this authentication persists with the kernel you are using with the notebook, so if you restart the kernel you will need to re-authenticate.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Initalize a QueryProvider for Azure Sentinel\n", + "qry_prov = QueryProvider(\"LogAnalytics\")\n", + "\n", + "# Get the Azure Sentinel workspace details from msticpyconfig\n", + "try:\n", + " ws_config = WorkspaceConfig()\n", + " md(\"Workspace details collected from config file\")\n", + "except:\n", + " raise(\"No workspace settings are configured, please run the cells above to configure these.\")\n", + " \n", + "# Connect to Azure Sentinel with our QueryProvider and config details\n", + "# ws_config.code_connect_str is a feature of MSTICpy that creates the required connection string from details in our msticpyconfig\n", + "qry_prov.connect(connection_str=ws_config.code_connect_str)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Now that we have connected we can query Azure Sentinel for data, but before we do that we need to understand what data is avalaible to query. The QueryProvider object provides a way to get a list of tables as well as tables and table columns:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Get list of tables in our Workspace\n", + "display(qry_prov.schema_tables [:5]) # We are outputting only the first 5 tables for brevity\n", + "# Get list of tables and thier columns\n", + "qry_prov.schema['SigninLogs'] # We are only displaying the columns for SigninLogs for brevity" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "MSTICpy includes a number of built in queries that you can run.
\n", + "You can list available queries with .list_queries() and get specific details about a query by calling it with \"?\" as a parameter" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Get a list of avaliable queries\n", + "qry_prov.list_queries()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Get details about a query\n", + "qry_prov.Azure.list_all_signins_geo(\"?\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "You can then run the query by calling it with the required parameters:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from datetime import datetime, timedelta\n", + "# set our query end time as now\n", + "end = datetime.now()\n", + "# set our query start time as 1 hour ago\n", + "start = end - timedelta(hours=1)\n", + "# run query with specified start and end times\n", + "logons_df = qry_prov.Azure.list_all_signins_geo(start=start, end=end)\n", + "# display first 5 rows of any results\n", + "logons_df.head() # If you have no data you will just see the column headings displayed" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Another way to run queries is to pass a string format of a KQL query to the query provider, this will run the query against the workspace connected to above, and will return the data in a [Pandas DataFrame](https://pandas.pydata.org/). We will look at working with Pandas in a bit more detail later." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "ExecuteTime": { + "end_time": "2020-06-26T19:27:44.779558Z", + "start_time": "2020-06-26T19:27:44.569079Z" + } + }, + "outputs": [], + "source": [ + "# Define our query\n", + "test_query = \"\"\"\n", + "SigninLogs\n", + "| where TimeGenerated > ago(7d)\n", + "| take 10\n", + "\"\"\"\n", + "\n", + "# Pass that query to our QueryProvider\n", + "test_df = qry_prov.exec_query(test_query)\n", + "\n", + "# Check that we have some data\n", + "if isinstance(test_df, pd.DataFrame) and not test_df.empty:\n", + " # .head() returns the first 5 rows of our results DataFrame\n", + " display(test_df.head())\n", + "# If where is no data load some sample data to use instead\n", + "else:\n", + " md(\"You don't appear to have any SigninLogs - we will load sample data for you to use.\")\n", + " qry_prov = QueryProvider(\"LocalData\", data_paths=[\"nbdemo/data/\"], query_paths=[\"nbdemo/data/\"])\n", + " logons_df = qry_prov.Azure.list_all_signins_geo()\n", + " display(logons_df.head())" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Learn more:\n", + " - You can learn more about the MSTICpy pre-defined queries in the [MSTICpy Documentation](https://msticpy.readthedocs.io/en/latest/data_acquisition/DataProviders.html#running-an-pre-defined-query)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "---\n", + "## Pandas\n", + "Our query results are returned in the form of a Pandas DataFrame. DataFrames are a core component of the Azure Sentinel notebooks and of MSTICpy and is used for both input and output formats.\n", + "Pandas DataFrames are incredibly versitile data structures with a lot of useful features, we will cover a small number of them here and we recommend that you check out the Learn more section to learn more about Pandas features.\n", + "
\n", + "
\n", + "### Displaying a DataFrame:\n", + "The first thing we want to do is display our DataFrame. You can either just run it or explicity display it by calling `display(df)`." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# For this section we are going to create a DataFrame from data we have saved in a csv file\n", + "df = pd.read_csv(\"https://raw.githubusercontent.com/microsoft/msticpy/master/tests/testdata/host_logons.csv\", index_col=[0] )\n", + "# Display our DataFrame\n", + "df # or display(df)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "> **Note** if the dataframe variable (`df` in the example above) is the last statement in a \n", + "> code cell, Jupyter will automatically display it without using the `display()` function. \n", + "> However, if you want to display a DataFrame in the middle of \n", + "> other code in a cell you must use the `display()` function.\n", + "\n", + "You may not want to display the whole DataFrame and instead display only a selection of items. There are numerous ways to do this and the cell below shows some of the most widely used functions." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "md(\"Display the first 2 rows using head(): \", \"bold\")\n", + "display(df.head(2)) # we don't need to call display here but just for illustration" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "md(\"Display the 3rd row using iloc[]: \", \"bold\")\n", + "df.iloc[3]" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "md(\"Show the column names in the DataFrame \", \"bold\")\n", + "df.columns" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "md(\"Display just the TimeGenerated and TenantId columnns: \", \"bold\")\n", + "df[[\"TimeGenerated\", \"TenantId\"]]" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We can also choose to select a subsection of our DataFrame based on the contents of the DataFrame:\n", + "\n", + "> **Tip**: the syntax in these examples is using a technique called *boolean indexing*. \n", + ">
`df[]`\n", + "> returns all rows in the dataframe where the boolean expression is True\n", + ">
In the first example we telling pandas to return all rows where the column value of\n", + "> 'TargetUserName' matches 'MSTICAdmin'" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "md(\"Display only rows where TargetUserName value is 'MSTICAdmin': \", \"bold\")\n", + "df[df['TargetUserName']==\"MSTICAdmin\"]" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "md(\"Display rows where TargetUserName is either MSTICAdmin or adm1nistratror:\", \"bold\")\n", + "display(df[df['TargetUserName'].isin(['adm1nistrator', 'MSTICAdmin'])])" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Our DataFrame call also be extended to add new columns with additional data if reqired:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "df[\"NewCol\"] = \"Look at my new data!\"\n", + "display(df[[\"TenantId\",\"Account\", \"TimeGenerated\", \"NewCol\"]].head(2))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Learn more:\n", + "There is a lot more you can do with Pandas, the links below provide some useful resources:\n", + " - [Getting starting with Pandas](https://pandas.pydata.org/pandas-docs/stable/getting_started/index.html)\n", + " - [Infosec Jupyerbook intro to Pandas](https://infosecjupyterbook.com/notebooks/tutorials/03_intro_to_pandas.html)\n", + " - [A great list of Pandas hints and tricks](https://www.dataschool.io/python-pandas-tips-and-tricks/)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "---\n", + "## Enriching data\n", + "\n", + "Now that we have seen how to query for data, and do some basic manipulation we can look at enriching this data with additional data sources. For this we are going to use an external threat intelligence provider to give us some more details about an IP address we have in our dataset using the [MSTICpy TIProvider](https://msticpy.readthedocs.io/en/latest/data_acquisition/TIProviders.html) feature." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from datetime import datetime, timedelta\n", + "# Check if we have logon data already and if not get some\n", + "if not isinstance(logons_df, pd.DataFrame) or logons_df.empty:\n", + " # set our query end time as now\n", + " end = datetime.now()\n", + " # set our query start time as 1 hour ago\n", + " start = end - timedelta(days=1)\n", + " # run query with specified start and end times\n", + " logons_df = qry_prov.Azure.list_all_signins_geo(start=start, end=end)\n", + " \n", + "# Create our TI provider\n", + "ti = TILookup()\n", + "# Get the first logon IP address from our dataset\n", + "ip = logons_df.iloc[1]['IPAddress']\n", + "# Look up the IP in VirusTotal\n", + "ti_resp = ti.lookup_ioc(ip, providers=[\"VirusTotal\"])\n", + "\n", + "# Format our results as a DataFrame\n", + "ti_resp = ti.result_to_df(ti_resp)\n", + "display(ti_resp)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Using the [Pandas apply()](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.apply.html) feature we can get results for all the IP addresses in our data set and add the lookup severity score as a new column in our DataFrame for easier reference." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Take the IP address in each row, look it up against TI and return the seveirty score\n", + "def lookup_res(row):\n", + " ip = row['IPAddress']\n", + " resp = ti.lookup_ioc(ip, providers=[\"VirusTotal\"])\n", + " resp = ti.result_to_df(resp)\n", + " return resp[\"Severity\"].iloc[0]\n", + "\n", + "# Take the first 3 rows of data and copy they into a new DataFrame\n", + "enrich_logons_df = logons_df.iloc[:3].copy()\n", + "# Create a new column called TIRisk and populate that with the TI severity score of the IP Address in that row\n", + "enrich_logons_df['TIRisk'] = enrich_logons_df.apply(lookup_res, axis=1)\n", + "# Display a subset of columns from our DataFrame\n", + "enrich_logons_df[[\"TimeGenerated\", \"ResultType\", \"UserPrincipalName\", \"IPAddress\", \"TIRisk\"]]" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Learn more:\n", + "MSTICpy includes further threat intelligence capabilities as well as other data enrichment options. More details on these can be found in the [documentation](https://msticpy.readthedocs.io/en/latest/DataEnrichment.html)." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "---\n", + "## Analyzing data\n", + "With the data we have collected we may wish to perform some analysis on it in order to better understand it. MSTICpy includes a number of features to help with this, and there are a vast array of other data analysis capabilities available via Python ranging from simple processes to complex ML models. We will start here by keeping it simple and look at how we can decode some Base64 encoded command line strings we have in order to allow us to understand their content." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from msticpy.sectools import base64unpack as b64\n", + "# Take our encoded Powershell Command\n", + "b64_cmd = \"powershell.exe -encodedCommand SW52b2tlLVdlYlJlcXVlc3QgaHR0cHM6Ly9jb250b3NvLmNvbS9tYWx3YXJlIC1PdXRGaWxlIEM6XG1hbHdhcmUuZXhl\"\n", + "# Unpack the Base64 encoded elements\n", + "unpack_txt = b64.unpack(input_string=b64_cmd)\n", + "# Display our results and transform for easier reading\n", + "unpack_txt[1].T" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We can also use MSTICpy to extract Indicators of Compromise (IoCs) from a dataset, this makes it easy to extract and match on a set of IoCs within our data. In the example below we take a US Cybersecurity & Infrastructure Security Agency (CISA) report and extract all domains listed in the report:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import requests\n", + "# Set up our IoCExtract oject\n", + "ioc_extractor = iocextract.IoCExtract()\n", + "# Download our threat report\n", + "data = requests.get(\"https://www.us-cert.gov/sites/default/files/publications/AA20-099A_WHITE.stix.xml\")\n", + "# Extract domains listed in our report\n", + "iocs = ioc_extractor.extract(data.text, ioc_types=\"dns\")['dns']\n", + "# Display the first 5 iocs found in our report\n", + "list(iocs)[:5]" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Learn more:\n", + "There are a wide range of options when it comes to data analysis in notebooks using Python. Here are some useful resources to get you started:\n", + " - [MSITCpy DataAnalysis documentation](https://msticpy.readthedocs.io/en/latest/DataAnalysis.html)\n", + " - Scikit-Learn is a popular Python ML data analysis library, which has a useful [tutorial](https://scikit-learn.org/stable/tutorial/basic/tutorial.html)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "---\n", + "## Visualizing data\n", + "Visualizing data can provide an excellent way to analyse data, identify patterns and anomalies. Python has a wide range of data visualization capabilities each of which have thier own benefits and drawbacks. We will look at some basic capabilities as well as the in-build visualizations in MSTICpy.\n", + "


\n", + "**Basic Graphs**
\n", + "Pandas and Matplotlib provide the easiest and simplest way to produce simple plots of data:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "vis_q = \"\"\"\n", + "SigninLogs\n", + "| where TimeGenerated > ago(7d)\n", + "| sample 5\"\"\"\n", + "\n", + "# Try and query for data but if using sample data load that instead\n", + "try:\n", + " vis_data = qry_prov.exec_query(vis_q)\n", + "except FileNotFoundError:\n", + " vis_data = logons_df\n", + "\n", + "# Check we have some data in our results and if not use previously used dataset\n", + "if not isinstance(vis_data, pd.DataFrame) or vis_data.empty:\n", + " vis_data = logons_df\n", + "\n", + "# Plot up to the first 5 IP addresses\n", + "vis_data.head()['IPAddress'].value_counts().plot.bar(title=\"IP prevelence\", legend=False)\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "pie_df = vis_data.copy()\n", + " # If we have lots of data just plot the first 5 rows\n", + "pie_df.head()['IPAddress'].value_counts().plot.pie(legend=True)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "[Bokeh](https://bokeh.org/) is a powerful visualization library that allows you to create complex, interactive visualizations. MSTICpy includes a number of pre-built visualizations using Bokeh including a timeline feature that can be used to represent events over time. You can interact with the timeline by zooming and panning, using the range selector, as well as hovering over data points to see more details." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from datetime import datetime, timedelta\n", + "# Check if we have logon data already and if not get some\n", + "if not isinstance(logons_df, pd.DataFrame) or logons_df.empty:\n", + " # set our query end time as now\n", + " end = datetime.now()\n", + " # set our query start time as 1 hour ago\n", + " start = end - timedelta(days=1)\n", + " # run query with specified start and end times\n", + " logons_df = qry_prov.Azure.list_all_signins_geo(start=start, end=end)\n", + " \n", + "display(timeline.display_timeline(logons_df.head(10), source_columns=[\"TimeGenerated\", \"ResultType\", \"UserPrincipalName\", \"IPAddress\"], group_by=\"AppDisplayName\"))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "MSTICpy also includes a feature to allow you to map locations, this can be particularily useful when looking at the distribution of remote network connections or other events. Below we plot the locations of remote logons observed in our Azure AD data." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from msticpy.sectools.ip_utils import convert_to_ip_entities\n", + "from msticpy.nbtools.foliummap import FoliumMap, get_map_center\n", + "\n", + "# Convert our IP addresses in string format into an ip address entity\n", + "ip_entity = entityschema.IpAddress()\n", + "ip_list = [convert_to_ip_entities(i)[0] for i in logons_df['IPAddress'].head(10)]\n", + " \n", + "# Get center location of all IP locaitons to center the map on\n", + "location = get_map_center(ip_list)\n", + "logon_map = FoliumMap(location=location, zoom_start=4)\n", + "\n", + "# Add location markers to our map and dsiplay it\n", + "if len(ip_list) > 0:\n", + " logon_map.add_ip_cluster(ip_entities=ip_list)\n", + "display(logon_map.folium_map)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Learn more:\n", + " - The [Infosec Jupyterbook](https://infosecjupyterbook.com/) includes a section on data visualization.\n", + " - [Bokeh Library Documentation](https://bokeh.org/)\n", + " - [Matplotlib tutorial](https://matplotlib.org/3.2.0/tutorials/index.html)\n", + " - [Seaborn visualization library tutorial](https://seaborn.pydata.org/tutorial.html)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "---\n", + "## Conclusion\n", + "This notebook has showed you the basics of using notebooks and Azure Sentinel for security investigaitons. There are many more things possible using notebooks and it is stronly encouraged to read the material we have referenced in the learn more sections in this notebook. You can also explore the other Azure Sentinel notebooks in order to take advantage of the pre-built hunting logic, and understand other analysis techniques that are possible.
\n", + "### Appendix:\n", + " - [Jupyter Notebooks: An Introduction](https://realpython.com/jupyter-notebook-introduction/)\n", + " - [Threat Hunting in the cloud with Azure Notebooks](https://medium.com/@maarten.goet/threat-hunting-in-the-cloud-with-azure-notebooks-supercharge-your-hunting-skills-using-jupyter-8d69218e7ca0)\n", + " - [MSTICpy documentation](https://msticpy.readthedocs.io/)\n", + " - [Azure Sentinel Notebooks documentation](https://docs.microsoft.com/en-us/azure/sentinel/notebooks)\n", + " - [The Infosec Jupyterbook](https://infosecjupyterbook.com/introduction.html)\n", + " - [Linux Host Explorer Notebook walkthrough](https://techcommunity.microsoft.com/t5/azure-sentinel/explorer-notebook-series-the-linux-host-explorer/ba-p/1138273)\n", + " - [Why use Jupyter for Security Investigations](https://techcommunity.microsoft.com/t5/azure-sentinel/why-use-jupyter-for-security-investigations/ba-p/475729)\n", + " - [Security Investigtions with Azure Sentinel & Notebooks](https://techcommunity.microsoft.com/t5/azure-sentinel/security-investigation-with-azure-sentinel-and-jupyter-notebooks/ba-p/432921)\n", + " - [Pandas Documentation](https://pandas.pydata.org/pandas-docs/stable/user_guide/index.html)\n", + " - [Bokeh Documentation](https://docs.bokeh.org/en/latest/)" + ] + } + ], + "metadata": { + "hide_input": false, + "kernelspec": { + "display_name": "Python 3.6", + "language": "python", + "name": "python36" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.7" + }, + "toc": { + "base_numbering": 1, + "nav_menu": {}, + "number_sections": true, + "sideBar": true, + "skip_h1_title": false, + "title_cell": "Table of Contents", + "title_sidebar": "Contents", + "toc_cell": false, + "toc_position": {}, + "toc_section_display": true, + "toc_window_display": false + }, + "varInspector": { + "cols": { + "lenName": 16, + "lenType": 16, + "lenVar": 40 + }, + "kernels_config": { + "python": { + "delete_cmd_postfix": "", + "delete_cmd_prefix": "del ", + "library": "var_list.py", + "varRefreshCmd": "print(var_dic_list())" + }, + "r": { + "delete_cmd_postfix": ") ", + "delete_cmd_prefix": "rm(", + "library": "var_list.r", + "varRefreshCmd": "cat(var_dic_list()) " + } + }, + "types_to_exclude": [ + "module", + "function", + "builtin_function_or_method", + "instance", + "_Feature" + ], + "window_display": false + }, + "widgets": { + "application/vnd.jupyter.widget-state+json": { + "state": {}, + "version_major": 2, + "version_minor": 0 + } + } + }, + "nbformat": 4, + "nbformat_minor": 4 +} diff --git a/ConfiguringNotebookEnvironment.ipynb b/ConfiguringNotebookEnvironment.ipynb index e004a45f..85963bf7 100644 --- a/ConfiguringNotebookEnvironment.ipynb +++ b/ConfiguringNotebookEnvironment.ipynb @@ -30,7 +30,9 @@ "### Creating a virtual environment\n", "If you are running these notebooks locally, it is a good idea to create a clean virtual python environment, before installing any of the packages . This will prevent installed packages conflicting with versions that you may need for other applications.\n", "\n", - "For standard python use the `virtualenv` command. For Conda use the `conda env` command. In both cases be sure to activate the environment before running jupyter using `activate {my_env_name}`.\n", + "For standard python use the [`venv`](https://docs.python.org/3/library/venv.html?highlight=venv) command. \n", + "For Conda use the [`conda env`](https://docs.conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html) command. \n", + "In both cases be sure to activate the environment before running jupyter using `activate {my_env_name}` or `conda activate {my_env_name}`.\n", "\n", "\n", "### Using Requirements.txt\n", @@ -49,12 +51,7 @@ { "cell_type": "code", "execution_count": null, - "metadata": { - "ExecuteTime": { - "end_time": "2019-10-31T00:47:41.219073Z", - "start_time": "2019-10-31T00:47:41.213073Z" - } - }, + "metadata": {}, "outputs": [], "source": [ "# Run this cell to view requirements.txt\n", @@ -85,7 +82,10 @@ "pip install pkg_name --user --upgrade\n", "```\n", "\n", - "This will avoid permission errors by installing into your user folder." + "This will avoid permission errors by installing into your user folder.\n", + "\n", + "> **Note**: the use of the `--user` option is usually not required in a Conda environment \n", + "> since the Python site packages are normally already installed in a per-user folder." ] }, { @@ -93,57 +93,69 @@ "metadata": {}, "source": [ "### Install Packages from this Notebook\n", - "The first time this cell runs for a new Azure Notebooks project or other Python environment it will take several minutes to download and install the packages. In subsequent runs it should run quickly and confirm that package dependencies are already installed. Unless you want to upgrade the packages you can feel free to skip execution of the next cell.\n", - "\n", - "If you see any import failures (```ImportError```) in the notebooks, please re-run this notebook and answer 'y' when prompted, then re-run the cell where the import failure occurred.\n", - "\n", - "Note you may see some warnings about incompatibility with certain packages. This should not affect the functionality of this notebook but you may need to upgrade the packages producing the warnings to a more recent version." + "The first time this cell runs for a new Azure ML or Azure Notebooks notebook or other Python environment it will do the following things:\n", + "1. Check the kernel version to ensure that a Python 3.6 or later kernel is running\n", + "2. Check the msticpy version - if this is not installed or the version installed is less than the required version (in `REQ_MSTICPY_VER`)\n", + " it will attempt to install a new version (you will be prompted whether you want to do this)\n", + " The install can take several minutes depending on the versions of packages that you already have installed.\n", + " \n", + " > **Note:** These two steps are run from a local python module - this is available in the Azure-Sentinel-Notebooks repo.\n", + " > If you do not have this locally, download it from [here](https://github.com/Azure/Azure-Sentinel-Notebooks/blob/master/utils/nb_check.py) and\n", + " > put a copy in a `utils` subfolder of your current directory.\n", + " \n", + "3. Once *msticpy* is installed and imported, the `init_notebook` function is run. This:\n", + " - imports common modules used in the notebook\n", + " - installs additional packages\n", + " - sets some global options\n", + " \n", + "> **Note:** In subsequent runs, this cell shoud run quickly since you will already have the required packages installed.\n", + "\n", + "\n", + "> **Warning:** you may see some warnings about incompatibility with certain packages. This should not affect the functionality of this notebook but you may need to upgrade the packages producing the warnings to a more recent version." ] }, { "cell_type": "code", "execution_count": null, - "metadata": { - "scrolled": true - }, + "metadata": {}, "outputs": [], "source": [ + "from pathlib import Path\n", + "import os\n", "import sys\n", "import warnings\n", - "\n", - "warnings.filterwarnings(\"ignore\",category=DeprecationWarning)\n", - "\n", - "MIN_REQ_PYTHON = (3,6)\n", - "if sys.version_info < MIN_REQ_PYTHON:\n", - " print('Check the Kernel->Change Kernel menu and ensure that Python 3.6')\n", - " print('or later is selected as the active kernel.')\n", - " sys.exit(\"Python %s.%s or later is required.\\n\" % MIN_REQ_PYTHON)\n", - "\n", - "# Package Installs - try to avoid if they are already installed\n", - "try:\n", - " import msticpy.sectools as sectools\n", - " import Kqlmagic\n", - " from dns import reversename, resolver\n", - " from ipwhois import IPWhois\n", - " import folium\n", - " \n", - " print('If you answer \"n\" this cell will exit with an error in order to avoid the pip install calls,')\n", - " print('This error can safely be ignored.')\n", - " resp = input('msticpy and Kqlmagic packages are already loaded. Do you want to re-install? (y/n)')\n", - " if resp.strip().lower() != 'y':\n", - " sys.exit('pip install aborted - you may skip this error and continue.')\n", - " else:\n", - " print('After installation has completed, restart the current kernel and run '\n", - " 'the notebook again skipping this cell.')\n", - "except ImportError:\n", - " pass\n", - "\n", - "print('\\nPlease wait. Installing required packages. This may take a few minutes...')\n", - "!pip install --user -r requirements.txt \n", - "\n", - "print('To ensure that the latest versions of the installed libraries '\n", - " 'are used, please restart the current kernel and run '\n", - " 'the notebook again skipping this cell.')" + "from IPython.display import display, HTML, Markdown\n", + "\n", + "REQ_PYTHON_VER=(3, 6)\n", + "REQ_MSTICPY_VER=(0, 6, 0)\n", + "\n", + "display(HTML(\"

Starting Notebook setup...

\"))\n", + "if Path(\"./utils/nb_check.py\").is_file():\n", + " from utils.nb_check import check_python_ver, check_mp_ver\n", + " check_python_ver(min_py_ver=REQ_PYTHON_VER)\n", + " try:\n", + " check_mp_ver(min_msticpy_ver=REQ_MSTICPY_VER)\n", + " except ImportError:\n", + " !pip install --upgrade msticpy\n", + " if \"msticpy\" in sys.modules:\n", + " importlib.reload(sys.modules[\"msticpy\"])\n", + " else:\n", + " import msticpy\n", + " check_mp_ver(REQ_MSTICPY_VER)\n", + " \n", + "extra_imports = [\n", + " \"msticpy.nbtools, observationlist\",\n", + " \"msticpy.sectools, domain_utils\",\n", + " \"pyvis.network, Network\",\n", + "]\n", + "# If not using Azure Notebooks, install msticpy with\n", + "# !pip install msticpy\n", + "from msticpy.nbtools import nbinit\n", + "nbinit.init_notebook(\n", + " namespace=globals(),\n", + " extra_imports=[\"ipwhois, IPWhois\"],\n", + " additional_packages=[\"pyvis\", \"python-whois\"],\n", + ");" ] }, { @@ -159,6 +171,17 @@ "metadata": {}, "source": [ "## Creating your `config.json`\n", + "When you start a notebook from Azure Sentinel for the first time it will create a `config.json` file in\n", + "your notebooks folder. This should be populated with your workspace and tenant IDs needed to \n", + "authenticate to Azure Sentinel.\n", + "\n", + "If you are using notebooks in a different environment you may need to create a `config.json` or `msticpyconfig.yaml` (see below)\n", + "to supply this information to your notebook.\n", + "\n", + "Form more information see this [msticpy Package Configuration](https://msticpy.readthedocs.io/en/latest/getting_started/msticpyconfig.html)\n", + "\n", + "---\n", + "\n", "If you need to create or modify your config.json you can run the following cell.\n", "\n", "You will need the subscription and workspace IDs for your Azure Sentinel Workspace. These can be found here in the Azure Sentinel portal as shown below.\n", @@ -172,13 +195,7 @@ { "cell_type": "code", "execution_count": null, - "metadata": { - "ExecuteTime": { - "end_time": "2019-10-31T00:51:46.650354Z", - "start_time": "2019-10-31T00:51:46.611399Z" - }, - "scrolled": true - }, + "metadata": {}, "outputs": [], "source": [ "import requests\n", @@ -260,27 +277,39 @@ "metadata": {}, "source": [ "## `msticpyconfig.yaml` Configuration File\n", - "Before you can use the msticpy TILookup class you need to configure your TI provider settings.\n", "\n", - "You can configure primary and secondary TI providers and any required parameters in the `msticpyconfig.yaml` file. This file is read from the current directory or you can set an environment variable (`MSTICPYCONFIG`) pointing to its location.\n", - "For more details on msticpy configuration see the [msticpy documentation](https://msticpy.readthedocs.io/en/latest/msticpyconfig.html).\n", + "`config.json` provides some basic configuration for connecting to your Azure Sentinel workspace. \n", + "However, there are many features that require additional configuration information. Some examples are:\n", + "- Threat Intelligence Provider connection information\n", + "- GeoIP connection information\n", + "- Keyvault configuration for storing secrets remotely\n", + "- MDATP and Azure API connection information.\n", + "- Connection information for multiple Azure Sentinel workspaces.\n", + "\n", + "Settings for these are stored in the `msticpyconfig.yaml` file. This file is read from the current directory or you can set an environment variable (`MSTICPYCONFIG`) pointing to its location.\n", + "Form more information about *msticpy* configuration see [msticpy Package Configuration](https://msticpy.readthedocs.io/en/latest/getting_started/msticpyconfig.html).\n", + "\n", + "The most commonly-used sections are described below.\n", + "\n", "\n", "### Threat Intelligence Provider Setup\n", "For more information on the msticpy Threat Intel lookup class see the [documentation here](https://msticpy.readthedocs.io/en/latest/TIProviders.html).\n", "\n", "Primary providers are used by default. Secondary providers are not run by default but can be invoked by using the `providers` parameter to `lookup_ioc()` or `lookup_iocs()`. Set the `Primary` config setting to `True` or `False` for each provider ID according to how you want to use them. The `providers` parameter should be a list of strings identifying the provider(s) to use. \n", "\n", - "The provider ID is given by the `Provider:` setting for each of the TI providers - do not alter this value.\n", - "\n", - "Delete or comment out the section for any TI Providers that you do not wish to use.\n", + "- The provider ID is given by the `Provider:` setting for each of the TI providers - do not alter this value.\n", + "- Delete or comment out the section for any TI Providers that you do not wish to use.\n", + "- For most providers you will usually need to supply an authorization (API) key and in some cases a user ID for each provider.\n", + "- For the Azure Sentinel TI provider, you will need the workspace ID and tenant ID and will need to authenticate in order \n", + " to access the data (although if you have an existing authenticated connection with the same workspace/tenant, this connection will be re-used).\n", "\n", - "For most providers you will usually need to supply an authorization (API) key and in some cases a user ID for each provider.\n", + "If you need to create a new msticpyconfig.yaml file, run the \"Create a new mstipyconfig.yaml\" cell below.\n", "\n", - "For the Azure Sentinel TI provider, you will need the workspace ID and tenant ID and will need to authenticate in order to access the data (although if you have an existing authenticated connection with the same workspace/tenant, this connection will be re-used).\n", + "**Warning** - this will overwrite a file of the same name in the current directory\n", "\n", - "If you need to create a config file, run the \"Create a new mstipyconfig.yaml\" cell below.\n", - "\n", - "**Warning** - this will overwrite a file of the same name in the current directory\n", + "### GeoIP Providers\n", + "Like the TI providers these services normally need an API key to access. You can read more about configuration\n", + "the supported providers here. [msticpy GeoIP Providers](https://msticpy.readthedocs.io/en/latest/data_acquisition/GeoIPLookups.html)\n", "\n", "### Browshot Setup\n", "The functionality to screenshot a URL in msticpy.sectools.domain_utils relies on a service called BrowShot (https://browshot.com/). An API key is required to use this service and it needs to be defined in the `msticpyconfig` file as well. As this is not a threat intelligence provider it doesn't not fall under the `TIProviders` section of `msticpyconfig` but instead sits alone. See the cell below for example configuration." @@ -296,12 +325,7 @@ { "cell_type": "code", "execution_count": null, - "metadata": { - "ExecuteTime": { - "end_time": "2020-02-27T02:03:13.536170Z", - "start_time": "2020-02-27T02:03:13.530188Z" - } - }, + "metadata": {}, "outputs": [], "source": [ "%pfile msticpyconfig.yaml\n" @@ -312,18 +336,30 @@ "metadata": {}, "source": [ "### Create a new `msticpyconfig.yaml`\n", - "If you need to create a msticpyconfig from scratch, edit the cell below uncommenting the sections you need and adding the correct values.\n", + "If you need to create a msticpyconfig from scratch, edit the cell below:\n", + "1. Uncommenting the first line to enable the %%writefile magic instruction\n", + "2. Edit sections you need and adding the correct values.\n", + "3. Delete the other sections or leave commented out\n", "\n", + "\n", + "Guidelines:\n", "- Usually you will only need the `default` workspace in the `AzureSentinel/Workspaces` section\n", "- You can add TI Provider auth/API keys to the relevant sections (either as text or\n", - " stored in an environment variable)\n", + " stored in an environment variable - the OTX entry shows the former and the XForce entry\n", + " shows an example of the latter syntax)\n", "- Delete the providers/sections that you do not need.\n", "- Usually one or more TI providers and one GeoIP provider will be needed for most notebooks\n", - "\n", - ">

** WARNING **

\n", - "> Executing the following cell will overwrite the contents of the cell to any existing\n", - "> `msticpyconfig.yaml`.
\n", - ">

Do not run this cell if you have existing configuration settings

" + "- For a single string (e.g. API key) it does not matter whether the string is quoted or not\n", + "- For TI Providers, setting `Primary: True` means that the provider will be used in the\n", + " default set of providers every time you do a TI Lookup.\n", + "\n", + ">

** WARNING **

\n", + "> Executing the following cell will write the contents of the cell to any existing\n", + "> `msticpyconfig.yaml` in the current folder, overwriting any settings that you have\n", + "> in there.\n", + "> If you have a current msticpyconfig.yaml, go to the next cell to read in this file\n", + "> and edit this with your settings.
\n", + ">

Do not run this cell if you have existing configuration settings!

" ] }, { @@ -332,7 +368,7 @@ "metadata": {}, "outputs": [], "source": [ - "%%writefile msticpyconfig.yaml\n", + "#%%writefile msticpyconfig.yaml\n", "\n", "AzureSentinel:\n", " #Workspaces:\n", @@ -424,12 +460,7 @@ { "cell_type": "code", "execution_count": null, - "metadata": { - "ExecuteTime": { - "end_time": "2020-02-27T02:06:09.736798Z", - "start_time": "2020-02-27T02:06:09.732799Z" - } - }, + "metadata": {}, "outputs": [], "source": [ "%load path/to/your/msticpyconfig.yaml" @@ -438,8 +469,8 @@ ], "metadata": { "hide_input": false, + "history": [], "kernelspec": { - "display_name": "Python 3.6", "language": "python", "name": "python36" }, @@ -453,7 +484,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.6.7" + "version": "3.6.10" }, "toc": { "base_numbering": 1, @@ -468,6 +499,7 @@ "toc_section_display": true, "toc_window_display": true }, + "uuid": "75a9aa0a-6ec9-4b1a-a5f0-fc14ed6f8fab", "varInspector": { "cols": { "lenName": 16, @@ -496,13 +528,6 @@ "_Feature" ], "window_display": false - }, - "widgets": { - "application/vnd.jupyter.widget-state+json": { - "state": {}, - "version_major": 2, - "version_minor": 0 - } } }, "nbformat": 4, diff --git a/Entity Explorer - Account.ipynb b/Entity Explorer - Account.ipynb index 1c4bf399..7658ed17 100644 --- a/Entity Explorer - Account.ipynb +++ b/Entity Explorer - Account.ipynb @@ -33,7 +33,7 @@ }, "source": [ "

Contents

\n", - "" + "" ] }, { @@ -73,12 +73,7 @@ { "cell_type": "code", "execution_count": null, - "metadata": { - "ExecuteTime": { - "end_time": "2020-02-28T21:13:24.369073Z", - "start_time": "2020-02-28T21:13:24.260137Z" - } - }, + "metadata": {}, "outputs": [], "source": [ "from pathlib import Path\n", @@ -93,7 +88,6 @@ "display(HTML(\"

Starting Notebook setup...

\"))\n", "if Path(\"./utils/nb_check.py\").is_file():\n", " from utils.nb_check import check_python_ver, check_mp_ver\n", - "\n", " check_python_ver(min_py_ver=REQ_PYTHON_VER)\n", " try:\n", " check_mp_ver(min_msticpy_ver=REQ_MSTICPY_VER)\n", @@ -105,6 +99,9 @@ " import msticpy\n", " check_mp_ver(REQ_MSTICPY_VER)\n", " \n", + "\n", + "# If not using Azure Notebooks, install msticpy with\n", + "# !pip install msticpy\n", "from msticpy.nbtools import nbinit\n", "nbinit.init_notebook(\n", " namespace=globals(),\n", @@ -143,12 +140,7 @@ { "cell_type": "code", "execution_count": null, - "metadata": { - "ExecuteTime": { - "end_time": "2019-10-31T21:20:45.697714Z", - "start_time": "2019-10-31T21:20:27.173805Z" - } - }, + "metadata": {}, "outputs": [], "source": [ "# Authentication\n", @@ -200,12 +192,7 @@ { "cell_type": "code", "execution_count": null, - "metadata": { - "ExecuteTime": { - "end_time": "2019-10-31T21:20:50.717317Z", - "start_time": "2019-10-31T21:20:50.710320Z" - } - }, + "metadata": {}, "outputs": [], "source": [ "WIDGET_DEFAULTS = {\n", @@ -219,12 +206,7 @@ { "cell_type": "code", "execution_count": null, - "metadata": { - "ExecuteTime": { - "end_time": "2019-10-31T21:20:55.324557Z", - "start_time": "2019-10-31T21:20:55.285577Z" - } - }, + "metadata": {}, "outputs": [], "source": [ "query_times = nbwidgets.QueryTime(units='day', max_before=200, before=5, max_after=7)\n", @@ -234,12 +216,7 @@ { "cell_type": "code", "execution_count": null, - "metadata": { - "ExecuteTime": { - "end_time": "2019-10-31T21:21:05.153861Z", - "start_time": "2019-10-31T21:21:05.149864Z" - } - }, + "metadata": {}, "outputs": [], "source": [ "# Set up function to allow easy reference to common parameters for queries\n", @@ -263,12 +240,7 @@ { "cell_type": "code", "execution_count": null, - "metadata": { - "ExecuteTime": { - "end_time": "2019-10-31T21:21:15.909785Z", - "start_time": "2019-10-31T21:21:07.674690Z" - } - }, + "metadata": {}, "outputs": [], "source": [ "# KQL query for full text search of IP address and display all datatypes \n", @@ -277,8 +249,7 @@ "| where TimeGenerated >= datetime({start}) and TimeGenerated <= datetime({end})\n", "| summarize RowCount=count() by Table=$table\n", "'''.format(**acct_query_params())\n", - "%kql -query datasource_status\n", - "datasource_status_df = _kql_raw_result_.to_dataframe()\n", + "datasource_status_df = qry_prov.exec_query(datasource_status)\n", "\n", "#Display result as transposed matrix of datatypes availabel to query for the query period \n", "if len(datasource_status_df) > 0:\n", @@ -309,12 +280,7 @@ { "cell_type": "code", "execution_count": null, - "metadata": { - "ExecuteTime": { - "end_time": "2019-10-31T21:21:27.840285Z", - "start_time": "2019-10-31T21:21:20.047711Z" - } - }, + "metadata": {}, "outputs": [], "source": [ "# AAD\n", @@ -406,12 +372,7 @@ { "cell_type": "code", "execution_count": null, - "metadata": { - "ExecuteTime": { - "end_time": "2019-10-31T21:21:39.187410Z", - "start_time": "2019-10-31T21:21:39.119455Z" - } - }, + "metadata": {}, "outputs": [], "source": [ "from collections import namedtuple\n", @@ -474,22 +435,35 @@ "\n", "def display_activity(selected_item):\n", " acct, source = selected_account(selected_item)\n", - " utils.md(f\"{acct} (source: {source})\", \"bold\")\n", + " outputs = []\n", + " title = HTML(f\"{acct} (source: {source})\")\n", + " outputs.append(title)\n", " if source == \"LinuxHostLogon\":\n", - " display(linux_logon_df[linux_logon_df[\"AccountName\"] == acct]\n", - " .sort_values(\"TimeGenerated\", ascending=True))\n", + " outputs.append(\n", + " linux_logon_df[linux_logon_df[\"AccountName\"] == acct]\n", + " .sort_values(\"TimeGenerated\", ascending=True)\n", + " )\n", " if source == \"WindowsHostLogon\":\n", - " display(win_logon_df[win_logon_df[\"TargetUserName\"] == acct]\n", - " .sort_values(\"TimeGenerated\", ascending=True))\n", + " outputs.append(\n", + " win_logon_df[win_logon_df[\"TargetUserName\"] == acct]\n", + " .sort_values(\"TimeGenerated\", ascending=True)\n", + " )\n", " if source == \"AADLogon\":\n", - " display(aad_signin_df[aad_signin_df[\"UserPrincipalName\"] == acct]\n", - " .sort_values(\"TimeGenerated\", ascending=True))\n", + " outputs.append(\n", + " aad_signin_df[aad_signin_df[\"UserPrincipalName\"] == acct]\n", + " .sort_values(\"TimeGenerated\", ascending=True)\n", + " )\n", " if source == \"AzureActivity\":\n", - " display(azure_activity_df[azure_activity_df[\"UserPrincipalName\"] == acct]\n", - " .sort_values(\"TimeGenerated\", ascending=True))\n", + " outputs.append(\n", + " azure_activity_df[azure_activity_df[\"UserPrincipalName\"] == acct]\n", + " .sort_values(\"TimeGenerated\", ascending=True)\n", + " )\n", " if source == \"O365Activity\":\n", - " display(o365_activity_df[o365_activity_df[\"UserId\"] == acct]\n", - " .sort_values(\"TimeGenerated\", ascending=True))\n", + " outputs.append(\n", + " o365_activity_df[o365_activity_df[\"UserId\"] == acct]\n", + " .sort_values(\"TimeGenerated\", ascending=True)\n", + " )\n", + " return outputs\n", "\n", "def selected_account(selected_acct):\n", " if not selected_acct:\n", @@ -508,12 +482,7 @@ }, { "cell_type": "markdown", - "metadata": { - "ExecuteTime": { - "end_time": "2019-09-22T22:43:08.721257Z", - "start_time": "2019-09-22T22:43:08.686275Z" - } - }, + "metadata": {}, "source": [ "## Related Alerts and Hunting Bookmarks\n", "### Alerts\n", @@ -523,12 +492,7 @@ { "cell_type": "code", "execution_count": null, - "metadata": { - "ExecuteTime": { - "end_time": "2019-10-31T21:21:59.287988Z", - "start_time": "2019-10-31T21:21:57.846245Z" - } - }, + "metadata": {}, "outputs": [], "source": [ "account_name, account_source = selected_account(select_acct.value)\n", @@ -564,7 +528,7 @@ "def disp_full_alert(alert):\n", " global related_alert\n", " related_alert = SecurityAlert(alert)\n", - " nbdisplay.display_alert(related_alert, show_entities=True)\n", + " return nbdisplay.format_alert(related_alert, show_entities=True)\n", "\n", "if related_alerts is not None and not related_alerts.empty:\n", " related_alerts[\"CompromisedEntity\"] = related_alerts[\"src_accountname\"]\n", @@ -587,13 +551,7 @@ { "cell_type": "code", "execution_count": null, - "metadata": { - "ExecuteTime": { - "end_time": "2019-10-31T21:22:05.399690Z", - "start_time": "2019-10-31T21:22:04.264621Z" - }, - "scrolled": true - }, + "metadata": {}, "outputs": [], "source": [ "acct_name = acct_query_params()[\"account_name\"]\n", @@ -625,7 +583,7 @@ " display(Markdown(\"No related bookmarks found.\"))\n", "\n", "def disp_bookmark(bookmark_id):\n", - " display(related_bkmark_df[related_bkmark_df[\"BookmarkId\"] == bookmark_id].T)\n", + " return related_bkmark_df[related_bkmark_df[\"BookmarkId\"] == bookmark_id].T\n", "\n", "if related_bkmark_df is not None and not related_bkmark_df.empty:\n", " display(Markdown(\"### Click on bookmark to view details.\"))\n", @@ -650,12 +608,7 @@ { "cell_type": "code", "execution_count": null, - "metadata": { - "ExecuteTime": { - "end_time": "2019-10-31T21:39:01.948639Z", - "start_time": "2019-10-31T21:39:01.037012Z" - } - }, + "metadata": {}, "outputs": [], "source": [ "# Function definitions used below\n", @@ -762,12 +715,7 @@ { "cell_type": "code", "execution_count": null, - "metadata": { - "ExecuteTime": { - "end_time": "2019-10-31T21:38:34.770088Z", - "start_time": "2019-10-31T21:38:31.168203Z" - } - }, + "metadata": {}, "outputs": [], "source": [ "md(\"Fetching logon data...\")\n", @@ -788,12 +736,7 @@ { "cell_type": "code", "execution_count": null, - "metadata": { - "ExecuteTime": { - "end_time": "2019-10-31T21:29:02.043177Z", - "start_time": "2019-10-31T21:29:01.895284Z" - } - }, + "metadata": {}, "outputs": [], "source": [ "logon_summary = (all_win_logons\n", @@ -836,13 +779,7 @@ { "cell_type": "code", "execution_count": null, - "metadata": { - "ExecuteTime": { - "end_time": "2019-10-31T21:32:07.264120Z", - "start_time": "2019-10-31T21:30:25.991169Z" - }, - "scrolled": true - }, + "metadata": {}, "outputs": [], "source": [ "ti_results, all_win_logons_ti, src_ip_addrs_win = check_ip_ti(df=all_win_logons, ip_col=\"IpAddress\")\n", @@ -869,12 +806,7 @@ { "cell_type": "code", "execution_count": null, - "metadata": { - "ExecuteTime": { - "end_time": "2019-10-31T21:32:54.522511Z", - "start_time": "2019-10-31T21:32:54.163885Z" - } - }, + "metadata": {}, "outputs": [], "source": [ "all_win_logons_geo = check_geo_whois(src_ip_addrs_win, all_win_logons, \"IpAddress\")\n", @@ -902,12 +834,7 @@ { "cell_type": "code", "execution_count": null, - "metadata": { - "ExecuteTime": { - "end_time": "2019-10-30T18:59:39.221208Z", - "start_time": "2019-10-30T18:59:37.439830Z" - } - }, + "metadata": {}, "outputs": [], "source": [ "related_host_alerts = []\n", @@ -945,12 +872,7 @@ { "cell_type": "code", "execution_count": null, - "metadata": { - "ExecuteTime": { - "end_time": "2019-10-30T19:00:08.147249Z", - "start_time": "2019-10-30T19:00:07.109628Z" - } - }, + "metadata": {}, "outputs": [], "source": [ "ip_list = \",\".join(list(all_win_logons[\"IpAddress\"].unique()))\n", @@ -984,12 +906,7 @@ { "cell_type": "code", "execution_count": null, - "metadata": { - "ExecuteTime": { - "end_time": "2019-10-30T19:00:12.305509Z", - "start_time": "2019-10-30T19:00:11.427623Z" - } - }, + "metadata": {}, "outputs": [], "source": [ "related_host_bkmks = []\n", @@ -1032,12 +949,7 @@ { "cell_type": "code", "execution_count": null, - "metadata": { - "ExecuteTime": { - "end_time": "2019-10-17T01:12:09.414523Z", - "start_time": "2019-10-17T01:12:06.50592Z" - } - }, + "metadata": {}, "outputs": [], "source": [ "md(\"Fetching logon data...\")\n", @@ -1056,12 +968,7 @@ { "cell_type": "code", "execution_count": null, - "metadata": { - "ExecuteTime": { - "end_time": "2019-10-18T00:37:26.703212Z", - "start_time": "2019-10-18T00:37:26.610352Z" - } - }, + "metadata": {}, "outputs": [], "source": [ "logon_summary = (all_lx_logons\n", @@ -1104,13 +1011,7 @@ { "cell_type": "code", "execution_count": null, - "metadata": { - "ExecuteTime": { - "end_time": "2019-10-18T00:34:57.729768Z", - "start_time": "2019-10-18T00:34:56.96394Z" - }, - "scrolled": true - }, + "metadata": {}, "outputs": [], "source": [ "ti_results_lx, all_lx_logons_ti, src_ip_addrs_lx = check_ip_ti(df=all_lx_logons, ip_col=\"SourceIP\")\n", @@ -1137,12 +1038,7 @@ { "cell_type": "code", "execution_count": null, - "metadata": { - "ExecuteTime": { - "end_time": "2019-10-18T00:36:38.966326Z", - "start_time": "2019-10-18T00:36:38.921842Z" - } - }, + "metadata": {}, "outputs": [], "source": [ "all_lx_logons_geo = check_geo_whois(src_ip_addrs_lx, all_lx_logons, \"SourceIP\")\n", @@ -1171,12 +1067,7 @@ { "cell_type": "code", "execution_count": null, - "metadata": { - "ExecuteTime": { - "end_time": "2019-10-18T02:35:11.6751Z", - "start_time": "2019-10-18T02:35:06.398572Z" - } - }, + "metadata": {}, "outputs": [], "source": [ "related_host_alerts = []\n", @@ -1214,12 +1105,7 @@ { "cell_type": "code", "execution_count": null, - "metadata": { - "ExecuteTime": { - "end_time": "2019-10-18T02:38:57.430531Z", - "start_time": "2019-10-18T02:38:51.274039Z" - } - }, + "metadata": {}, "outputs": [], "source": [ "ip_list = \",\".join(list(all_lx_logons[\"SourceIP\"].unique()))\n", @@ -1253,12 +1139,7 @@ { "cell_type": "code", "execution_count": null, - "metadata": { - "ExecuteTime": { - "end_time": "2019-10-17T01:37:22.034764Z", - "start_time": "2019-10-17T01:37:19.091489Z" - } - }, + "metadata": {}, "outputs": [], "source": [ "related_host_bkmks = []\n", @@ -1302,13 +1183,7 @@ { "cell_type": "code", "execution_count": null, - "metadata": { - "ExecuteTime": { - "end_time": "2019-10-31T21:39:11.590768Z", - "start_time": "2019-10-31T21:39:07.163786Z" - }, - "scrolled": true - }, + "metadata": {}, "outputs": [], "source": [ "md(\"Fetching Azure/Office data...\")\n", @@ -1351,12 +1226,7 @@ { "cell_type": "code", "execution_count": null, - "metadata": { - "ExecuteTime": { - "end_time": "2019-10-31T21:39:15.869712Z", - "start_time": "2019-10-31T21:39:15.268413Z" - } - }, + "metadata": {}, "outputs": [], "source": [ "az_all_data = pd.concat([aad_signin_df, azure_activity_df, o365_activity_df], sort=False)\n", @@ -1378,12 +1248,7 @@ { "cell_type": "code", "execution_count": null, - "metadata": { - "ExecuteTime": { - "end_time": "2019-10-31T21:39:25.409627Z", - "start_time": "2019-10-31T21:39:25.384637Z" - } - }, + "metadata": {}, "outputs": [], "source": [ "(az_all_data\n", @@ -1419,13 +1284,7 @@ { "cell_type": "code", "execution_count": null, - "metadata": { - "ExecuteTime": { - "end_time": "2019-10-31T21:56:21.766666Z", - "start_time": "2019-10-31T21:39:35.923317Z" - }, - "scrolled": true - }, + "metadata": {}, "outputs": [], "source": [ "ti_results_az, all_az_ti, src_ip_addrs_az = check_ip_ti(df=az_all_data, ip_col=\"IPAddress\")\n", @@ -1451,12 +1310,7 @@ { "cell_type": "code", "execution_count": null, - "metadata": { - "ExecuteTime": { - "end_time": "2019-10-31T00:00:36.920040Z", - "start_time": "2019-10-31T00:00:36.771150Z" - } - }, + "metadata": {}, "outputs": [], "source": [ "all_az_geo = check_geo_whois(src_ip_addrs_az.iloc[0:50], az_all_data, \"IPAddress\")\n", @@ -1485,12 +1339,7 @@ { "cell_type": "code", "execution_count": null, - "metadata": { - "ExecuteTime": { - "end_time": "2019-10-31T00:01:17.332134Z", - "start_time": "2019-10-31T00:01:10.488413Z" - } - }, + "metadata": {}, "outputs": [], "source": [ "ip_list = \",\".join(list(src_ip_addrs_az[\"IPAddress\"].unique()))\n", @@ -1527,13 +1376,7 @@ { "cell_type": "code", "execution_count": null, - "metadata": { - "ExecuteTime": { - "end_time": "2019-10-18T03:26:11.596865Z", - "start_time": "2019-10-18T03:26:11.58687Z" - }, - "scrolled": true - }, + "metadata": {}, "outputs": [], "source": [ "print('List of current DataFrames in Notebook')\n", @@ -1655,13 +1498,6 @@ "_Feature" ], "window_display": false - }, - "widgets": { - "application/vnd.jupyter.widget-state+json": { - "state": {}, - "version_major": 2, - "version_minor": 0 - } } }, "nbformat": 4, diff --git a/Entity Explorer - Domain & URL.ipynb b/Entity Explorer - Domain & URL.ipynb index 57c27622..246f6b74 100644 --- a/Entity Explorer - Domain & URL.ipynb +++ b/Entity Explorer - Domain & URL.ipynb @@ -10,7 +10,7 @@ }, "source": [ "# Entity Explorer - Domain and URL\n", - " <details>\n", + "
\n", "  Details...\n", "\n", " **Notebook Version:** 1.0
\n", @@ -25,7 +25,7 @@ " - Log Analytics - Syslog, SecurityEvent, DnsEvents, CommonSecurityLog, AzureNetworkAnalytics_CL
\n", "**TI Proviers Used**\n", " - VirusTotal, Open Page Rank, BrowShot(all required for certain elements), AlienVault OTX, IBM XForce (optional) - all providers require accounts and API keys\n", - " </details>\n", + "
\n", "\n", "This Notebooks brings together a series of tools and techniques to enable threat hunting within the context of a domain name or URL that has been identified as of interest. It provides a series of techniques to assist in determining whether a domain or URL is malicious. Once this has been established it provides an overview of the scope of the domain or URL across an environment, along with indicators of areas for further investigation such as hosts of interest. " ] @@ -37,14 +37,14 @@ }, "source": [ "

Table of Contents

\n", - "" + "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "# Hunting Hypothesis: \n", + "## Hunting Hypothesis: \n", "Our broad initial hunting hypothesis is that a particular Linux host in our environment\n", "has been compromised, we will need to hunt from a range of different positions to\n", "validate or disprove this hypothesis." @@ -91,10 +91,10 @@ "import os\n", "import sys\n", "import warnings\n", - "from IPython.display import display, HTML, Markdown\n", + "from IPython.display import display, HTML, Markdown, Image\n", "\n", "REQ_PYTHON_VER=(3, 6)\n", - "REQ_MSTICPY_VER=(0, 5, 0)\n", + "REQ_MSTICPY_VER=(0, 6, 0)\n", "\n", "display(HTML(\"

Starting Notebook setup...

\"))\n", "if Path(\"./utils/nb_check.py\").is_file():\n", @@ -111,6 +111,9 @@ " import msticpy\n", " check_mp_ver(REQ_MSTICPY_VER)\n", " \n", + "\n", + "# If not using Azure Notebooks, install msticpy with\n", + "# !pip install msticpy\n", "from msticpy.nbtools import nbinit\n", "extra_imports = [\n", " \"msticpy.nbtools, observationlist\",\n", @@ -139,7 +142,7 @@ }, "source": [ "### Get WorkspaceId and Authenticate to Log Analytics\n", - "<details>\n", + "
\n", "  Details...\n", "If you are using user/device authentication, run the following cell. \n", "- Click the 'Copy code to clipboard and authenticate' button.\n", @@ -159,7 +162,7 @@ "Note: you may occasionally see a JavaScript error displayed at the end of the authentication - you can safely ignore this.
\n", "On successful authentication you should see a ```popup schema``` button.\n", "To find your Workspace Id go to [Log Analytics](https://ms.portal.azure.com/#blade/HubsExtension/Resources/resourceType/Microsoft.OperationalInsights%2Fworkspaces). Look at the workspace properties to find the ID.\n", - "</details>" + "
" ] }, { @@ -209,7 +212,9 @@ "qry_prov = QueryProvider('LogAnalytics')\n", "la_connection_string = f'loganalytics://code().tenant(\"{ten_id}\").workspace(\"{ws_id}\")'\n", "qry_prov.connect(connection_str=f'{la_connection_string}')\n", - "tilookup = TILookup()" + "tilookup = TILookup()\n", + "tilookup.reload_providers()\n", + "tilookup.provider_status" ] }, { @@ -219,7 +224,7 @@ "#### Authentication and Configuration Problems\n", "\n", "
\n", - "<details>\n", + "
\n", " Click for details about configuring your authentication parameters\n", " \n", "The notebook is expecting your Azure Sentinel Tenant ID and Workspace ID to be configured in one of the following places:\n", @@ -231,7 +236,7 @@ "```%pfile config.json```\n", "\n", "For help with setting up your `msticpyconfig.yaml` see the [Setup](#Setup) section at the end of this notebook and the [ConfigureNotebookEnvironment notebook](https://github.com/Azure/Azure-Sentinel-Notebooks/blob/master/ConfiguringNotebookEnvironment.ipynb)\n", - "</details>" + "
" ] }, { @@ -239,7 +244,7 @@ "metadata": {}, "source": [ "## Select the domain or URL you wish to investigate\n", - "Enter the domain or URL you wish to investigate." + "Enter the domain or URL you wish to investigate. e.g. www.microsoft.com/index.html" ] }, { @@ -309,8 +314,12 @@ "### Threat Intelligence\n", "As a first step we want to establish if this domain or URL is known to to be malicious by our Threat Intelligence providers.\n", "\n", - "#### msticpyconfig.yaml configuration file\n", - "You can configure primary and secondary TI providers and any required parameters in the msticpyconfig.yaml file. This is read from the current directory or you can set an environment variable (MSTICPYCONFIG) pointing to its location. To configure this file see the ConfigureNotebookEnvironment notebook." + "#### `msticpyconfig.yaml` configuration File\n", + "You can configure primary and secondary TI providers and any required parameters in the `msticpyconfig.yaml` file. This is read from the current directory or you can set an environment variable (`MSTICPYCONFIG`) pointing to its location.\n", + "\n", + "To configure this file see the [ConfigureNotebookEnvironment notebook](https://github.com/Azure/Azure-Sentinel-Notebooks/blob/master/ConfiguringNotebookEnvironment.ipynb) and [Threat intelligence provider configuration](https://msticpy.readthedocs.io/en/latest/data_acquisition/TIProviders.html#configuration-file). \n", + "\n", + "For Azure Sentinel Notebooks environment (Azure Notebooks or Azure ML) [Run ConfiguringNotebookEnvironment](./ConfiguringNotebookEnvironment.ipynb)" ] }, { @@ -363,7 +372,11 @@ "### Domain analysis\n", "To build up a fuller picture of the domain we can use whois, and other data sources to gather pertinent data. Indicators such as registration data, domain entropy, and registration details can provide indicators that a domain is not legitimate in nature.\n", "\n", - "This cell uses the Open Page Rank API (https://www.domcop.com/openpagerank/) - in order to use this you need to add your API key to your `msticpyconfig.yaml` configuration file (as you did for other TI providers). Please see the `ConfigureNotebookEnvironment` notebook for more details on this." + "This cell uses the Open Page Rank API (https://www.domcop.com/openpagerank/) - in order to use this you need to add your API key to your `msticpyconfig.yaml` configuration file (as you did for other TI providers). \n", + "\n", + "To configure this file see the [ConfigureNotebookEnvironment notebook](https://github.com/Azure/Azure-Sentinel-Notebooks/blob/master/ConfiguringNotebookEnvironment.ipynb) and [Threat intelligence provider configuration](https://msticpy.readthedocs.io/en/latest/data_acquisition/TIProviders.html#configuration-file). \n", + "\n", + "For Azure Sentinel Notebooks environment (Azure Notebooks or Azure ML) [Run ConfiguringNotebookEnvironment](./ConfiguringNotebookEnvironment.ipynb)" ] }, { @@ -467,7 +480,7 @@ }, "source": [ "### TLS Cert Details\n", - "Does the domain have an associated tls certificate and if so is that certificate blacklisted by abuse.ch?\n", + "Does the domain have an associated tls certificate and if so is that certificate in the malicious certs list held by abuse.ch?\n", "Details such as the certificate's subject and issuer can also provide indicators as to the domains nature." ] }, @@ -487,23 +500,23 @@ "else:\n", " scope = domain\n", "\n", - "# See if TLS cert is in abuse.ch blacklist and get cert details\n", - "result, x509 = dom_val.ssl_blacklisted(scope)\n", + "# See if TLS cert is in abuse.ch malicious certs list and get cert details\n", + "result, x509 = dom_val.in_abuse_list(scope)\n", "\n", "if x509 is not None:\n", " cert_df = pd.DataFrame({\"SN\" :[x509.serial_number],\n", " \"Subject\":[[(i.value) for i in x509.subject]],\n", " \"Issuer\": [[(i.value) for i in x509.issuer]],\n", " \"Expired\": [x509.not_valid_after],\n", - " \"In SSLBL?\": result})\n", + " \"InAbuseList\": result})\n", "\n", " display(cert_df.T)\n", " summary.add_observation(caption=\"TLS Summary\", description=f\"Summary of TLS certificate for {domain}\", data=cert_df)\n", - " md(\"If 'In SSLBL?' is True this shows that the SSL certificate figerprint appeared in the abuse.ch blacklist\")\n", + " md(\"If 'InAbuseList' is True this shows that the SSL certificate fingerprint appeared in the abuse.ch list\")\n", " graph_items.append((domain,result))\n", "\n", "else:\n", - " md(\"No Blacklisted TLS certificate was found.\")" + " md(\"No TLS certificate was found in abuse.ch lists.\")" ] }, { @@ -514,7 +527,11 @@ "What IP address is assocatiated with this domain, what do we know about that IP?\n", "What other domains have been associated with this IP, and is it a known ToR exit node?\n", "\n", - "In order to use this ToR lookup functionality of MSTICpy you need to configure it as a provider in your `msticpyconfig.yaml` configuration file. No API key is required to use this functionality. Please see the `ConfigureNotebookEnvironment` notebook for more details on this." + "In order to use this ToR lookup functionality of MSTICpy you need to configure it as a provider in your `msticpyconfig.yaml` configuration file. No API key is required to use this functionality. \n", + "\n", + "To configure this file see the [ConfigureNotebookEnvironment notebook](https://github.com/Azure/Azure-Sentinel-Notebooks/blob/master/ConfiguringNotebookEnvironment.ipynb) and [Threat intelligence provider configuration](https://msticpy.readthedocs.io/en/latest/data_acquisition/TIProviders.html#configuration-file). \n", + "\n", + "For Azure Sentinel Notebooks environment (Azure Notebooks or Azure ML) [Run ConfiguringNotebookEnvironment](./ConfiguringNotebookEnvironment.ipynb)" ] }, { @@ -593,7 +610,11 @@ "### Site Screenshot\n", "Using https://browshot.com/ return a screenshot of the domain or url being investigated. This can help us identify if the site is a phishing portal.\n", "\n", - "As with other external providers you need an API key to use the BrowShot service, and have the provider configured in your `msticpyconfig.yaml` file. Please see the `ConfigureNotebookEnvironment` notebook for more details on this." + "As with other external providers you need an API key to use the BrowShot service, and have the provider configured in your `msticpyconfig.yaml` file. \n", + "\n", + "To configure this file see the [ConfigureNotebookEnvironment notebook](https://github.com/Azure/Azure-Sentinel-Notebooks/blob/master/ConfiguringNotebookEnvironment.ipynb) and [Threat intelligence provider configuration](https://msticpy.readthedocs.io/en/latest/data_acquisition/TIProviders.html#configuration-file). \n", + "\n", + "For Azure Sentinel Notebooks environment (Azure Notebooks or Azure ML) [Run ConfiguringNotebookEnvironment](./ConfiguringNotebookEnvironment.ipynb)" ] }, { @@ -844,7 +865,7 @@ "# Show selected alert when selected\n", "if isinstance(related_alerts, pd.DataFrame) and not related_alerts.empty:\n", " display(Markdown('### Click on alert to view details.'))\n", - " rel_alert_select = nbwidgets.AlertSelector(alerts=related_alerts,\n", + " rel_alert_select = nbwidgets.SelectAlert(alerts=related_alerts,\n", " action=show_full_alert)\n", " rel_alert_select.display()\n", "else:\n", @@ -1317,18 +1338,6 @@ "md(f\"URL: {url}\", \"bold\")\n", "summary.display_observations()" ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Configuration\n", - "\n", - "### `msticpyconfig.yaml` configuration File\n", - "You can configure primary and secondary TI providers and any required parameters in the `msticpyconfig.yaml` file. This is read from the current directory or you can set an environment variable (`MSTICPYCONFIG`) pointing to its location.\n", - "\n", - "To configure this file see the [ConfigureNotebookEnvironment notebook](https://github.com/Azure/Azure-Sentinel-Notebooks/blob/master/ConfiguringNotebookEnvironment.ipynb)" - ] } ], "metadata": { @@ -1382,10 +1391,10 @@ "height": "calc(100% - 180px)", "left": "10px", "top": "150px", - "width": "512px" + "width": "352.33px" }, "toc_section_display": true, - "toc_window_display": false + "toc_window_display": true }, "varInspector": { "cols": { diff --git a/Entity Explorer - IP Address.ipynb b/Entity Explorer - IP Address.ipynb index ab67e17a..e129706d 100644 --- a/Entity Explorer - IP Address.ipynb +++ b/Entity Explorer - IP Address.ipynb @@ -1,1972 +1,1975 @@ -{ - "cells": [ - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Title: IP Explorer\n", - "<details>\n", - "  Details...\n", - " \n", - "**Notebook Version:** 1.0
\n", - "**Python Version:** Python 3.7 (including Python 3.6 - AzureML)
\n", - "**Required Packages**: kqlmagic, msticpy, pandas, numpy, matplotlib, networkx, ipywidgets, ipython, scikit_learn, dnspython, ipwhois, folium, holoviews
\n", - "**Platforms Supported**:\n", - "- Azure Notebooks Free Compute\n", - "- Azure Notebooks DSVM\n", - "- OS Independent\n", - "\n", - "**Data Sources Required**:\n", - "- Log Analytics \n", - " - Heartbeat\n", - " - SecurityAlert\n", - " - SecurityEvent\n", - " - AzureNetworkAnalytics_CL\n", - " \n", - "- (Optional) \n", - " - VirusTotal (with API key)\n", - " - Alienvault OTX (with API key) \n", - " - IBM Xforce (with API key) \n", - " - CommonSecurityLog\n", - "</details>\n", - "\n", - "\n", - "Brings together a series of queries and visualizations to help you assess the security state of an IP address. It works with both internal addresses and public addresses. \n", - "
For internal addresses it focuses on traffic patterns and behavior of the host using that IP address. For public IPs it lets you perform threat intelligence lookups, passive dns, whois and other checks. \n", - "
It also allows you to examine any network traffic between the external IP address and your resources." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "toc": true - }, - "source": [ - "

Table of Contents

\n", - "
" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "[Contents](#toc)\n", - "## Hunting Hypothesis\n", - "Our broad initial hunting hypothesis is that a we have received IP address entity which is suspected to be compromized internal host or external public address to whom internal hosts are communicating in malicious manner, we will need to hunt from a range of different positions to validate or disprove this hypothesis.\n", - "\n", - "Before you start hunting please run the cells in Setup at the bottom of this Notebook." - ] - }, - { - "attachments": { - "ipexplorer-mindmapv2.PNG": { - "image/png": "" - } - }, - "cell_type": "markdown", - "metadata": {}, - "source": [ - "[Contents](#toc)\n", - "### IP Explorer Mindmap\n", - "Below mindmap diagram shows hunting workflow depending upon the type of IP address provided\n", - "\n", - "![ipexplorer-mindmapv2.PNG](attachment:ipexplorer-mindmapv2.PNG)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "---\n", - "### Notebook initialization\n", - "The next cell:\n", - "- Checks for the correct Python version\n", - "- Checks versions and optionally installs required packages\n", - "- Imports the required packages into the notebook\n", - "- Sets a number of configuration options.\n", - "\n", - "This should complete without errors. If you encounter errors or warnings look at the following two notebooks:\n", - "- [TroubleShootingNotebooks](https://github.com/Azure/Azure-Sentinel-Notebooks/blob/master/TroubleShootingNotebooks.ipynb)\n", - "- [ConfiguringNotebookEnvironment](https://github.com/Azure/Azure-Sentinel-Notebooks/blob/master/ConfiguringNotebookEnvironment.ipynb)\n", - "\n", - "If you are running in the Azure Sentinel Notebooks environment (Azure Notebooks or Azure ML) you can run live versions of these notebooks:\n", - "- [Run TroubleShootingNotebooks](./TroubleShootingNotebooks.ipynb)\n", - "- [Run ConfiguringNotebookEnvironment](./ConfiguringNotebookEnvironment.ipynb)\n", - "\n", - "You may also need to do some additional configuration to successfully use functions such as Threat Intelligence service lookup and Geo IP lookup. \n", - "There are more details about this in the `ConfiguringNotebookEnvironment` notebook and in these documents:\n", - "- [msticpy configuration](https://msticpy.readthedocs.io/en/latest/getting_started/msticpyconfig.html)\n", - "- [Threat intelligence provider configuration](https://msticpy.readthedocs.io/en/latest/data_acquisition/TIProviders.html#configuration-file)\n" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "ExecuteTime": { - "end_time": "2020-05-15T23:01:51.949751Z", - "start_time": "2020-05-15T23:01:51.909753Z" - } - }, - "outputs": [], - "source": [ - "from pathlib import Path\n", - "import os\n", - "import sys\n", - "import warnings\n", - "from IPython.display import display, HTML, Markdown\n", - "\n", - "REQ_PYTHON_VER=(3, 6)\n", - "REQ_MSTICPY_VER=(0, 5, 0)\n", - "\n", - "display(HTML(\"

Starting Notebook setup...

\"))\n", - "if Path(\"./utils/nb_check.py\").is_file():\n", - " from utils.nb_check import check_python_ver, check_mp_ver\n", - "\n", - " check_python_ver(min_py_ver=REQ_PYTHON_VER)\n", - " try:\n", - " check_mp_ver(min_msticpy_ver=REQ_MSTICPY_VER)\n", - " except ImportError:\n", - " !pip install --upgrade msticpy\n", - " if \"msticpy\" in sys.modules:\n", - " importlib.reload(sys.modules[\"msticpy\"])\n", - " else:\n", - " import msticpy\n", - " check_mp_ver(REQ_MSTICPY_VER)\n", - " \n", - "from msticpy.nbtools import nbinit\n", - "extra_imports = [\n", - " \"msticpy.nbtools.entityschema, IpAddress\",\n", - " \"msticpy.nbtools.entityschema, GeoLocation\",\n", - " \"msticpy.sectools.ip_utils, create_ip_record\",\n", - " \"msticpy.sectools.ip_utils, get_ip_type\",\n", - " \"msticpy.sectools.ip_utils, get_whois_info\",\n", - "]\n", - "nbinit.init_notebook(\n", - " namespace=globals(),\n", - " extra_imports=extra_imports,\n", - ");\n", - "WIDGET_DEFAULTS = {\n", - " \"layout\": widgets.Layout(width=\"95%\"),\n", - " \"style\": {\"description_width\": \"initial\"},\n", - "}" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "[Contents](#toc)\n", - "### Get WorkspaceId and Authenticate to Log Analytics \n", - "<details>\n", - "  Details...\n", - "If you are using user/device authentication, run the following cell. \n", - "- Click the 'Copy code to clipboard and authenticate' button.\n", - "- This will pop up an Azure Active Directory authentication dialog (in a new tab or browser window). The device code will have been copied to the clipboard. \n", - "- Select the text box and paste (Ctrl-V/Cmd-V) the copied value. \n", - "- You should then be redirected to a user authentication page where you should authenticate with a user account that has permission to query your Log Analytics workspace.\n", - "\n", - "Use the following syntax if you are authenticating using an Azure Active Directory AppId and Secret:\n", - "```\n", - "%kql loganalytics://tenant(aad_tenant).workspace(WORKSPACE_ID).clientid(client_id).clientsecret(client_secret)\n", - "```\n", - "instead of\n", - "```\n", - "%kql loganalytics://code().workspace(WORKSPACE_ID)\n", - "```\n", - "\n", - "Note: you may occasionally see a JavaScript error displayed at the end of the authentication - you can safely ignore this.
\n", - "On successful authentication you should see a ```popup schema``` button.\n", - "To find your Workspace Id go to [Log Analytics](https://ms.portal.azure.com/#blade/HubsExtension/Resources/resourceType/Microsoft.OperationalInsights%2Fworkspaces). Look at the workspace properties to find the ID.\n", - "</details>" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "ExecuteTime": { - "end_time": "2020-05-15T23:02:52.662562Z", - "start_time": "2020-05-15T23:02:52.653563Z" - } - }, - "outputs": [], - "source": [ - "#See if we have an Azure Sentinel Workspace defined in our config file, if not let the user specify Workspace and Tenant IDs\n", - "from msticpy.nbtools.wsconfig import WorkspaceConfig\n", - "ws_config = WorkspaceConfig()\n", - "try:\n", - " ws_id = ws_config['workspace_id']\n", - " ten_id = ws_config['tenant_id']\n", - " config = True\n", - " md(\"Workspace details collected from config file\")\n", - "except KeyError:\n", - " md(('Please go to your Log Analytics workspace, copy the workspace ID'\n", - " ' and/or tenant Id and paste here to enable connection to the workspace and querying of it..
'))\n", - " ws_id_wgt = nbwidgets.GetEnvironmentKey(env_var='WORKSPACE_ID',\n", - " prompt='Please enter your Log Analytics Workspace Id:', auto_display=True)\n", - " ten_id_wgt = nbwidgets.GetEnvironmentKey(env_var='TENANT_ID',\n", - " prompt='Please enter your Log Analytics Tenant Id:', auto_display=True)\n", - " config = False" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "ExecuteTime": { - "end_time": "2020-05-15T23:03:22.552179Z", - "start_time": "2020-05-15T23:02:56.043852Z" - } - }, - "outputs": [], - "source": [ - "# Authentication\n", - "qry_prov = QueryProvider(data_environment=\"LogAnalytics\")\n", - "qry_prov.connect(connection_str=ws_config.code_connect_str)\n", - "table_index = qry_prov.schema_tables" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "[Contents](#toc)\n", - "## Enter the IP Address and query time window\n", - "\n", - "Type the IP address you want to search for and the time bounds over which search.\n", - "\n", - "You can specify the IP address value in the widget e.g. 192.168.1.1" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "ExecuteTime": { - "end_time": "2020-05-15T23:03:22.632179Z", - "start_time": "2020-05-15T23:03:22.619179Z" - } - }, - "outputs": [], - "source": [ - "ipaddr_text = widgets.Text(\n", - " description=\"Enter the IP Address to search for:\", **WIDGET_DEFAULTS\n", - ")\n", - "display(ipaddr_text)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "ExecuteTime": { - "end_time": "2020-05-15T23:03:56.698491Z", - "start_time": "2020-05-15T23:03:56.631491Z" - } - }, - "outputs": [], - "source": [ - "query_times = nbwidgets.QueryTime(units=\"day\", max_before=20, before=5, max_after=7)\n", - "query_times.display()" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "ExecuteTime": { - "end_time": "2020-05-15T23:04:05.784278Z", - "start_time": "2020-05-15T23:04:05.776278Z" - } - }, - "outputs": [], - "source": [ - "# Set up function to allow easy reference to common parameters for queries throughout the notebook\n", - "def ipaddr_query_params():\n", - " return {\n", - " \"start\": query_times.start,\n", - " \"end\": query_times.end,\n", - " \"ip_address\": ipaddr_text.value.strip()\n", - " }" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "[Contents](#toc)\n", - "## Detemine IP Address Type" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "ExecuteTime": { - "end_time": "2020-05-15T23:04:47.927548Z", - "start_time": "2020-05-15T23:04:43.963316Z" - } - }, - "outputs": [], - "source": [ - "ipaddr_type = get_ip_type(ipaddr_query_params()['ip_address'])\n", - "\n", - "md(f'Depending on the IP Address origin, different sections of this notebook are applicable', styles=[\"bold\", \"large\"])\n", - "md(f'Please follow either the Interal IP Address or External IP Address sections based on below Recommendation', styles=[\"bold\"])\n", - "\n", - "#Get details from Heartbeat table for the given IP Address and Time Parameters\n", - "heartbeat_df = qry_prov.Heartbeat.get_info_by_ipaddress(**ipaddr_query_params())\n", - "\n", - "# Set hostnames retrived from Heartbeat table if available\n", - "if not heartbeat_df.empty:\n", - " hostname = heartbeat_df[\"Computer\"][0]\n", - "else:\n", - " hostname = \"\"\n", - " \n", - "if not heartbeat_df.empty:\n", - " ipaddr_origin = \"Internal\"\n", - " md(f'IP Address type based on subnet: {ipaddr_type} & IP Address Owner based on available logs : {ipaddr_origin}', styles=[\"blue\",\"bold\"])\n", - " display(Markdown('#### Recommendation - Go to section [InternalIP](#goto_internalIP)'))\n", - "elif ipaddr_type==\"Private\" and heartbeat_df.empty:\n", - " ipaddr_origin = \"Unknown\"\n", - " md(f'IP Address type based on subnet: {ipaddr_type} & IP Address Owner based on available logs : {ipaddr_origin}', styles=[\"blue\",\"bold\"])\n", - " display(Markdown('#### Recommendation - Go to section [InternalIP](#goto_internalIP)'))\n", - "else:\n", - " ipaddr_origin = \"External\"\n", - " md(f'IP Address type based on subnet: {ipaddr_type} & IP Address Owner based on available logs : {ipaddr_origin}', styles=[\"blue\",\"bold\"])\n", - " display(Markdown('#### Recommendation - Go to section [ExternalIP](#goto_externalIP)'))\n", - " \n", - "#Populate related IP addresses for the calculated hostname\n", - "az_net_df = pd.DataFrame()\n", - "if \"AzureNetworkAnalytics_CL\" in table_index:\n", - " aznet_query = f\"\"\"\n", - " AzureNetworkAnalytics_CL | where ResourceType == 'NetworkInterface' \n", - " | where SubType_s == \"Topology\" \n", - " | search \\'{ipaddr_text.value}\\' \n", - " | where TimeGenerated >= datetime({query_times.start}) \n", - " | where TimeGenerated <= datetime({query_times.end}) \n", - " | where VirtualMachine_s has '{hostname}' \n", - " | top 1 by TimeGenerated desc \n", - " | project PrivateIPAddresses = PrivateIPAddresses_s, PublicIPAddresses = PublicIPAddresses_s\"\"\"\n", - " az_net_df = qry_prov.exec_query(query=aznet_query)\n", - " \n", - "# Create IP Entity record using available dataframes or input ip address if nothing present\n", - "if az_net_df.empty and heartbeat_df.empty:\n", - " ip_entity = IpAddress()\n", - " ip_entity['Address'] = ipaddr_query_params()['ip_address']\n", - " ip_entity['Type'] = 'ipaddress'\n", - " ip_entity['OSType'] = 'Unknown'\n", - " md('No Heartbeat Data and Network topology data found')\n", - "elif not heartbeat_df.empty:\n", - " if az_net_df.empty:\n", - " ip_entity = create_ip_record(\n", - " heartbeat_df=heartbeat_df)\n", - " else:\n", - " ip_entity = create_ip_record(\n", - " heartbeat_df=heartbeat_df, az_net_df=az_net_df)\n", - "#Display IP Entity\n", - "md(\"Displaying IP Entity\", styles=[\"green\",\"bold\"])\n", - "print(ip_entity)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "\n", - "## External IP" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "[Contents](#toc)\n", - "### GeoIP Lookups for External IP Addresses" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "ExecuteTime": { - "end_time": "2020-04-27T08:33:37.478812Z", - "start_time": "2020-04-27T08:33:37.470173Z" - } - }, - "outputs": [], - "source": [ - "# msticpy- geoip module to retrieving Geo Location for Public IP addresses\n", - "# To force Threatinel lookup for Internal public IP, replace and with or in if condition\n", - "if ipaddr_type == \"Public\" and ipaddr_origin == \"External\" :\n", - " iplocation = GeoLiteLookup()\n", - "\n", - " loc_results, ext_ip_entity = iplocation.lookup_ip(ip_address=ipaddr_query_params()['ip_address'])\n", - " md(\n", - " 'Geo Location for the IP Address ::', styles=[\"bold\",\"green\"]\n", - " )\n", - " print(ext_ip_entity[0])\n", - "else:\n", - " md(f'Analysis section Not Applicable since IP address owner is {ipaddr_origin}', styles=[\"bold\",\"red\"])" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "[Contents](#toc)\n", - "### Whois Registrars for External IP Addresses" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "ExecuteTime": { - "end_time": "2020-04-27T08:33:39.572115Z", - "start_time": "2020-04-27T08:33:39.566009Z" - } - }, - "outputs": [], - "source": [ - "# ipwhois module to retrieve whois registrar for Public IP addresses\n", - "# To force Threatinel lookup for Internal public IP, replace and with or in if condition\n", - "if ipaddr_type == \"Public\" and ipaddr_origin == \"External\" :\n", - " from ipwhois import IPWhois\n", - "\n", - " whois = IPWhois(ipaddr_query_params()['ip_address'])\n", - " whois_result = whois.lookup_whois()\n", - " if whois_result:\n", - " md(f'Whois Registrar Info ::', styles=[\"bold\",\"green\"])\n", - " display(whois_result)\n", - " else:\n", - " md(\n", - " f'No whois records available', styles=[\"bold\",\"orange\"]\n", - " )\n", - "else:\n", - " md(f'Analysis section Not Applicable since IP address owner is {ipaddr_origin}', styles=[\"bold\",\"red\"])" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "[Contents](#toc)\n", - "### Opensource and Azure Sentinel ThreatIntel Lookups" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### Configure your TI Provider settings\n", - "If you have not used threat intelligence lookups before you will need to supply API keys for the \n", - "TI Providers that you want to use. Please see the section on configuring [msticpyconfig.yaml](#msticpyconfig.yaml-configuration-File)\n", - "\n", - "Then reload provider settings:\n", - "```\n", - "mylookup = TILookup()\n", - "mylookup.reload_provider_settings()\n", - "```" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "ExecuteTime": { - "end_time": "2020-04-27T08:33:43.562087Z", - "start_time": "2020-04-27T08:33:43.554830Z" - }, - "scrolled": true - }, - "outputs": [], - "source": [ - "# To force Threatinel lookup for Internal public IP, replace and with or in if condition\n", - "if ipaddr_type == \"Public\" and ipaddr_origin == \"External\" :\n", - " mylookup = TILookup()\n", - " mylookup.loaded_providers\n", - " resp = mylookup.lookup_ioc(observable=ipaddr_query_params()['ip_address'], ioc_type=\"ipv4\")\n", - " md(f'ThreatIntel Lookup for IP ::', styles=[\"bold\",\"green\"])\n", - " display(mylookup.result_to_df(resp).T)\n", - "else:\n", - " md(f'Analysis section Not Applicable since IP address owner is {ipaddr_origin}', styles=[\"bold\",\"red\"])" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "[Contents](#toc)\n", - "### Passive DNS lookups for External IP Addresses" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "ExecuteTime": { - "end_time": "2020-04-27T08:33:45.838706Z", - "start_time": "2020-04-27T08:33:45.829919Z" - } - }, - "outputs": [], - "source": [ - "# To force Passive DNS lookup for Internal public IP, change and with or in if\n", - "if ipaddr_type == \"Public\" and ipaddr_origin == \"External\" :\n", - " # retrieve passive dns from TI Providers\n", - " pdns = mylookup.lookup_ioc(\n", - " observable=ipaddr_query_params()['ip_address'],\n", - " ioc_type=\"ipv4\",\n", - " ioc_query_type=\"passivedns\",\n", - " providers=[\"XForce\"],\n", - " )\n", - " pdns_df = mylookup.result_to_df(pdns)\n", - " if not pdns_df.empty and pdns_df[\"RawResult\"][0] and \"RDNS\" in pdns_df[\"RawResult\"][0]:\n", - " pdnsdomains = pdns_df[\"RawResult\"][0][\"RDNS\"]\n", - " md(\n", - " 'Passive DNS domains for IP: {pdnsdomains}',styles=[\"bold\",\"green\"]\n", - " )\n", - " display(mylookup.result_to_df(pdns).T)\n", - " else:\n", - " md(\n", - " 'No passive domains found from the providers', styles=[\"bold\",\"orange\"]\n", - " )\n", - "else:\n", - " md(f'Analysis section Not Applicable since IP address owner is {ipaddr_origin}', styles=[\"bold\",\"red\"])" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "\n", - "## Internal IP Address" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "[Contents](#toc)\n", - "### Data Sources available to query related to IP" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "ExecuteTime": { - "end_time": "2020-05-15T23:04:59.773853Z", - "start_time": "2020-05-15T23:04:53.039482Z" - } - }, - "outputs": [], - "source": [ - "if ipaddr_origin in [\"Internal\",\"Unknown\"]:\n", - " # KQL query for full text search of IP address and display all datatypes populated for the time period\n", - " datasource_status = \"\"\"\n", - " search \\'{ip_address}\\' or \\'{hostname}\\'\n", - " | where TimeGenerated >= datetime({start}) and TimeGenerated <= datetime({end})\n", - " | summarize RowCount=count() by Table=$table\n", - " \"\"\".format(\n", - " **ipaddr_query_params(), hostname=hostname\n", - " )\n", - " %kql -query datasource_status\n", - " datasource_status_df = _kql_raw_result_.to_dataframe()\n", - "\n", - " # Display result as transposed matrix of datatypes availabel to query for the query period\n", - " if not datasource_status_df.empty:\n", - " available_datasets = datasource_status_df['Table'].values\n", - " md(\"Datasources available to query for IP ::\", styles=[\"green\",\"bold\"])\n", - " display(datasource_status_df)\n", - " else:\n", - " md_warn(\"No datasources contain given IP address for the query period\")\n", - "else:\n", - " md(f'Analysis section Not Applicable since IP address type is: {ipaddr_type}', styles=[\"bold\",\"red\"])" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "[Contents](#toc)\n", - "### Check if IP is assigned to multiple hostnames" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "ExecuteTime": { - "end_time": "2020-05-15T23:05:03.895367Z", - "start_time": "2020-05-15T23:05:02.486243Z" - } - }, - "outputs": [], - "source": [ - "if ipaddr_origin == \"Internal\" or not datasource_status_df.empty:\n", - " # Get single event - try process creation\n", - " if ip_entity['OSType'] =='Windows':\n", - " if \"SecurityEvent\" not in available_datasets:\n", - " raise ValueError(\"No Windows event log data available in the workspace\")\n", - " host_name = None\n", - " matching_hosts_df = qry_prov.WindowsSecurity.list_host_processes(\n", - " query_times, host_name=hostname, add_query_items=\"| distinct Computer\"\n", - " )\n", - " elif ip_entity['OSType'] =='Linux':\n", - " if \"Syslog\" not in available_datasets:\n", - " raise ValueError(\"No Linux syslog data available in the workspace\")\n", - " else:\n", - " linux_syslog_query = f\"\"\" Syslog | where TimeGenerated >= datetime({query_times.start}) | where TimeGenerated <= datetime({query_times.end}) | where HostIP == '{ipaddr_text.value}' | distinct Computer \"\"\"\n", - " matching_hosts_df = qry_prov.exec_query(query=linux_syslog_query)\n", - "\n", - " if len(matching_hosts_df) > 1:\n", - " print(f\"Multiple matches for '{hostname}'. Please select a host from the list.\")\n", - " choose_host = nbwidgets.SelectString(\n", - " item_list=list(matching_hosts_df[\"Computer\"].values),\n", - " description=\"Select the host.\",\n", - " auto_display=True,\n", - " )\n", - " elif not matching_hosts_df.empty:\n", - " host_name = matching_hosts_df[\"Computer\"].iloc[0]\n", - " print(f\"Unique host found for IP: {hostname}\")\n", - "elif datasource_status_df.empty:\n", - " md_warn(\"No datasources contain given IP address for the query period\")\n", - "else: \n", - " md(f'Analysis section Not Applicable since IP address type is : {ipaddr_type}', styles=[\"bold\",\"red\"])" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "[Contents](#toc)\n", - "### System Info" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "ExecuteTime": { - "end_time": "2020-05-15T23:05:07.346683Z", - "start_time": "2020-05-15T23:05:07.330684Z" - } - }, - "outputs": [], - "source": [ - "# Retrieving System info from internal table if IP address is not Public\n", - "if ipaddr_origin == \"Internal\" and not heartbeat_df.empty:\n", - " md(\n", - " 'System Info retrieved from Heartbeat table ::', styles=[\"green\",\"bold\"]\n", - " )\n", - " display(heartbeat_df.T)\n", - "else:\n", - " md_warn(\n", - " 'No records available in HeartBeat table'\n", - " )" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "[Contents](#toc)\n", - "### ServiceMap - Get List of Services for Host" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "ExecuteTime": { - "end_time": "2020-05-15T23:05:10.389939Z", - "start_time": "2020-05-15T23:05:10.369939Z" - } - }, - "outputs": [], - "source": [ - "if ipaddr_origin == \"Internal\":\n", - " if \"ServiceMapProcess_CL\" not in available_datasets:\n", - " md_warn(\"ServiceMap data is not enabled\")\n", - " md(\n", - " f\"Enable ServiceMap Solution from Azure marketplce:
\"\n", - " +\"https://docs.microsoft.com/en-us/azure/azure-monitor/insights/service-map#enable-service-map\",\n", - " styles=[\"bold\"]\n", - " )\n", - "\n", - " else:\n", - " servicemap_proc_query = \"\"\"\n", - " ServiceMapProcess_CL\n", - " | where Computer == \\'{hostname}\\'\n", - " | where TimeGenerated >= datetime({start}) and TimeGenerated <= datetime({end})\n", - " | project Computer, Services_s, DisplayName_s, ExecutableName_s , ExecutablePath_s \n", - " \"\"\".format(\n", - " hostname=hostname, **ipaddr_query_params()\n", - " )\n", - "\n", - " %kql -query servicemap_proc_query\n", - " servicemap_proc_df = _kql_raw_result_.to_dataframe()\n", - " display(servicemap_proc_df)\n", - "else:\n", - " md(f'Analysis section Not Applicable since IP address type is {ipaddr_type}', styles=[\"bold\",\"red\"])" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Related Alerts" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "ExecuteTime": { - "end_time": "2020-05-15T23:05:14.185177Z", - "start_time": "2020-05-15T23:05:14.123178Z" - } - }, - "outputs": [], - "source": [ - "ra_query_times = nbwidgets.QueryTime(\n", - " units=\"day\",\n", - " origin_time=query_times.origin_time,\n", - " max_before=28,\n", - " max_after=5,\n", - " before=5,\n", - " auto_display=True,\n", - ")\n" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Visualization - Timeline of Related Alerts" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "ExecuteTime": { - "end_time": "2020-05-15T23:05:19.536611Z", - "start_time": "2020-05-15T23:05:17.943028Z" - } - }, - "outputs": [], - "source": [ - "#Provide hostname if present to the query\n", - "if hostname:\n", - " md(f\"Searching for alerts related to {hostname}...\")\n", - " related_alerts = qry_prov.SecurityAlert.list_related_alerts(\n", - " ra_query_times, host_name=hostname\n", - " )\n", - "else:\n", - " md(f\"Searching for alerts related to ip address(es) {ipaddr_query_params()['ip_address']}\")\n", - " related_alerts = qry_prov.SecurityAlert.list_alerts_for_ip(\n", - " ra_query_times, source_ip_list=ipaddr_query_params()['ip_address']\n", - " )\n", - "\n", - "\n", - "def print_related_alerts(alertDict, entityType, entityName):\n", - " if len(alertDict) > 0:\n", - " md(\n", - " f\"Found {len(alertDict)} different alert types related to this {entityType} (`{entityName}`)\",styles=[\"bold\",\"orange\"]\n", - " )\n", - " for (k, v) in alertDict.items():\n", - " print(f\"- {k}, # Alerts: {v}\")\n", - " else:\n", - " print(f\"No alerts for {entityType} entity `{entityName}`\")\n", - "\n", - "\n", - "if isinstance(related_alerts, pd.DataFrame) and not related_alerts.empty:\n", - " host_alert_items = (\n", - " related_alerts[[\"AlertName\", \"TimeGenerated\"]]\n", - " .groupby(\"AlertName\")\n", - " .TimeGenerated.agg(\"count\")\n", - " .to_dict()\n", - " )\n", - " print_related_alerts(host_alert_items, \"host\", hostname)\n", - " nbdisplay.display_timeline(\n", - " data=related_alerts, title=\"Alerts\", source_columns=[\"AlertName\"], height=200\n", - " )\n", - "else:\n", - " md(\"No related alerts found.\",styles=[\"bold\",\"green\"])" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - " ### Browse List of Related Alerts\n", - " Select an Alert to view details" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "ExecuteTime": { - "end_time": "2020-05-15T23:05:24.530275Z", - "start_time": "2020-05-15T23:05:24.464277Z" - } - }, - "outputs": [], - "source": [ - "def disp_full_alert(alert):\n", - " global related_alert\n", - " related_alert = SecurityAlert(alert)\n", - " nbdisplay.display_alert(related_alert, show_entities=True)\n", - "\n", - "recenter_wgt = widgets.Checkbox(\n", - " value=True,\n", - " description='Center subsequent query times round selected Alert?',\n", - " disabled=False,\n", - " **WIDGET_DEFAULTS\n", - ")\n", - "if related_alerts is not None and not related_alerts.empty:\n", - " related_alerts[\"CompromisedEntity\"] = related_alerts[\"Computer\"]\n", - " md(\"Click on alert to view details.\", styles=[\"bold\"])\n", - " display(recenter_wgt)\n", - " rel_alert_select = nbwidgets.AlertSelector(\n", - " alerts=related_alerts,\n", - " action=disp_full_alert,\n", - " )\n", - " rel_alert_select.display()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "[Contents](#toc)\n", - "## Related Hosts\n", - "**Hypothesis:** That an attacker has gained access to the host, compromized credentials for the accounts and laterally moving to the network gaining access to more hosts.\n", - "\n", - "This section provides related hosts of IP address which is being investigated. .If you wish to expand the scope of hunting then investigate each hosts in detail, it is recommended that to use the **Host Explorer Notebook (include link).**\n", - "\n", - "#### __NOTE - the following sections are only relevant for Internal IP Addresses.__" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "[Contents](#toc)\n", - "### Visualization - Networkx Graph" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "ExecuteTime": { - "end_time": "2020-05-15T23:05:30.863302Z", - "start_time": "2020-05-15T23:05:29.870080Z" - } - }, - "outputs": [], - "source": [ - "import networkx as nx\n", - "if ipaddr_origin == \"Internal\":\n", - " # Retrived relatd accounts from SecurityEvent table for Windows OS\n", - " if ip_entity['OSType'] =='Windows':\n", - " if \"SecurityEvent\" not in available_datasets:\n", - " raise ValueError(\"No Windows event log data available in the workspace\")\n", - " else:\n", - " related_hosts = \"\"\"\n", - " SecurityEvent\n", - " | where TimeGenerated >= datetime({start}) and TimeGenerated <= datetime({end})\n", - " | where IpAddress == \\'{ip_address}\\' or Computer == \\'{hostname}\\' \n", - " | summarize count() by Computer, IpAddress\n", - " \"\"\".format(\n", - " **ipaddr_query_params(), hostname=hostname\n", - " )\n", - " %kql -query related_hosts\n", - " related_hosts_df = _kql_raw_result_.to_dataframe()\n", - "\n", - " elif ip_entity['OSType'] =='Linux':\n", - " if \"Syslog\" not in available_datasets:\n", - " raise ValueError(\"No Linux syslog data available in the workspace\")\n", - " else:\n", - " related_hosts_df = qry_prov.LinuxSyslog.list_logons_for_source_ip(invest_times, ip_address=ipaddr_query_params()['ip_address'],add_query_items='extend IpAddress = HostIP | summarize count() by Computer, IpAddress')\n", - "\n", - " # Displaying networkx - static graph. for interactive graph uncomment and run next block of code.\n", - " plt.figure(10, figsize=(22, 14))\n", - " g = nx.from_pandas_edgelist(related_hosts_df, \"IpAddress\", \"Computer\")\n", - " md('Entity Relationship Graph - Related Hosts :: ',styles=[\"bold\",\"green\"])\n", - " nx.draw_circular(g, with_labels=True, size=40, font_size=12, font_color=\"blue\")\n", - "\n", - "\n", - " # Uncomment below cells if you want to dispaly interactive graphs using Pyvis library, Azure notebook free tier may not render the graph correctly.\n", - " # logonpyvis_graph = Network(notebook=True, height=\"750px\", width=\"100%\", bgcolor=\"#222222\", font_color=\"white\")\n", - "\n", - " # # set the physics layout of the network\n", - " # logonpyvis_graph.barnes_hut()\n", - "\n", - " # sources = related_hosts_df['Computer']\n", - " # targets = related_hosts_df['IpAddress']\n", - " # weights = related_hosts_df['count_']\n", - "\n", - " # edge_data = zip(sources, targets, weights)\n", - "\n", - " # for e in edge_data:\n", - " # src = e[0]\n", - " # dst = e[1]\n", - " # w = e[2]\n", - "\n", - " # logonpyvis_graph.add_node(src, src, title=src)\n", - " # logonpyvis_graph.add_node(dst, dst, title=dst)\n", - " # logonpyvis_graph.add_edge(src, dst, value=w)\n", - "\n", - " # neighbor_map = logonpyvis_graph.get_adj_list()\n", - "\n", - " # # add neighbor data to node hover data\n", - " # for node in logonpyvis_graph.nodes:\n", - " # node[\"title\"] += \" Neighbors:
\" + \"
\".join(neighbor_map[node[\"id\"]])\n", - " # node[\"value\"] = len(neighbor_map[node[\"id\"]]) \n", - "\n", - " # logonpyvis_graph.show(\"hostlogonpyvis_graph.html\")\n", - "else:\n", - " md(f'Analysis section Not Applicable since IP address owner is {ipaddr_origin}', styles=[\"bold\",\"red\"])" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "[Contents](#toc)\n", - "## Related Accounts\n", - "**Hypothesis:** That an attacker has gained access to the host, compromized credentials for the accounts on it and laterally moving to the network gaining access to more accounts.\n", - "\n", - "This section provides related accounts of IP address which is being investigated. .If you wish to expand the scope of hunting then investigate each accounts in detail, it is recommended that to use the **Account Explorer Notebook (include link).**" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "ExecuteTime": { - "end_time": "2019-09-10T20:12:42.022358Z", - "start_time": "2019-09-10T20:12:42.010961Z" - } - }, - "source": [ - "[Contents](#toc)\n", - "### Visualization - Networkx Graph" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "ExecuteTime": { - "end_time": "2020-05-15T23:05:36.951055Z", - "start_time": "2020-05-15T23:05:35.741976Z" - } - }, - "outputs": [], - "source": [ - "if ipaddr_origin == \"Internal\":\n", - " # Retrived relatd accounts from SecurityEvent table for Windows OS\n", - " if ip_entity['OSType'] =='Windows':\n", - " if \"SecurityEvent\" not in available_datasets:\n", - " raise ValueError(\"No Windows event log data available in the workspace\")\n", - " else:\n", - " related_accounts = \"\"\"\n", - " SecurityEvent\n", - " | where TimeGenerated >= datetime({start}) and TimeGenerated <= datetime({end})\n", - " | where IpAddress == \\'{ip_address}\\' or Computer == \\'{hostname}\\' \n", - " | summarize count() by Account, Computer\n", - " \"\"\".format(\n", - " **ipaddr_query_params(), hostname=hostname\n", - " )\n", - " %kql -query related_accounts\n", - " related_accounts_df = _kql_raw_result_.to_dataframe()\n", - "\n", - " elif ip_entity['OSType'] =='Linux':\n", - " if \"Syslog\" not in available_datasets:\n", - " raise ValueError(\"No Linux syslog data available in the workspace\")\n", - " else:\n", - " related_accounts_df = qry_prov.LinuxSyslog.list_logons_for_source_ip(invest_times, ip_address=ipaddr_query_params()['ip_address'],add_query_items='extend Account = AccountName | summarize count() by Account, Computer')\n", - "\n", - "\n", - " # Uncomment- below cells if above visualization does not render - Networkx connected Graph\n", - " plt.figure(10, figsize=(22, 14))\n", - " g = nx.from_pandas_edgelist(related_accounts_df, \"Computer\", \"Account\")\n", - " md('Entity Relationship Graph - Related Accounts :: ',styles=[\"bold\",\"green\"])\n", - " nx.draw_circular(g, with_labels=True, size=40, font_size=12, font_color=\"blue\")\n", - "\n", - " # Uncomment below cells if you want to display interactive graphs using Pyvis library, Azure notebook free tier may not render the graph correctly.\n", - " # acclogon_pyvisgraph = Network(notebook=True, height=\"750px\", width=\"100%\", bgcolor=\"#222222\", font_color=\"white\")\n", - "\n", - " # # set the physics layout of the network\n", - " # acclogon_pyvisgraph.barnes_hut()\n", - "\n", - "\n", - " # sources = related_accounts_df['Computer']\n", - " # targets = related_accounts_df['Account']\n", - " # weights = related_accounts_df['count_']\n", - "\n", - " # edge_data = zip(sources, targets, weights)\n", - "\n", - " # for e in edge_data:\n", - " # src = e[0]\n", - " # dst = e[1]\n", - " # w = e[2]\n", - "\n", - " # acclogon_pyvisgraph.add_node(src, src, title=src)\n", - " # acclogon_pyvisgraph.add_node(dst, dst, title=dst)\n", - " # acclogon_pyvisgraph.add_edge(src, dst, value=w)\n", - "\n", - " # neighbor_map = acclogon_pyvisgraph.get_adj_list()\n", - "\n", - " # # add neighbor data to node hover data\n", - " # for node in acclogon_pyvisgraph.nodes:\n", - " # node[\"title\"] += \" Neighbors:
\" + \"
\".join(neighbor_map[node[\"id\"]])\n", - " # node[\"value\"] = len(neighbor_map[node[\"id\"]]) # this value attrribute for the node affects node size\n", - "\n", - " # acclogon_pyvisgraph.show(\"accountlogonpyvis_graph.html\")\n", - "else:\n", - " md(f'Analysis section Not Applicable since IP address owner is {ipaddr_origin}', styles=[\"bold\",\"red\"])" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "ExecuteTime": { - "end_time": "2019-08-30T15:50:05.854226Z", - "start_time": "2019-08-30T15:50:04.517392Z" - } - }, - "source": [ - "[Contents](#toc)\n", - "## Logon Summary for Related Entities\n", - "**Hypothesis:** By analyzing logon activities of the related entities, we can identify change in logon patterns and narrow down the entities to few suspicious logon patterns.\n", - "\n", - "This section provides various visualization of logon attributes such as \n", - "- Weekly Failed Logon trend\n", - "- Logon Types \n", - "- Logon Processes\n", - "\n", - "If you wish to expand the scope of hunting then investigate specific host in detail, it is recommended that to use the **Host Explorer Notebook (include link).**" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "ExecuteTime": { - "end_time": "2019-09-10T20:18:33.673179Z", - "start_time": "2019-09-10T20:18:33.670042Z" - } - }, - "source": [ - "[Contents](#toc)\n", - "### HeatMap for Weekly failed logons" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "ExecuteTime": { - "end_time": "2020-05-15T23:05:46.615934Z", - "start_time": "2020-05-15T23:05:44.570772Z" - } - }, - "outputs": [], - "source": [ - "if ipaddr_origin == \"Internal\":\n", - " # Retrived related accounts from SecurityEvent table for Windows OS\n", - " if ip_entity['OSType'] =='Windows':\n", - " if \"SecurityEvent\" not in available_datasets:\n", - " raise ValueError(\"No Windows event log data available in the workspace\")\n", - " else:\n", - " failed_logons = \"\"\"\n", - " SecurityEvent\n", - " | where EventID in (4624,4625) | where IpAddress == \\'{ip_address}\\' or Computer == \\'{hostname}\\' \n", - " | where TimeGenerated >= datetime({start}) and TimeGenerated <= datetime({end})\n", - " | extend DayofWeek = case(dayofweek(TimeGenerated) == time(1.00:00:00), \"Monday\", \n", - " dayofweek(TimeGenerated) == time(2.00:00:00), \"Tuesday\",\n", - " dayofweek(TimeGenerated) == time(3.00:00:00), \"Wednesday\",\n", - " dayofweek(TimeGenerated) == time(4.00:00:00), \"Thursday\",\n", - " dayofweek(TimeGenerated) == time(5.00:00:00), \"Friday\",\n", - " dayofweek(TimeGenerated) == time(6.00:00:00), \"Saturday\",\n", - " \"Sunday\")\n", - " | summarize LogonCount=count() by DayofWeek, HourOfDay=format_datetime(bin(TimeGenerated,1h),'HH:mm')\n", - " \"\"\".format(\n", - " **ipaddr_query_params(), hostname=hostname\n", - " )\n", - " %kql -query failed_logons\n", - " failed_logons_df = _kql_raw_result_.to_dataframe()\n", - "\n", - " elif ip_entity['OSType'] =='Linux':\n", - " if \"Syslog\" not in available_datasets:\n", - " raise ValueError(\"No Linux syslog data available in the workspace\")\n", - " else: \n", - " failed_logons_df = qry_prov.LinuxSyslog.user_logon(invest_times, account_name ='', add_query_items=\"\"\"| where HostIP == '{ipaddr_text.value}' |extend Account = AccountName | extend DayofWeek = case(dayofweek(TimeGenerated) == time(1.00:00:00), \"Monday\", dayofweek(TimeGenerated) == time(2.00:00:00), \"Tuesday\",\n", - " dayofweek(TimeGenerated) == time(3.00:00:00), \"Wednesday\",\n", - " dayofweek(TimeGenerated) == time(4.00:00:00), \"Thursday\",\n", - " dayofweek(TimeGenerated) == time(5.00:00:00), \"Friday\",\n", - " dayofweek(TimeGenerated) == time(6.00:00:00), \"Saturday\", \"Sunday\") | summarize LogonCount=count() by DayofWeek, HourOfDay=format_datetime(bin(TimeGenerated,1h),'HH:mm')\"\"\")\n", - "\n", - " # Plotting hearmap using seaborn library if there are failed logons\n", - " if len(failed_logons_df) > 0:\n", - " df_pivot = (\n", - " failed_logons_df.reset_index()\n", - " .pivot_table(index=\"DayofWeek\", columns=\"HourOfDay\", values=\"LogonCount\")\n", - " .fillna(0)\n", - " )\n", - " display(\n", - " Markdown(\n", - " f'### Heatmap - Weekly Failed Logon Trend :: '\n", - " )\n", - " )\n", - " f, ax = plt.subplots(figsize=(16, 8))\n", - " hm1 = sns.heatmap(df_pivot, cmap=\"YlGnBu\", ax=ax)\n", - " plt.xticks(rotation=45)\n", - " plt.yticks(rotation=30)\n", - " else:\n", - " linux_logons=qry_prov.LinuxSyslog.list_logons_for_source_ip(**ipaddr_query_params())\n", - " failed_logons = (logon_events[logon_events['LogonResult'] == 'Failure'])\n", - "else:\n", - " md(f'Analysis section Not Applicable since IP address owner is {ipaddr_origin}', styles=[\"bold\",\"red\"])" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "[Contents](#toc)\n", - "### Host Logons Timeline" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "ExecuteTime": { - "end_time": "2020-05-15T23:05:49.256466Z", - "start_time": "2020-05-15T23:05:49.190460Z" - } - }, - "outputs": [], - "source": [ - "# set the origin time to the time of our alert\n", - "try:\n", - " origin_time = (related_alert.TimeGenerated \n", - " if recenter_wgt.value \n", - " else query_times.origin_time)\n", - "except NameError:\n", - " origin_time = query_times.origin_time\n", - " \n", - "logon_query_times = nbwidgets.QueryTime(\n", - " units=\"day\",\n", - " origin_time=origin_time,\n", - " before=5,\n", - " after=1,\n", - " max_before=20,\n", - " max_after=20,\n", - ")\n", - "logon_query_times.display()" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "ExecuteTime": { - "end_time": "2020-05-15T23:05:55.096129Z", - "start_time": "2020-05-15T23:05:52.661823Z" - } - }, - "outputs": [], - "source": [ - "if ipaddr_origin == \"Internal\":\n", - " host_logons = qry_prov.WindowsSecurity.list_host_logons(\n", - " logon_query_times, host_name=hostname\n", - " )\n", - "\n", - " if host_logons is not None and not host_logons.empty:\n", - " display(Markdown(\"### Logon timeline.\"))\n", - " tooltip_cols = [\n", - " \"TargetUserName\",\n", - " \"TargetDomainName\",\n", - " \"SubjectUserName\",\n", - " \"SubjectDomainName\",\n", - " \"LogonType\",\n", - " \"IpAddress\",\n", - " ]\n", - " nbdisplay.display_timeline(\n", - " data=host_logons,\n", - " group_by=\"TargetUserName\",\n", - " source_columns=tooltip_cols,\n", - " legend=\"right\", yaxis=True\n", - " )\n", - "\n", - " display(Markdown(\"### Counts of logon events by logon type.\"))\n", - " display(Markdown(\"Min counts for each logon type highlighted.\"))\n", - " logon_by_type = (\n", - " host_logons[[\"Account\", \"LogonType\", \"EventID\"]]\n", - " .astype({'LogonType': 'int32'})\n", - " .merge(right=pd.Series(data=nbdisplay._WIN_LOGON_TYPE_MAP, name=\"LogonTypeDesc\"),\n", - " left_on=\"LogonType\", right_index=True)\n", - " .drop(columns=\"LogonType\")\n", - " .groupby([\"Account\", \"LogonTypeDesc\"])\n", - " .count()\n", - " .unstack()\n", - " .rename(columns={\"EventID\": \"LogonCount\"})\n", - " .fillna(0)\n", - " .style\n", - " .background_gradient(cmap=\"viridis\", low=0.5, high=0)\n", - " .format(\"{0:0>3.0f}\")\n", - " )\n", - " display(logon_by_type)\n", - " else:\n", - " display(Markdown(\"No logon events found for host.\"))\n", - "else:\n", - " md(f'Analysis section Not Applicable since IP address owner is {ipaddr_origin}', styles=[\"bold\",\"red\"])" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Failed Logons Timeline" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "ExecuteTime": { - "end_time": "2020-05-15T23:06:01.493064Z", - "start_time": "2020-05-15T23:05:59.580819Z" - }, - "scrolled": true - }, - "outputs": [], - "source": [ - "if ipaddr_origin == \"Internal\":\n", - " failedLogons = qry_prov.WindowsSecurity.list_host_logon_failures(\n", - " logon_query_times, host_name=ip_entity.hostname\n", - " )\n", - " if failedLogons.empty:\n", - " print(\"No logon failures recorded for this host between \",\n", - " f\" {logon_query_times.start} and {logon_query_times.end}\"\n", - " )\n", - " else:\n", - " nbdisplay.display_timeline(\n", - " data=host_logons.query('TargetLogonId != \"0x3e7\"'),\n", - " overlay_data=failedLogons,\n", - " alert=related_alert,\n", - " title=\"Logons (blue=user-success, green=failed)\",\n", - " source_columns=tooltip_cols,\n", - " height=200,\n", - " )\n", - " display(failedLogons\n", - " .astype({'LogonType': 'int32'})\n", - " .merge(right=pd.Series(data=nbdisplay._WIN_LOGON_TYPE_MAP, name=\"LogonTypeDesc\"),\n", - " left_on=\"LogonType\", right_index=True)\n", - " [['Account', 'EventID', 'TimeGenerated',\n", - " 'Computer', 'SubjectUserName', 'SubjectDomainName',\n", - " 'TargetUserName', 'TargetDomainName',\n", - " 'LogonTypeDesc','IpAddress', 'WorkstationName'\n", - " ]])\n", - "else:\n", - " md(f'Analysis section Not Applicable since IP address owner is {ipaddr_origin}', styles=[\"bold\",\"red\"])" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "ExecuteTime": { - "end_time": "2019-08-30T15:52:54.700099Z", - "start_time": "2019-08-30T15:52:54.661189Z" - } - }, - "source": [ - "[Contents](#toc)\n", - "## Network Connection Analysis\n", - "\n", - "**Hypothesis:** That an attacker is remotely communicating with the host in order to compromise the host or for outbound communication to C2 for data exfiltration purposes after compromising the host.\n", - "\n", - "This section provides an overview of network activity to and from the host during hunting time frame, the purpose of this is for the identification of anomalous network traffic. If you wish to investigate a specific IP in detail it is recommended that to use another instance of this notebook with each IP addresses." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "[Contents](#toc)\n", - "### Network Check Communications with Other Hosts" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "ExecuteTime": { - "end_time": "2020-05-15T23:06:06.486183Z", - "start_time": "2020-05-15T23:06:06.429184Z" - } - }, - "outputs": [], - "source": [ - "ip_q_times = nbwidgets.QueryTime(\n", - " label=\"Set time bounds for network queries\",\n", - " units=\"day\",\n", - " max_before=28,\n", - " before=2,\n", - " after=5,\n", - " max_after=28,\n", - " origin_time=logon_query_times.origin_time\n", - ")\n", - "ip_q_times.display()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "[Contents](#toc)\n", - "### Query Flows by IP Address" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "ExecuteTime": { - "end_time": "2020-05-15T23:06:22.160247Z", - "start_time": "2020-05-15T23:06:10.292782Z" - } - }, - "outputs": [], - "source": [ - "if \"AzureNetworkAnalytics_CL\" not in available_datasets:\n", - " md_warn(\"No network flow data available.\")\n", - " md(\"Please skip the remainder of this section and go to [Time-Series-Anomalies](#Outbound-Data-transfer-Time-Series-Anomalies)\")\n", - " az_net_comms_df = None\n", - "else:\n", - " all_host_ips = (\n", - " ip_entity['private_ips'] + ip_entity['public_ips']\n", - " )\n", - " host_ips = [i.Address for i in all_host_ips]\n", - "\n", - " az_net_comms_df = qry_prov.Network.list_azure_network_flows_by_ip(\n", - " ip_q_times, ip_address_list=host_ips\n", - " )\n", - "\n", - " if isinstance(az_net_comms_df, pd.DataFrame) and not az_net_comms_df.empty:\n", - " az_net_comms_df['TotalAllowedFlows'] = az_net_comms_df['AllowedOutFlows'] + az_net_comms_df['AllowedInFlows']\n", - " nbdisplay.display_timeline(\n", - " data=az_net_comms_df,\n", - " group_by=\"L7Protocol\",\n", - " title=\"Network Flows by Protocol\",\n", - " time_column=\"FlowStartTime\",\n", - " source_columns=[\"FlowType\", \"AllExtIPs\", \"L7Protocol\", \"FlowDirection\"],\n", - " height=300,\n", - " legend=\"right\",\n", - " yaxis=True\n", - " )\n", - " nbdisplay.display_timeline(\n", - " data=az_net_comms_df,\n", - " group_by=\"FlowDirection\",\n", - " title=\"Network Flows by Direction\",\n", - " time_column=\"FlowStartTime\",\n", - " source_columns=[\"FlowType\", \"AllExtIPs\", \"L7Protocol\", \"FlowDirection\"],\n", - " height=300,\n", - " legend=\"right\",\n", - " yaxis=True\n", - " )\n", - " else:\n", - " md_warn(\"No network data for specified time range.\")\n", - " md(\"Please skip the remainder of this section and go to [Time-Series-Anomalies](#Outbound-Data-transfer-Time-Series-Anomalies)\")" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "ExecuteTime": { - "end_time": "2020-05-15T23:06:50.373391Z", - "start_time": "2020-05-15T23:06:50.084392Z" - } - }, - "outputs": [], - "source": [ - "try:\n", - " flow_plot = nbdisplay.display_timeline_values(\n", - " data=az_net_comms_df,\n", - " group_by=\"L7Protocol\",\n", - " source_columns=[\"FlowType\", \n", - " \"AllExtIPs\", \n", - " \"L7Protocol\", \n", - " \"FlowDirection\", \n", - " \"TotalAllowedFlows\"],\n", - " time_column=\"FlowStartTime\",\n", - " y=\"TotalAllowedFlows\",\n", - " legend=\"right\",\n", - " height=500,\n", - " kind=[\"vbar\", \"circle\"],\n", - " );\n", - "except NameError as err:\n", - " md(f\"Error Occured, Make sure to execute previous cells in notebook: {err}\",styles=[\"bold\",\"red\"])" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "ExecuteTime": { - "end_time": "2020-05-15T23:06:55.928554Z", - "start_time": "2020-05-15T23:06:55.790553Z" - } - }, - "outputs": [], - "source": [ - "try:\n", - " if az_net_comms_df is not None and not az_net_comms_df.empty:\n", - " cm = sns.light_palette(\"green\", as_cmap=True)\n", - "\n", - " cols = [\n", - " \"VMName\",\n", - " \"VMIPAddress\",\n", - " \"PublicIPs\",\n", - " \"SrcIP\",\n", - " \"DestIP\",\n", - " \"L4Protocol\",\n", - " \"L7Protocol\",\n", - " \"DestPort\",\n", - " \"FlowDirection\",\n", - " \"AllExtIPs\",\n", - " \"TotalAllowedFlows\",\n", - " ]\n", - " flow_index = az_net_comms_df[cols].copy()\n", - "\n", - " def get_source_ip(row):\n", - " if row.FlowDirection == \"O\":\n", - " return row.VMIPAddress if row.VMIPAddress else row.SrcIP\n", - " else:\n", - " return row.AllExtIPs if row.AllExtIPs else row.DestIP\n", - "\n", - " def get_dest_ip(row):\n", - " if row.FlowDirection == \"O\":\n", - " return row.AllExtIPs if row.AllExtIPs else row.DestIP\n", - " else:\n", - " return row.VMIPAddress if row.VMIPAddress else row.SrcIP\n", - " \n", - " flow_index[\"source\"] = flow_index.apply(get_source_ip, axis=1)\n", - " flow_index[\"dest\"] = flow_index.apply(get_dest_ip, axis=1)\n", - " display(flow_index)\n", - "\n", - " # Uncomment to view flow_index results\n", - " # with warnings.catch_warnings():\n", - " # warnings.simplefilter(\"ignore\")\n", - " # display(\n", - " # flow_index[\n", - " # [\"source\", \"dest\", \"L7Protocol\", \"FlowDirection\", \"TotalAllowedFlows\"]\n", - " # ]\n", - " # .groupby([\"source\", \"dest\", \"L7Protocol\", \"FlowDirection\"])\n", - " # .sum()\n", - " # .reset_index()\n", - " # .style.bar(subset=[\"TotalAllowedFlows\"], color=\"#d65f5f\")\n", - " # )\n", - "except NameError as err:\n", - " md(f\"Error Occured, Make sure to execute previous cells in notebook: {err}\",styles=[\"bold\",\"red\"])" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "[Contents](#toc)\n", - "### Bulk whois lookup " - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "ExecuteTime": { - "end_time": "2020-05-15T23:08:00.744206Z", - "start_time": "2020-05-15T23:07:01.493951Z" - } - }, - "outputs": [], - "source": [ - "# Bulk WHOIS lookup function\n", - "from functools import lru_cache\n", - "from ipwhois import IPWhois\n", - "from ipaddress import ip_address\n", - "\n", - "try:\n", - " # Add ASN informatio from Whois\n", - " flows_df = (\n", - " flow_index[[\"source\", \"dest\", \"L7Protocol\", \"FlowDirection\", \"TotalAllowedFlows\"]]\n", - " .groupby([\"source\", \"dest\", \"L7Protocol\", \"FlowDirection\"])\n", - " .sum()\n", - " .reset_index()\n", - " )\n", - "\n", - " num_ips = len(flows_df[\"source\"].unique()) + len(flows_df[\"dest\"].unique())\n", - " print(f\"Performing WhoIs lookups for {num_ips} IPs \", end=\"\")\n", - " #flows_df = flows_df.assign(DestASN=\"\", DestASNFull=\"\", SourceASN=\"\", SourceASNFull=\"\")\n", - " flows_df[\"DestASN\"] = flows_df.apply(lambda x: get_whois_info(x.dest, True), axis=1)\n", - " flows_df[\"SourceASN\"] = flows_df.apply(lambda x: get_whois_info(x.source, True), axis=1)\n", - " print(\"done\")\n", - "\n", - " # Split the tuple returned by get_whois_info into separate columns\n", - " flows_df[\"DestASNFull\"] = flows_df.apply(lambda x: x.DestASN[1], axis=1)\n", - " flows_df[\"DestASN\"] = flows_df.apply(lambda x: x.DestASN[0], axis=1)\n", - " flows_df[\"SourceASNFull\"] = flows_df.apply(lambda x: x.SourceASN[1], axis=1)\n", - " flows_df[\"SourceASN\"] = flows_df.apply(lambda x: x.SourceASN[0], axis=1)\n", - "\n", - " our_host_asns = [get_whois_info(ip.Address)[0] for ip in ip_entity.public_ips]\n", - " md(f\"Host {ip_entity.hostname} ASNs:\", \"bold\")\n", - " md(str(our_host_asns))\n", - "\n", - " flow_sum_df = flows_df.groupby([\"DestASN\", \"SourceASN\"]).agg(\n", - " TotalAllowedFlows=pd.NamedAgg(column=\"TotalAllowedFlows\", aggfunc=\"sum\"),\n", - " L7Protocols=pd.NamedAgg(column=\"L7Protocol\", aggfunc=lambda x: x.unique().tolist()),\n", - " source_ips=pd.NamedAgg(column=\"source\", aggfunc=lambda x: x.unique().tolist()),\n", - " dest_ips=pd.NamedAgg(column=\"dest\", aggfunc=lambda x: x.unique().tolist()),\n", - " ).reset_index()\n", - " flow_sum_df\n", - "except NameError as err:\n", - " md(f\"Error Occured, Make sure to execute previous cells in notebook: {err}\",styles=[\"bold\",\"red\"])" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Choose ASNs/IPs to Check for Threat Intel Reports\n", - "Choose from the list of Selected ASNs for the IPs you wish to check on.\n", - "The Source list is been pre-populated with all ASNs found in the network flow summary.\n", - "\n", - "As an example, we've populated the `Selected` list with the ASNs that have the lowest number of flows to and from the host. We also remove the ASN that matches the ASN of the host we are investigating.\n", - "\n", - "Please edit this list, using flow summary data above as a guide and leaving only ASNs that you are suspicious about. Typicially these would be ones with relatively low `TotalAllowedFlows` and possibly with unusual `L7Protocols`." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "ExecuteTime": { - "end_time": "2020-05-15T23:08:01.347207Z", - "start_time": "2020-05-15T23:08:01.287206Z" - } - }, - "outputs": [], - "source": [ - "try:\n", - " if isinstance(flow_sum_df, pd.DataFrame) and not flow_sum_df.empty:\n", - " all_asns = list(flow_sum_df[\"DestASN\"].unique()) + list(flow_sum_df[\"SourceASN\"].unique())\n", - " all_asns = set(all_asns) - set([\"private address\"])\n", - "\n", - " # Select the ASNs in the 25th percentile (lowest number of flows)\n", - " quant_25pc = flow_sum_df[\"TotalAllowedFlows\"].quantile(q=[0.25]).iat[0]\n", - " quant_25pc_df = flow_sum_df[flow_sum_df[\"TotalAllowedFlows\"] <= quant_25pc]\n", - " other_asns = list(quant_25pc_df[\"DestASN\"].unique()) + list(quant_25pc_df[\"SourceASN\"].unique())\n", - " other_asns = set(other_asns) - set(our_host_asns)\n", - " md(\"Choose IPs from Selected ASNs to look up for Threat Intel.\", \"bold\")\n", - " sel_asn = nbwidgets.SelectSubset(source_items=all_asns, default_selected=other_asns)\n", - "except NameError as err:\n", - " md(f\"Error Occured, Make sure to execute previous cells in notebook: {err}\",styles=[\"bold\",\"red\"])" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "ExecuteTime": { - "end_time": "2020-05-15T23:08:14.516746Z", - "start_time": "2020-05-15T23:08:01.935205Z" - } - }, - "outputs": [], - "source": [ - "try:\n", - " if isinstance(flow_sum_df, pd.DataFrame) and not flow_sum_df.empty:\n", - " ti_lookup = TILookup()\n", - " from itertools import chain\n", - " dest_ips = set(chain.from_iterable(flow_sum_df[flow_sum_df[\"DestASN\"].isin(sel_asn.selected_items)][\"dest_ips\"]))\n", - " src_ips = set(chain.from_iterable(flow_sum_df[flow_sum_df[\"SourceASN\"].isin(sel_asn.selected_items)][\"source_ips\"]))\n", - " selected_ips = dest_ips | src_ips\n", - " print(f\"{len(selected_ips)} unique IPs in selected ASNs\")\n", - "\n", - " # Add the IoCType to save cost of inferring each item\n", - " selected_ip_dict = {ip: \"ipv4\" for ip in selected_ips}\n", - " ti_results = ti_lookup.lookup_iocs(data=selected_ip_dict)\n", - "\n", - " print(f\"{len(ti_results)} results received.\")\n", - "\n", - " # ti_results_pos = ti_results[ti_results[\"Severity\"] > 0]\n", - " #####\n", - " # WARNING - faking results for illustration purposes\n", - " #####\n", - " ti_results_pos = ti_results.sample(n=2)\n", - "\n", - " print(f\"{len(ti_results_pos)} positive results found.\")\n", - "\n", - "\n", - " if not ti_results_pos.empty:\n", - " src_pos = flows_df.merge(ti_results_pos, left_on=\"source\", right_on=\"Ioc\")\n", - " dest_pos = flows_df.merge(ti_results_pos, left_on=\"dest\", right_on=\"Ioc\")\n", - " ti_ip_results = pd.concat([src_pos, dest_pos])\n", - " md_warn(\"Positive Threat Intel Results found for the following flows\")\n", - " md(\"Please examine these IP flows using the IP Explorer notebook.\", \"bold, large\")\n", - " display(ti_ip_results)\n", - "except NameError as err:\n", - " md(f\"Error Occured, Make sure to execute previous cells in notebook: {err}\",styles=[\"bold\",\"red\"])" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - " ### GeoIP Map of External IPs" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "ExecuteTime": { - "end_time": "2020-05-15T23:08:16.023912Z", - "start_time": "2020-05-15T23:08:15.611915Z" - } - }, - "outputs": [], - "source": [ - "iplocation = GeoLiteLookup()\n", - "def format_ip_entity(row, ip_col):\n", - " ip_entity = entities.IpAddress(Address=row[ip_col])\n", - " iplocation.lookup_ip(ip_entity=ip_entity)\n", - " ip_entity.AdditionalData[\"protocol\"] = row.L7Protocol\n", - " if \"severity\" in row:\n", - " ip_entity.AdditionalData[\"threat severity\"] = row[\"severity\"]\n", - " if \"Details\" in row:\n", - " ip_entity.AdditionalData[\"threat details\"] = row[\"Details\"]\n", - " return ip_entity\n", - "\n", - "# from msticpy.nbtools.foliummap import FoliumMap\n", - "folium_map = FoliumMap()\n", - "if az_net_comms_df is None or az_net_comms_df.empty:\n", - " print(\"No network flow data available.\")\n", - "else:\n", - " # Get the flow records for all flows not in the TI results\n", - " selected_out = flows_df[flows_df[\"DestASN\"].isin(sel_asn.selected_items)]\n", - " selected_out = selected_out[~selected_out[\"dest\"].isin(ti_ip_results[\"Ioc\"])]\n", - " if selected_out.empty:\n", - " ips_out = []\n", - " else:\n", - " ips_out = list(selected_out.apply(lambda x: format_ip_entity(x, \"dest\"), axis=1))\n", - " \n", - " selected_in = flows_df[flows_df[\"SourceASN\"].isin(sel_asn.selected_items)]\n", - " selected_in = selected_in[~selected_in[\"source\"].isin(ti_ip_results[\"Ioc\"])]\n", - " if selected_in.empty:\n", - " ips_in = []\n", - " else:\n", - " ips_in = list(selected_in.apply(lambda x: format_ip_entity(x, \"source\"), axis=1))\n", - "\n", - " ips_threats = list(ti_ip_results.apply(lambda x: format_ip_entity(x, \"Ioc\"), axis=1))\n", - "\n", - " display(HTML(\"

External IP Addresses communicating with host

\"))\n", - " display(HTML(\"Numbered circles indicate multiple items - click to expand\"))\n", - " display(HTML(\"Location markers:
Blue = outbound, Purple = inbound, Green = Host, Red = Threats\"))\n", - "\n", - " icon_props = {\"color\": \"green\"}\n", - " for ips in ip_entity.public_ips:\n", - " ips.AdditionalData[\"host\"] = ip_entity.hostname\n", - " folium_map.add_ip_cluster(ip_entities=ip_entity.public_ips, **icon_props)\n", - " icon_props = {\"color\": \"blue\"}\n", - " folium_map.add_ip_cluster(ip_entities=ips_out, **icon_props)\n", - " icon_props = {\"color\": \"purple\"}\n", - " folium_map.add_ip_cluster(ip_entities=ips_in, **icon_props)\n", - " icon_props = {\"color\": \"red\"}\n", - " folium_map.add_ip_cluster(ip_entities=ips_threats, **icon_props)\n", - " \n", - " display(folium_map)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "ExecuteTime": { - "end_time": "2019-09-05T18:03:37.980223Z", - "start_time": "2019-09-05T18:03:37.804856Z" - } - }, - "source": [ - "[Contents](#toc)\n", - "### Outbound Data transfer Time Series Anomalies" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "This section will look into the network datasources to check outbound data transfer trends. \n", - "You can also use time series analysis using below built-in KQL query example to analyze anamalous data transfer trends.below example shows sample dataset trends comparing with actual vs baseline traffic trends." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "ExecuteTime": { - "end_time": "2020-05-15T23:08:42.937737Z", - "start_time": "2020-05-15T23:08:41.794266Z" - } - }, - "outputs": [], - "source": [ - "if \"VMConnection\" in table_index or \"CommonSecurityLog\" in table_index:\n", - " # KQL query for full text search of IP address and display all datatypes\n", - " dataxfer_stats = \"\"\"\n", - " union isfuzzy=true\n", - " (\n", - " CommonSecurityLog \n", - " | where TimeGenerated >= datetime({start}) and TimeGenerated <= datetime({end})\n", - " | where isnotempty(DestinationIP) and isnotempty(SourceIP)\n", - " | where SourceIP == \\'{ip_address}\\'\n", - " | extend SentBytesinKB = (SentBytes / 1024), ReceivedBytesinKB = (ReceivedBytes / 1024)\n", - " | summarize DailyCount = count(), ListOfDestPorts = make_set(DestinationPort), TotalSentBytesinKB = sum(SentBytesinKB), TotalReceivedBytesinKB = sum(ReceivedBytesinKB) by SourceIP, DestinationIP, DeviceVendor, bin(TimeGenerated,1d)\n", - " | project DeviceVendor, TimeGenerated, SourceIP, DestinationIP, ListOfDestPorts, TotalSentBytesinKB, TotalReceivedBytesinKB \n", - " ),\n", - " (\n", - " VMConnection \n", - " | where TimeGenerated >= datetime({start}) and TimeGenerated <= datetime({end}) \n", - " | where isnotempty(DestinationIp) and isnotempty(SourceIp)\n", - " | where SourceIp == \\'{ip_address}\\'\n", - " | extend DeviceVendor = \"VMConnection\", SourceIP = SourceIp, DestinationIP = DestinationIp\n", - " | extend SentBytesinKB = (BytesSent / 1024), ReceivedBytesinKB = (BytesReceived / 1024)\n", - " | summarize DailyCount = count(), ListOfDestPorts = make_set(DestinationPort), TotalSentBytesinKB = sum(SentBytesinKB),TotalReceivedBytesinKB = sum(ReceivedBytesinKB) by SourceIP, DestinationIP, DeviceVendor, bin(TimeGenerated,1d)\n", - " | project DeviceVendor, TimeGenerated, SourceIP, DestinationIP, ListOfDestPorts, TotalSentBytesinKB, TotalReceivedBytesinKB \n", - " )\n", - " \"\"\".format(**ipaddr_query_params())\n", - " %kql -query dataxfer_stats\n", - " dataxfer_stats_df = _kql_raw_result_.to_dataframe()\n", - "\n", - "#Display result as transposed matrix of datatypes availabel to query for the query period\n", - "if len(dataxfer_stats_df) > 0:\n", - " md(\n", - " 'Data transfer daily stats for IP ::', styles=[\"bold\",\"green\"]\n", - " )\n", - " #display(dataxfer_stats_df)\n", - "else:\n", - " md_warn(\n", - " f'No Data transfer logs found for the query period'\n", - " )\n", - " #####\n", - " # WARNING - faking results for illustration purposes\n", - " #####\n", - "md(\n", - " 'Visualizing time series data transfer on dummy dataset for demonstration ::', styles=[\"bold\",\"green\"]\n", - " )\n", - "\n", - "#Generating graph based on dummy dataset in custom table representing Flow records outbound data transfer\n", - "timechartquery = \"\"\"\n", - "let TimeSeriesData = PaloAltoBytesSent_CL\n", - "| extend TimeGenerated = todatetime(EventTime_s), TotalBytesSent = todouble(TotalBytesSent_s) \n", - "| summarize TimeGenerated=make_list(TimeGenerated, 10000),TotalBytesSent=make_list(TotalBytesSent, 10000) by deviceVendor_s\n", - "| project TimeGenerated, TotalBytesSent;\n", - "TimeSeriesData\n", - "| extend (baseline,seasonal,trend,residual) = series_decompose(TotalBytesSent)\n", - "| mv-expand TotalBytesSent to typeof(double), TimeGenerated to typeof(datetime), baseline to typeof(long), seasonal to typeof(long), trend to typeof(long), residual to typeof(long)\n", - "| project TimeGenerated, TotalBytesSent, baseline\n", - "| render timechart with (title=\"Palo Alto Outbound Data Transfer Time Series decomposition\")\n", - "\"\"\"\n", - "%kql -query timechartquery" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Conclusion" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### List of Suspicious Activities/ Observables/Hunting bookmarks\n", - "- Suspicious alerts for the IP\n", - "- Anamalous Failed Logon trend on few days at 04:00 AM\n", - "- Anamalous spike in traffic logs on http\n", - "- Positive TI Hit from Open source feeds.\n", - "- Unusual data transfer deviating from normal baseline." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "[Contents](#toc)\n", - "## Appendices" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Available DataFrames" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "ExecuteTime": { - "end_time": "2020-04-02T10:00:41.436112Z", - "start_time": "2020-04-02T10:00:41.426605Z" - } - }, - "outputs": [], - "source": [ - "print('List of current DataFrames in Notebook')\n", - "print('-' * 50)\n", - "current_vars = list(locals().keys())\n", - "for var_name in current_vars:\n", - " if isinstance(locals()[var_name], pd.DataFrame) and not var_name.startswith('_'):\n", - " print(var_name)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Saving Data to Excel\n", - "To save the contents of a pandas DataFrame to an Excel spreadsheet\n", - "use the following syntax\n", - "```\n", - "writer = pd.ExcelWriter('myWorksheet.xlsx')\n", - "my_data_frame.to_excel(writer,'Sheet1')\n", - "writer.save()\n", - "```" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Configuration\n", - "\n", - "### `msticpyconfig.yaml` configuration File\n", - "You can configure primary and secondary TI providers and any required parameters in the `msticpyconfig.yaml` file. This is read from the current directory or you can set an environment variable (`MSTICPYCONFIG`) pointing to its location.\n", - "\n", - "To configure this file see the [ConfigureNotebookEnvironment notebook](https://github.com/Azure/Azure-Sentinel-Notebooks/blob/master/ConfiguringNotebookEnvironment.ipynb)" - ] - } - ], - "metadata": { - "hide_input": false, - "kernelspec": { - "display_name": "Python 3.6", - "language": "python", - "name": "python36" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.6.7" - }, - "latex_envs": { - "LaTeX_envs_menu_present": true, - "autoclose": false, - "autocomplete": true, - "bibliofile": "biblio.bib", - "cite_by": "apalike", - "current_citInitial": 1, - "eqLabelWithNumbers": true, - "eqNumInitial": 1, - "hotkeys": { - "equation": "Ctrl-E", - "itemize": "Ctrl-I" - }, - "labels_anchors": false, - "latex_user_defs": false, - "report_style_numbering": false, - "user_envs_cfg": false - }, - "toc": { - "base_numbering": 1, - "nav_menu": {}, - "number_sections": true, - "sideBar": true, - "skip_h1_title": true, - "title_cell": "Table of Contents", - "title_sidebar": "Contents", - "toc_cell": true, - "toc_position": { - "height": "calc(100% - 180px)", - "left": "10px", - "top": "150px", - "width": "299px" - }, - "toc_section_display": true, - "toc_window_display": true - }, - "varInspector": { - "cols": { - "lenName": 16, - "lenType": 16, - "lenVar": 40 - }, - "kernels_config": { - "python": { - "delete_cmd_postfix": "", - "delete_cmd_prefix": "del ", - "library": "var_list.py", - "varRefreshCmd": "print(var_dic_list())" - }, - "r": { - "delete_cmd_postfix": ") ", - "delete_cmd_prefix": "rm(", - "library": "var_list.r", - "varRefreshCmd": "cat(var_dic_list()) " - } - }, - "position": { - "height": "400px", - "left": "1549px", - "right": "20px", - "top": "120px", - "width": "351px" - }, - "types_to_exclude": [ - "module", - "function", - "builtin_function_or_method", - "instance", - "_Feature" - ], - "window_display": false - }, - "widgets": { - "application/vnd.jupyter.widget-state+json": { - "state": {}, - "version_major": 2, - "version_minor": 0 - } - } - }, - "nbformat": 4, - "nbformat_minor": 4 -} +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Title: IP Explorer\n", + "
\n", + "  Details...\n", + " \n", + "**Notebook Version:** 1.0
\n", + "**Python Version:** Python 3.7 (including Python 3.6 - AzureML)
\n", + "**Required Packages**: kqlmagic, msticpy, pandas, numpy, matplotlib, networkx, ipywidgets, ipython, scikit_learn, dnspython, ipwhois, folium, holoviews
\n", + "**Platforms Supported**:\n", + "- Azure Notebooks Free Compute\n", + "- Azure Notebooks DSVM\n", + "- OS Independent\n", + "\n", + "**Data Sources Required**:\n", + "- Log Analytics \n", + " - Heartbeat\n", + " - SecurityAlert\n", + " - SecurityEvent\n", + " - AzureNetworkAnalytics_CL\n", + " \n", + "- (Optional) \n", + " - VirusTotal (with API key)\n", + " - Alienvault OTX (with API key) \n", + " - IBM Xforce (with API key) \n", + " - CommonSecurityLog\n", + "
\n", + "\n", + "\n", + "Brings together a series of queries and visualizations to help you assess the security state of an IP address. It works with both internal addresses and public addresses. \n", + "
For internal addresses it focuses on traffic patterns and behavior of the host using that IP address. \n", + "
For public IPs it lets you perform threat intelligence lookups, passive dns, whois and other checks. \n", + "
It also allows you to examine any network traffic between the external IP address and your resources." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "toc": true + }, + "source": [ + "

Table of Contents

\n", + "
" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "[Contents](#toc)\n", + "## Hunting Hypothesis\n", + "Our broad initial hunting hypothesis is that a we have received IP address entity which is suspected to be compromized internal host or external public address to whom internal hosts are communicating in malicious manner, we will need to hunt from a range of different positions to validate or disprove this hypothesis.\n" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "[Contents](#toc)\n", + "### IP Explorer Mindmap\n", + "Below mindmap diagram shows hunting workflow depending upon the type of IP address provided\n", + "\n", + "![IPExplorerMindMap](https://github.com/Azure/Azure-Sentinel-Notebooks/raw/master/images/nb_ipexplorer-mindmap.png)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "---\n", + "### Notebook initialization\n", + "The next cell:\n", + "- Checks for the correct Python version\n", + "- Checks versions and optionally installs required packages\n", + "- Imports the required packages into the notebook\n", + "- Sets a number of configuration options.\n", + "\n", + "This should complete without errors. If you encounter errors or warnings look at the following two notebooks:\n", + "- [TroubleShootingNotebooks](https://github.com/Azure/Azure-Sentinel-Notebooks/blob/master/TroubleShootingNotebooks.ipynb)\n", + "- [ConfiguringNotebookEnvironment](https://github.com/Azure/Azure-Sentinel-Notebooks/blob/master/ConfiguringNotebookEnvironment.ipynb)\n", + "\n", + "If you are running in the Azure Sentinel Notebooks environment (Azure Notebooks or Azure ML) you can run live versions of these notebooks:\n", + "- [Run TroubleShootingNotebooks](./TroubleShootingNotebooks.ipynb)\n", + "- [Run ConfiguringNotebookEnvironment](./ConfiguringNotebookEnvironment.ipynb)\n", + "\n", + "You may also need to do some additional configuration to successfully use functions such as Threat Intelligence service lookup and Geo IP lookup. \n", + "There are more details about this in the `ConfiguringNotebookEnvironment` notebook and in these documents:\n", + "- [msticpy configuration](https://msticpy.readthedocs.io/en/latest/getting_started/msticpyconfig.html)\n", + "- [Threat intelligence provider configuration](https://msticpy.readthedocs.io/en/latest/data_acquisition/TIProviders.html#configuration-file)\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "ExecuteTime": { + "end_time": "2020-05-15T23:01:51.949751Z", + "start_time": "2020-05-15T23:01:51.909753Z" + } + }, + "outputs": [], + "source": [ + "from pathlib import Path\n", + "import os\n", + "import sys\n", + "import warnings\n", + "from IPython.display import display, HTML, Markdown\n", + "\n", + "REQ_PYTHON_VER=(3, 6)\n", + "REQ_MSTICPY_VER=(0, 6, 0)\n", + "\n", + "display(HTML(\"

Starting Notebook setup...

\"))\n", + "if Path(\"./utils/nb_check.py\").is_file():\n", + " from utils.nb_check import check_python_ver, check_mp_ver\n", + "\n", + " check_python_ver(min_py_ver=REQ_PYTHON_VER)\n", + " try:\n", + " check_mp_ver(min_msticpy_ver=REQ_MSTICPY_VER)\n", + " except ImportError:\n", + " !pip install --upgrade msticpy\n", + " if \"msticpy\" in sys.modules:\n", + " importlib.reload(sys.modules[\"msticpy\"])\n", + " else:\n", + " import msticpy\n", + " check_mp_ver(REQ_MSTICPY_VER)\n", + " \n", + "\n", + "# If not using Azure Notebooks, install msticpy with\n", + "# !pip install msticpy\n", + "from msticpy.nbtools import nbinit\n", + "extra_imports = [\n", + " \"msticpy.nbtools.entityschema, IpAddress\",\n", + " \"msticpy.nbtools.entityschema, GeoLocation\",\n", + " \"msticpy.sectools.ip_utils, create_ip_record\",\n", + " \"msticpy.sectools.ip_utils, get_ip_type\",\n", + " \"msticpy.sectools.ip_utils, get_whois_info\",\n", + "]\n", + "nbinit.init_notebook(\n", + " namespace=globals(),\n", + " extra_imports=extra_imports,\n", + ");\n", + "WIDGET_DEFAULTS = {\n", + " \"layout\": widgets.Layout(width=\"95%\"),\n", + " \"style\": {\"description_width\": \"initial\"},\n", + "}" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "[Contents](#toc)\n", + "### Get WorkspaceId and Authenticate to Log Analytics \n", + "
\n", + "  Details...\n", + "If you are using user/device authentication, run the following cell. \n", + "- Click the 'Copy code to clipboard and authenticate' button.\n", + "- This will pop up an Azure Active Directory authentication dialog (in a new tab or browser window). The device code will have been copied to the clipboard. \n", + "- Select the text box and paste (Ctrl-V/Cmd-V) the copied value. \n", + "- You should then be redirected to a user authentication page where you should authenticate with a user account that has permission to query your Log Analytics workspace.\n", + "\n", + "Use the following syntax if you are authenticating using an Azure Active Directory AppId and Secret:\n", + "```\n", + "%kql loganalytics://tenant(aad_tenant).workspace(WORKSPACE_ID).clientid(client_id).clientsecret(client_secret)\n", + "```\n", + "instead of\n", + "```\n", + "%kql loganalytics://code().workspace(WORKSPACE_ID)\n", + "```\n", + "\n", + "Note: you may occasionally see a JavaScript error displayed at the end of the authentication - you can safely ignore this.
\n", + "On successful authentication you should see a ```popup schema``` button.\n", + "To find your Workspace Id go to [Log Analytics](https://ms.portal.azure.com/#blade/HubsExtension/Resources/resourceType/Microsoft.OperationalInsights%2Fworkspaces). Look at the workspace properties to find the ID.\n", + "
" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "ExecuteTime": { + "end_time": "2020-05-15T23:02:52.662562Z", + "start_time": "2020-05-15T23:02:52.653563Z" + } + }, + "outputs": [], + "source": [ + "#See if we have an Azure Sentinel Workspace defined in our config file, if not let the user specify Workspace and Tenant IDs\n", + "from msticpy.nbtools.wsconfig import WorkspaceConfig\n", + "ws_config = WorkspaceConfig()\n", + "try:\n", + " ws_id = ws_config['workspace_id']\n", + " ten_id = ws_config['tenant_id']\n", + " config = True\n", + " md(\"Workspace details collected from config file\")\n", + "except KeyError:\n", + " md(('Please go to your Log Analytics workspace, copy the workspace ID'\n", + " ' and/or tenant Id and paste here to enable connection to the workspace and querying of it..
'))\n", + " ws_id_wgt = nbwidgets.GetEnvironmentKey(env_var='WORKSPACE_ID',\n", + " prompt='Please enter your Log Analytics Workspace Id:', auto_display=True)\n", + " ten_id_wgt = nbwidgets.GetEnvironmentKey(env_var='TENANT_ID',\n", + " prompt='Please enter your Log Analytics Tenant Id:', auto_display=True)\n", + " config = False" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "ExecuteTime": { + "end_time": "2020-05-15T23:03:22.552179Z", + "start_time": "2020-05-15T23:02:56.043852Z" + } + }, + "outputs": [], + "source": [ + "# Authentication\n", + "qry_prov = QueryProvider(data_environment=\"LogAnalytics\")\n", + "qry_prov.connect(connection_str=ws_config.code_connect_str)\n", + "table_index = qry_prov.schema_tables" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "[Contents](#toc)\n", + "## Enter the IP Address and query time window\n", + "\n", + "Type the IP address you want to search for and the time bounds over which search.\n", + "\n", + "You can specify the IP address value in the widget e.g. 192.168.1.1" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "ExecuteTime": { + "end_time": "2020-05-15T23:03:22.632179Z", + "start_time": "2020-05-15T23:03:22.619179Z" + } + }, + "outputs": [], + "source": [ + "ipaddr_text = widgets.Text(\n", + " description=\"Enter the IP Address to search for:\", **WIDGET_DEFAULTS\n", + ")\n", + "display(ipaddr_text)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "ExecuteTime": { + "end_time": "2020-05-15T23:03:56.698491Z", + "start_time": "2020-05-15T23:03:56.631491Z" + } + }, + "outputs": [], + "source": [ + "query_times = nbwidgets.QueryTime(units=\"day\", max_before=20, before=5, max_after=7)\n", + "query_times.display()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "[Contents](#toc)\n", + "## Determine IP Address Type" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "ExecuteTime": { + "end_time": "2020-05-15T23:04:47.927548Z", + "start_time": "2020-05-15T23:04:43.963316Z" + } + }, + "outputs": [], + "source": [ + "# Set up function to allow easy reference to common parameters for queries throughout the notebook\n", + "def ipaddr_query_params():\n", + " return {\n", + " \"start\": query_times.start,\n", + " \"end\": query_times.end,\n", + " \"ip_address\": ipaddr_text.value.strip()\n", + " }\n", + "\n", + "ipaddr_type = get_ip_type(ipaddr_query_params()['ip_address'])\n", + "\n", + "md(f'Depending on the IP Address origin, different sections of this notebook are applicable', styles=[\"bold\", \"large\"])\n", + "md(f'Please follow either the Interal IP Address or External IP Address sections based on below Recommendation', styles=[\"bold\"])\n", + "\n", + "#Get details from Heartbeat table for the given IP Address and Time Parameters\n", + "heartbeat_df = qry_prov.Heartbeat.get_info_by_ipaddress(**ipaddr_query_params())\n", + "\n", + "# Set hostnames retrived from Heartbeat table if available\n", + "if not heartbeat_df.empty:\n", + " hostname = heartbeat_df[\"Computer\"][0]\n", + "else:\n", + " hostname = \"\"\n", + " \n", + "if not heartbeat_df.empty:\n", + " ipaddr_origin = \"Internal\"\n", + " md(f'IP Address type based on subnet: {ipaddr_type} & IP Address Owner based on available logs : {ipaddr_origin}', styles=[\"blue\",\"bold\"])\n", + " display(Markdown('#### Recommendation - Go to section [InternalIP](#goto_internalIP)'))\n", + "elif ipaddr_type==\"Private\" and heartbeat_df.empty:\n", + " ipaddr_origin = \"Unknown\"\n", + " md(f'IP Address type based on subnet: {ipaddr_type} & IP Address Owner based on available logs : {ipaddr_origin}', styles=[\"blue\",\"bold\"])\n", + " display(Markdown('#### Recommendation - Go to section [InternalIP](#goto_internalIP)'))\n", + "else:\n", + " ipaddr_origin = \"External\"\n", + " md(f'IP Address type based on subnet: {ipaddr_type} & IP Address Owner based on available logs : {ipaddr_origin}', styles=[\"blue\",\"bold\"])\n", + " display(Markdown('#### Recommendation - Go to section [ExternalIP](#goto_externalIP)'))\n", + " \n", + "#Populate related IP addresses for the calculated hostname\n", + "az_net_df = pd.DataFrame()\n", + "if \"AzureNetworkAnalytics_CL\" in table_index:\n", + " aznet_query = f\"\"\"\n", + " AzureNetworkAnalytics_CL | where ResourceType == 'NetworkInterface' \n", + " | where SubType_s == \"Topology\" \n", + " | search \\'{ipaddr_text.value}\\' \n", + " | where TimeGenerated >= datetime({query_times.start}) \n", + " | where TimeGenerated <= datetime({query_times.end}) \n", + " | where VirtualMachine_s has '{hostname}' \n", + " | top 1 by TimeGenerated desc \n", + " | project PrivateIPAddresses = PrivateIPAddresses_s, PublicIPAddresses = PublicIPAddresses_s\"\"\"\n", + " az_net_df = qry_prov.exec_query(query=aznet_query)\n", + " \n", + "# Create IP Entity record using available dataframes or input ip address if nothing present\n", + "if az_net_df.empty and heartbeat_df.empty:\n", + " ip_entity = IpAddress()\n", + " ip_entity['Address'] = ipaddr_query_params()['ip_address']\n", + " ip_entity['Type'] = 'ipaddress'\n", + " ip_entity['OSType'] = 'Unknown'\n", + " md('No Heartbeat Data and Network topology data found')\n", + "elif not heartbeat_df.empty:\n", + " if az_net_df.empty:\n", + " ip_entity = create_ip_record(\n", + " heartbeat_df=heartbeat_df)\n", + " else:\n", + " ip_entity = create_ip_record(\n", + " heartbeat_df=heartbeat_df, az_net_df=az_net_df)\n", + "#Display IP Entity\n", + "md(\"Displaying IP Entity\", styles=[\"green\",\"bold\"])\n", + "print(ip_entity)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "## External IP" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "[Contents](#toc)\n", + "### GeoIP Lookups for External IP Addresses" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "ExecuteTime": { + "end_time": "2020-04-27T08:33:37.478812Z", + "start_time": "2020-04-27T08:33:37.470173Z" + } + }, + "outputs": [], + "source": [ + "# msticpy- geoip module to retrieving Geo Location for Public IP addresses\n", + "# To force Threatinel lookup for Internal public IP, replace and with or in if condition\n", + "if ipaddr_type == \"Public\" and ipaddr_origin == \"External\" :\n", + " iplocation = GeoLiteLookup()\n", + "\n", + " loc_results, ext_ip_entity = iplocation.lookup_ip(ip_address=ipaddr_query_params()['ip_address'])\n", + " md(\n", + " 'Geo Location for the IP Address ::', styles=[\"bold\",\"green\"]\n", + " )\n", + " print(ext_ip_entity[0])\n", + "else:\n", + " md(f'Analysis section Not Applicable since IP address owner is {ipaddr_origin}', styles=[\"bold\",\"red\"])" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "[Contents](#toc)\n", + "### Whois Registrars for External IP Addresses" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "ExecuteTime": { + "end_time": "2020-04-27T08:33:39.572115Z", + "start_time": "2020-04-27T08:33:39.566009Z" + } + }, + "outputs": [], + "source": [ + "# ipwhois module to retrieve whois registrar for Public IP addresses\n", + "# To force Threatinel lookup for Internal public IP, replace and with or in if condition\n", + "if ipaddr_type == \"Public\" and ipaddr_origin == \"External\" :\n", + " from ipwhois import IPWhois\n", + "\n", + " whois = IPWhois(ipaddr_query_params()['ip_address'])\n", + " whois_result = whois.lookup_whois()\n", + " if whois_result:\n", + " md(f'Whois Registrar Info ::', styles=[\"bold\",\"green\"])\n", + " display(whois_result)\n", + " else:\n", + " md(\n", + " f'No whois records available', styles=[\"bold\",\"orange\"]\n", + " )\n", + "else:\n", + " md(f'Analysis section Not Applicable since IP address owner is {ipaddr_origin}', styles=[\"bold\",\"red\"])" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "[Contents](#toc)\n", + "### Opensource and Azure Sentinel ThreatIntel Lookups" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Configure your TI Provider settings\n", + "If you have not used threat intelligence lookups before you will need to supply API keys for the \n", + "TI Providers that you want to use. Please see the section on configuring [msticpyconfig.yaml](#msticpyconfig.yaml-configuration-File)\n", + "\n", + "Then reload provider settings:\n", + "```\n", + "mylookup = TILookup()\n", + "mylookup.reload_provider_settings()\n", + "```" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "ExecuteTime": { + "end_time": "2020-04-27T08:33:43.562087Z", + "start_time": "2020-04-27T08:33:43.554830Z" + }, + "scrolled": true + }, + "outputs": [], + "source": [ + "# To force Threatinel lookup for Internal public IP, replace and with or in if condition\n", + "if ipaddr_type == \"Public\" and ipaddr_origin == \"External\" :\n", + " mylookup = TILookup()\n", + " mylookup.loaded_providers\n", + " resp = mylookup.lookup_ioc(observable=ipaddr_query_params()['ip_address'], ioc_type=\"ipv4\")\n", + " md(f'ThreatIntel Lookup for IP ::', styles=[\"bold\",\"green\"])\n", + " display(mylookup.result_to_df(resp).T)\n", + "else:\n", + " md(f'Analysis section Not Applicable since IP address owner is {ipaddr_origin}', styles=[\"bold\",\"red\"])" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "[Contents](#toc)\n", + "### Passive DNS lookups for External IP Addresses" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "ExecuteTime": { + "end_time": "2020-04-27T08:33:45.838706Z", + "start_time": "2020-04-27T08:33:45.829919Z" + } + }, + "outputs": [], + "source": [ + "# To force Passive DNS lookup for Internal public IP, change and with or in if\n", + "if ipaddr_type == \"Public\" and ipaddr_origin == \"External\" :\n", + " # retrieve passive dns from TI Providers\n", + " pdns = mylookup.lookup_ioc(\n", + " observable=ipaddr_query_params()['ip_address'],\n", + " ioc_type=\"ipv4\",\n", + " ioc_query_type=\"passivedns\",\n", + " providers=[\"XForce\"],\n", + " )\n", + " pdns_df = mylookup.result_to_df(pdns)\n", + " if not pdns_df.empty and pdns_df[\"RawResult\"][0] and \"RDNS\" in pdns_df[\"RawResult\"][0]:\n", + " pdnsdomains = pdns_df[\"RawResult\"][0][\"RDNS\"]\n", + " md(\n", + " 'Passive DNS domains for IP: {pdnsdomains}',styles=[\"bold\",\"green\"]\n", + " )\n", + " display(mylookup.result_to_df(pdns).T)\n", + " else:\n", + " md(\n", + " 'No passive domains found from the providers', styles=[\"bold\",\"orange\"]\n", + " )\n", + "else:\n", + " md(f'Analysis section Not Applicable since IP address owner is {ipaddr_origin}', styles=[\"bold\",\"red\"])" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "## Internal IP Address" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "[Contents](#toc)\n", + "### Data Sources available to query related to IP" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "ExecuteTime": { + "end_time": "2020-05-15T23:04:59.773853Z", + "start_time": "2020-05-15T23:04:53.039482Z" + } + }, + "outputs": [], + "source": [ + "if ipaddr_origin in [\"Internal\",\"Unknown\"]:\n", + " # KQL query for full text search of IP address and display all datatypes populated for the time period\n", + " datasource_status = \"\"\"\n", + " search \\'{ip_address}\\' or \\'{hostname}\\'\n", + " | where TimeGenerated >= datetime({start}) and TimeGenerated <= datetime({end})\n", + " | summarize RowCount=count() by Table=$table\n", + " \"\"\".format(\n", + " **ipaddr_query_params(), hostname=hostname\n", + " )\n", + " datasource_status_df = qry_prov.exec_query(datasource_status)\n", + "\n", + " # Display result as transposed matrix of datatypes availabel to query for the query period\n", + " if not datasource_status_df.empty:\n", + " available_datasets = datasource_status_df['Table'].values\n", + " md(\"Datasources available to query for IP ::\", styles=[\"green\",\"bold\"])\n", + " display(datasource_status_df)\n", + " else:\n", + " md_warn(\"No datasources contain given IP address for the query period\")\n", + "else:\n", + " md(f'Analysis section Not Applicable since IP address type is: {ipaddr_origin}', styles=[\"bold\",\"red\"])" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "[Contents](#toc)\n", + "### Check if IP is assigned to multiple hostnames" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "ExecuteTime": { + "end_time": "2020-05-15T23:05:03.895367Z", + "start_time": "2020-05-15T23:05:02.486243Z" + } + }, + "outputs": [], + "source": [ + "if ipaddr_origin == \"Internal\" or not datasource_status_df.empty:\n", + " # Get single event - try process creation\n", + " if ip_entity['OSType'] =='Windows':\n", + " if \"SecurityEvent\" not in available_datasets:\n", + " raise ValueError(\"No Windows event log data available in the workspace\")\n", + " host_name = None\n", + " matching_hosts_df = qry_prov.WindowsSecurity.list_host_processes(\n", + " query_times, host_name=hostname, add_query_items=\"| distinct Computer\"\n", + " )\n", + " elif ip_entity['OSType'] =='Linux':\n", + " if \"Syslog\" not in available_datasets:\n", + " raise ValueError(\"No Linux syslog data available in the workspace\")\n", + " else:\n", + " linux_syslog_query = f\"\"\" Syslog | where TimeGenerated >= datetime({query_times.start}) | where TimeGenerated <= datetime({query_times.end}) | where HostIP == '{ipaddr_text.value}' | distinct Computer \"\"\"\n", + " matching_hosts_df = qry_prov.exec_query(query=linux_syslog_query)\n", + "\n", + " if len(matching_hosts_df) > 1:\n", + " print(f\"Multiple matches for '{hostname}'. Please select a host from the list.\")\n", + " choose_host = nbwidgets.SelectString(\n", + " item_list=list(matching_hosts_df[\"Computer\"].values),\n", + " description=\"Select the host.\",\n", + " auto_display=True,\n", + " )\n", + " elif not matching_hosts_df.empty:\n", + " host_name = matching_hosts_df[\"Computer\"].iloc[0]\n", + " print(f\"Unique host found for IP: {hostname}\")\n", + "elif datasource_status_df.empty:\n", + " md_warn(\"No datasources contain given IP address for the query period\")\n", + "else: \n", + " md(f'Analysis section Not Applicable since IP address type is : {ipaddr_origin}', styles=[\"bold\",\"red\"])" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "[Contents](#toc)\n", + "### System Info" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "ExecuteTime": { + "end_time": "2020-05-15T23:05:07.346683Z", + "start_time": "2020-05-15T23:05:07.330684Z" + } + }, + "outputs": [], + "source": [ + "# Retrieving System info from internal table if IP address is not Public\n", + "if ipaddr_origin == \"Internal\" and not heartbeat_df.empty:\n", + " md(\n", + " 'System Info retrieved from Heartbeat table ::', styles=[\"green\",\"bold\"]\n", + " )\n", + " display(heartbeat_df.T)\n", + "else:\n", + " md_warn(\n", + " 'No records available in HeartBeat table'\n", + " )" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "[Contents](#toc)\n", + "### ServiceMap - Get List of Services for Host" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "ExecuteTime": { + "end_time": "2020-05-15T23:05:10.389939Z", + "start_time": "2020-05-15T23:05:10.369939Z" + } + }, + "outputs": [], + "source": [ + "if ipaddr_origin == \"Internal\":\n", + " if \"ServiceMapProcess_CL\" not in available_datasets:\n", + " md_warn(\"ServiceMap data is not enabled\")\n", + " md(\n", + " f\"Enable ServiceMap Solution from Azure marketplce:
\"\n", + " +\"https://docs.microsoft.com/en-us/azure/azure-monitor/insights/service-map#enable-service-map\",\n", + " styles=[\"bold\"]\n", + " )\n", + "\n", + " else:\n", + " servicemap_proc_query = \"\"\"\n", + " ServiceMapProcess_CL\n", + " | where Computer == \\'{hostname}\\'\n", + " | where TimeGenerated >= datetime({start}) and TimeGenerated <= datetime({end})\n", + " | project Computer, Services_s, DisplayName_s, ExecutableName_s , ExecutablePath_s \n", + " \"\"\".format(\n", + " hostname=hostname, **ipaddr_query_params()\n", + " )\n", + "\n", + " servicemap_proc_df = qry_prov.exec_query(servicemap_proc_query)\n", + " display(servicemap_proc_df)\n", + "else:\n", + " md(f'Analysis section Not Applicable since IP address type is {ipaddr_type}', styles=[\"bold\",\"red\"])" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Related Alerts" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "ExecuteTime": { + "end_time": "2020-05-15T23:05:14.185177Z", + "start_time": "2020-05-15T23:05:14.123178Z" + } + }, + "outputs": [], + "source": [ + "ra_query_times = nbwidgets.QueryTime(\n", + " units=\"day\",\n", + " origin_time=query_times.origin_time,\n", + " max_before=28,\n", + " max_after=5,\n", + " before=5,\n", + " auto_display=True,\n", + ")\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Visualization - Timeline of Related Alerts" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "ExecuteTime": { + "end_time": "2020-05-15T23:05:19.536611Z", + "start_time": "2020-05-15T23:05:17.943028Z" + } + }, + "outputs": [], + "source": [ + "#Provide hostname if present to the query\n", + "if hostname:\n", + " md(f\"Searching for alerts related to {hostname}...\")\n", + " related_alerts = qry_prov.SecurityAlert.list_related_alerts(\n", + " ra_query_times, host_name=hostname\n", + " )\n", + "else:\n", + " md(f\"Searching for alerts related to ip address(es) {ipaddr_query_params()['ip_address']}\")\n", + " related_alerts = qry_prov.SecurityAlert.list_alerts_for_ip(\n", + " ra_query_times, source_ip_list=ipaddr_query_params()['ip_address']\n", + " )\n", + "\n", + "\n", + "def print_related_alerts(alertDict, entityType, entityName):\n", + " if len(alertDict) > 0:\n", + " md(\n", + " f\"Found {len(alertDict)} different alert types related to this {entityType} (`{entityName}`)\",styles=[\"bold\",\"orange\"]\n", + " )\n", + " for (k, v) in alertDict.items():\n", + " print(f\"- {k}, # Alerts: {v}\")\n", + " else:\n", + " print(f\"No alerts for {entityType} entity `{entityName}`\")\n", + "\n", + "\n", + "if isinstance(related_alerts, pd.DataFrame) and not related_alerts.empty:\n", + " host_alert_items = (\n", + " related_alerts[[\"AlertName\", \"TimeGenerated\"]]\n", + " .groupby(\"AlertName\")\n", + " .TimeGenerated.agg(\"count\")\n", + " .to_dict()\n", + " )\n", + " print_related_alerts(host_alert_items, \"host\", hostname)\n", + " nbdisplay.display_timeline(\n", + " data=related_alerts, title=\"Alerts\", source_columns=[\"AlertName\"], height=200\n", + " )\n", + "else:\n", + " md(\"No related alerts found.\",styles=[\"bold\",\"green\"])" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + " ### Browse List of Related Alerts\n", + " Select an Alert to view details" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "ExecuteTime": { + "end_time": "2020-05-15T23:05:24.530275Z", + "start_time": "2020-05-15T23:05:24.464277Z" + } + }, + "outputs": [], + "source": [ + "def disp_full_alert(alert):\n", + " global related_alert\n", + " related_alert = SecurityAlert(alert)\n", + " nbdisplay.display_alert(related_alert, show_entities=True)\n", + "\n", + "recenter_wgt = widgets.Checkbox(\n", + " value=True,\n", + " description='Center subsequent query times round selected Alert?',\n", + " disabled=False,\n", + " **WIDGET_DEFAULTS\n", + ")\n", + "if related_alerts is not None and not related_alerts.empty:\n", + " related_alerts[\"CompromisedEntity\"] = related_alerts[\"Computer\"]\n", + " md(\"Click on alert to view details.\", styles=[\"bold\"])\n", + " display(recenter_wgt)\n", + " rel_alert_select = nbwidgets.SelectAlert(\n", + " alerts=related_alerts,\n", + " action=disp_full_alert,\n", + " )\n", + " rel_alert_select.display()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "[Contents](#toc)\n", + "## Related Hosts\n", + "**Hypothesis:** That an attacker has gained access to the host, compromized credentials for the accounts and laterally moving to the network gaining access to more hosts.\n", + "\n", + "This section provides related hosts of IP address which is being investigated. .If you wish to expand the scope of hunting then investigate each hosts in detail, it is recommended that to use the **Host Explorer (Windows/Linux)**\n", + " - [Entity Explorer - Windows Host](https://github.com/Azure/Azure-Sentinel-Notebooks/blob/master/Entity%20Explorer%20-%20Windows%20Host.ipynb)\n", + " - [Entity Explorer - Linux Host](https://github.com/Azure/Azure-Sentinel-Notebooks/blob/master/Entity%20Explorer%20-%20Linux%20Host.ipynb)\n", + " \n", + "If you are running in the Azure Sentinel Notebooks environment (Azure Notebooks or Azure ML) you can run live versions of these notebooks:\n", + " - [Run Entity Explorer - Windows Host](./Entity%20Explorer%20-%20Windows%20Host.ipynb)\n", + " - [Run Entity Explorer - Linux Host](./Entity%20Explorer%20-%20Linux%20Host.ipynb)\n", + "\n", + "#### __NOTE - the following sections are only relevant for Internal IP Addresses.__" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "[Contents](#toc)\n", + "### Visualization - Networkx Graph" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "ExecuteTime": { + "end_time": "2020-05-15T23:05:30.863302Z", + "start_time": "2020-05-15T23:05:29.870080Z" + } + }, + "outputs": [], + "source": [ + "import networkx as nx\n", + "if ipaddr_origin == \"Internal\":\n", + " # Retrived relatd accounts from SecurityEvent table for Windows OS\n", + " if ip_entity['OSType'] =='Windows':\n", + " if \"SecurityEvent\" not in available_datasets:\n", + " raise ValueError(\"No Windows event log data available in the workspace\")\n", + " else:\n", + " related_hosts = \"\"\"\n", + " SecurityEvent\n", + " | where TimeGenerated >= datetime({start}) and TimeGenerated <= datetime({end})\n", + " | where IpAddress == \\'{ip_address}\\' or Computer == \\'{hostname}\\' \n", + " | summarize count() by Computer, IpAddress\n", + " \"\"\".format(\n", + " **ipaddr_query_params(), hostname=hostname\n", + " )\n", + "\n", + " related_hosts_df = qry_prov.exec_query(related_hosts)\n", + "\n", + " elif ip_entity['OSType'] =='Linux':\n", + " if \"Syslog\" not in available_datasets:\n", + " raise ValueError(\"No Linux syslog data available in the workspace\")\n", + " else:\n", + " related_hosts_df = qry_prov.LinuxSyslog.list_logons_for_source_ip(invest_times, ip_address=ipaddr_query_params()['ip_address'],add_query_items='extend IpAddress = HostIP | summarize count() by Computer, IpAddress')\n", + "\n", + " # Displaying networkx - static graph. for interactive graph uncomment and run next block of code.\n", + " plt.figure(10, figsize=(22, 14))\n", + " g = nx.from_pandas_edgelist(related_hosts_df, \"IpAddress\", \"Computer\")\n", + " md('Entity Relationship Graph - Related Hosts :: ',styles=[\"bold\",\"green\"])\n", + " nx.draw_circular(g, with_labels=True, size=40, font_size=12, font_color=\"blue\")\n", + "\n", + "\n", + " # Uncomment below cells if you want to dispaly interactive graphs using Pyvis library, Azure notebook free tier may not render the graph correctly.\n", + " # logonpyvis_graph = Network(notebook=True, height=\"750px\", width=\"100%\", bgcolor=\"#222222\", font_color=\"white\")\n", + "\n", + " # # set the physics layout of the network\n", + " # logonpyvis_graph.barnes_hut()\n", + "\n", + " # sources = related_hosts_df['Computer']\n", + " # targets = related_hosts_df['IpAddress']\n", + " # weights = related_hosts_df['count_']\n", + "\n", + " # edge_data = zip(sources, targets, weights)\n", + "\n", + " # for e in edge_data:\n", + " # src = e[0]\n", + " # dst = e[1]\n", + " # w = e[2]\n", + "\n", + " # logonpyvis_graph.add_node(src, src, title=src)\n", + " # logonpyvis_graph.add_node(dst, dst, title=dst)\n", + " # logonpyvis_graph.add_edge(src, dst, value=w)\n", + "\n", + " # neighbor_map = logonpyvis_graph.get_adj_list()\n", + "\n", + " # # add neighbor data to node hover data\n", + " # for node in logonpyvis_graph.nodes:\n", + " # node[\"title\"] += \" Neighbors:
\" + \"
\".join(neighbor_map[node[\"id\"]])\n", + " # node[\"value\"] = len(neighbor_map[node[\"id\"]]) \n", + "\n", + " # logonpyvis_graph.show(\"hostlogonpyvis_graph.html\")\n", + "else:\n", + " md(f'Analysis section Not Applicable since IP address owner is {ipaddr_origin}', styles=[\"bold\",\"red\"])" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "[Contents](#toc)\n", + "## Related Accounts\n", + "**Hypothesis:** That an attacker has gained access to the host, compromized credentials for the accounts on it and laterally moving to the network gaining access to more accounts.\n", + "\n", + "This section provides related accounts of IP address which is being investigated. .If you wish to expand the scope of hunting then investigate each accounts in detail, it is recommended that to use the **Account Explorer.**\n", + " - [Entity Explorer - Account](https://github.com/Azure/Azure-Sentinel-Notebooks/blob/master/Entity%20Explorer%20-%20Account.ipynb)\n", + "\n", + "If you are running in the Azure Sentinel Notebooks environment (Azure Notebooks or Azure ML) you can run live versions of these notebooks:\n", + " - [Run Entity Explorer - Account](./Entity%20Explorer%20-%20Account.ipynb)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "ExecuteTime": { + "end_time": "2019-09-10T20:12:42.022358Z", + "start_time": "2019-09-10T20:12:42.010961Z" + } + }, + "source": [ + "[Contents](#toc)\n", + "### Visualization - Networkx Graph" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "ExecuteTime": { + "end_time": "2020-05-15T23:05:36.951055Z", + "start_time": "2020-05-15T23:05:35.741976Z" + } + }, + "outputs": [], + "source": [ + "if ipaddr_origin == \"Internal\":\n", + " # Retrived relatd accounts from SecurityEvent table for Windows OS\n", + " if ip_entity['OSType'] =='Windows':\n", + " if \"SecurityEvent\" not in available_datasets:\n", + " raise ValueError(\"No Windows event log data available in the workspace\")\n", + " else:\n", + " related_accounts = \"\"\"\n", + " SecurityEvent\n", + " | where TimeGenerated >= datetime({start}) and TimeGenerated <= datetime({end})\n", + " | where IpAddress == \\'{ip_address}\\' or Computer == \\'{hostname}\\' \n", + " | summarize count() by Account, Computer\n", + " \"\"\".format(\n", + " **ipaddr_query_params(), hostname=hostname\n", + " )\n", + " related_accounts_df = qry_prov.exec_query(related_accounts)\n", + "\n", + " elif ip_entity['OSType'] =='Linux':\n", + " if \"Syslog\" not in available_datasets:\n", + " raise ValueError(\"No Linux syslog data available in the workspace\")\n", + " else:\n", + " related_accounts_df = qry_prov.LinuxSyslog.list_logons_for_source_ip(invest_times, ip_address=ipaddr_query_params()['ip_address'],add_query_items='extend Account = AccountName | summarize count() by Account, Computer')\n", + "\n", + "\n", + " # Uncomment- below cells if above visualization does not render - Networkx connected Graph\n", + " plt.figure(10, figsize=(22, 14))\n", + " g = nx.from_pandas_edgelist(related_accounts_df, \"Computer\", \"Account\")\n", + " md('Entity Relationship Graph - Related Accounts :: ',styles=[\"bold\",\"green\"])\n", + " nx.draw_circular(g, with_labels=True, size=40, font_size=12, font_color=\"blue\")\n", + "\n", + " # Uncomment below cells if you want to display interactive graphs using Pyvis library, Azure notebook free tier may not render the graph correctly.\n", + " # acclogon_pyvisgraph = Network(notebook=True, height=\"750px\", width=\"100%\", bgcolor=\"#222222\", font_color=\"white\")\n", + "\n", + " # # set the physics layout of the network\n", + " # acclogon_pyvisgraph.barnes_hut()\n", + "\n", + "\n", + " # sources = related_accounts_df['Computer']\n", + " # targets = related_accounts_df['Account']\n", + " # weights = related_accounts_df['count_']\n", + "\n", + " # edge_data = zip(sources, targets, weights)\n", + "\n", + " # for e in edge_data:\n", + " # src = e[0]\n", + " # dst = e[1]\n", + " # w = e[2]\n", + "\n", + " # acclogon_pyvisgraph.add_node(src, src, title=src)\n", + " # acclogon_pyvisgraph.add_node(dst, dst, title=dst)\n", + " # acclogon_pyvisgraph.add_edge(src, dst, value=w)\n", + "\n", + " # neighbor_map = acclogon_pyvisgraph.get_adj_list()\n", + "\n", + " # # add neighbor data to node hover data\n", + " # for node in acclogon_pyvisgraph.nodes:\n", + " # node[\"title\"] += \" Neighbors:
\" + \"
\".join(neighbor_map[node[\"id\"]])\n", + " # node[\"value\"] = len(neighbor_map[node[\"id\"]]) # this value attrribute for the node affects node size\n", + "\n", + " # acclogon_pyvisgraph.show(\"accountlogonpyvis_graph.html\")\n", + "else:\n", + " md(f'Analysis section Not Applicable since IP address owner is {ipaddr_origin}', styles=[\"bold\",\"red\"])" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "ExecuteTime": { + "end_time": "2019-08-30T15:50:05.854226Z", + "start_time": "2019-08-30T15:50:04.517392Z" + } + }, + "source": [ + "[Contents](#toc)\n", + "## Logon Summary for Related Entities\n", + "**Hypothesis:** By analyzing logon activities of the related entities, we can identify change in logon patterns and narrow down the entities to few suspicious logon patterns.\n", + "\n", + "This section provides various visualization of logon attributes such as \n", + "- Weekly Failed Logon trend\n", + "- Logon Types \n", + "- Logon Processes\n", + "\n", + "If you wish to expand the scope of hunting then investigate specific host in detail, it is recommended that to use the **Host Explorer (Windows/Linux)**\n", + "\n", + " - [Entity Explorer - Windows Host](https://github.com/Azure/Azure-Sentinel-Notebooks/blob/master/Entity%20Explorer%20-%20Windows%20Host.ipynb)\n", + " - [Entity Explorer - Linux Host](https://github.com/Azure/Azure-Sentinel-Notebooks/blob/master/Entity%20Explorer%20-%20Linux%20Host.ipynb)\n", + " \n", + "If you are running in the Azure Sentinel Notebooks environment (Azure Notebooks or Azure ML) you can run live versions of these notebooks:\n", + " - [Run Entity Explorer - Windows Host](./Entity%20Explorer%20-%20Windows%20Host.ipynb)\n", + " - [Run Entity Explorer - Linux Host](./Entity%20Explorer%20-%20Linux%20Host.ipynb)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "ExecuteTime": { + "end_time": "2019-09-10T20:18:33.673179Z", + "start_time": "2019-09-10T20:18:33.670042Z" + } + }, + "source": [ + "[Contents](#toc)\n", + "### HeatMap for Weekly failed logons" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "ExecuteTime": { + "end_time": "2020-05-15T23:05:46.615934Z", + "start_time": "2020-05-15T23:05:44.570772Z" + } + }, + "outputs": [], + "source": [ + "if ipaddr_origin == \"Internal\":\n", + " # Retrived related accounts from SecurityEvent table for Windows OS\n", + " if ip_entity['OSType'] =='Windows':\n", + " if \"SecurityEvent\" not in available_datasets:\n", + " raise ValueError(\"No Windows event log data available in the workspace\")\n", + " else:\n", + " failed_logons = \"\"\"\n", + " SecurityEvent\n", + " | where EventID in (4624,4625) | where IpAddress == \\'{ip_address}\\' or Computer == \\'{hostname}\\' \n", + " | where TimeGenerated >= datetime({start}) and TimeGenerated <= datetime({end})\n", + " | extend DayofWeek = case(dayofweek(TimeGenerated) == time(1.00:00:00), \"Monday\", \n", + " dayofweek(TimeGenerated) == time(2.00:00:00), \"Tuesday\",\n", + " dayofweek(TimeGenerated) == time(3.00:00:00), \"Wednesday\",\n", + " dayofweek(TimeGenerated) == time(4.00:00:00), \"Thursday\",\n", + " dayofweek(TimeGenerated) == time(5.00:00:00), \"Friday\",\n", + " dayofweek(TimeGenerated) == time(6.00:00:00), \"Saturday\",\n", + " \"Sunday\")\n", + " | summarize LogonCount=count() by DayofWeek, HourOfDay=format_datetime(bin(TimeGenerated,1h),'HH:mm')\n", + " \"\"\".format(\n", + " **ipaddr_query_params(), hostname=hostname\n", + " )\n", + " failed_logons_df = qry_prov.exec_query(failed_logons)\n", + "\n", + " elif ip_entity['OSType'] =='Linux':\n", + " if \"Syslog\" not in available_datasets:\n", + " raise ValueError(\"No Linux syslog data available in the workspace\")\n", + " else: \n", + " failed_logons_df = qry_prov.LinuxSyslog.user_logon(invest_times, account_name ='', add_query_items=\"\"\"| where HostIP == '{ipaddr_text.value}' |extend Account = AccountName | extend DayofWeek = case(dayofweek(TimeGenerated) == time(1.00:00:00), \"Monday\", dayofweek(TimeGenerated) == time(2.00:00:00), \"Tuesday\",\n", + " dayofweek(TimeGenerated) == time(3.00:00:00), \"Wednesday\",\n", + " dayofweek(TimeGenerated) == time(4.00:00:00), \"Thursday\",\n", + " dayofweek(TimeGenerated) == time(5.00:00:00), \"Friday\",\n", + " dayofweek(TimeGenerated) == time(6.00:00:00), \"Saturday\", \"Sunday\") | summarize LogonCount=count() by DayofWeek, HourOfDay=format_datetime(bin(TimeGenerated,1h),'HH:mm')\"\"\")\n", + "\n", + " # Plotting hearmap using seaborn library if there are failed logons\n", + " if len(failed_logons_df) > 0:\n", + " df_pivot = (\n", + " failed_logons_df.reset_index()\n", + " .pivot_table(index=\"DayofWeek\", columns=\"HourOfDay\", values=\"LogonCount\")\n", + " .fillna(0)\n", + " )\n", + " display(\n", + " Markdown(\n", + " f'### Heatmap - Weekly Failed Logon Trend :: '\n", + " )\n", + " )\n", + " f, ax = plt.subplots(figsize=(16, 8))\n", + " hm1 = sns.heatmap(df_pivot, cmap=\"YlGnBu\", ax=ax)\n", + " plt.xticks(rotation=45)\n", + " plt.yticks(rotation=30)\n", + " else:\n", + " linux_logons=qry_prov.LinuxSyslog.list_logons_for_source_ip(**ipaddr_query_params())\n", + " failed_logons = (logon_events[logon_events['LogonResult'] == 'Failure'])\n", + "else:\n", + " md(f'Analysis section Not Applicable since IP address owner is {ipaddr_origin}', styles=[\"bold\",\"red\"])" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "[Contents](#toc)\n", + "### Host Logons Timeline" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "ExecuteTime": { + "end_time": "2020-05-15T23:05:49.256466Z", + "start_time": "2020-05-15T23:05:49.190460Z" + } + }, + "outputs": [], + "source": [ + "# set the origin time to the time of our alert\n", + "try:\n", + " origin_time = (related_alert.TimeGenerated \n", + " if recenter_wgt.value \n", + " else query_times.origin_time)\n", + "except NameError:\n", + " origin_time = query_times.origin_time\n", + " \n", + "logon_query_times = nbwidgets.QueryTime(\n", + " units=\"day\",\n", + " origin_time=origin_time,\n", + " before=5,\n", + " after=1,\n", + " max_before=20,\n", + " max_after=20,\n", + ")\n", + "logon_query_times.display()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "ExecuteTime": { + "end_time": "2020-05-15T23:05:55.096129Z", + "start_time": "2020-05-15T23:05:52.661823Z" + } + }, + "outputs": [], + "source": [ + "if ipaddr_origin == \"Internal\":\n", + " host_logons = qry_prov.WindowsSecurity.list_host_logons(\n", + " logon_query_times, host_name=hostname\n", + " )\n", + "\n", + " if host_logons is not None and not host_logons.empty:\n", + " display(Markdown(\"### Logon timeline.\"))\n", + " tooltip_cols = [\n", + " \"TargetUserName\",\n", + " \"TargetDomainName\",\n", + " \"SubjectUserName\",\n", + " \"SubjectDomainName\",\n", + " \"LogonType\",\n", + " \"IpAddress\",\n", + " ]\n", + " nbdisplay.display_timeline(\n", + " data=host_logons,\n", + " group_by=\"TargetUserName\",\n", + " source_columns=tooltip_cols,\n", + " legend=\"right\", yaxis=True\n", + " )\n", + "\n", + " display(Markdown(\"### Counts of logon events by logon type.\"))\n", + " display(Markdown(\"Min counts for each logon type highlighted.\"))\n", + " logon_by_type = (\n", + " host_logons[[\"Account\", \"LogonType\", \"EventID\"]]\n", + " .astype({'LogonType': 'int32'})\n", + " .merge(right=pd.Series(data=nbdisplay._WIN_LOGON_TYPE_MAP, name=\"LogonTypeDesc\"),\n", + " left_on=\"LogonType\", right_index=True)\n", + " .drop(columns=\"LogonType\")\n", + " .groupby([\"Account\", \"LogonTypeDesc\"])\n", + " .count()\n", + " .unstack()\n", + " .rename(columns={\"EventID\": \"LogonCount\"})\n", + " .fillna(0)\n", + " .style\n", + " .background_gradient(cmap=\"viridis\", low=0.5, high=0)\n", + " .format(\"{0:0>3.0f}\")\n", + " )\n", + " display(logon_by_type)\n", + " else:\n", + " display(Markdown(\"No logon events found for host.\"))\n", + "else:\n", + " md(f'Analysis section Not Applicable since IP address owner is {ipaddr_origin}', styles=[\"bold\",\"red\"])" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Failed Logons Timeline" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "ExecuteTime": { + "end_time": "2020-05-15T23:06:01.493064Z", + "start_time": "2020-05-15T23:05:59.580819Z" + }, + "scrolled": true + }, + "outputs": [], + "source": [ + "if ipaddr_origin == \"Internal\":\n", + " failedLogons = qry_prov.WindowsSecurity.list_host_logon_failures(\n", + " logon_query_times, host_name=ip_entity.hostname\n", + " )\n", + " if failedLogons.empty:\n", + " print(\"No logon failures recorded for this host between \",\n", + " f\" {logon_query_times.start} and {logon_query_times.end}\"\n", + " )\n", + " else:\n", + " nbdisplay.display_timeline(\n", + " data=host_logons.query('TargetLogonId != \"0x3e7\"'),\n", + " overlay_data=failedLogons,\n", + " alert=related_alert,\n", + " title=\"Logons (blue=user-success, green=failed)\",\n", + " source_columns=tooltip_cols,\n", + " height=200,\n", + " )\n", + " display(failedLogons\n", + " .astype({'LogonType': 'int32'})\n", + " .merge(right=pd.Series(data=nbdisplay._WIN_LOGON_TYPE_MAP, name=\"LogonTypeDesc\"),\n", + " left_on=\"LogonType\", right_index=True)\n", + " [['Account', 'EventID', 'TimeGenerated',\n", + " 'Computer', 'SubjectUserName', 'SubjectDomainName',\n", + " 'TargetUserName', 'TargetDomainName',\n", + " 'LogonTypeDesc','IpAddress', 'WorkstationName'\n", + " ]])\n", + "else:\n", + " md(f'Analysis section Not Applicable since IP address owner is {ipaddr_origin}', styles=[\"bold\",\"red\"])" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "ExecuteTime": { + "end_time": "2019-08-30T15:52:54.700099Z", + "start_time": "2019-08-30T15:52:54.661189Z" + } + }, + "source": [ + "[Contents](#toc)\n", + "## Network Connection Analysis\n", + "\n", + "**Hypothesis:** That an attacker is remotely communicating with the host in order to compromise the host or for outbound communication to C2 for data exfiltration purposes after compromising the host.\n", + "\n", + "This section provides an overview of network activity to and from the host during hunting time frame, the purpose of this is for the identification of anomalous network traffic. If you wish to investigate a specific IP in detail it is recommended that to use another instance of this notebook with each IP addresses.\n", + "\n", + "> Note: this query can return a lot of data for active hosts\n", + "> If your query times out, try reducing the time range, breaking the analysis\n", + "> into chunks" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "[Contents](#toc)\n", + "### Network Check Communications with Other Hosts" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "ExecuteTime": { + "end_time": "2020-05-15T23:06:06.486183Z", + "start_time": "2020-05-15T23:06:06.429184Z" + } + }, + "outputs": [], + "source": [ + "ip_q_times = nbwidgets.QueryTime(\n", + " label=\"Set time bounds for network queries\",\n", + " units=\"hour\",\n", + " max_before=120,\n", + " before=5,\n", + " after=5,\n", + " max_after=60,\n", + " origin_time=logon_query_times.origin_time\n", + ")\n", + "ip_q_times.display()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "[Contents](#toc)\n", + "### Query Flows by IP Address" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "ExecuteTime": { + "end_time": "2020-05-15T23:06:22.160247Z", + "start_time": "2020-05-15T23:06:10.292782Z" + } + }, + "outputs": [], + "source": [ + "if \"AzureNetworkAnalytics_CL\" not in available_datasets:\n", + " md_warn(\"No network flow data available.\")\n", + " md(\"Please skip the remainder of this section and go to [Time-Series-Anomalies](#Outbound-Data-transfer-Time-Series-Anomalies)\")\n", + " az_net_comms_df = None\n", + "else:\n", + " all_host_ips = (\n", + " ip_entity['private_ips'] + ip_entity['public_ips']\n", + " )\n", + " host_ips = [i.Address for i in all_host_ips]\n", + "\n", + " az_net_comms_df = qry_prov.Network.list_azure_network_flows_by_ip(\n", + " ip_q_times, ip_address_list=host_ips\n", + " )\n", + "\n", + " if isinstance(az_net_comms_df, pd.DataFrame) and not az_net_comms_df.empty:\n", + " az_net_comms_df['TotalAllowedFlows'] = az_net_comms_df['AllowedOutFlows'] + az_net_comms_df['AllowedInFlows']\n", + " nbdisplay.display_timeline(\n", + " data=az_net_comms_df,\n", + " group_by=\"L7Protocol\",\n", + " title=\"Network Flows by Protocol\",\n", + " time_column=\"FlowStartTime\",\n", + " source_columns=[\"FlowType\", \"AllExtIPs\", \"L7Protocol\", \"FlowDirection\"],\n", + " height=300,\n", + " legend=\"right\",\n", + " yaxis=True\n", + " )\n", + " nbdisplay.display_timeline(\n", + " data=az_net_comms_df,\n", + " group_by=\"FlowDirection\",\n", + " title=\"Network Flows by Direction\",\n", + " time_column=\"FlowStartTime\",\n", + " source_columns=[\"FlowType\", \"AllExtIPs\", \"L7Protocol\", \"FlowDirection\"],\n", + " height=300,\n", + " legend=\"right\",\n", + " yaxis=True\n", + " )\n", + " else:\n", + " md_warn(\"No network data for specified time range.\")\n", + " md(\"Please skip the remainder of this section and go to [Time-Series-Anomalies](#Outbound-Data-transfer-Time-Series-Anomalies)\")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "ExecuteTime": { + "end_time": "2020-05-15T23:06:50.373391Z", + "start_time": "2020-05-15T23:06:50.084392Z" + } + }, + "outputs": [], + "source": [ + "try:\n", + " flow_plot = nbdisplay.display_timeline_values(\n", + " data=az_net_comms_df,\n", + " group_by=\"L7Protocol\",\n", + " source_columns=[\"FlowType\", \n", + " \"AllExtIPs\", \n", + " \"L7Protocol\", \n", + " \"FlowDirection\", \n", + " \"TotalAllowedFlows\"],\n", + " time_column=\"FlowStartTime\",\n", + " y=\"TotalAllowedFlows\",\n", + " legend=\"right\",\n", + " height=500,\n", + " kind=[\"vbar\", \"circle\"],\n", + " );\n", + "except NameError as err:\n", + " md(f\"Error Occured, Make sure to execute previous cells in notebook: {err}\",styles=[\"bold\",\"red\"])" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "ExecuteTime": { + "end_time": "2020-05-15T23:06:55.928554Z", + "start_time": "2020-05-15T23:06:55.790553Z" + } + }, + "outputs": [], + "source": [ + "try:\n", + " if az_net_comms_df is not None and not az_net_comms_df.empty:\n", + " cm = sns.light_palette(\"green\", as_cmap=True)\n", + "\n", + " cols = [\n", + " \"VMName\",\n", + " \"VMIPAddress\",\n", + " \"PublicIPs\",\n", + " \"SrcIP\",\n", + " \"DestIP\",\n", + " \"L4Protocol\",\n", + " \"L7Protocol\",\n", + " \"DestPort\",\n", + " \"FlowDirection\",\n", + " \"AllExtIPs\",\n", + " \"TotalAllowedFlows\",\n", + " ]\n", + " flow_index = az_net_comms_df[cols].copy()\n", + "\n", + " def get_source_ip(row):\n", + " if row.FlowDirection == \"O\":\n", + " return row.VMIPAddress if row.VMIPAddress else row.SrcIP\n", + " else:\n", + " return row.AllExtIPs if row.AllExtIPs else row.DestIP\n", + "\n", + " def get_dest_ip(row):\n", + " if row.FlowDirection == \"O\":\n", + " return row.AllExtIPs if row.AllExtIPs else row.DestIP\n", + " else:\n", + " return row.VMIPAddress if row.VMIPAddress else row.SrcIP\n", + " \n", + " flow_index[\"source\"] = flow_index.apply(get_source_ip, axis=1)\n", + " flow_index[\"dest\"] = flow_index.apply(get_dest_ip, axis=1)\n", + " display(flow_index)\n", + "\n", + " # Uncomment to view flow_index results\n", + " # with warnings.catch_warnings():\n", + " # warnings.simplefilter(\"ignore\")\n", + " # display(\n", + " # flow_index[\n", + " # [\"source\", \"dest\", \"L7Protocol\", \"FlowDirection\", \"TotalAllowedFlows\"]\n", + " # ]\n", + " # .groupby([\"source\", \"dest\", \"L7Protocol\", \"FlowDirection\"])\n", + " # .sum()\n", + " # .reset_index()\n", + " # .style.bar(subset=[\"TotalAllowedFlows\"], color=\"#d65f5f\")\n", + " # )\n", + "except NameError as err:\n", + " md(f\"Error Occured, Make sure to execute previous cells in notebook: {err}\",styles=[\"bold\",\"red\"])" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "[Contents](#toc)\n", + "### Bulk whois lookup " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "ExecuteTime": { + "end_time": "2020-05-15T23:08:00.744206Z", + "start_time": "2020-05-15T23:07:01.493951Z" + } + }, + "outputs": [], + "source": [ + "# Bulk WHOIS lookup function\n", + "from functools import lru_cache\n", + "from ipwhois import IPWhois\n", + "from ipaddress import ip_address\n", + "\n", + "try:\n", + " # Add ASN informatio from Whois\n", + " flows_df = (\n", + " flow_index[[\"source\", \"dest\", \"L7Protocol\", \"FlowDirection\", \"TotalAllowedFlows\"]]\n", + " .groupby([\"source\", \"dest\", \"L7Protocol\", \"FlowDirection\"])\n", + " .sum()\n", + " .reset_index()\n", + " )\n", + "\n", + " num_ips = len(flows_df[\"source\"].unique()) + len(flows_df[\"dest\"].unique())\n", + " print(f\"Performing WhoIs lookups for {num_ips} IPs \", end=\"\")\n", + " #flows_df = flows_df.assign(DestASN=\"\", DestASNFull=\"\", SourceASN=\"\", SourceASNFull=\"\")\n", + " flows_df[\"DestASN\"] = flows_df.apply(lambda x: get_whois_info(x.dest, True), axis=1)\n", + " flows_df[\"SourceASN\"] = flows_df.apply(lambda x: get_whois_info(x.source, True), axis=1)\n", + " print(\"done\")\n", + "\n", + " # Split the tuple returned by get_whois_info into separate columns\n", + " flows_df[\"DestASNFull\"] = flows_df.apply(lambda x: x.DestASN[1], axis=1)\n", + " flows_df[\"DestASN\"] = flows_df.apply(lambda x: x.DestASN[0], axis=1)\n", + " flows_df[\"SourceASNFull\"] = flows_df.apply(lambda x: x.SourceASN[1], axis=1)\n", + " flows_df[\"SourceASN\"] = flows_df.apply(lambda x: x.SourceASN[0], axis=1)\n", + "\n", + " our_host_asns = [get_whois_info(ip.Address)[0] for ip in ip_entity.public_ips]\n", + " md(f\"Host {ip_entity.hostname} ASNs:\", \"bold\")\n", + " md(str(our_host_asns))\n", + "\n", + " flow_sum_df = flows_df.groupby([\"DestASN\", \"SourceASN\"]).agg(\n", + " TotalAllowedFlows=pd.NamedAgg(column=\"TotalAllowedFlows\", aggfunc=\"sum\"),\n", + " L7Protocols=pd.NamedAgg(column=\"L7Protocol\", aggfunc=lambda x: x.unique().tolist()),\n", + " source_ips=pd.NamedAgg(column=\"source\", aggfunc=lambda x: x.unique().tolist()),\n", + " dest_ips=pd.NamedAgg(column=\"dest\", aggfunc=lambda x: x.unique().tolist()),\n", + " ).reset_index()\n", + " display(flow_sum_df)\n", + "except NameError as err:\n", + " md(f\"Error Occured, Make sure to execute previous cells in notebook: {err}\",styles=[\"bold\",\"red\"])" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Choose ASNs/IPs to Check for Threat Intel Reports\n", + "Choose from the list of Selected ASNs for the IPs you wish to check on.\n", + "The Source list is been pre-populated with all ASNs found in the network flow summary.\n", + "\n", + "As an example, we've populated the `Selected` list with the ASNs that have the lowest number of flows to and from the host. We also remove the ASN that matches the ASN of the host we are investigating.\n", + "\n", + "Please edit this list, using flow summary data above as a guide and leaving only ASNs that you are suspicious about. Typicially these would be ones with relatively low `TotalAllowedFlows` and possibly with unusual `L7Protocols`." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "ExecuteTime": { + "end_time": "2020-05-15T23:08:01.347207Z", + "start_time": "2020-05-15T23:08:01.287206Z" + } + }, + "outputs": [], + "source": [ + "try:\n", + " if isinstance(flow_sum_df, pd.DataFrame) and not flow_sum_df.empty:\n", + " all_asns = list(flow_sum_df[\"DestASN\"].unique()) + list(flow_sum_df[\"SourceASN\"].unique())\n", + " all_asns = set(all_asns) - set([\"private address\"])\n", + "\n", + " # Select the ASNs in the 25th percentile (lowest number of flows)\n", + " quant_25pc = flow_sum_df[\"TotalAllowedFlows\"].quantile(q=[0.25]).iat[0]\n", + " quant_25pc_df = flow_sum_df[flow_sum_df[\"TotalAllowedFlows\"] <= quant_25pc]\n", + " other_asns = list(quant_25pc_df[\"DestASN\"].unique()) + list(quant_25pc_df[\"SourceASN\"].unique())\n", + " other_asns = set(other_asns) - set(our_host_asns)\n", + " md(\"Choose IPs from Selected ASNs to look up for Threat Intel.\", \"bold\")\n", + " sel_asn = nbwidgets.SelectSubset(source_items=all_asns, default_selected=other_asns)\n", + "except NameError as err:\n", + " md(f\"Error Occured, Make sure to execute previous cells in notebook: {err}\",styles=[\"bold\",\"red\"])" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "ExecuteTime": { + "end_time": "2020-05-15T23:08:14.516746Z", + "start_time": "2020-05-15T23:08:01.935205Z" + } + }, + "outputs": [], + "source": [ + "try:\n", + " if isinstance(flow_sum_df, pd.DataFrame) and not flow_sum_df.empty:\n", + " ti_lookup = TILookup()\n", + " from itertools import chain\n", + " dest_ips = set(chain.from_iterable(flow_sum_df[flow_sum_df[\"DestASN\"].isin(sel_asn.selected_items)][\"dest_ips\"]))\n", + " src_ips = set(chain.from_iterable(flow_sum_df[flow_sum_df[\"SourceASN\"].isin(sel_asn.selected_items)][\"source_ips\"]))\n", + " selected_ips = dest_ips | src_ips\n", + " print(f\"{len(selected_ips)} unique IPs in selected ASNs\")\n", + "\n", + " # Add the IoCType to save cost of inferring each item\n", + " selected_ip_dict = {ip: \"ipv4\" for ip in selected_ips}\n", + " ti_results = ti_lookup.lookup_iocs(data=selected_ip_dict)\n", + "\n", + " print(f\"{len(ti_results)} results received.\")\n", + "\n", + " # ti_results_pos = ti_results[ti_results[\"Severity\"] > 0]\n", + " #####\n", + " # WARNING - faking results for illustration purposes\n", + " #####\n", + " ti_results_pos = ti_results.sample(n=2)\n", + "\n", + " print(f\"{len(ti_results_pos)} positive results found.\")\n", + "\n", + "\n", + " if not ti_results_pos.empty:\n", + " src_pos = flows_df.merge(ti_results_pos, left_on=\"source\", right_on=\"Ioc\")\n", + " dest_pos = flows_df.merge(ti_results_pos, left_on=\"dest\", right_on=\"Ioc\")\n", + " ti_ip_results = pd.concat([src_pos, dest_pos])\n", + " md_warn(\"Positive Threat Intel Results found for the following flows\")\n", + " md(\"Please examine these IP flows using the IP Explorer notebook.\", \"bold, large\")\n", + " display(ti_ip_results)\n", + "except NameError as err:\n", + " md(f\"Error Occured, Make sure to execute previous cells in notebook: {err}\",styles=[\"bold\",\"red\"])" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + " ### GeoIP Map of External IPs" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "ExecuteTime": { + "end_time": "2020-05-15T23:08:16.023912Z", + "start_time": "2020-05-15T23:08:15.611915Z" + } + }, + "outputs": [], + "source": [ + "iplocation = GeoLiteLookup()\n", + "def format_ip_entity(row, ip_col):\n", + " ip_entity = entities.IpAddress(Address=row[ip_col])\n", + " iplocation.lookup_ip(ip_entity=ip_entity)\n", + " ip_entity.AdditionalData[\"protocol\"] = row.L7Protocol\n", + " if \"severity\" in row:\n", + " ip_entity.AdditionalData[\"threat severity\"] = row[\"severity\"]\n", + " if \"Details\" in row:\n", + " ip_entity.AdditionalData[\"threat details\"] = row[\"Details\"]\n", + " return ip_entity\n", + "\n", + "# from msticpy.nbtools.foliummap import FoliumMap\n", + "folium_map = FoliumMap()\n", + "if az_net_comms_df is None or az_net_comms_df.empty:\n", + " print(\"No network flow data available.\")\n", + "else:\n", + " # Get the flow records for all flows not in the TI results\n", + " selected_out = flows_df[flows_df[\"DestASN\"].isin(sel_asn.selected_items)]\n", + " selected_out = selected_out[~selected_out[\"dest\"].isin(ti_ip_results[\"Ioc\"])]\n", + " if selected_out.empty:\n", + " ips_out = []\n", + " else:\n", + " ips_out = list(selected_out.apply(lambda x: format_ip_entity(x, \"dest\"), axis=1))\n", + " \n", + " selected_in = flows_df[flows_df[\"SourceASN\"].isin(sel_asn.selected_items)]\n", + " selected_in = selected_in[~selected_in[\"source\"].isin(ti_ip_results[\"Ioc\"])]\n", + " if selected_in.empty:\n", + " ips_in = []\n", + " else:\n", + " ips_in = list(selected_in.apply(lambda x: format_ip_entity(x, \"source\"), axis=1))\n", + "\n", + " ips_threats = list(ti_ip_results.apply(lambda x: format_ip_entity(x, \"Ioc\"), axis=1))\n", + "\n", + " display(HTML(\"

External IP Addresses communicating with host

\"))\n", + " display(HTML(\"Numbered circles indicate multiple items - click to expand\"))\n", + " display(HTML(\"Location markers:
Blue = outbound, Purple = inbound, Green = Host, Red = Threats\"))\n", + "\n", + " icon_props = {\"color\": \"green\"}\n", + " for ips in ip_entity.public_ips:\n", + " ips.AdditionalData[\"host\"] = ip_entity.hostname\n", + " folium_map.add_ip_cluster(ip_entities=ip_entity.public_ips, **icon_props)\n", + " icon_props = {\"color\": \"blue\"}\n", + " folium_map.add_ip_cluster(ip_entities=ips_out, **icon_props)\n", + " icon_props = {\"color\": \"purple\"}\n", + " folium_map.add_ip_cluster(ip_entities=ips_in, **icon_props)\n", + " icon_props = {\"color\": \"red\"}\n", + " folium_map.add_ip_cluster(ip_entities=ips_threats, **icon_props)\n", + " \n", + " display(folium_map)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "ExecuteTime": { + "end_time": "2019-09-05T18:03:37.980223Z", + "start_time": "2019-09-05T18:03:37.804856Z" + } + }, + "source": [ + "[Contents](#toc)\n", + "### Outbound Data transfer Time Series Anomalies" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "This section will look into the network datasources to check outbound data transfer trends. \n", + "You can also use time series analysis using below built-in KQL query example to analyze anamalous data transfer trends.below example shows sample dataset trends comparing with actual vs baseline traffic trends." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "ExecuteTime": { + "end_time": "2020-05-15T23:08:42.937737Z", + "start_time": "2020-05-15T23:08:41.794266Z" + } + }, + "outputs": [], + "source": [ + "if \"VMConnection\" in table_index or \"CommonSecurityLog\" in table_index:\n", + " # KQL query for full text search of IP address and display all datatypes\n", + " dataxfer_stats = \"\"\"\n", + " union isfuzzy=true\n", + " (\n", + " CommonSecurityLog \n", + " | where TimeGenerated >= datetime({start}) and TimeGenerated <= datetime({end})\n", + " | where isnotempty(DestinationIP) and isnotempty(SourceIP)\n", + " | where SourceIP == \\'{ip_address}\\'\n", + " | extend SentBytesinKB = (SentBytes / 1024), ReceivedBytesinKB = (ReceivedBytes / 1024)\n", + " | summarize DailyCount = count(), ListOfDestPorts = make_set(DestinationPort), TotalSentBytesinKB = sum(SentBytesinKB), TotalReceivedBytesinKB = sum(ReceivedBytesinKB) by SourceIP, DestinationIP, DeviceVendor, bin(TimeGenerated,1d)\n", + " | project DeviceVendor, TimeGenerated, SourceIP, DestinationIP, ListOfDestPorts, TotalSentBytesinKB, TotalReceivedBytesinKB \n", + " ),\n", + " (\n", + " VMConnection \n", + " | where TimeGenerated >= datetime({start}) and TimeGenerated <= datetime({end}) \n", + " | where isnotempty(DestinationIp) and isnotempty(SourceIp)\n", + " | where SourceIp == \\'{ip_address}\\'\n", + " | extend DeviceVendor = \"VMConnection\", SourceIP = SourceIp, DestinationIP = DestinationIp\n", + " | extend SentBytesinKB = (BytesSent / 1024), ReceivedBytesinKB = (BytesReceived / 1024)\n", + " | summarize DailyCount = count(), ListOfDestPorts = make_set(DestinationPort), TotalSentBytesinKB = sum(SentBytesinKB),TotalReceivedBytesinKB = sum(ReceivedBytesinKB) by SourceIP, DestinationIP, DeviceVendor, bin(TimeGenerated,1d)\n", + " | project DeviceVendor, TimeGenerated, SourceIP, DestinationIP, ListOfDestPorts, TotalSentBytesinKB, TotalReceivedBytesinKB \n", + " )\n", + " \"\"\".format(**ipaddr_query_params())\n", + "\n", + " dataxfer_stats_df = qry_prov.exec_query(dataxfer_stats)\n", + "\n", + "#Display result as transposed matrix of datatypes availabel to query for the query period\n", + "if len(dataxfer_stats_df) > 0:\n", + " md(\n", + " 'Data transfer daily stats for IP ::', styles=[\"bold\",\"green\"]\n", + " )\n", + " display(dataxfer_stats_df)\n", + "else:\n", + " md_warn(\n", + " f'No Data transfer logs found for the query period'\n", + " )\n", + " #####\n", + " # WARNING - faking results for illustration purposes\n", + " #####\n", + "md(\n", + " 'Visualizing time series data transfer on dummy dataset for demonstration ::', styles=[\"bold\",\"green\"]\n", + " )\n", + "\n", + "# Generating graph based on dummy dataset in custom table representing Flow records outbound data transfer\n", + "timechartquery = \"\"\"\n", + "let TimeSeriesData = PaloAltoBytesSent_CL\n", + "| extend TimeGenerated = todatetime(EventTime_s), TotalBytesSent = todouble(TotalBytesSent_s) \n", + "| summarize TimeGenerated=make_list(TimeGenerated, 10000),TotalBytesSent=make_list(TotalBytesSent, 10000) by deviceVendor_s\n", + "| project TimeGenerated, TotalBytesSent;\n", + "TimeSeriesData\n", + "| extend (baseline,seasonal,trend,residual) = series_decompose(TotalBytesSent)\n", + "| mv-expand TotalBytesSent to typeof(double), TimeGenerated to typeof(datetime), baseline to typeof(long), seasonal to typeof(long), trend to typeof(long), residual to typeof(long)\n", + "| project TimeGenerated, TotalBytesSent, baseline\n", + "| render timechart with (title=\"Palo Alto Outbound Data Transfer Time Series decomposition\")\n", + "\"\"\"\n", + "%kql -query timechartquery" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Conclusion" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### List of Suspicious Activities/ Observables/Hunting bookmarks\n", + "- Suspicious alerts for the IP\n", + "- Anamalous Failed Logon trend on few days at 04:00 AM\n", + "- Anamalous spike in traffic logs on http\n", + "- Positive TI Hit from Open source feeds.\n", + "- Unusual data transfer deviating from normal baseline." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "[Contents](#toc)\n", + "## Appendices" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Available DataFrames" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "ExecuteTime": { + "end_time": "2020-04-02T10:00:41.436112Z", + "start_time": "2020-04-02T10:00:41.426605Z" + } + }, + "outputs": [], + "source": [ + "print('List of current DataFrames in Notebook')\n", + "print('-' * 50)\n", + "current_vars = list(locals().keys())\n", + "for var_name in current_vars:\n", + " if isinstance(locals()[var_name], pd.DataFrame) and not var_name.startswith('_'):\n", + " print(var_name)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Saving Data to Excel\n", + "To save the contents of a pandas DataFrame to an Excel spreadsheet\n", + "use the following syntax\n", + "```\n", + "writer = pd.ExcelWriter('myWorksheet.xlsx')\n", + "my_data_frame.to_excel(writer,'Sheet1')\n", + "writer.save()\n", + "```" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Configuration\n", + "\n", + "### `msticpyconfig.yaml` configuration File\n", + "You can configure primary and secondary TI providers and any required parameters in the `msticpyconfig.yaml` file. This is read from the current directory or you can set an environment variable (`MSTICPYCONFIG`) pointing to its location.\n", + "\n", + "To configure this file see the [ConfigureNotebookEnvironment notebook](https://github.com/Azure/Azure-Sentinel-Notebooks/blob/master/ConfiguringNotebookEnvironment.ipynb)" + ] + } + ], + "metadata": { + "hide_input": false, + "kernelspec": { + "display_name": "Python 3.6", + "language": "python", + "name": "python36" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.7" + }, + "latex_envs": { + "LaTeX_envs_menu_present": true, + "autoclose": false, + "autocomplete": true, + "bibliofile": "biblio.bib", + "cite_by": "apalike", + "current_citInitial": 1, + "eqLabelWithNumbers": true, + "eqNumInitial": 1, + "hotkeys": { + "equation": "Ctrl-E", + "itemize": "Ctrl-I" + }, + "labels_anchors": false, + "latex_user_defs": false, + "report_style_numbering": false, + "user_envs_cfg": false + }, + "toc": { + "base_numbering": 1, + "nav_menu": {}, + "number_sections": true, + "sideBar": true, + "skip_h1_title": true, + "title_cell": "Table of Contents", + "title_sidebar": "Contents", + "toc_cell": true, + "toc_position": { + "height": "calc(100% - 180px)", + "left": "10px", + "top": "150px", + "width": "299px" + }, + "toc_section_display": true, + "toc_window_display": true + }, + "varInspector": { + "cols": { + "lenName": 16, + "lenType": 16, + "lenVar": 40 + }, + "kernels_config": { + "python": { + "delete_cmd_postfix": "", + "delete_cmd_prefix": "del ", + "library": "var_list.py", + "varRefreshCmd": "print(var_dic_list())" + }, + "r": { + "delete_cmd_postfix": ") ", + "delete_cmd_prefix": "rm(", + "library": "var_list.r", + "varRefreshCmd": "cat(var_dic_list()) " + } + }, + "position": { + "height": "400px", + "left": "1549px", + "right": "20px", + "top": "120px", + "width": "351px" + }, + "types_to_exclude": [ + "module", + "function", + "builtin_function_or_method", + "instance", + "_Feature" + ], + "window_display": false + }, + "widgets": { + "application/vnd.jupyter.widget-state+json": { + "state": {}, + "version_major": 2, + "version_minor": 0 + } + } + }, + "nbformat": 4, + "nbformat_minor": 4 +} diff --git a/Entity Explorer - Linux Host.ipynb b/Entity Explorer - Linux Host.ipynb index 316682b8..ce2f688d 100644 --- a/Entity Explorer - Linux Host.ipynb +++ b/Entity Explorer - Linux Host.ipynb @@ -8,7 +8,7 @@ "
\n", "  Details...\n", "\n", - " **Notebook Version:** 1.0
\n", + " **Notebook Version:** 1.1
\n", " **Python Version:** Python 3.6 (including Python 3.6 - AzureML)
\n", " **Required Packages**: kqlmagic, msticpy, pandas, pandas_bokeh, numpy, matplotlib, networkx, seaborn, datetime, ipywidgets, ipython, dnspython, ipwhois, folium, maxminddb_geolite2
\n", " **Platforms Supported**:\n", @@ -31,7 +31,7 @@ }, "source": [ "

Table of Contents

\n", - "" + "" ] }, { @@ -73,8 +73,8 @@ "execution_count": null, "metadata": { "ExecuteTime": { - "end_time": "2020-05-15T23:58:35.616818Z", - "start_time": "2020-05-15T23:58:35.383819Z" + "end_time": "2020-06-24T01:51:59.386590Z", + "start_time": "2020-06-24T01:51:55.136591Z" } }, "outputs": [], @@ -86,7 +86,7 @@ "from IPython.display import display, HTML, Markdown\n", "\n", "REQ_PYTHON_VER=(3, 6)\n", - "REQ_MSTICPY_VER=(0, 5, 0)\n", + "REQ_MSTICPY_VER=(0, 6, 0)\n", "\n", "display(HTML(\"

Starting Notebook setup...

\"))\n", "if Path(\"./utils/nb_check.py\").is_file():\n", @@ -103,20 +103,39 @@ " import msticpy\n", " check_mp_ver(REQ_MSTICPY_VER)\n", " \n", + "\n", + "# If not using Azure Notebooks, install msticpy with\n", + "# !pip install msticpy\n", "from msticpy.nbtools import nbinit\n", "extra_imports = [\n", " \"msticpy.nbtools, observationlist\",\n", " \"msticpy.nbtools.foliummap, get_map_center\",\n", + " \"msticpy.common.exceptions, MsticpyException\",\n", + " \"msticpy.sectools.syslog_utils, create_host_record\",\n", + " \"msticpy.sectools.syslog_utils, cluster_syslog_logons_df\",\n", + " \"msticpy.sectools.syslog_utils, risky_sudo_sessions\",\n", + " \"msticpy.sectools.ip_utils, convert_to_ip_entities\",\n", + " \"msticpy.sectools, auditdextract\",\n", + " \"msticpy.sectools.cmd_line, risky_cmd_line\",\n", " \"pyvis.network, Network\",\n", " \"re\",\n", + " \"math, pi\",\n", " \"ipwhois, IPWhois\",\n", - " \"pandas_bokeh\",\n", + " \"bokeh.plotting, show\",\n", + " \"bokeh.plotting, Row\",\n", + " \"bokeh.models, ColumnDataSource\",\n", + " \"bokeh.models, FactorRange\",\n", + " \"bokeh.transform, factor_cmap\",\n", + " \"bokeh.transform, cumsum\",\n", " \"bokeh.palettes, viridis\",\n", " \"dns, reversename\",\n", - " \"dns, resolver\"\n", + " \"dns, resolver\",\n", + " \"ipaddress, ip_address\",\n", + " \"functools, lru_cache\",\n", + " \"datetime,,dt\"\n", "]\n", "additional_packages = [\n", - " \"oauthlib\", \"pyvis\", \"python-whois\", \"pandas_bokeh\"\n", + " \"oauthlib\", \"pyvis\", \"python-whois\"\n", "]\n", "nbinit.init_notebook(\n", " namespace=globals(),\n", @@ -128,12 +147,7 @@ " \"layout\": widgets.Layout(width=\"95%\"),\n", " \"style\": {\"description_width\": \"initial\"},\n", "}\n", - "\n", - "from msticpy.sectools import auditdextract\n", - "from msticpy.sectools.cmd_line import *\n", - "from msticpy.sectools.ip_utils import convert_to_ip_entities\n", - "from msticpy.sectools.syslog_utils import *\n", - "from msticpy.sectools.syslog_utils import create_host_record, cluster_syslog_logons_df, risky_sudo_sessions\n" + "from bokeh.plotting import figure" ] }, { @@ -173,8 +187,8 @@ "execution_count": null, "metadata": { "ExecuteTime": { - "end_time": "2020-05-15T23:58:49.319665Z", - "start_time": "2020-05-15T23:58:49.309664Z" + "end_time": "2020-06-24T01:51:59.434663Z", + "start_time": "2020-06-24T01:51:59.420592Z" } }, "outputs": [], @@ -185,12 +199,12 @@ "try:\n", " ws_id = ws_config['workspace_id']\n", " ten_id = ws_config['tenant_id']\n", - " display(HTML(\"Workspace details collected from config file\"))\n", + " md(\"Workspace details collected from config file\")\n", " config = True\n", "except:\n", - " display(HTML('Please go to your Log Analytics workspace, copy the workspace ID'\n", - " ' and/or tenant Id and paste here to enable connection to the workspace and querying of it..
'))\n", - " ws_id = mnbwidgets.GetEnvironmentKey(env_var='WORKSPACE_ID',\n", + " md('Please go to your Log Analytics workspace, copy the workspace ID'\n", + " ' and/or tenant Id and paste here to enable connection to the workspace and querying of it..
')\n", + " ws_id = nbwidgets.GetEnvironmentKey(env_var='WORKSPACE_ID',\n", " prompt='Please enter your Log Analytics Workspace Id:', auto_display=True)\n", " ten_id = nbwidgets.GetEnvironmentKey(env_var='TENANT_ID',\n", " prompt='Please enter your Log Analytics Tenant Id:', auto_display=True)\n", @@ -202,8 +216,8 @@ "execution_count": null, "metadata": { "ExecuteTime": { - "end_time": "2020-05-15T23:59:18.090694Z", - "start_time": "2020-05-15T23:58:52.515672Z" + "end_time": "2020-06-24T01:52:41.282988Z", + "start_time": "2020-06-24T01:52:00.925257Z" } }, "outputs": [], @@ -213,7 +227,7 @@ " ws_id = ws_id.value\n", " ten_id = ten_id.value\n", "qry_prov = QueryProvider('LogAnalytics')\n", - "qry_prov.connect(connection_str=ws_config.code_connect_str)\n" + "qry_prov.connect(connection_str=ws_config.code_connect_str)" ] }, { @@ -229,14 +243,14 @@ "execution_count": null, "metadata": { "ExecuteTime": { - "end_time": "2020-05-15T23:59:18.217691Z", - "start_time": "2020-05-15T23:59:18.155693Z" + "end_time": "2020-06-24T01:52:41.392989Z", + "start_time": "2020-06-24T01:52:41.334990Z" } }, "outputs": [], "source": [ "query_times = nbwidgets.QueryTime(units='day',\n", - " max_before=20, max_after=1, before=3)\n", + " max_before=14, max_after=1, before=1)\n", "query_times.display()" ] }, @@ -251,53 +265,19 @@ { "cell_type": "code", "execution_count": null, - "metadata": { - "ExecuteTime": { - "end_time": "2020-05-15T23:59:30.667719Z", - "start_time": "2020-05-15T23:59:30.645721Z" - } - }, - "outputs": [], - "source": [ - "host_text = widgets.Text(\n", - " description=\"Enter the Host name to search for:\", **WIDGET_DEFAULTS\n", - ")\n", - "display(host_text)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "ExecuteTime": { - "end_time": "2020-05-15T23:59:47.945809Z", - "start_time": "2020-05-15T23:59:45.525517Z" - } - }, + "metadata": {}, "outputs": [], "source": [ - "hostname = None\n", - "items = []\n", - "hosts_query = f\"\"\" Syslog | where TimeGenerated between (datetime({query_times.start}) .. datetime({query_times.end})) \n", - " | where Computer contains \"{host_text.value}\" | distinct Computer | limit 490000\"\"\"\n", - "print(\"Collecting details on avaliable hosts...\")\n", - "hosts_df = qry_prov._query_provider.query(query=hosts_query)\n", - "if isinstance(hosts_df, pd.DataFrame) and not hosts_df.empty:\n", - " items = hosts_df[\"Computer\"].unique().tolist()\n", - "\n", - "if len(items) > 1:\n", - " print(f\"Multiple matches for '{host_text.value}'. Please select a host from the list.\")\n", - " choose_host = nbwidgets.SelectString(\n", - " item_list=items,\n", - " description=\"Select the host.\",\n", - " auto_display=True,\n", - " )\n", - " \n", - "elif not hosts_df.empty:\n", - " hostname = items[0]\n", - " md(f\"Unique host found: {hostname}\")\n", + "#Get a list of hosts with syslog data in our hunting timegframe to provide easy selection\n", + "syslog_query = f\"\"\"Syslog | where TimeGenerated between (datetime({query_times.start}) .. datetime({query_times.end})) | summarize by Computer\"\"\"\n", + "md(\"Collecting avaliable host details...\")\n", + "hosts_list = qry_prov._query_provider.query(query=syslog_query)\n", + "if isinstance(hosts_list, pd.DataFrame) and not hosts_list.empty:\n", + " hosts = hosts_list[\"Computer\"].unique().tolist()\n", + " host_text = nbwidgets.SelectString(description='Select host to investigate: ', \n", + " item_list=hosts, width='75%', auto_display=True)\n", "else:\n", - " md(f\"Host not found: {host_text.value}\")" + " display(md(\"There are no hosts with syslog data in this time period to investigate\"))" ] }, { @@ -311,91 +291,81 @@ { "cell_type": "code", "execution_count": null, - "metadata": { - "ExecuteTime": { - "end_time": "2020-05-16T00:00:21.714769Z", - "start_time": "2020-05-15T23:59:56.711836Z" - } - }, + "metadata": {}, "outputs": [], "source": [ - "print(\"Collecting host details. This may take a few minutes...\")\n", - "if not hostname:\n", - " hostname = choose_host.value\n", + "hostname=host_text.value\n", + "az_net_df = None\n", "# Collect data on the host\n", - "syslog_query = f\"\"\" Syslog | where TimeGenerated between (datetime({query_times.start}) .. datetime({query_times.end})) \n", - " | where Computer contains \"{hostname}\" \"\"\"\n", - "all_syslog = qry_prov.exec_query(query=syslog_query)\n", - "syslog_data = all_syslog[all_syslog['Computer'] == f'{hostname}']\n", - "heartbeat_query = f\"\"\"Heartbeat | where TimeGenerated >= datetime({query_times.start}) | where TimeGenerated <= datetime({query_times.end})| where Computer == '{hostname}' | top 1 by TimeGenerated desc nulls last\"\"\"\n", - "if \"AzureNetworkAnalytics_CL\" in qry_prov.schema:\n", - " aznet_query = f\"\"\"AzureNetworkAnalytics_CL | where TimeGenerated >= datetime({query_times.start}) | where TimeGenerated <= datetime({query_times.end}) | where VirtualMachine_s has '{hostname}' | where ResourceType == 'NetworkInterface' | top 1 by TimeGenerated desc | project PrivateIPAddresses = PrivateIPAddresses_s, PublicIPAddresses = PublicIPAddresses_s\"\"\"\n", - " az_net_df = qry_prov.exec_query(query=aznet_query)\n", - "host_hb = qry_prov.exec_query(query=heartbeat_query)\n", - "\n", - "# Create host entity record, with Azure network data if any is avaliable\n", - "if isinstance(az_net_df, pd.DataFrame):\n", - " host_entity = create_host_record(\n", - " syslog_df=syslog_data, heartbeat_df=host_hb, az_net_df=az_net_df)\n", + "all_syslog_query = f\"Syslog | where TimeGenerated between (datetime({query_times.start}) .. datetime({query_times.end})) | where Computer =~ '{hostname}'\"\"\"\n", + "all_syslog_data = qry_prov.exec_query(all_syslog_query)\n", + "if isinstance(all_syslog_data, pd.DataFrame) and not all_syslog_data.empty:\n", + " heartbeat_query = f\"\"\"Heartbeat | where TimeGenerated >= datetime({query_times.start}) | where TimeGenerated <= datetime({query_times.end})| where Computer == '{hostname}' | top 1 by TimeGenerated desc nulls last\"\"\"\n", + " if \"AzureNetworkAnalytics_CL\" in qry_prov.schema:\n", + " aznet_query = f\"\"\"AzureNetworkAnalytics_CL | where TimeGenerated >= datetime({query_times.start}) | where TimeGenerated <= datetime({query_times.end}) | where VirtualMachine_s has '{hostname}' | where ResourceType == 'NetworkInterface' | top 1 by TimeGenerated desc | project PrivateIPAddresses = PrivateIPAddresses_s, PublicIPAddresses = PublicIPAddresses_s\"\"\"\n", + " print(\"Getting network data...\")\n", + " az_net_df = qry_prov.exec_query(query=aznet_query)\n", + " print(\"Getting host data...\")\n", + " host_hb = qry_prov.exec_query(query=heartbeat_query)\n", + "\n", + " # Create host entity record, with Azure network data if any is avaliable\n", + " if az_net_df is not None and isinstance(az_net_df, pd.DataFrame) and not az_net_df.empty:\n", + " host_entity = create_host_record(syslog_df=all_syslog_data, heartbeat_df=host_hb, az_net_df=az_net_df)\n", + " else:\n", + " host_entity = create_host_record(syslog_df=all_syslog_data, heartbeat_df=host_hb)\n", + "\n", + " md(\n", + " \"Host Details
\"\n", + " f\"Hostname: {host_entity.computer}
\"\n", + " f\"OS: {host_entity.OSType} {host_entity.OSName}
\"\n", + " f\"IP Address: {host_entity.IPAddress.Address}
\"\n", + " f\"Location: {host_entity.IPAddress.Location.CountryName}
\"\n", + " f\"Installed Applications: {host_entity.Applications}
\"\n", + " )\n", "else:\n", - " host_entity = create_host_record(\n", - " syslog_df=syslog_data, heartbeat_df=host_hb)\n", - "\n", - "display(\n", - " Markdown(\n", - " \"***Host Details***\\n\\n\"\n", - " f\"**Hostname**: {host_entity.computer} \\n\\n\"\n", - " f\"**OS**: {host_entity.OSType} {host_entity.OSName}\\n\\n\"\n", - " f\"**IP Address**: {host_entity.IPAddress.Address}\\n\\n\"\n", - " f\"**Location**: {host_entity.IPAddress.Location.CountryName}\\n\\n\"\n", - " f\"**Installed Applications**: {host_entity.Applications}\\n\\n\"\n", - " )\n", - ")\n", - "rel_alert_select = None\n", - "sudo_events = None" + " md_warn(\"No Syslog data found, check hostname and timeframe.\")\n", + " md(\"The data query may be timing out, consider reducing the timeframe size.\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "### Host Alerts\n", - "This section provides an overview of any security alerts in Azure Sentinel related to this host, this will help scope and guide our hunt." + "### Host Alerts & Bookmarks\n", + "This section provides an overview of any security alerts or Hunting Bookmarks in Azure Sentinel related to this host, this will help scope and guide our hunt." ] }, { "cell_type": "code", "execution_count": null, - "metadata": { - "ExecuteTime": { - "end_time": "2020-05-16T00:00:27.549794Z", - "start_time": "2020-05-16T00:00:25.664615Z" - } - }, + "metadata": {}, "outputs": [], "source": [ "related_alerts = qry_prov.SecurityAlert.list_related_alerts(\n", " query_times, host_name=hostname)\n", - "\n", + "realted_bookmarks = qry_prov.AzureSentinel.list_bookmarks_for_entity(query_times, entity_id=hostname)\n", "if isinstance(related_alerts, pd.DataFrame) and not related_alerts.empty:\n", " host_alert_items = (related_alerts[['AlertName', 'TimeGenerated']]\n", " .groupby('AlertName').TimeGenerated.agg('count').to_dict())\n", "\n", " def print_related_alerts(alertDict, entityType, entityName):\n", " if len(alertDict) > 0:\n", - " md(f\"### Found {len(alertDict)} different alert types related to this {entityType} (\\'{entityName}\\')\")\n", + " md(f\"Found {len(alertDict)} different alert types related to this {entityType} (\\'{entityName}\\')\")\n", " for (k, v) in alertDict.items():\n", " md(f\"- {k}, Count of alerts: {v}\")\n", " else:\n", " md(f\"No alerts for {entityType} entity \\'{entityName}\\'\")\n", "\n", - "\n", - "# Display alerts on timeline to aid in visual grouping\n", " print_related_alerts(host_alert_items, 'host', host_entity.HostName)\n", - " x = nbdisplay.display_timeline(\n", + " nbdisplay.display_timeline(\n", " data=related_alerts, source_columns=[\"AlertName\"], title=\"Host alerts over time\", height=300, color=\"red\")\n", "else:\n", - " md('No related alerts found.')" + " md('No related alerts found.')\n", + " \n", + "if isinstance(realted_bookmarks, pd.DataFrame) and not realted_bookmarks.empty:\n", + " nbdisplay.display_timeline(data=realted_bookmarks, source_columns=[\"BookmarkName\"], height=200, color=\"orange\", title=\"Host bookmarks over time\",)\n", + "else:\n", + " md('No related bookmarks found.')" ] }, { @@ -403,8 +373,8 @@ "execution_count": null, "metadata": { "ExecuteTime": { - "end_time": "2020-05-16T00:00:31.720767Z", - "start_time": "2020-05-16T00:00:31.664768Z" + "end_time": "2020-06-24T01:53:31.887372Z", + "start_time": "2020-06-24T01:53:31.826372Z" } }, "outputs": [], @@ -421,7 +391,7 @@ "if isinstance(related_alerts, pd.DataFrame) and not related_alerts.empty:\n", " related_alerts['CompromisedEntity'] = related_alerts['Computer']\n", " md('### Click on alert to view details.')\n", - " rel_alert_select = nbwidgets.AlertSelector(alerts=related_alerts,\n", + " rel_alert_select = nbwidgets.SelectAlert(alerts=related_alerts,\n", " action=show_full_alert)\n", " rel_alert_select.display()\n", "else:\n", @@ -441,8 +411,8 @@ "execution_count": null, "metadata": { "ExecuteTime": { - "end_time": "2020-05-16T00:01:56.991980Z", - "start_time": "2020-05-16T00:01:56.936979Z" + "end_time": "2020-06-24T01:53:32.233372Z", + "start_time": "2020-06-24T01:53:32.172372Z" } }, "outputs": [], @@ -453,8 +423,8 @@ " start = rel_alert_select.selected_alert['TimeGenerated']\n", "\n", "# Set new investigation time windows based on the selected alert\n", - "invest_times = nbwidgets.QueryTime(units='hours',\n", - " max_before=24, max_after=12, before=6, origin_time=start)\n", + "invest_times = nbwidgets.QueryTime(\n", + " units='day', max_before=24, max_after=12, before=1, after=1, origin_time=start)\n", "invest_times.display()" ] }, @@ -472,6 +442,27 @@ "You can choose to start below with a hunt in host logon events or choose to jump to one of the other sections listed above. The order in which you choose to run each of these major sections doesn't matter, they are each self contained. You may also choose to rerun sections based on your findings from running other sections." ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "This notebook uses external threat intelligence sources to enrich data. The next cell loads the TILookup class.\n", + "> **Note**: to use TILookup you will need configuration settings in your msticpyconfig.yaml\n", + ">
see [TIProviders documenation](https://msticpy.readthedocs.io/en/latest/TIProviders.html)\n", + ">
and [Configuring Notebook Environment notebook](./ConfiguringNotebookEnvironment.ipynb)\n", + ">
or [ConfiguringNotebookEnvironment (GitHub static view)](https://github.com/Azure/Azure-Sentinel-Notebooks/blob/master/ConfiguringNotebookEnvironment.ipynb)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "tilookup = TILookup()\n", + "md(\"Threat intelligence provider loading complete.\")" + ] + }, { "cell_type": "markdown", "metadata": {}, @@ -485,89 +476,91 @@ { "cell_type": "code", "execution_count": null, - "metadata": { - "ExecuteTime": { - "end_time": "2020-05-16T00:02:12.485265Z", - "start_time": "2020-05-16T00:02:10.553617Z" - } - }, + "metadata": {}, "outputs": [], "source": [ + "\n", "# Collect logon events for this, seperate them into sucessful and unsucessful and cluster sucessful one into sessions\n", - "logon_events = qry_prov.LinuxSyslog.user_logon(invest_times, host_name=hostname)\n", + "logon_events = qry_prov.LinuxSyslog.user_logon(start=invest_times.start, end=invest_times.end, host_name=hostname)\n", "remote_logons = None\n", "failed_logons = None\n", - "logon_sessions_df = None\n", + "\n", "if isinstance(logon_events, pd.DataFrame) and not logon_events.empty:\n", - " try:\n", - " remote_logons = (logon_events[logon_events['LogonResult'] == 'Success'])\n", - " failed_logons = (logon_events[logon_events['LogonResult'] == 'Failure'])\n", - " logon_sessions_df = cluster_syslog_logons_df(logon_events)\n", - " except:\n", - " print(\"No logon sessions in this timeframe\")\n", + " remote_logons = (logon_events[logon_events['LogonResult'] == 'Success'])\n", + " failed_logons = (logon_events[logon_events['LogonResult'] == 'Failure'])\n", "else:\n", " print(\"No logon events in this timeframe\")\n", "\n", "\n", - "if (remote_logons is not None and not remote_logons.empty) or (failed_logons is not None and not failed_logons.empty):\n", - " # Provide a timeline of sucessful and failed logon attempts to aid identification of potential brute force attacks\n", + "if not remote_logons.empty or not failed_logons.empty:\n", + "#Provide a timeline of sucessful and failed logon attempts to aid identification of potential brute force attacks\n", " display(Markdown('### Timeline of sucessful host logons.'))\n", - " tl_data = {\"Remote Logons\": {\"data\": remote_logons, \"source_columns\": ['User', 'ProcessName', 'SourceIP'], \"color\": \"Green\"},\n", - " \"Failed Logons\": {\"data\": failed_logons, \"source_columns\": ['User', 'ProcessName', 'SourceIP'], \"time_column\": \"TimeGenerated\", \"color\": \"Red\"}}\n", - " logon_timeline = nbdisplay.display_timeline(\n", - " data=tl_data, height=300, alert=rel_alert_select.selected_alert)\n", - " palette = viridis(5)\n", - " # Graph out failed/sucessful logons by account and by logon process\n", - " all_df = pd.DataFrame(dict(successful=remote_logons['ProcessName'].value_counts(\n", - " ), failed=failed_logons['ProcessName'].value_counts())).fillna(0)\n", - " fail_data = pd.value_counts(failed_logons['User'].values, sort=True).head(\n", - " 10).reset_index(name='value').rename(columns={'User': 'Count'})\n", - " fail_pie = None\n", - " sucess_pie = None\n", - " if not fail_data.empty:\n", - " fail_pie = fail_data.plot_bokeh.pie(x='index', y=\"value\", colormap=palette,\n", - " show_figure=False, title=\"Relative Frequencies of Failed Logons by Account\")\n", - " sucess_data = pd.value_counts(remote_logons['User'].values, sort=False).reset_index(\n", - " name='value').rename(columns={'User': 'Count'})\n", - " if not sucess_data.empty:\n", - " sucess_pie = sucess_data.plot_bokeh.pie(x='index', colormap=palette, y=\"value\",\n", - " show_figure=False, title=\"Relative Frequencies of Sucessful Logons by Account\")\n", + " tooltip_cols = ['User', 'ProcessName', 'SourceIP']\n", + " if rel_alert_select is not None:\n", + " logon_timeline = nbdisplay.display_timeline(data=remote_logons, overlay_data=failed_logons, source_columns=tooltip_cols, height=200, overlay_color=\"red\", alert = rel_alert_select.selected_alert)\n", + " else:\n", + " logon_timeline = nbdisplay.display_timeline(data=remote_logons, overlay_data=failed_logons, source_columns=tooltip_cols, height=200, overlay_color=\"red\")\n", + " display(Markdown('Key:

Sucessful logons

Failed Logon Attempts (via su)

')) \n", + "\n", + " all_df = pd.DataFrame(dict(successful= remote_logons['ProcessName'].value_counts(), failed = failed_logons['ProcessName'].value_counts())).fillna(0)\n", + " fail_data = pd.value_counts(failed_logons['User'].values, sort=True).head(10).reset_index(name='value').rename(columns={'User':'Count'})\n", + " fail_data['angle'] = fail_data['value']/fail_data['value'].sum() * 2*pi\n", + " fail_data['color'] = viridis(len(fail_data))\n", + " fp = figure(plot_height=350, plot_width=450, title=\"Relative Frequencies of Failed Logons by Account\", toolbar_location=None, tools=\"hover\", tooltips=\"@index: @value\")\n", + " fp.wedge(x=0, y=1, radius=0.5, start_angle=cumsum('angle', include_zero=True), end_angle=cumsum('angle'), line_color=\"white\", fill_color='color', legend='index', source=fail_data)\n", + "\n", + " sucess_data = pd.value_counts(remote_logons['User'].values, sort=False).reset_index(name='value').rename(columns={'User':'Count'})\n", + " sucess_data['angle'] = sucess_data['value']/sucess_data['value'].sum() * 2*pi\n", + " sucess_data['color'] = viridis(len(sucess_data))\n", + " sp = figure(plot_height=350, width=450, title=\"Relative Frequencies of Sucessful Logons by Account\", toolbar_location=None, tools=\"hover\", tooltips=\"@index: @value\")\n", + " sp.wedge(x=0, y=1, radius=0.5, start_angle=cumsum('angle', include_zero=True), end_angle=cumsum('angle'), line_color=\"white\", fill_color='color', legend='index', source=sucess_data)\n", + "\n", + " fp.axis.axis_label=None\n", + " fp.axis.visible=False\n", + " fp.grid.grid_line_color = None\n", + " sp.axis.axis_label=None\n", + " sp.axis.visible=False\n", + " sp.grid.grid_line_color = None\n", + "\n", + "\n", " processes = all_df.index.values.tolist()\n", - " fail_sucess_data = pd.DataFrame({'processes': processes,\n", - " 'sucess': all_df['successful'].values.tolist(),\n", - " 'failure': all_df['failed'].values.tolist()})\n", - "\n", - " process_bar = fail_sucess_data.plot_bokeh.bar(\n", - " x=\"processes\", colormap=palette, show_figure=False, title=\"Failed and Sucessful logon attempts by process\")\n", - " pandas_bokeh.plot_grid(\n", - " [[fail_pie, sucess_pie], [process_bar]], plot_width=450, plot_height=300)\n", - "\n", - " # Convert logon IPs to IP entities in order to get location\n", - " ip_entity = entityschema.IpAddress()\n", - " #Is there a better way to do this rather than reseting the list each time.\n", - " ip_list = []\n", - " for ip_logon in remote_logons['SourceIP']:\n", - " ip_list.extend(convert_to_ip_entities(ip_logon))\n", - " ip_fail_list = []\n", - " for ip_fail in failed_logons['SourceIP']:\n", - " ip_fail_list.extend(convert_to_ip_entities(ip_fail))\n", - "\n", - " # Get center location of all IP locaitons to set map default\n", + " results = all_df.columns.values.tolist()\n", + " fail_sucess_data = {'processes' :processes,\n", + " 'sucess' : all_df['successful'].values.tolist(),\n", + " 'failure': all_df['failed'].values.tolist()}\n", + "\n", + " palette = viridis(2)\n", + " x = [ (process, result) for process in processes for result in results ]\n", + " counts = sum(zip(fail_sucess_data['sucess'], fail_sucess_data['failure']), ()) \n", + " source = ColumnDataSource(data=dict(x=x, counts=counts))\n", + " b = figure(x_range=FactorRange(*x), plot_height=350, plot_width=450, title=\"Failed and Sucessful logon attempts by process\",\n", + " toolbar_location=None, tools=\"\", y_minor_ticks=2)\n", + " b.vbar(x='x', top='counts', width=0.9, source=source, line_color=\"white\",\n", + " fill_color=factor_cmap('x', palette=palette, factors=results, start=1, end=2))\n", + " b.y_range.start = 0\n", + " b.x_range.range_padding = 0.1\n", + " b.xaxis.major_label_orientation = 1\n", + " b.xgrid.grid_line_color = None\n", + "\n", + " show(Row(sp,fp,b))\n", + "\n", + " ip_list = [convert_to_ip_entities(i)[0] for i in remote_logons['SourceIP']]\n", + " ip_fail_list = [convert_to_ip_entities(i)[0] for i in failed_logons['SourceIP']]\n", + " \n", " location = get_map_center(ip_list + ip_fail_list)\n", - " folium_map = FoliumMap(location=location, zoom_start=4)\n", - "\n", - " # Map logon locations to allow for identification of anomolous locations\n", + " folium_map = FoliumMap(location = location, zoom_start=1.4)\n", + " #Map logon locations to allow for identification of anomolous locations\n", " if len(ip_fail_list) > 0:\n", - " display(HTML('

Map of Originating Location of Logon Attempts

'))\n", + " md('

Map of Originating Location of Logon Attempts

')\n", " icon_props = {'color': 'red'}\n", " folium_map.add_ip_cluster(ip_entities=ip_fail_list, **icon_props)\n", " if len(ip_list) > 0:\n", " icon_props = {'color': 'green'}\n", " folium_map.add_ip_cluster(ip_entities=ip_list, **icon_props)\n", - " display(folium_map.folium_map)\n", - " display(Markdown('

Warning: the folium mapping library '\n", + " display(folium_map.folium_map)\n", + " md('

Warning: the folium mapping library '\n", " 'does not display correctly in some browsers.


'\n", - " 'If you see a blank image please retry with a different browser.'))" + " 'If you see a blank image please retry with a different browser.') \n" ] }, { @@ -583,22 +576,24 @@ "execution_count": null, "metadata": { "ExecuteTime": { - "end_time": "2020-05-16T00:02:19.836782Z", - "start_time": "2020-05-16T00:02:19.732781Z" + "end_time": "2020-06-24T01:53:38.073770Z", + "start_time": "2020-06-24T01:53:37.978770Z" } }, "outputs": [], "source": [ - "import datetime as dt\n", - "def to_utc(time):\n", - " ts = (time - np.datetime64('1970-01-01T00:00:00')) / np.timedelta64(1, 's')\n", - " time = dt.datetime.utcfromtimestamp(ts) \n", - " return time\n", + "logon_sessions_df = None\n", + "try:\n", + " print(\"Clustering logon sessions...\")\n", + " logon_sessions_df = cluster_syslog_logons_df(logon_events)\n", + "except Exception as err:\n", + " print(f\"Error clustering logons: {err}\")\n", + "\n", "if logon_sessions_df is not None:\n", " logon_sessions_df[\"Alerts during session?\"] = np.nan\n", " # check if any alerts occur during logon window.\n", - " logon_sessions_df['Start (UTC)'] = [(to_utc(time) - dt.timedelta(seconds=5)) for time in logon_sessions_df['Start']]\n", - " logon_sessions_df['End (UTC)'] = [(to_utc(time) + dt.timedelta(seconds=5)) for time in logon_sessions_df['End']]\n", + " logon_sessions_df['Start (UTC)'] = [(time - dt.timedelta(seconds=5)) for time in logon_sessions_df['Start']]\n", + " logon_sessions_df['End (UTC)'] = [(time + dt.timedelta(seconds=5)) for time in logon_sessions_df['End']]\n", "\n", " for TimeGenerated in related_alerts['TimeGenerated']:\n", " logon_sessions_df.loc[(TimeGenerated >= logon_sessions_df['Start (UTC)']) & (TimeGenerated <= logon_sessions_df['End (UTC)']), \"Alerts during session?\"] = \"Yes\"\n", @@ -631,14 +626,16 @@ " display(logon_sessions_df[['User','Start (UTC)', 'End (UTC)', 'Alerts during session?', 'Sucessful to failed logon ratio', 'Root?']]\n", " .style.applymap(color_cells).hide_index())\n", "\n", - " logon_items = logon_sessions_df[['User','Start (UTC)', 'End (UTC)']].to_string(header=False,\n", - " index=False,\n", - " index_names=False).split('\\n')\n", - " logon_sessions_df[\"Key\"] = logon_items\n", + " logon_items = (\n", + " logon_sessions_df[['User','Start (UTC)', 'End (UTC)']]\n", + " .to_string(header=False, index=False, index_names=False)\n", + " .split('\\n')\n", + " )\n", + " logon_sessions_df[\"Key\"] = logon_items \n", " logon_sessions_df.set_index('Key', inplace=True)\n", " logon_dict = logon_sessions_df[['User','Start (UTC)', 'End (UTC)']].to_dict('index')\n", "\n", - " logon_selection = nbwidgets.SelectString(description='Select logon session to investigate: ',\n", + " logon_selection = nbwidgets.SelectItem(description='Select logon session to investigate: ',\n", " item_dict=logon_dict , width='80%', auto_display=True)\n", "else:\n", " md(\"No logon sessions during this timeframe\")" @@ -656,17 +653,16 @@ "execution_count": null, "metadata": { "ExecuteTime": { - "end_time": "2020-05-16T00:02:33.394888Z", - "start_time": "2020-05-16T00:02:24.256128Z" + "end_time": "2020-06-24T01:53:44.059818Z", + "start_time": "2020-06-24T01:53:40.909226Z" } }, "outputs": [], "source": [ "def view_syslog(selected_facility):\n", - " display(syslog_events.query('Facility == @selected_facility'))\n", + " return [syslog_events.query('Facility == @selected_facility')]\n", "\n", "# Produce a summary of user modification actions taken\n", - "def action_count(x):\n", " if \"Add\" in x:\n", " return len(add_events.replace(\"\", np.nan).dropna(subset=['User'])['User'].unique().tolist())\n", " elif \"Modify\" in x:\n", @@ -675,6 +671,10 @@ " return len(del_events.replace(\"\", np.nan).dropna(subset=['User'])['User'].unique().tolist())\n", " else:\n", " return \"\"\n", + "\n", + "crn_tl_data = {}\n", + "user_tl_data = {}\n", + "sudo_tl_data = {}\n", "sudo_sessions = None\n", "tooltip_cols = ['SyslogMessage']\n", "if logon_sessions_df is not None:\n", @@ -686,31 +686,24 @@ " start=session.StartTimeUtc, end=session.EndTimeUtc, host_name=session.Host)\n", " sudo_events = qry_prov.LinuxSyslog.sudo_activity(\n", " start=session.StartTimeUtc, end=session.EndTimeUtc, host_name=session.Host, user=session.Account)\n", - "\n", + " \n", " if isinstance(sudo_events, pd.DataFrame) and not sudo_events.empty:\n", - " sudo_events[['Command', 'CommandCall']].replace('', np.nan, inplace=True)\n", " try:\n", - " sudo_sessions = cluster_syslog_logons_df(logon_events=(sudo_events))\n", - " except:\n", + " sudo_sessions = cluster_syslog_logons_df(logon_events=sudo_events)\n", + " except MsticpyException:\n", " pass\n", "\n", " # Display summary of cron activity in session\n", " cron_events = qry_prov.LinuxSyslog.cron_activity(\n", " start=session.StartTimeUtc, end=session.EndTimeUtc, host_name=session.Host)\n", - " if not isinstance(cron_events, pd.DataFrame):\n", - " display(HTML(\n", - " f'

No Cron activity for {session.Host} between {session.StartTimeUtc} and {session.EndTimeUtc}

'))\n", - " crn_tl_data = {}\n", + " if not isinstance(cron_events, pd.DataFrame) or cron_events.empty:\n", + " md(f'

No Cron activity for {session.Host} between {session.StartTimeUtc} and {session.EndTimeUtc}

')\n", " else:\n", - "\n", " cron_events['CMD'].replace('', np.nan, inplace=True)\n", - "\n", " crn_tl_data = {\"Cron Exections\": {\"data\": cron_events[['TimeGenerated', 'CMD', 'CronUser', 'SyslogMessage']].dropna(), \"source_columns\": tooltip_cols, \"color\": \"Blue\"},\n", " \"Cron Edits\": {\"data\": cron_events.loc[cron_events['SyslogMessage'].str.contains('EDIT')], \"source_columns\": tooltip_cols, \"color\": \"Green\"}}\n", - "\n", - " display(HTML('

Most common commands run by cron:

'))\n", - " display(HTML(\n", - " 'This shows how often each cron job was exected within the specified time window'))\n", + " md('

Most common commands run by cron:

')\n", + " md('This shows how often each cron job was exected within the specified time window')\n", " cron_commands = (cron_events[['EventTime', 'CMD']]\n", " .groupby(['CMD']).count()\n", " .dropna()\n", @@ -723,10 +716,8 @@ " # Display summary of user and group creations, deletions and modifications during the session\n", " user_activity = qry_prov.LinuxSyslog.user_group_activity(\n", " start=session.StartTimeUtc, end=session.EndTimeUtc, host_name=session.Host)\n", - "\n", - " if not isinstance(user_activity, pd.DataFrame) and not use_activity.empty:\n", - " display(HTML(\n", - " f' No user or group moidifcations for {session.Host} between {session.StartTimeUtc} and {session.EndTimeUtc}'))\n", + " if not isinstance(user_activity, pd.DataFrame) or user_activity.empty:\n", + " md(f'

No user or group moidifcations for {session.Host} between {session.StartTimeUtc} and {session.EndTimeUtc}>

')\n", " else:\n", " add_events = user_activity[user_activity['UserGroupAction'].str.contains(\n", " 'Add')]\n", @@ -736,12 +727,11 @@ " 'Modify')]\n", " user_activity['Count'] = user_activity.groupby('UserGroupAction')['UserGroupAction'].transform('count')\n", " if add_events.empty and del_events.empty and mod_events.empty:\n", - " display(HTML('

Users and groups added or deleted:'))\n", - " display(HTML(\n", - " f'No users or groups were added or deleted on {host_entity.HostName} between {query_times.start} and {query_times.end}'))\n", + " md('

Users and groups added or deleted:')\n", + " md(f'No users or groups were added or deleted on {host_entity.HostName} between {query_times.start} and {query_times.end}')\n", " user_tl_data = {}\n", " else:\n", - " display(HTML(\"

Users added, modified or deleted

\"))\n", + " md(\"

Users added, modified or deleted

\")\n", " display(user_activity[['UserGroupAction','Count']].drop_duplicates().style.hide_index())\n", " account_actions = pd.DataFrame({\"User Additions\": [add_events.replace(\"\", np.nan).dropna(subset=['User'])['User'].unique().tolist()],\n", " \"User Modifications\": [mod_events.replace(\"\", np.nan).dropna(subset=['User'])['User'].unique().tolist()],\n", @@ -750,54 +740,60 @@ " user_tl_data = {\"User adds\": {\"data\": add_events, \"source_columns\": tooltip_cols, \"color\": \"Orange\"},\n", " \"User deletes\": {\"data\": del_events, \"source_columns\": tooltip_cols, \"color\": \"Red\"},\n", " \"User modfications\": {\"data\": mod_events, \"source_columns\": tooltip_cols, \"color\": \"Grey\"}}\n", + " \n", " # Display sudo activity during session\n", - " if sudo_sessions is None:\n", - " md(f\"No Sudo sessions for {session.Host} between {logon_selection.value.get('Start (UTC)')} and {logon_selection.value.get('End (UTC)')}\")\n", - " sudo_tl_data = {}\n", + " if not isinstance(sudo_sessions, pd.DataFrame) or sudo_sessions.empty:\n", + " md(f\"

No Sudo sessions for {session.Host} between {logon_selection.value.get('Start (UTC)')} and {logon_selection.value.get('End (UTC)')}

\")\n", + " sudo_tl_data = {}\n", + " else:\n", + " sudo_start = sudo_events[sudo_events[\"SyslogMessage\"].str.contains(\n", + " \"pam_unix.+session opened\")].rename(columns={\"Sudoer\": \"User\"})\n", + " sudo_tl_data = {\"Host logons\": {\"data\": remote_logons, \"source_columns\": tooltip_cols, \"color\": \"Cyan\"},\n", + " \"Sudo sessions\": {\"data\": sudo_start, \"source_columns\": tooltip_cols, \"color\": \"Purple\"}}\n", + " try:\n", + " risky_actions = cmd_line.risky_cmd_line(events=sudo_events, log_type=\"Syslog\")\n", + " suspicious_events = cmd_speed(\n", + " cmd_events=sudo_events, time=60, events=2, cmd_field=\"Command\")\n", + " except:\n", + " risky_actions = None\n", + " suspicious_events = None\n", + " if risky_actions is None and suspicious_events is None:\n", + " pass\n", " else:\n", - " sudo_start = sudo_events[sudo_events[\"SyslogMessage\"].str.contains(\n", - " \"pam_unix.+session opened\")].rename(columns={\"Sudoer\": \"User\"})\n", - " sudo_tl_data = {\"Host logons\": {\"data\": remote_logons, \"source_columns\": tooltip_cols, \"color\": \"Cyan\"},\n", - " \"Sudo sessions\": {\"data\": sudo_start, \"source_columns\": tooltip_cols, \"color\": \"Purple\"}}\n", - " try:\n", - " risky_actions = cmd_line.risky_cmd_line(events=sudo_events, log_type=\"Syslog\")\n", - " suspicious_events = cmd_speed(\n", - " cmd_events=sudo_events, time=60, events=2, cmd_field=\"Command\")\n", - " except:\n", - " risky_actions = None\n", - " suspicious_events = None\n", - " if risky_actions is None and suspicious_events is None:\n", - " pass\n", + " risky_sessions = risky_sudo_sessions(\n", + " risky_actions=risky_actions, sudo_sessions=sudo_sessions, suspicious_actions=suspicious_events)\n", + " for key in risky_sessions:\n", + " if key in sudo_sessions:\n", + " sudo_sessions[f\"{key} - {risky_sessions[key]}\"] = sudo_sessions.pop(\n", + " key)\n", + " \n", + " if isinstance(sudo_events, pd.DataFrame):\n", + " sudo_events_val = sudo_events[['EventTime', 'CommandCall']][sudo_events['CommandCall']!=\"\"].dropna(how='any', subset=['CommandCall'])\n", + " if sudo_events_val.empty:\n", + " md(f\"No sucessful sudo activity for {hostname} between {logon_selection.value.get('Start (UTC)')} and {logon_selection.value.get('End (UTC)')}\")\n", " else:\n", - " risky_sessions = risky_sudo_sessions(\n", - " risky_actions=risky_actions, sudo_sessions=sudo_sessions, suspicious_actions=suspicious_events)\n", - " for key in risky_sessions:\n", - " if key in sudo_sessions:\n", - " sudo_sessions[f\"{key} - {risky_sessions[key]}\"] = sudo_sessions.pop(\n", - " key)\n", - "\n", - " if sudo_events.empty:\n", - " md(f\"No sucessful sudo activity for {hostname} between {logon_selection.value.get('Start (UTC)')} and {logon_selection.value.get('End (UTC)')}\")\n", + " sudo_events.replace(\"\", np.nan, inplace=True)\n", + " md('

Frequency of sudo commands

')\n", + " md('This shows how many times each command has been run with sudo. /bin/bash is usally associated with the use of \"sudo -i\"')\n", + " sudo_commands = (sudo_events[['EventTime', 'CommandCall']]\n", + " .groupby(['CommandCall'])\n", + " .count()\n", + " .dropna()\n", + " .style\n", + " .set_table_attributes('width=900px, text-align=center')\n", + " .background_gradient(cmap='Reds', low=.5, high=1)\n", + " .format(\"{0:0>3.0f}\"))\n", + " display(sudo_commands)\n", " else:\n", - " sudo_events.replace(\"\", np.nan, inplace=True)\n", - " display(HTML('

Frequency of sudo commands

'))\n", - " display(HTML('This shows how many times each command has been run with sudo. /bin/bash is usally associated with the use of \"sudo -i\"'))\n", - " sudo_commands = (sudo_events[['EventTime', 'CommandCall']]\n", - " .groupby(['CommandCall'])\n", - " .count()\n", - " .dropna()\n", - " .style\n", - " .set_table_attributes('width=900px, text-align=center')\n", - " .background_gradient(cmap='Reds', low=.5, high=1)\n", - " .format(\"{0:0>3.0f}\"))\n", - " display(sudo_commands)\n", + " md(f\"No sucessful sudo activity for {hostname} between {logon_selection.value.get('Start (UTC)')} and {logon_selection.value.get('End (UTC)')}\") \n", "\n", " # Display a timeline of all activity during session\n", " crn_tl_data.update(user_tl_data)\n", " crn_tl_data.update(sudo_tl_data)\n", - " display(HTML('

Session Timeline.

'))\n", - " nbdisplay.display_timeline(\n", - " data=crn_tl_data, title='Session Timeline', height=300)\n", + " if crn_tl_data:\n", + " md('

Session Timeline.

')\n", + " nbdisplay.display_timeline(\n", + " data=crn_tl_data, title='Session Timeline', height=300)\n", "else:\n", " md(\"No logon sessions during this timeframe\")" ] @@ -815,13 +811,13 @@ "execution_count": null, "metadata": { "ExecuteTime": { - "end_time": "2020-05-16T00:04:55.506355Z", - "start_time": "2020-05-16T00:04:52.200963Z" + "end_time": "2020-06-24T01:53:47.432915Z", + "start_time": "2020-06-24T01:53:45.628367Z" } }, "outputs": [], "source": [ - "if logon_sessions_df is not None:\n", + "if isinstance(logon_sessions_df, pd.DataFrame) and not logon_sessions_df.empty:\n", " #Return syslog data and present it to the use for investigation\n", " session_syslog = qry_prov.LinuxSyslog.all_syslog(\n", " start=session.StartTimeUtc, end=session.EndTimeUtc, host_name=session.Host)\n", @@ -831,15 +827,13 @@ "\n", "\n", " def view_sudo(selected_cmd):\n", - " display(sudo_events.query('CommandCall == @selected_cmd')[\n", - " ['TimeGenerated', 'SyslogMessage', 'Sudoer', 'SudoTo', 'Command', 'CommandCall']])\n", + " return [sudo_events.query('CommandCall == @selected_cmd')[\n", + " ['TimeGenerated', 'SyslogMessage', 'Sudoer', 'SudoTo', 'Command', 'CommandCall']]]\n", "\n", " # Show syslog messages associated with selected sudo command\n", - " display(HTML(\"

View all messages assocated with a sudo command

\"))\n", + " md(\"

View all messages associated with a sudo command

\")\n", " items = sudo_events['CommandCall'].dropna().unique().tolist()\n", - " cmd_w = widgets.Dropdown(\n", - " options=items, description='Select sudo command facility to examine', disabled=False, **WIDGET_DEFAULTS)\n", - " display(widgets.interactive(view_sudo, selected_cmd=cmd_w))\n", + " display(nbwidgets.SelectItem(item_list=items, action=view_sudo))\n", "else:\n", " md(\"No logon sessions during this timeframe\")" ] @@ -849,19 +843,17 @@ "execution_count": null, "metadata": { "ExecuteTime": { - "end_time": "2020-05-16T00:04:56.376353Z", - "start_time": "2020-05-16T00:04:56.284354Z" + "end_time": "2020-06-24T01:53:48.221915Z", + "start_time": "2020-06-24T01:53:48.175914Z" } }, "outputs": [], "source": [ - "if logon_sessions_df is not None:\n", + "if isinstance(logon_sessions_df, pd.DataFrame) and not logon_sessions_df.empty:\n", " # Display syslog messages from the session witht he facility selected\n", " items = syslog_events['Facility'].dropna().unique().tolist()\n", - " display(HTML(\"

View all messages assocated with a syslog facility

\"))\n", - " sess_w = widgets.Dropdown(\n", - " options=items, description='Select syslog facility to examine', disabled=False, **WIDGET_DEFAULTS)\n", - " display(widgets.interactive(view_syslog, selected_facility=sess_w))\n", + " md(\"

View all messages associated with a syslog facility

\")\n", + " display(nbwidgets.SelectItem(item_list=items, action=view_syslog))\n", "else:\n", " md(\"No logon sessions during this timeframe\")" ] @@ -878,13 +870,13 @@ "execution_count": null, "metadata": { "ExecuteTime": { - "end_time": "2020-05-16T00:05:06.090478Z", - "start_time": "2020-05-16T00:04:57.481284Z" + "end_time": "2020-06-24T01:53:51.672525Z", + "start_time": "2020-06-24T01:53:50.175953Z" } }, "outputs": [], "source": [ - "if logon_sessions_df is not None:\n", + "if isinstance(logon_sessions_df, pd.DataFrame) and not logon_sessions_df.empty:\n", " display(HTML(\"

Process Trees from session

\"))\n", " print(\"Building process tree, this may take some time...\")\n", " # Find the table with auditd data in\n", @@ -934,8 +926,8 @@ "execution_count": null, "metadata": { "ExecuteTime": { - "end_time": "2020-05-16T00:05:14.828475Z", - "start_time": "2020-05-16T00:05:14.794474Z" + "end_time": "2020-06-24T01:53:55.462637Z", + "start_time": "2020-06-24T01:53:55.422637Z" } }, "outputs": [], @@ -948,45 +940,26 @@ " sudo_sessions.set_index('Key', inplace=True)\n", " sudo_dict = sudo_sessions[['User','Start', 'End']].to_dict('index')\n", "\n", - " sudo_selection = nbwidgets.SelectString(description='Select sudo session to investigate: ',\n", + " sudo_selection = nbwidgets.SelectItem(description='Select sudo session to investigate: ',\n", " item_dict=sudo_dict, width='100%', height='300px', auto_display=True)\n", "else:\n", " sudo_selection = None\n", " md(\"No logon sessions during this timeframe\")" ] }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Load TILookup class\n", - "> **Note**: to use TILookup you will need configuration settings in your msticpyconfig.yaml\n", - ">
see [TIProviders documenation](https://msticpy.readthedocs.io/en/latest/TIProviders.html)\n", - ">
and [Configuring Notebook Environment notebook](./ConfiguringNotebookEnvironment.ipynb)\n", - ">
or [ConfiguringNotebookEnvironment (GitHub static view)](https://github.com/Azure/Azure-Sentinel-Notebooks/blob/master/ConfiguringNotebookEnvironment.ipynb)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "tilookup = TILookup()" - ] - }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { - "end_time": "2020-05-16T00:05:25.237213Z", - "start_time": "2020-05-16T00:05:21.813842Z" + "end_time": "2020-06-24T01:57:23.902023Z", + "start_time": "2020-06-24T01:57:21.856481Z" } }, "outputs": [], "source": [ "#Collect data associated with the sudo session selected\n", + "sudo_events = None\n", "from msticpy.sectools.tiproviders.ti_provider_base import TISeverity\n", "\n", "def ti_check_sev(severity, threshold):\n", @@ -1034,7 +1007,7 @@ " display(sudo_events[sudo_events['SyslogMessage'].str.contains(\n", " ioc)][['TimeGenerated', 'SyslogMessage']])\n", " else:\n", - " md(\"No IoC patterns found in Syslog Messages.\")\n", + " md(\"No IoC patterns found in Syslog Messages.\")\n", " else:\n", " md('No sudo messages for this session')\n", "\n", @@ -1074,8 +1047,8 @@ "execution_count": null, "metadata": { "ExecuteTime": { - "end_time": "2020-05-16T00:05:40.283882Z", - "start_time": "2020-05-16T00:05:38.659404Z" + "end_time": "2020-06-24T01:57:32.366086Z", + "start_time": "2020-06-24T01:57:31.372985Z" } }, "outputs": [], @@ -1093,7 +1066,7 @@ "\n", "# Pick Users\n", "if not logon_events.empty:\n", - " user_select = nbwidgets.SelectString(description='Select user to investigate: ',\n", + " user_select = nbwidgets.SelectItem(description='Select user to investigate: ',\n", " item_list=all_users, width='75%', auto_display=True)\n", "else:\n", " md(\"There was no user activity in the timeframe specified.\")\n", @@ -1105,8 +1078,8 @@ "execution_count": null, "metadata": { "ExecuteTime": { - "end_time": "2020-05-16T00:05:43.703387Z", - "start_time": "2020-05-16T00:05:41.635697Z" + "end_time": "2020-06-24T01:57:35.805460Z", + "start_time": "2020-06-24T01:57:33.955397Z" } }, "outputs": [], @@ -1114,18 +1087,18 @@ "folium_user_map = FoliumMap()\n", "\n", "def view_sudo(cmd):\n", - " display(user_sudo_hold.query('CommandCall == @cmd')[\n", - " ['TimeGenerated', 'HostName', 'Command', 'CommandCall', 'SyslogMessage']])\n", + " return [user_sudo_hold.query('CommandCall == @cmd')[\n", + " ['TimeGenerated', 'HostName', 'Command', 'CommandCall', 'SyslogMessage']]]\n", "user_sudo_hold = None\n", "if user_select is not None:\n", " # Get all syslog relating to these users\n", " username = user_select.value\n", - " user_events = all_syslog[all_syslog['SyslogMessage'].str.contains(username)]\n", + " user_events = all_syslog_data[all_syslog_data['SyslogMessage'].str.contains(username)]\n", " logon_sessions = cluster_syslog_logons_df(logon_events)\n", "\n", " # Display all logons associated with the user\n", - " display(HTML(f\"

User Logon Activity for {username}

\"))\n", - " user_logon_events = logon_events.loc[logon_events['User'] == username]\n", + " md(f\"

User Logon Activity for {username}

\")\n", + " user_logon_events = logon_events[logon_events['User'] == username]\n", " try:\n", " user_logon_sessions = cluster_syslog_logons_df(user_logon_events)\n", " except:\n", @@ -1141,7 +1114,7 @@ " for _, row in logon_sessions_df.iterrows():\n", " end = row['End']\n", " user_sudo_events = qry_prov.LinuxSyslog.sudo_activity(start=user_remote_logons.sort_values(\n", - " by='TimeGenerated')['TimeGenerated'].head(1).values[0], end=end, host_name=hostname, user=username)\n", + " by='TimeGenerated')['TimeGenerated'].iloc[0], end=end, host_name=hostname, user=username)\n", " else: \n", " user_sudo_events = None\n", "\n", @@ -1162,75 +1135,66 @@ "\n", " nbdisplay.display_timeline(\n", " data=user_tl_data, title=\"User logon timeline\", height=300)\n", + " \n", + " all_user_df = pd.DataFrame(dict(successful= user_remote_logons['ProcessName'].value_counts(), failed = user_failed_logons['ProcessName'].value_counts())).fillna(0)\n", + " processes = all_user_df.index.values.tolist()\n", + " results = all_user_df.columns.values.tolist()\n", + " user_fail_sucess_data = {'processes' :processes,\n", + " 'sucess' : all_user_df['successful'].values.tolist(),\n", + " 'failure': all_user_df['failed'].values.tolist()}\n", "\n", " palette = viridis(2)\n", - " # Graph out failed/sucessful logons by account and by logon process\n", - " all_user_df = pd.DataFrame(dict(successful=user_remote_logons['ProcessName'].value_counts(\n", - " ), failed=user_failed_logons['ProcessName'].value_counts())).fillna(0).T\n", - "\n", - " user_processes = all_user_df.columns.values.tolist()\n", - "\n", - " fail_sucess_user_data = pd.DataFrame({'processes': user_processes,\n", - " 'sucess': all_user_df.loc['successful'].values.tolist(),\n", - " 'failure': all_user_df.loc['failed'].astype(int).values.tolist()})\n", - "\n", - " user_process_bar = fail_sucess_user_data.plot_bokeh.bar(\n", - " x=\"processes\", colormap=palette, show_figure=False, title=\"Failed and Sucessful logon attempts by process\")\n", - " user_logons = pd.DataFrame({\"Sucessful Logons\" : [int(all_user_df.loc['successful'].sum())],\n", - " \"Failed Logons\" : [int(all_user_df.loc['failed'].sum())]}).T\n", - "\n", - " user_ratio_pie =user_logons.plot_bokeh.pie(colormap = palette,\n", - " show_figure = False, title = \"Relative Frequencies of Sucessful Logons by Account\")\n", - "\n", - " pandas_bokeh.plot_grid([[user_ratio_pie, user_process_bar], \n", - " []], plot_width = 450, plot_height = 300)\n", - "\n", - "\n", - " # Convert logon IPs to IP entities in order to get location\n", - " ip_entity = entityschema.IpAddress()\n", + " x = [ (process, result) for process in processes for result in results ]\n", + " counts = sum(zip(user_fail_sucess_data['sucess'], fail_sucess_data['failure']), ()) \n", + " source = ColumnDataSource(data=dict(x=x, counts=counts))\n", + " b = figure(x_range=FactorRange(*x), plot_height=350, plot_width=450, title=\"Failed and Sucessful logon attempts by process\",\n", + " toolbar_location=None, tools=\"\", y_minor_ticks=2)\n", + " b.vbar(x='x', top='counts', width=0.9, source=source, line_color=\"white\",\n", + " fill_color=factor_cmap('x', palette=palette, factors=results, start=1, end=2))\n", + " b.y_range.start = 0\n", + " b.x_range.range_padding = 0.1\n", + " b.xaxis.major_label_orientation = 1\n", + " b.xgrid.grid_line_color = None\n", + " user_logons = pd.DataFrame({\"Sucessful Logons\" : [int(all_user_df['successful'].sum())],\n", + " \"Failed Logons\" : [int(all_user_df['failed'].sum())]}).T\n", + " user_logon_data = pd.value_counts(user_logon_events['LogonResult'].values, sort=True).head(10).reset_index(name='value').rename(columns={'User':'Count'})\n", + " user_logon_data = user_logon_data[user_logon_data['index']!=\"Unknown\"].copy()\n", + " user_logon_data['angle'] = user_logon_data['value']/user_logon_data['value'].sum() * 2*pi\n", + " user_logon_data['color'] = viridis(len(user_logon_data))\n", + " p = figure(plot_height=350, plot_width=450, title=\"Relative Frequencies of Failed Logons by Account\", toolbar_location=None, tools=\"hover\", tooltips=\"@index: @value\")\n", + " p.axis.visible = False\n", + " p.xgrid.visible = False\n", + " p.ygrid.visible = False\n", + " p.wedge(x=0, y=1, radius=0.5, start_angle=cumsum('angle', include_zero=True), end_angle=cumsum('angle'), line_color=\"white\", fill_color='color', legend='index', source=user_logon_data)\n", + " show(Row(p,b)) \n", " \n", - " user_ip_list = []\n", - " for ip_logon in user_remote_logons['SourceIP']:\n", - " user_ip_list.extend(convert_to_ip_entities(ip_logon))\n", - " user_ip_fail_list = []\n", - " for ip_logon in user_failed_logons['SourceIP']:\n", - " user_ip_fail_list.extend(convert_to_ip_entities(ip_logon))\n", - " \n", - " folium_user_map=FoliumMap(location=location, zoom_start=3)\n", - " if not user_ip_list and not user_ip_fail_list:\n", - " print(\"No user events\")\n", - " elif not user_ip_list and user_ip_fail_list:\n", - " icon_props={'color': 'red'}\n", - " folium_user_map.add_ip_cluster(ip_entities=user_ip_fail_list, **icon_props)\n", - " elif not user_ip_fail_list and user_ip_list:\n", - " icon_props = {'color': 'green'}\n", - " folium_user_map.add_ip_cluster(ip_entities=user_ip_list, **icon_props)\n", - " else:\n", + " user_ip_list = [convert_to_ip_entities(i)[0] for i in user_remote_logons['SourceIP']]\n", + " user_ip_fail_list = [convert_to_ip_entities(i)[0] for i in user_failed_logons['SourceIP']]\n", + " \n", + " user_location = get_map_center(ip_list + ip_fail_list)\n", + " user_folium_map = FoliumMap(location = location, zoom_start=1.4)\n", + " #Map logon locations to allow for identification of anomolous locations\n", + " if len(ip_fail_list) > 0:\n", + " md('

Map of Originating Location of Logon Attempts

')\n", " icon_props = {'color': 'red'}\n", - " folium_user_map.add_ip_cluster(ip_entities=user_ip_fail_list, **icon_props)\n", + " user_folium_map.add_ip_cluster(ip_entities=user_ip_fail_list, **icon_props)\n", + " if len(ip_list) > 0:\n", " icon_props = {'color': 'green'}\n", - " folium_user_map.add_ip_cluster(ip_entities=user_ip_list, **icon_props)\n", - "\n", - " folium_user_map.center_map()\n", - " display(HTML('

Map of Originating Location of Logon Attempts

'))\n", - " display(folium_user_map)\n", - " display(Markdown('

Warning: the folium mapping library '\n", + " user_folium_map.add_ip_cluster(ip_entities=user_ip_list, **icon_props)\n", + " display(user_folium_map.folium_map)\n", + " md('

Warning: the folium mapping library '\n", " 'does not display correctly in some browsers.


'\n", - " 'If you see a blank image please retry with a different browser.'))\n", - "\n", - "\n", - "\n", + " 'If you see a blank image please retry with a different browser.') \n", + " \n", " #Display sudo activity of the user \n", " if not isinstance(user_sudo_events, pd.DataFrame) or user_sudo_events.empty:\n", - " display(HTML(f\"No sucessful sudo activity for {username}\"))\n", + " md(f\"

No sucessful sudo activity for {username}

\")\n", " else:\n", " user_sudo_hold = user_sudo_events\n", " user_sudo_commands = (user_sudo_events[['EventTime', 'CommandCall']].replace('', np.nan).groupby(['CommandCall']).count().dropna().style.set_table_attributes('width=900px, text-align=center').background_gradient(cmap='Reds', low=.5, high=1).format(\"{0:0>3.0f}\"))\n", " display(user_sudo_commands)\n", - " display(HTML(\"Select a sudo command to investigate in more detail\"))\n", - " cmd = widgets.Dropdown(options=user_sudo_events['CommandCall'].replace(\n", - " '', np.nan).dropna().unique().tolist(), description='Cmd:', disabled=False)\n", - " display(widgets.interactive(view_sudo, cmd=cmd))\n", + " md(\"Select a sudo command to investigate in more detail\")\n", + " display(nbwidgets.SelectItem(item_list=items, action=view_sudo))\n", "else:\n", " md(\"No user session selected\")" ] @@ -1240,15 +1204,15 @@ "execution_count": null, "metadata": { "ExecuteTime": { - "end_time": "2020-05-16T00:05:50.586733Z", - "start_time": "2020-05-16T00:05:50.564733Z" + "end_time": "2020-06-24T01:57:41.495503Z", + "start_time": "2020-06-24T01:57:41.474501Z" } }, "outputs": [], "source": [ "# If the user has sudo activity extract and IOCs from the logs and look them up in TI feeds\n", - "if user_sudo_hold is not None or user_sudo_hold is not isinstance(user_sudo_hold, pd.DataFrame) or user_sudo_hold.empty:\n", - " print(f\"No sudo messages data\")\n", + "if not isinstance(user_sudo_hold, pd.DataFrame) or user_sudo_hold.empty:\n", + " md(f\"No sudo messages data\")\n", "else:\n", " # Extract IOCs\n", " ioc_extractor = iocextract.IoCExtract()\n", @@ -1260,7 +1224,7 @@ " ioc_types=['ipv4', 'ipv6', 'dns', 'url', 'md5_hash', 'sha1_hash', 'sha256_hash'])\n", " if len(ioc_df) > 0:\n", " ioc_count = len(ioc_df[[\"IoCType\", \"Observable\"]].drop_duplicates())\n", - " display(HTML(f\"Found {ioc_count} IOCs\"))\n", + " md(f\"Found {ioc_count} IOCs\")\n", " ti_resps = tilookup.lookup_iocs(data=ioc_df[[\"IoCType\", \"Observable\"]].drop_duplicates(\n", " ).reset_index(), obs_col='Observable', ioc_type_col='IoCType')\n", " i = 0\n", @@ -1272,13 +1236,13 @@ " i += 1\n", " else:\n", " i += 1\n", - " display(HTML(f\"Found {len(ti_hits)} IoCs in Threat Intelligence\"))\n", + " md(f\"Found {len(ti_hits)} IoCs in Threat Intelligence\")\n", " for ioc in ti_hits:\n", - " display(HTML(f\"Messages containing IoC found in TI feed: {ioc}\"))\n", + " md(f\"Messages containing IoC found in TI feed: {ioc}\")\n", " display(user_sudo_hold[user_sudo_hold['SyslogMessage'].str.contains(\n", " ioc)][['TimeGenerated', 'SyslogMessage']])\n", " else:\n", - " display(HTML(\"No IoC patterns found in Syslog Message.\"))" + " md(\"No IoC patterns found in Syslog Message.\")" ] }, { @@ -1308,14 +1272,14 @@ "execution_count": null, "metadata": { "ExecuteTime": { - "end_time": "2020-05-16T00:05:57.509366Z", - "start_time": "2020-05-16T00:05:57.455366Z" + "end_time": "2020-06-24T01:57:45.323865Z", + "start_time": "2020-06-24T01:57:45.274865Z" } }, "outputs": [], "source": [ "# Get list of Applications\n", - "apps = all_syslog['ProcessName'].replace('', np.nan).dropna().unique().tolist()\n", + "apps = all_syslog_data['ProcessName'].replace('', np.nan).dropna().unique().tolist()\n", "system_apps = ['sudo', 'CRON', 'systemd-resolved', 'snapd',\n", " '50-motd-news', 'systemd-logind', 'dbus-deamon', 'crontab']\n", "if len(host_entity.Applications) > 0:\n", @@ -1323,7 +1287,7 @@ " installed_apps.extend(x for x in apps if x not in system_apps)\n", "\n", " # Pick Applications\n", - " app_select = nbwidgets.SelectString(description='Select sudo session to investigate: ',\n", + " app_select = nbwidgets.SelectItem(description='Select sudo session to investigate: ',\n", " item_list=installed_apps, width='75%', auto_display=True)\n", "else:\n", " display(HTML(\"No applications other than stand OS applications present\"))" @@ -1334,78 +1298,35 @@ "execution_count": null, "metadata": { "ExecuteTime": { - "end_time": "2020-05-16T00:06:05.972032Z", - "start_time": "2020-05-16T00:06:05.838032Z" + "end_time": "2020-06-24T01:57:51.258753Z", + "start_time": "2020-06-24T01:57:51.149753Z" } }, "outputs": [], "source": [ - "from bokeh.models import ColumnDataSource, RangeTool\n", - "from bokeh.plotting import figure, show, output_notebook\n", - "from bokeh.layouts import column\n", - "output_notebook()\n", "# Get all syslog relating to these Applications\n", "app = app_select.value\n", - "app_data = all_syslog.loc[all_syslog['ProcessName'] == app]\n", + "app_data = all_syslog_data[all_syslog_data['ProcessName'] == app].copy()\n", "\n", "# App log volume over time\n", "if isinstance(app_data, pd.DataFrame) and not app_data.empty:\n", " app_data_volume = app_data.set_index(\n", " \"TimeGenerated\").resample('5T').count()\n", - " source = ColumnDataSource(\n", - " data=dict(date=app_data_volume.index, count=app_data_volume['SyslogMessage']))\n", - " p = figure(plot_height=300, plot_width=900, tools=\"xpan\", toolbar_location=None,\n", - " x_axis_type=\"datetime\", x_axis_location=\"above\", y_minor_ticks=2,\n", - " title=\"Application syslog volume over time\",\n", - " background_fill_color=\"#efefef\", x_range=(app_data_volume.index[int(len(app_data_volume.index)*.33)], app_data_volume.index[int(len(app_data_volume.index)*.66)]))\n", - " p.line('date', 'count', source=source)\n", - " p.yaxis.axis_label = 'Message volume'\n", - " select = figure(title=\"Drag the middle and edges of the selection box to change the range above\",\n", - " plot_height=130, plot_width=900, y_range=p.y_range,\n", - " x_axis_type=\"datetime\", y_axis_type=None,\n", - " tools=\"\", toolbar_location=None, background_fill_color=\"#efefef\")\n", - " range_tool = RangeTool(x_range=p.x_range)\n", - " range_tool.overlay.fill_color = \"navy\"\n", - " range_tool.overlay.fill_alpha = 0.2\n", - " select.line('date', 'count', source=source)\n", - " select.ygrid.grid_line_color = None\n", - " select.add_tools(range_tool)\n", - " select.toolbar.active_multi = range_tool\n", - " show(column(p, select))\n", + " app_data_volume.reset_index(level=0, inplace=True)\n", + " app_data_volume.rename(columns={\"TenantId\" : \"NoOfLogMessages\"}, inplace=True)\n", + " nbdisplay.display_timeline_values(data=app_data_volume, y='NoOfLogMessages', source_columns=['NoOfLogMessages'], title=f\"{app} log volume over time\") \n", + " \n", " app_high_sev = app_data[app_data['SeverityLevel'].isin(\n", " ['emerg', 'alert', 'crit', 'err', 'warning'])]\n", - " if app_high_sev.empty:\n", - " print(f\"No high severity syslog messages for {app}\")\n", - " else:\n", - " app_high_sev = app_high_sev.set_index(\n", + " if isinstance(app_high_sev, pd.DataFrame) and not app_high_sev.empty:\n", + " app_hs_volume = app_high_sev.set_index(\n", " \"TimeGenerated\").resample('5T').count()\n", - " hs_source = ColumnDataSource(\n", - " data=dict(date=app_high_sev.index, count=app_high_sev['SyslogMessage']))\n", - " hs_p = figure(plot_height=300, plot_width=900, tools=\"xpan\", toolbar_location=None,\n", - " x_axis_type=\"datetime\", x_axis_location=\"above\", y_minor_ticks=2,\n", - " title=\"High Severity application syslog volume over time\",\n", - " background_fill_color=\"#FCF1CB\", x_range=(app_high_sev.index[int(len(app_high_sev.index)*.33)], app_high_sev.index[int(len(app_high_sev.index)*.66)]), y_range=(0, app_data_volume['SyslogMessage'].max()))\n", - " hs_p.line('date', 'count', source=hs_source, line_color='red')\n", - " hs_p.yaxis.axis_label = 'Message volume'\n", - " hs_select = figure(title=\"Drag the middle and edges of the selection box to change the range above\",\n", - " plot_height=130, plot_width=900, y_range=hs_p.y_range,\n", - " x_axis_type=\"datetime\", y_axis_type=None,\n", - " tools=\"\", toolbar_location=None, background_fill_color=\"#FCF1CB\")\n", - " hs_range_tool = RangeTool(x_range=hs_p.x_range)\n", - " hs_range_tool.overlay.fill_color = \"orange\"\n", - " hs_range_tool.overlay.fill_alpha = 0.2\n", - " hs_select.line('date', 'count', source=hs_source, line_color='red')\n", - " hs_select.ygrid.grid_line_color = None\n", - " hs_select.add_tools(hs_range_tool)\n", - " hs_select.toolbar.active_multi = hs_range_tool\n", - " show(column(hs_p, hs_select))\n", - "else:\n", - " display(HTML(\"No data for this application\"))\n", - "# Check for mallicious stuff\n", + " app_hs_volume.reset_index(level=0, inplace=True)\n", + " app_hs_volume.rename(columns={\"TenantId\" : \"NoOfLogMessages\"}, inplace=True)\n", + " nbdisplay.display_timeline_values(data=app_hs_volume, y='NoOfLogMessages', source_columns=['NoOfLogMessages'], title=f\"{app} high severity log volume over time\") \n", + "\n", "risky_messages = risky_cmd_line(events=app_data, log_type=\"Syslog\", cmd_field=\"SyslogMessage\")\n", - "if not risky_messages:\n", - " pass\n", - "else:\n", + "if risky_messages:\n", " print(risky_messages)" ] }, @@ -1422,8 +1343,8 @@ "execution_count": null, "metadata": { "ExecuteTime": { - "end_time": "2020-05-16T00:06:27.934589Z", - "start_time": "2020-05-16T00:06:27.885591Z" + "end_time": "2020-06-24T01:59:29.756566Z", + "start_time": "2020-06-24T01:59:29.702565Z" } }, "outputs": [], @@ -1444,8 +1365,8 @@ "execution_count": null, "metadata": { "ExecuteTime": { - "end_time": "2020-05-16T00:07:32.371549Z", - "start_time": "2020-05-16T00:06:33.042022Z" + "end_time": "2020-06-24T02:01:09.922496Z", + "start_time": "2020-06-24T02:00:19.315827Z" } }, "outputs": [], @@ -1453,6 +1374,7 @@ "audit_table = None\n", "app_audit_data = None\n", "app = app_select.value\n", + "process_tree_data = None\n", "regex = '.*audit.*\\_cl?'\n", "# Find the table with auditd data in and collect the data\n", "matches = ((re.match(regex, key, re.IGNORECASE)) for key in qry_prov.schema)\n", @@ -1485,12 +1407,6 @@ " )\n", " response = (input(\"Y/N\") or \"N\")\n", " \n", - "# app_audit_query = f\"\"\"{audit_table} \n", - "# | where TimeGenerated >= datetime({proc_invest_times.start}) \n", - "# | where TimeGenerated <= datetime({proc_invest_times.end}) \n", - "# | where Computer == '{hostname}'\n", - "# | where RawData contains \"sshd\"\n", - "# \"\"\"\n", " if (\n", " (count_check['count_'].iloc[0] < 100000)\n", " or (count_check['count_'].iloc[0] > 100000\n", @@ -1506,7 +1422,7 @@ " data=audit_data\n", " )\n", " \n", - " process_tree = auditdextract.generate_process_tree(audit_data=audit_events)\n", + " process_tree_data = auditdextract.generate_process_tree(audit_data=audit_events)\n", " plot_lim = 1000\n", " if len(process_tree) > plot_lim:\n", " md_warn(f\"More than {plot_lim} processes to plot, limiting to top {plot_lim}.\")\n", @@ -1515,11 +1431,13 @@ " process_tree.mp_process_tree.plot(legend_col=\"exe\")\n", " size = audit_events.size\n", " print(f\"Collected {size} rows of data\")\n", + " else:\n", + " md(\"No audit events avalaible\")\n", " else:\n", " print(\"Resize query window\")\n", " \n", "else:\n", - " display(HTML(\"No audit events avalaible\"))" + " md(\"No audit events avalaible\")" ] }, { @@ -1527,37 +1445,27 @@ "execution_count": null, "metadata": { "ExecuteTime": { - "end_time": "2020-05-16T00:07:41.084063Z", - "start_time": "2020-05-16T00:07:40.662062Z" + "end_time": "2020-06-24T02:01:43.252644Z", + "start_time": "2020-06-24T02:01:42.969634Z" } }, "outputs": [], "source": [ - "display(HTML(f\"

Process tree for {app}

\"))\n", - "#Generate process tree with auditd data around the selected process\n", - "# from msticpy.sectools import auditdextract\n", - "# if isinstance(audit_events, pd.DataFrame) and not audit_events.empty:\n", - "# audit_events = auditdextract.extract_events_to_df(\n", - "# data=app_audit_data, input_column='RawData')\n", - "# if not audit_events[audit_events[\"exe\"].str.contains(app, na=False)].empty:\n", - "# procs = auditdextract.cluster_auditd_processes(audit_data=audit_events, app=app)\n", - "# display(Markdown(f'{len(procs)} process events'))\n", - "# process_tree = auditdextract.generate_process_tree(audit_data = audit_events, processes = procs)\n", - "# nbdisplay.display_process_tree(process_tree)\n", - "\n", - "# else:\n", - "# display(f\"No process tree data avaliable for {app}\")\n", - "# process_tree = None\n", - "if not process_tree[process_tree[\"exe\"].str.contains(app, na=False)].empty: \n", - " app_roots = process_tree[process_tree[\"exe\"].str.contains(app)].apply(lambda x: ptree.get_root(process_tree, x), axis=1)\n", - " trees = []\n", - " for root in app_roots[\"source_index\"].unique():\n", - " trees.append(process_tree[process_tree[\"path\"].str.startswith(root)])\n", - " app_proc_trees = pd.concat(trees)\n", - " app_proc_trees.mp_process_tree.plot(legend_col=\"exe\", show_table=True)\n", + "md(f\"

Process tree for {app}

\")\n", + "if process_tree_data is not None:\n", + " process_tree_df = process_tree_data[process_tree_data[\"exe\"].str.contains(app, na=False)].copy()\n", + " if not process_tree_df.empty: \n", + " app_roots = process_tree_data.apply(lambda x: ptree.get_root(process_tree_data, x), axis=1)\n", + " trees = []\n", + " for root in app_roots[\"source_index\"].unique():\n", + " trees.append(process_tree_data[process_tree_data[\"path\"].str.startswith(root)])\n", + " app_proc_trees = pd.concat(trees)\n", + " app_proc_trees.mp_process_tree.plot(legend_col=\"exe\", show_table=True)\n", + " else:\n", + " display(f\"No process tree data avaliable for {app}\")\n", + " process_tree = None\n", "else:\n", - " display(f\"No process tree data avaliable for {app}\")\n", - " process_tree = None" + " md(\"No data avaliable to build process tree\")" ] }, { @@ -1573,8 +1481,8 @@ "execution_count": null, "metadata": { "ExecuteTime": { - "end_time": "2020-05-16T00:08:25.051755Z", - "start_time": "2020-05-16T00:08:03.604431Z" + "end_time": "2020-06-24T02:01:50.058394Z", + "start_time": "2020-06-24T02:01:49.715903Z" } }, "outputs": [], @@ -1589,7 +1497,7 @@ " ioc_types=['ipv4', 'ipv6', 'dns', 'url',\n", " 'md5_hash', 'sha1_hash', 'sha256_hash'])\n", "\n", - "if process_tree is not None and not process_tree.empty:\n", + "if process_tree_data is not None and not process_tree_data.empty:\n", " app_process_tree = app_proc_trees.dropna(subset=['cmdline'])\n", " audit_ioc_df = ioc_extractor.extract(data=app_process_tree,\n", " columns=['cmdline'],\n", @@ -1620,7 +1528,7 @@ " display(app_data[app_data['SyslogMessage'].str.contains(\n", " ioc)][['TimeGenerated', 'SyslogMessage']])\n", "else:\n", - " display(Markdown(\"### No IoC patterns found in Syslog Message.\"))" + " md(\"

No IoC patterns found in Syslog Message.

\")" ] }, { @@ -1653,8 +1561,8 @@ "execution_count": null, "metadata": { "ExecuteTime": { - "end_time": "2020-05-16T00:08:58.079873Z", - "start_time": "2020-05-16T00:08:41.302405Z" + "end_time": "2020-06-24T02:02:21.843587Z", + "start_time": "2020-06-24T02:02:11.835821Z" } }, "outputs": [], @@ -1663,7 +1571,7 @@ "ioc_extractor = iocextract.IoCExtract()\n", "os_family = host_entity.OSType if host_entity.OSType else 'Linux'\n", "print('Finding IP Addresses this may take a few minutes.......')\n", - "syslog_ips = ioc_extractor.extract(data=syslog_data,\n", + "syslog_ips = ioc_extractor.extract(data=all_syslog_data,\n", " columns=['SyslogMessage'],\n", " os_family=os_family,\n", " ioc_types=['ipv4', 'ipv6'])\n", @@ -1692,7 +1600,7 @@ " IPs = syslog_ips[['IoCType', 'Observable']].drop_duplicates('Observable')\n", " display(f\"Found {len(IPs)} IP Addresses assoicated with the host\")\n", "else:\n", - " display(Markdown(\"### No IoC patterns found in Syslog Message.\"))\n", + " md(\"### No IoC patterns found in Syslog Message.\")\n", " \n", "if az_ips is not None:\n", " ips = az_ips['PublicIps'].drop_duplicates(\n", @@ -1723,7 +1631,7 @@ " 'FlowType', 'AllExtIPs', 'L7Protocol', 'FlowDirection'],\n", " height=300)\n", "else:\n", - " print('No Azure network data for specified time range.')" + " md('

No Azure network data for specified time range.

')" ] }, { @@ -1740,17 +1648,12 @@ "execution_count": null, "metadata": { "ExecuteTime": { - "end_time": "2020-05-16T00:09:14.771199Z", - "start_time": "2020-05-16T00:09:04.783620Z" + "end_time": "2020-06-24T02:02:28.305211Z", + "start_time": "2020-06-24T02:02:27.707241Z" } }, "outputs": [], "source": [ - "\n", - "from functools import lru_cache\n", - "from ipwhois import IPWhois\n", - "from ipaddress import ip_address\n", - "\n", "#Lookup each IP in whois data and extract the ASN\n", "@lru_cache(maxsize=1024)\n", "def whois_desc(ip_lookup, progress=False):\n", @@ -1793,14 +1696,15 @@ "execution_count": null, "metadata": { "ExecuteTime": { - "end_time": "2020-05-16T00:09:26.062183Z", - "start_time": "2020-05-16T00:09:26.045183Z" + "end_time": "2020-06-24T02:03:09.018331Z", + "start_time": "2020-06-24T02:03:08.996333Z" } }, "outputs": [], "source": [ "# For every IP associated with the selected ASN look them up in TI feeds\n", "ip_invest_list = None\n", + "ip_selection = None\n", "for ASN in selection.value:\n", " if ip_invest_list is None:\n", " ip_invest_list = (IP_ASN[IP_ASN[\"ASN\"] == ASN]['IPs'].tolist())\n", @@ -1815,7 +1719,7 @@ " ti_hits = []\n", " while i < len(ti_resps):\n", " if ti_resps['Details'][i]['pulse_count'] > 0:\n", - " ti_hits.append(ti_resps['IoC'][i])\n", + " ti_hits.append(ti_resps['Ioc'][i])\n", " i += 1\n", " else:\n", " i += 1\n", @@ -1826,9 +1730,8 @@ " #Show IPs found in TI feeds for further investigation \n", " if len(ioc_ip_list) > 0: \n", " display(HTML(\"Select an IP whcih appeared in TI to investigate further\"))\n", - " ip_selection = nbwidgets.SelectString(description='Select IP Address to investigate: ', item_list = ioc_ip_list, width='95%', auto_display=True)\n", - " else: \n", - " ip_selection = None\n", + " ip_selection = nbwidgets.SelectItem(description='Select IP Address to investigate: ', item_list = ioc_ip_list, width='95%', auto_display=True)\n", + " \n", "else:\n", " md(\"No IPs to investigate\")" ] @@ -1838,8 +1741,8 @@ "execution_count": null, "metadata": { "ExecuteTime": { - "end_time": "2020-05-16T00:09:28.758531Z", - "start_time": "2020-05-16T00:09:28.740532Z" + "end_time": "2020-06-24T02:03:11.613331Z", + "start_time": "2020-06-24T02:03:11.600332Z" } }, "outputs": [], @@ -1847,13 +1750,13 @@ "# Get all syslog for the IPs\n", "if ip_selection is not None:\n", " display(HTML(\"Syslog data associated with this IP Address\"))\n", - " sys_hits = all_syslog[all_syslog['SyslogMessage'].str.contains(\n", + " sys_hits = all_syslog_data[all_syslog_data['SyslogMessage'].str.contains(\n", " ip_selection.value)]\n", " display(sys_hits)\n", " os_family = host_entity.OSType if host_entity.OSType else 'Linux'\n", "\n", " display(HTML(\"TI result for this IP Address\"))\n", - " display(ti_resps[ti_resps['IoC'] == ip_selection.value])\n", + " display(ti_resps[ti_resps['Ioc'] == ip_selection.value])\n", "else:\n", " md(\"No IP address selected\")" ] diff --git a/Entity Explorer - Windows Host.ipynb b/Entity Explorer - Windows Host.ipynb index b5f03c2d..edd29f8f 100644 --- a/Entity Explorer - Windows Host.ipynb +++ b/Entity Explorer - Windows Host.ipynb @@ -5,7 +5,7 @@ "metadata": {}, "source": [ " # Windows Host Explorer\n", - " <details>\n", + "
\n", "  Details...\n", "\n", " **Notebook Version:** 1.0
\n", @@ -19,7 +19,7 @@ " **Data Sources Required**:\n", " - Log Analytics - SecurityAlert, SecurityEvent (EventIDs 4688 and 4624/25), AzureNetworkAnalytics_CL, Heartbeat\n", " - (Optional) - VirusTotal, AlienVault OTX, IBM XForce, Open Page Rank, (all require accounts and API keys)\n", - " </details>\n", + "
\n", "\n", " Brings together a series of queries and visualizations to help you determine the security state of the Windows host or virtual machine that you are investigating.\n" ] @@ -63,12 +63,7 @@ { "cell_type": "code", "execution_count": null, - "metadata": { - "ExecuteTime": { - "end_time": "2020-05-15T23:27:18.623464Z", - "start_time": "2020-05-15T23:27:15.156160Z" - } - }, + "metadata": {}, "outputs": [], "source": [ "from pathlib import Path\n", @@ -78,7 +73,7 @@ "from IPython.display import display, HTML, Markdown\n", "\n", "REQ_PYTHON_VER=(3, 6)\n", - "REQ_MSTICPY_VER=(0, 5, 0)\n", + "REQ_MSTICPY_VER=(0, 6, 0)\n", "\n", "display(HTML(\"

Starting Notebook setup...

\"))\n", "if Path(\"./utils/nb_check.py\").is_file():\n", @@ -95,6 +90,9 @@ " import msticpy\n", " check_mp_ver(REQ_MSTICPY_VER)\n", " \n", + "\n", + "# If not using Azure Notebooks, install msticpy with\n", + "# !pip install msticpy\n", "from msticpy.nbtools import nbinit\n", "nbinit.init_notebook(\n", " namespace=globals(),\n", @@ -107,7 +105,7 @@ "metadata": {}, "source": [ " ## Get WorkspaceId and Authenticate to Azure Sentinel\n", - " <details>\n", + "
\n", " Details...\n", " If you are using user/device authentication, run the following cell.\n", " - Click the 'Copy code to clipboard and authenticate' button.\n", @@ -127,18 +125,13 @@ " Note: you may occasionally see a JavaScript error displayed at the end of the authentication - you can safely ignore this.
\n", " On successful authentication you should see a ```popup schema``` button.\n", " To find your Workspace Id go to [Log Analytics](https://ms.portal.azure.com/#blade/HubsExtension/Resources/resourceType/Microsoft.OperationalInsights%2Fworkspaces). Look at the workspace properties to find the ID.\n", - " </details>" + "
" ] }, { "cell_type": "code", "execution_count": null, - "metadata": { - "ExecuteTime": { - "end_time": "2020-05-15T23:27:22.847608Z", - "start_time": "2020-05-15T23:27:22.839609Z" - } - }, + "metadata": {}, "outputs": [], "source": [ "#See if we have an Azure Sentinel Workspace defined in our config file, if not let the user specify Workspace and Tenant IDs\n", @@ -166,12 +159,7 @@ { "cell_type": "code", "execution_count": null, - "metadata": { - "ExecuteTime": { - "end_time": "2020-05-15T23:28:39.796803Z", - "start_time": "2020-05-15T23:27:27.080209Z" - } - }, + "metadata": {}, "outputs": [], "source": [ "if config is False:\n", @@ -185,12 +173,7 @@ }, { "cell_type": "markdown", - "metadata": { - "ExecuteTime": { - "end_time": "2019-10-31T23:37:18.211230Z", - "start_time": "2019-10-31T23:37:18.204259Z" - } - }, + "metadata": {}, "source": [ "### Authentication and Configuration Problems\n", "\n", @@ -222,12 +205,7 @@ { "cell_type": "code", "execution_count": null, - "metadata": { - "ExecuteTime": { - "end_time": "2020-05-15T23:28:41.610484Z", - "start_time": "2020-05-15T23:28:41.598485Z" - } - }, + "metadata": {}, "outputs": [], "source": [ "host_text = widgets.Text(\n", @@ -239,12 +217,7 @@ { "cell_type": "code", "execution_count": null, - "metadata": { - "ExecuteTime": { - "end_time": "2020-05-15T23:28:46.198826Z", - "start_time": "2020-05-15T23:28:46.144827Z" - } - }, + "metadata": {}, "outputs": [], "source": [ "query_times = nbwidgets.QueryTime(units=\"day\", max_before=20, before=5, max_after=1)\n", @@ -254,12 +227,7 @@ { "cell_type": "code", "execution_count": null, - "metadata": { - "ExecuteTime": { - "end_time": "2020-05-15T23:28:58.158859Z", - "start_time": "2020-05-15T23:28:55.922817Z" - } - }, + "metadata": {}, "outputs": [], "source": [ "# Get single event - try process creation\n", @@ -271,7 +239,7 @@ ")\n", "if len(matching_hosts_df) > 1:\n", " print(f\"Multiple matches for '{host_text.value}'. Please select a host from the list.\")\n", - " choose_host = nbwidgets.SelectString(\n", + " choose_host = nbwidgets.SelectItem(\n", " item_list=list(matching_hosts_df[\"Computer\"].values),\n", " description=\"Select the host.\",\n", " auto_display=True,\n", @@ -286,12 +254,7 @@ { "cell_type": "code", "execution_count": null, - "metadata": { - "ExecuteTime": { - "end_time": "2020-05-15T23:29:12.506439Z", - "start_time": "2020-05-15T23:29:01.493356Z" - } - }, + "metadata": {}, "outputs": [], "source": [ "if not host_name:\n", @@ -393,12 +356,7 @@ { "cell_type": "code", "execution_count": null, - "metadata": { - "ExecuteTime": { - "end_time": "2020-05-15T23:29:12.712441Z", - "start_time": "2020-05-15T23:29:12.667444Z" - } - }, + "metadata": {}, "outputs": [], "source": [ "ra_query_times = nbwidgets.QueryTime(\n", @@ -414,12 +372,7 @@ { "cell_type": "code", "execution_count": null, - "metadata": { - "ExecuteTime": { - "end_time": "2020-05-15T23:29:15.961693Z", - "start_time": "2020-05-15T23:29:14.578094Z" - } - }, + "metadata": {}, "outputs": [], "source": [ "\n", @@ -467,18 +420,13 @@ { "cell_type": "code", "execution_count": null, - "metadata": { - "ExecuteTime": { - "end_time": "2020-05-15T23:29:19.615661Z", - "start_time": "2020-05-15T23:29:19.546661Z" - } - }, + "metadata": {}, "outputs": [], "source": [ "def disp_full_alert(alert):\n", " global related_alert\n", " related_alert = SecurityAlert(alert)\n", - " nbdisplay.display_alert(related_alert, show_entities=True)\n", + " return nbdisplay.format_alert(related_alert, show_entities=True)\n", "\n", "recenter_wgt = widgets.Checkbox(\n", " value=True,\n", @@ -490,7 +438,7 @@ " related_alerts[\"CompromisedEntity\"] = related_alerts[\"Computer\"]\n", " display(Markdown(\"### Click on alert to view details.\"))\n", " display(recenter_wgt)\n", - " rel_alert_select = nbwidgets.AlertSelector(\n", + " rel_alert_select = nbwidgets.SelectAlert(\n", " alerts=related_alerts,\n", " action=disp_full_alert,\n", " )\n", @@ -510,12 +458,7 @@ { "cell_type": "code", "execution_count": null, - "metadata": { - "ExecuteTime": { - "end_time": "2020-05-15T23:29:25.409590Z", - "start_time": "2020-05-15T23:29:25.359585Z" - } - }, + "metadata": {}, "outputs": [], "source": [ "# set the origin time to the time of our alert\n", @@ -543,12 +486,7 @@ { "cell_type": "code", "execution_count": null, - "metadata": { - "ExecuteTime": { - "end_time": "2020-05-15T23:29:29.150066Z", - "start_time": "2020-05-15T23:29:27.337759Z" - } - }, + "metadata": {}, "outputs": [], "source": [ "host_logons = qry_prov.WindowsSecurity.list_host_logons(\n", @@ -605,12 +543,7 @@ { "cell_type": "code", "execution_count": null, - "metadata": { - "ExecuteTime": { - "end_time": "2020-05-15T23:29:32.494051Z", - "start_time": "2020-05-15T23:29:31.042487Z" - } - }, + "metadata": {}, "outputs": [], "source": [ "failedLogons = qry_prov.WindowsSecurity.list_host_logon_failures(\n", @@ -651,12 +584,7 @@ { "cell_type": "code", "execution_count": null, - "metadata": { - "ExecuteTime": { - "end_time": "2020-05-15T23:29:35.995808Z", - "start_time": "2020-05-15T23:29:35.834809Z" - } - }, + "metadata": {}, "outputs": [], "source": [ "if not failedLogons.empty:\n", @@ -695,12 +623,7 @@ { "cell_type": "code", "execution_count": null, - "metadata": { - "ExecuteTime": { - "end_time": "2020-05-15T23:29:47.340949Z", - "start_time": "2020-05-15T23:29:38.308340Z" - } - }, + "metadata": {}, "outputs": [], "source": [ "md(f\"Collecting Windows Event Logs for {host_entity.HostName}, this may take a few minutes...\")\n", @@ -752,23 +675,18 @@ "For events that you want to look at in more detail you can parse out the full EventData field (containing all fields of the original event). The `parse_event_data` function below does that - transforming the EventData XML into a dictionary of property/value pairs). The `expand_event_properties` function takes this dictionary and transforms into columns in the output DataFrame.\n", "\n", "
\n", - "<details>\n", + "
\n", "  More details...\n", "You can do this for multiple event types in a single pass but, dependng on the schema of each event you may end up with a lot of sparsely populated columns. E.g. suppose EventID 1 has EventData fields A, B and C and EventID 2 has fields A, D, E. If you parse both IDs you'll will end up with a DataFrame with columns A, B, C, D and E with contents populated only for the rows that with corresponding data.\n", "\n", "We recommend that you process batches of related event types (e.g. all user account change events) that have similar sets of fields to keep the output DataFrame manageable.\n", - "</details>" + "
" ] }, { "cell_type": "code", "execution_count": null, - "metadata": { - "ExecuteTime": { - "end_time": "2020-05-15T23:29:58.968325Z", - "start_time": "2020-05-15T23:29:58.080325Z" - } - }, + "metadata": {}, "outputs": [], "source": [ "# Function to convert EventData XML into dictionary and\n", @@ -836,12 +754,7 @@ { "cell_type": "code", "execution_count": null, - "metadata": { - "ExecuteTime": { - "end_time": "2020-05-15T23:30:02.077311Z", - "start_time": "2020-05-15T23:30:01.842315Z" - } - }, + "metadata": {}, "outputs": [], "source": [ "# Get a full list of Windows Security Events\n", @@ -882,12 +795,7 @@ { "cell_type": "code", "execution_count": null, - "metadata": { - "ExecuteTime": { - "end_time": "2019-11-01T19:38:42.155265Z", - "start_time": "2019-11-01T19:38:42.104295Z" - } - }, + "metadata": {}, "outputs": [], "source": [ "# populate actual events IDs to select from\n", @@ -906,12 +814,7 @@ { "cell_type": "code", "execution_count": null, - "metadata": { - "ExecuteTime": { - "end_time": "2019-11-01T19:38:48.430726Z", - "start_time": "2019-11-01T19:38:48.412764Z" - } - }, + "metadata": {}, "outputs": [], "source": [ "col_names = ['TimeGenerated', 'Account', 'AccountType',\n", @@ -945,12 +848,7 @@ { "cell_type": "code", "execution_count": null, - "metadata": { - "ExecuteTime": { - "end_time": "2020-05-15T23:30:15.514844Z", - "start_time": "2020-05-15T23:30:14.697818Z" - } - }, + "metadata": {}, "outputs": [], "source": [ "from msticpy.sectools.eventcluster import (\n", @@ -999,12 +897,7 @@ { "cell_type": "code", "execution_count": null, - "metadata": { - "ExecuteTime": { - "end_time": "2020-05-15T23:30:17.480349Z", - "start_time": "2020-05-15T23:30:17.426340Z" - } - }, + "metadata": {}, "outputs": [], "source": [ "import re\n", @@ -1032,17 +925,11 @@ " return host_logons.query(\"TargetUserName == @acct and LogonType == @logon_type\")\n", "\n", "\n", - "# Create an Output widget to show the Logon Details\n", - "w_output = widgets.Output(layout={\"border\": \"1px solid black\"})\n", - "\n", - "\n", "def show_logon(idx):\n", - " w_output.clear_output()\n", - " with w_output:\n", - " nbdisplay.display_logon_data(pd.DataFrame(clus_logons.loc[idx]).T)\n", + " return nbdisplay.format_logon(pd.DataFrame(clus_logons.loc[idx]).T)\n", "\n", "\n", - "logon_wgt = nbwidgets.SelectString(\n", + "logon_wgt = nbwidgets.SelectItem(\n", " description=\"Select logon cluster to examine\",\n", " item_dict=dist_logons,\n", " action=show_logon,\n", @@ -1050,11 +937,7 @@ " width=\"100%\",\n", " auto_display=True,\n", ")\n", - "display(w_output)\n", - "# Display the first item on first view\n", - "top_item = next(iter(dist_logons.values()))\n", - "with w_output:\n", - " nbdisplay.display_logon_data(pd.DataFrame(clus_logons.loc[top_item]).T)\n" + "\n" ] }, { @@ -1071,12 +954,7 @@ { "cell_type": "code", "execution_count": null, - "metadata": { - "ExecuteTime": { - "end_time": "2020-05-15T23:30:31.264572Z", - "start_time": "2020-05-15T23:30:31.213572Z" - } - }, + "metadata": {}, "outputs": [], "source": [ "# set the origin time to start at the first logon in our set\n", @@ -1108,12 +986,7 @@ { "cell_type": "code", "execution_count": null, - "metadata": { - "ExecuteTime": { - "end_time": "2020-05-15T23:30:56.847458Z", - "start_time": "2020-05-15T23:30:36.409635Z" - } - }, + "metadata": {}, "outputs": [], "source": [ "from msticpy.sectools.eventcluster import dbcluster_events, add_process_features\n", @@ -1195,12 +1068,7 @@ { "cell_type": "code", "execution_count": null, - "metadata": { - "ExecuteTime": { - "end_time": "2020-05-15T23:31:02.979872Z", - "start_time": "2020-05-15T23:31:02.675871Z" - } - }, + "metadata": {}, "outputs": [], "source": [ "# Display process timeline for 75% percentile rarest scores\n", @@ -1234,12 +1102,7 @@ { "cell_type": "code", "execution_count": null, - "metadata": { - "ExecuteTime": { - "end_time": "2020-05-15T23:31:09.344452Z", - "start_time": "2020-05-15T23:31:09.280062Z" - } - }, + "metadata": {}, "outputs": [], "source": [ "def view_logon_sess(logon_id=\"\"):\n", @@ -1254,7 +1117,7 @@ " sess_procs = procs_with_cluster.query(\"SubjectLogonId == @logon_id\")[\n", " [\"NewProcessName\", \"CommandLine\", \"SubjectLogonId\", \"ClusterSize\"]\n", " ].drop_duplicates()\n", - " display(sess_procs)\n", + " return sess_procs\n", "\n", "sessions = list(process_rarity\n", " .sort_values(\"Rarity\", ascending=False)\n", @@ -1267,7 +1130,7 @@ " **WIDGET_DEFAULTS,\n", ")\n", "display(all_procs)\n", - "logon_wgt = nbwidgets.SelectString(\n", + "logon_wgt = nbwidgets.SelectItem(\n", " description=\"Select logon session to examine\",\n", " item_dict={label: val for label, val in sessions},\n", " height=\"300px\",\n", @@ -1295,15 +1158,10 @@ { "cell_type": "code", "execution_count": null, - "metadata": { - "ExecuteTime": { - "end_time": "2019-10-19T01:18:41.70951Z", - "start_time": "2019-10-19T01:18:41.686523Z" - } - }, + "metadata": {}, "outputs": [], "source": [ - "logon_wgt2 = nbwidgets.SelectString(\n", + "logon_wgt2 = nbwidgets.SelectItem(\n", " description=\"Select logon cluster to examine\",\n", " item_dict=dist_logons,\n", " height=\"200px\",\n", @@ -1328,12 +1186,7 @@ { "cell_type": "code", "execution_count": null, - "metadata": { - "ExecuteTime": { - "end_time": "2019-10-19T01:18:55.502417Z", - "start_time": "2019-10-19T01:18:55.450467Z" - } - }, + "metadata": {}, "outputs": [], "source": [ "selected_logon_cluster = get_selected_logon_cluster(logon_wgt2.value)\n", @@ -1453,12 +1306,7 @@ { "cell_type": "code", "execution_count": null, - "metadata": { - "ExecuteTime": { - "end_time": "2019-09-12T22:34:30.885408Z", - "start_time": "2019-09-12T22:34:30.882411Z" - } - }, + "metadata": {}, "outputs": [], "source": [ "# Use this to search all process commandlines\n", @@ -1474,12 +1322,7 @@ { "cell_type": "code", "execution_count": null, - "metadata": { - "ExecuteTime": { - "end_time": "2020-05-15T23:31:26.280399Z", - "start_time": "2020-05-15T23:31:25.106400Z" - } - }, + "metadata": {}, "outputs": [], "source": [ "selected_tgt_logon = selected_logon[\"TargetUserName\"].iat[0]\n", @@ -1514,7 +1357,7 @@ "metadata": {}, "source": [ " ## If any Base64 encoded strings, decode and search for IoCs in the results.\n", - " <details>\n", + "
\n", "  Details...\n", " This section looks for base64 encoded strings within the data - this is a common way of hiding attacker intent. It attempts to decode any strings that look like base64. Additionally, if the base64 decode operation returns any items that look like a base64 encoded string or file, a gzipped binary sequence, a zipped or tar archive, it will attempt to extract the contents before searching for potentially interesting IoC observables within the decoded data.\n", "\n", @@ -1535,18 +1378,13 @@ " - printable_bytes - printable version of input_bytes as a string of \\xNN values\n", " - src_index - the index of the row in the input dataframe from which the data came.\n", " - full_decoded_string - the full decoded string with any decoded replacements. This is only really useful for top-level items, since nested items will only show the 'full' string representing the child fragment.\n", - " </details>" + "
" ] }, { "cell_type": "code", "execution_count": null, - "metadata": { - "ExecuteTime": { - "end_time": "2020-05-15T23:31:27.893142Z", - "start_time": "2020-05-15T23:31:27.880144Z" - } - }, + "metadata": {}, "outputs": [], "source": [ "\n", @@ -1610,12 +1448,7 @@ { "cell_type": "code", "execution_count": null, - "metadata": { - "ExecuteTime": { - "end_time": "2020-05-15T23:31:36.912323Z", - "start_time": "2020-05-15T23:31:30.600559Z" - } - }, + "metadata": {}, "outputs": [], "source": [ "ti_lookup = TILookup()\n", @@ -1631,12 +1464,7 @@ { "cell_type": "code", "execution_count": null, - "metadata": { - "ExecuteTime": { - "end_time": "2020-05-15T23:31:47.445775Z", - "start_time": "2020-05-15T23:31:37.910324Z" - } - }, + "metadata": {}, "outputs": [], "source": [ "iocs_to_check = (ioc_df[ioc_df[\"Observable\"].isin(ioc_ss.selected_items)]\n", @@ -1656,12 +1484,7 @@ { "cell_type": "code", "execution_count": null, - "metadata": { - "ExecuteTime": { - "end_time": "2020-05-15T23:32:07.424166Z", - "start_time": "2020-05-15T23:32:07.379166Z" - } - }, + "metadata": {}, "outputs": [], "source": [ "ip_q_times = nbwidgets.QueryTime(\n", @@ -1687,12 +1510,7 @@ { "cell_type": "code", "execution_count": null, - "metadata": { - "ExecuteTime": { - "end_time": "2020-05-15T23:32:15.750114Z", - "start_time": "2020-05-15T23:32:09.049085Z" - } - }, + "metadata": {}, "outputs": [], "source": [ "if \"AzureNetworkAnalytics_CL\" not in table_index:\n", @@ -1740,12 +1558,7 @@ { "cell_type": "code", "execution_count": null, - "metadata": { - "ExecuteTime": { - "end_time": "2020-05-15T23:32:17.381114Z", - "start_time": "2020-05-15T23:32:17.067117Z" - } - }, + "metadata": {}, "outputs": [], "source": [ "if az_net_comms_df is not None and not az_net_comms_df.empty:\n", @@ -1779,12 +1592,7 @@ { "cell_type": "code", "execution_count": null, - "metadata": { - "ExecuteTime": { - "end_time": "2020-05-15T23:32:18.693115Z", - "start_time": "2020-05-15T23:32:18.586117Z" - } - }, + "metadata": {}, "outputs": [], "source": [ "if az_net_comms_df is not None and not az_net_comms_df.empty:\n", @@ -1840,12 +1648,7 @@ { "cell_type": "code", "execution_count": null, - "metadata": { - "ExecuteTime": { - "end_time": "2020-05-15T23:33:03.907521Z", - "start_time": "2020-05-15T23:32:26.377680Z" - } - }, + "metadata": {}, "outputs": [], "source": [ "# WHOIS lookup function\n", @@ -1932,13 +1735,7 @@ { "cell_type": "code", "execution_count": null, - "metadata": { - "ExecuteTime": { - "end_time": "2020-05-15T23:33:05.240520Z", - "start_time": "2020-05-15T23:33:05.179518Z" - }, - "scrolled": true - }, + "metadata": {}, "outputs": [], "source": [ "all_asns = list(flow_sum_df[\"DestASN\"].unique()) + list(flow_sum_df[\"SourceASN\"].unique())\n", @@ -1956,13 +1753,7 @@ { "cell_type": "code", "execution_count": null, - "metadata": { - "ExecuteTime": { - "end_time": "2020-05-15T23:33:20.587377Z", - "start_time": "2020-05-15T23:33:06.602518Z" - }, - "scrolled": true - }, + "metadata": {}, "outputs": [], "source": [ "from itertools import chain\n", @@ -2007,12 +1798,7 @@ { "cell_type": "code", "execution_count": null, - "metadata": { - "ExecuteTime": { - "end_time": "2019-10-19T01:25:26.845038Z", - "start_time": "2019-10-19T01:25:26.778039Z" - } - }, + "metadata": {}, "outputs": [], "source": [ "def format_ip_entity(row, ip_col):\n", @@ -2124,6 +1910,7 @@ "metadata": { "file_extension": ".py", "hide_input": false, + "history": [], "kernelspec": { "display_name": "Python 3.6", "language": "python", @@ -2181,6 +1968,7 @@ "toc_section_display": true, "toc_window_display": true }, + "uuid": "752d7f6a-d842-43cc-b46d-4d9e9a2c1160", "varInspector": { "cols": { "lenName": 16, @@ -2217,14 +2005,7 @@ ], "window_display": false }, - "version": 3, - "widgets": { - "application/vnd.jupyter.widget-state+json": { - "state": {}, - "version_major": 2, - "version_minor": 0 - } - } + "version": 3 }, "nbformat": 4, "nbformat_minor": 4 diff --git a/Getting Started with Azure Sentinel Notebooks.ipynb b/Getting Started with Azure Sentinel Notebooks.ipynb deleted file mode 100644 index 01e2c737..00000000 --- a/Getting Started with Azure Sentinel Notebooks.ipynb +++ /dev/null @@ -1,953 +0,0 @@ -{ - "cells": [ - { - "metadata": {}, - "cell_type": "markdown", - "source": [ - "# Getting Started with Azure Notebooks and Azure Sentinel\n", - "**Notebook Version:** 1.0
\n", - " **Python Version:** Python 3.6 (including Python 3.6 - AzureML)
\n", - " **Required Packages**:
\n", - " **Platforms Supported**:\n", - " - Azure Notebooks Free Compute\n", - " - Azure Notebooks DSVM\n", - " - OS Independent\n", - "\n", - "**Data Sources Required**:\n", - " - Log Analytics - SiginLogs (Optional)\n", - " - VirusTotal\n", - " - MaxMind\n", - " \n", - " \n", - "This notebook takes you through the basics needed to get started with Azure Notebooks and Azure Sentinel, and how to perform the basic actions of data acquisition, data enrichment, data analysis, and data visualization. These actions are the building blocks of threat hunting with notebooks and are useful to understand before running more complex notebooks. This notebook only lightly covers each topic but includes 'learn more' sections to provide you with the resource to deep dive into each of these topics. \n", - "\n", - "This notebook assumes that you are running this in an Azure Notebooks environment, however it will work in other Jupyter environments.\n", - "\n", - "**Note:**\n", - "This notebooks uses SigninLogs from your Azure Sentinel Workspace. If you are not yet collecting SigninLogs configure this connector in the Azure Sentinel portal before running this notebook.\n", - "This notebook also uses the VirusTotal API for data enrichment, for this you will require an API key which can be obtained by signing up for a free [VirusTotal community account](\"https://www.virustotal.com/gui/join-us\")\n" - ] - }, - { - "metadata": {}, - "cell_type": "markdown", - "source": [ - "---\n", - "## What is a Jupyter notebook?\n", - "You are currently reading a Jupyter notebook. [Jupyter](http://jupyter.org/) is an interactive development and data manipulation environment presented in a browser. Using Jupyter you can create documents, called Notebooks. These documents are made up of cells that contain interactive code, alongside that code's output, and other items such as text and images (what you are looking at now is a cell of Markdown text).\n", - "\n", - "The name, Jupyter, comes from the core supported programming languages that it supports: Julia, Python, and R. Whilst you can use any of these languages we are going to use Python in this notebook, in addition the notebooks that come with Azure Sentinel are all written in Python. Whilst there are pros, and cons to each language Python is a well-established language that has a large number of materials and libraries well suited for data analysis and security investigation, making it ideal for our needs.\n", - "\n", - "### Learn more:\n", - " - The [Infosec Jupyter Book](\"https://infosecjupyterbook.com/introduction.html\") has more details on the technical working of Jupyter.\n", - " - [The Jupyter Project documentation](\"https://jupyter.org/documentation\")\n", - "\n", - "---\n", - "## How to use a Jupyter notebook?\n", - "To use a Jupyter notebook you need a Jupyter server that will render the notebook and execute the code within it. This can take the form of a local [Jupyter installation](https://pypi.org/project/jupyter/), or a remotely hosted version such as [Azure Notebooks](https://notebooks.azure.com/). If you are reading this it is highly likely that you already have a Jupyter server that this notebook is using.\n", - "You can learn more about installing and running your own Jupyter server [here](https://realpython.com/jupyter-notebook-introduction/).\n", - "\n", - "### Using Azure Notebooks\n", - "If you accessed this notebook from Azure Sentinel, you are probably using Azure Notebooks to run this notebook. Azure Notebooks runs in the same way that a local Jupyter server with, except with the additional feature of integrated project management and file storage. When you open a notebook in Azure Notebooks the user interface is nearly identical to a standard Jupyter notebook experience.\n", - "\n", - "Before you can start running code in a notebook you need to make sure that it is connected to a Jupyter server and you have the correct type of kernel configured. For this notebook we are going to be using Python 3.6, hopefully Azure Notebooks has already loaded this kernel for you - you can check this by looking at the top left corner of the screen where you should see the currently connected kernel. \n", - "\n", - "![KernelIssue](./images/nb_img1.png)\n", - "\n", - "If this does not read Python 3.6 you can select the correct kernel by selecting Kernel > Change kernel from the top menu and clicking Python 3.6.\n", - "\n", - "> **Note**: the notebook works with Python 3.6, 3.7 or later. If you are using this notebook in Azure ML or another Jupyter environment you can choose any kernel that supports Python 3.6 or later\n", - "\n", - "![KernelPicker](./images/nb_img2.png)\n", - "\n", - "Once you have done this you should be ready to move onto a code cell.\n", - "> **Tip**: You can identify which cells are code by selecting them and looking at the drop down box at the center of the top menu. It will either read 'Code' (for interactive code cells), 'Markdown' (for Markdown text cells like this one), or RawNBConvert (these are just raw data and not interpreted by Jupyter - they can be used by tools that process notebook files, such as *nbconvert* to render the data into HTML or LaTeX). \n", - "\n", - "If you click on the cell below you should see this box change to 'Code'.\n", - "\n", - "### Learn More:\n", - "More details on Azure Notebooks can be found in the [Azure Notebooks documentation](https://docs.microsoft.com/en-us/azure/notebooks/) and the [Azure Sentinel documentation](https://docs.microsoft.com/en-us/azure/sentinel/notebooks).\n", - "\n", - "---\n", - "## Running code\n", - "Once you have selected a code cell you can run it by clicking the run button at the menu bar at the top, or by pressing Ctrl+Enter.\n" - ] - }, - { - "metadata": { - "trusted": true - }, - "cell_type": "code", - "source": [ - "# This is our first code cell, it contains basic Python code.\n", - "# You can run a code cell by selecting it and clicking the Run button in the top menu, or by pressing Shift + Enter.\n", - "# Once you run a code cell any output from that code will be displayed directly below it.\n", - "print(\"Congratulations you just ran this code cell\")\n", - "y = 2+2\n", - "print(\"2 + 2 =\", y)" - ], - "execution_count": null, - "outputs": [] - }, - { - "metadata": {}, - "cell_type": "markdown", - "source": [ - "Variables set within a code cell persist between cells meaning you can chain cells together" - ] - }, - { - "metadata": { - "trusted": true - }, - "cell_type": "code", - "source": [ - "y + 2" - ], - "execution_count": null, - "outputs": [] - }, - { - "metadata": {}, - "cell_type": "markdown", - "source": [ - "### Learn More : \n", - " - The [Infosec Jupyter Book](\"https://infosecjupyterbook.com/\") provides an infosec specific intro to Python.\n", - " - [Real Pyhton](\"https://realpython.com/\") is a comprehensive set of Python learnings and tutorials.\n", - "
\n", - "
" - ] - }, - { - "metadata": {}, - "cell_type": "markdown", - "source": [ - "Now that you understand the basics we can move onto more complex code.\n", - "\n", - "---\n", - "## Setting up the environment\n", - "Code cells behave in the same way your code would in other environments, so you need to remember about common coding practices such as variable initialization and library imports. \n", - "Before we execute more complex code we need to make sure the required packages are installed and libraries imported. At the top of many of the Azure Sentinel notebooks you will see large cells that will check kernel versions and then install and import all the libraries we are going to be using in the notebook, make sure you run this before running other cells in the notebook.\n", - "If you are running notebooks locally or via dedicated compute in Azure Notebooks library installs will persist but this is not the case with Azure Notebooks free tier, so you will need to install each time you run. Even if running in a static environment imports are required for each run so make sure you run this cell regardless." - ] - }, - { - "metadata": { - "trusted": true - }, - "cell_type": "code", - "source": [ - "from pathlib import Path\n", - "import os\n", - "import sys\n", - "import warnings\n", - "from IPython.display import display, HTML, Markdown\n", - "\n", - "REQ_PYTHON_VER=(3, 6)\n", - "REQ_MSTICPY_VER=(0, 5, 0)\n", - "\n", - "display(HTML(\"

Starting Notebook setup...

\"))\n", - "# If you did not clone the entire Azure-Sentinel-Notebooks repo you may not have this file\n", - "if Path(\"./utils/nb_check.py\").is_file():\n", - " from utils.nb_check import check_python_ver, check_mp_ver\n", - "\n", - " check_python_ver(min_py_ver=REQ_PYTHON_VER)\n", - " try:\n", - " check_mp_ver(min_msticpy_ver=REQ_MSTICPY_VER)\n", - " except ImportError:\n", - " !pip install --upgrade msticpy\n", - " if \"msticpy\" in sys.modules:\n", - " importlib.reload(sys.modules[\"msticpy\"])\n", - " else:\n", - " import msticpy\n", - " check_mp_ver(MSTICPY_REQ_VERSION)\n", - " \n", - "from msticpy.nbtools import nbinit\n", - "nbinit.init_notebook(\n", - " namespace=globals(),\n", - " extra_imports=[\"ipwhois, IPWhois, pyyaml\"]\n", - ")" - ], - "execution_count": null, - "outputs": [] - }, - { - "metadata": {}, - "cell_type": "markdown", - "source": [ - "---\n", - "## Configuration\n", - "Once we have set up our Jupyter environment with the libraries that we'll use in the notebook, we need to make sure we have some configuration in place. Some of the notebook components need addtional configuration to connect to external services (e.g. API keys to retrieve Threat Intelligence data). This includes configuration for connection to our Azure Sentinel workspace, as well as some threat intelligence providers we will use later.\n", - "The easiest way to handle the configuration for these services is to store them in a msticpyconfig file (`msticpyconfig.yaml`). More details on msticpyconfig can be found here: https://msticpy.readthedocs.io/en/latest/getting_started/msticpyconfig.html\n", - "\n", - "### Learn more: \n", - "- In this notebook we will setup the basic config we need to get started. If you need a more complete walk-through we have a separate notebook to help you: https://github.com/Azure/Azure-Sentinel-Notebooks/blob/master/ConfiguringNotebookEnvironment.ipynb\n", - "
\n", - "
" - ] - }, - { - "metadata": {}, - "cell_type": "markdown", - "source": [ - "The Azure-Sentinel-Notebooks GitHub repo contains an template msticpyconfig file ready to be populated. If you have run this notebook before you may have a msticpyconfig file already populated, the cell below allows you to checks if this file. If your config file does not contain details under Azure Sentinel > Workspaces, or TIProviders the following cells will populate these for you.
\n", - "If you want to see an example of what a populated msticpyconfig file should look like a samples is included in the repo as msticpyconfig-sample.yaml." - ] - }, - { - "metadata": { - "trusted": true - }, - "cell_type": "code", - "source": [ - "%pfile msticpyconfig.yaml" - ], - "execution_count": null, - "outputs": [] - }, - { - "metadata": {}, - "cell_type": "markdown", - "source": [ - "If you do not have and msticpyconfig file we can populate one for you. Before you do this you will need a few things.\n", - "\n", - "The first is the Workspace ID and Tenant ID of the Azure Sentinel Workspace you wish to connect to.\n", - "\n", - " - You can get the workspace ID by opening Azure Sentinel in the [Azure Portal](\"https://portal.azure.com\") and selecting Settings > Workspace Settings. Your Workspace ID is displayed near the top of this page.\n", - "\n", - "- You can get your tenant ID (also referred to organization or directory ID) via [Azure Active Directory](\"https://docs.microsoft.com/en-us/onedrive/find-your-office-365-tenant-id\")\n", - "\n", - "We are going to use [VirusTotal](\"https://www.virustotal.com\") to enrich our Azure Sentinel data. For this you will need a VirusTotal API key, one of these can be obtained for free (as a personnal key) via the [VirusTotal](\"https://developers.virustotal.com/v3.0/reference#getting-started\") website.\n", - "We are using VirusTotal for this notebook but we also support a range of other threat intelligence providers: https://msticpy.readthedocs.io/en/latest/data_acquisition/TIProviders.html\n", - "

\n", - "In addition we are going to plot IP address locations on a map, in order to do this we are going to use [MaxMind](\"https://www.maxmind.com\") to geolocate IP addresses which requires an API key. You can sign up for a free account and API key at https://www.maxmind.com/en/geolite2/signup. \n", - "

\n", - "Once you have these required items run the cell below and you will prompted to enter these elements:" - ] - }, - { - "metadata": { - "trusted": true, - "scrolled": true - }, - "cell_type": "code", - "source": [ - "ws_id = nbwidgets.GetEnvironmentKey(env_var='WORKSPACE_ID',\n", - " prompt='Please enter your Log Analytics Workspace Id:', auto_display=True)\n", - "ten_id = nbwidgets.GetEnvironmentKey(env_var='TENANT_ID',\n", - " prompt='Please enter your Log Analytics Tenant Id:', auto_display=True)\n", - "vt_key = nbwidgets.GetEnvironmentKey(env_var='VT_KEY',\n", - " prompt='Please enter your VirusTotal API Key:', auto_display=True)\n", - "mm_key = nbwidgets.GetEnvironmentKey(env_var='MM_KEY',\n", - " prompt='Please enter your MaxMind API Key:', auto_display=True)" - ], - "execution_count": null, - "outputs": [] - }, - { - "metadata": {}, - "cell_type": "markdown", - "source": [ - " The cell below will now populate a msticpyconfig file with these values:" - ] - }, - { - "metadata": { - "trusted": true - }, - "cell_type": "code", - "source": [ - "import yaml\n", - "with open(\"msticpyconfig.yaml\") as config:\n", - " data = yaml.load(config, Loader=yaml.Loader)\n", - "data['AzureSentinel']\n", - "\n", - "workspace = {\"Default\":{\"WorkspaceId\": ws_id.value, \"TenantId\": ten_id.value}}\n", - "ti = {\"VirusTotal\":{\"Args\": {\"AuthKey\" : vt_key.value}, \"Primary\" : True, \"Provider\": \"VirusTotal\"}}\n", - "other_prov = {\"GeoIPLite\" : {\"Args\" : {\"AuthKey\" : mm_key.value, \"DBFolder\" : \"~/msticpy\"}, \"Provider\" : \"GeoLiteLookup\"}}\n", - "data['AzureSentinel']['Workspaces'] = workspace\n", - "data['TIProviders'] = ti\n", - "data['OtherProviders'] = other_prov\n", - "\n", - "with open(\"msticpyconfig.yaml\", 'w') as config:\n", - " yaml.dump(data, config)\n", - " \n", - "print(\"msticpyconfig.yaml updated\")" - ], - "execution_count": null, - "outputs": [] - }, - { - "metadata": {}, - "cell_type": "markdown", - "source": [ - "We can now validate our configuration is correct." - ] - }, - { - "metadata": { - "trusted": true - }, - "cell_type": "code", - "source": [ - "from msticpy.common.pkg_config import refresh_config, validate_config\n", - "refresh_config()\n", - "validate_config()" - ], - "execution_count": null, - "outputs": [] - }, - { - "metadata": {}, - "cell_type": "markdown", - "source": [ - "> **Note** you may see warnings for missing providers when running this cell.\n", - "> This is not an issue as we will not be using all providers in this notebook\n", - "> so long as you get thie message \"No errors found.\" you are OK to proceed.\n" - ] - }, - { - "metadata": {}, - "cell_type": "markdown", - "source": [ - "---\n", - "## Getting Data\n", - "Now that we have configured the details necessary to connect to Azure Sentinel we can go ahead and get some data. We will do this with `QueryProvider()` from MSTICpy. \n", - "You can use the `QueryProvider` class to connect to different data sources such as MDATP, the Security Graph API, and the one we will use here, Azure Sentinel. \n", - "\n", - "### Learn more:\n", - " - More details on configuring and using QueryProviders can be found in the [MSTICpy Documentation](\"https://msticpy.readthedocs.io/en/latest/data_acquisition/DataProviders.html#instantiating-a-query-provider\").\n", - "

" - ] - }, - { - "metadata": {}, - "cell_type": "markdown", - "source": [ - "For now, we are going to set up a QueryProvider for Azure Sentinel, pass it the details for our workspace that we just stored in the msticpyconfig file, and connect. The connection process will ask us to authenticate to our Azure Sentinel workspace via [device authorization](\"https://docs.microsoft.com/en-us/azure/active-directory/develop/v2-oauth2-device-code\") with our Azure credentials. You can do this by clicking the device login code button that appears as the output of the next cell, or by navigating to https://microsoft.com/devicelogin and manually entering the code. Note that this authentication persists with the kernel you are using with the notebook, so if you restart the kernel you will need to re-authenticate.\n" - ] - }, - { - "metadata": { - "trusted": true - }, - "cell_type": "code", - "source": [ - "# Initalize a QueryProvider for Azure Sentinel\n", - "qry_prov = QueryProvider(\"LogAnalytics\")\n", - "\n", - "# Get the Azure Sentinel workspace details from msticpyconfig\n", - "try:\n", - " ws_config = WorkspaceConfig()\n", - " md(\"Workspace details collected from config file\")\n", - "except:\n", - " raise(\"No workspace settings are configured, please run the cells above to configure these.\")\n", - " \n", - "# Connect to Azure Sentinel with our QueryProvider and config details\n", - "# ws_config.code_connect_str is a feature of MSTICpy that creates the required connection string from details in our msticpyconfig\n", - "qry_prov.connect(connection_str=ws_config.code_connect_str)" - ], - "execution_count": null, - "outputs": [] - }, - { - "metadata": {}, - "cell_type": "markdown", - "source": [ - "Now that we have connected we can query Azure Sentinel for data, but before we do that we need to understand what data is avalaible to query. The QueryProvider object provides a way to get a list of tables as well as tables and table columns:" - ] - }, - { - "metadata": { - "trusted": true - }, - "cell_type": "code", - "source": [ - "# Get list of tables in our Workspace\n", - "display(qry_prov.schema_tables [:5]) # We are outputting only the first 5 tables for brevity\n", - "# Get list of tables and thier columns\n", - "qry_prov.schema['SigninLogs'] # We are only displaying the columns for SigninLogs for brevity" - ], - "execution_count": null, - "outputs": [] - }, - { - "metadata": {}, - "cell_type": "markdown", - "source": [ - "MSTICpy includes a number of built in queries that you can run.
\n", - "You can list available queries with .list_queries() and get specific details about a query by calling it with \"?\" as a parameter" - ] - }, - { - "metadata": { - "trusted": true - }, - "cell_type": "code", - "source": [ - "# Get a list of avaliable queries\n", - "qry_prov.list_queries()" - ], - "execution_count": null, - "outputs": [] - }, - { - "metadata": { - "trusted": true - }, - "cell_type": "code", - "source": [ - "# Get details about a query\n", - "qry_prov.Azure.list_all_signins_geo(\"?\")" - ], - "execution_count": null, - "outputs": [] - }, - { - "metadata": {}, - "cell_type": "markdown", - "source": [ - "You can then run the query by calling it with the required parameters:" - ] - }, - { - "metadata": { - "trusted": true - }, - "cell_type": "code", - "source": [ - "from datetime import datetime, timedelta\n", - "# set our query end time as now\n", - "end = datetime.now()\n", - "# set our query start time as 1 hour ago\n", - "start = end - timedelta(hours=1)\n", - "# run query with specified start and end times\n", - "logons_df = qry_prov.Azure.list_all_signins_geo(start=start, end=end)\n", - "# display first 5 rows of any results\n", - "logons_df.head() # If you have no data you will just see the column headings displayed" - ], - "execution_count": null, - "outputs": [] - }, - { - "metadata": {}, - "cell_type": "markdown", - "source": [ - "Another way to run queries is to pass a string format of a KQL query to the query provider, this will run the query against the workspace connected to above, and will return the data in a [Pandas DataFrame](\"https://pandas.pydata.org/\"). We will look at working with Pandas in a bit more detail later." - ] - }, - { - "metadata": { - "trusted": true - }, - "cell_type": "code", - "source": [ - "# Define our query\n", - "test_query = \"\"\"\n", - "SigninLogs\n", - "| where TimeGenerated > ago(7d)\n", - "| take 10\n", - "\"\"\"\n", - "\n", - "# Pass that query to our QueryProvider\n", - "test_df = qry_prov.exec_query(test_query)\n", - "\n", - "# Check that we have some data\n", - "if isinstance(test_df, pd.DataFrame) and not test_df.empty:\n", - " # .head() returns the first 5 rows of our results DataFrame\n", - " display(test_df.head())\n", - "# If where is no data load some sample data to use instead\n", - "else:\n", - " md(\"You don't appear to have any SigninLogs - we will load sample data for you to use.\")\n", - " qry_prov = QueryProvider(\"LocalData\", data_paths=[\"nbdemo/data/\"], query_paths=[\"nbdemo/data/\"])\n", - " logons_df = qry_prov.Azure.list_all_signins_geo()\n", - " display(logons_df.head())" - ], - "execution_count": null, - "outputs": [] - }, - { - "metadata": {}, - "cell_type": "markdown", - "source": [ - "### Learn more:\n", - " - You can learn more about the MSTICpy pre-defined queries in the [MSTICpy Documentation](\"https://msticpy.readthedocs.io/en/latest/data_acquisition/DataProviders.html#running-an-pre-defined-queryl\")" - ] - }, - { - "metadata": {}, - "cell_type": "markdown", - "source": [ - "---\n", - "## Pandas\n", - "Our query results are returned in the form of a Pandas DataFrame. DataFrames are a core component of the Azure Sentinel notebooks and of MSTICpy and is used for both input and output formats.\n", - "Pandas DataFrames are incredibly versitile data structures with a lot of useful features, we will cover a small number of them here and we recommend that you check out the Learn more section to learn more about Pandas features.\n", - "
\n", - "
\n", - "### Displaying a DataFrame:\n", - "The first thing we want to do is display our DataFrame. You can either just run it or explicity display it by calling `display(df)`." - ] - }, - { - "metadata": { - "trusted": true - }, - "cell_type": "code", - "source": [ - "# For this section we are going to create a DataFrame from data we have saved in a csv file\n", - "df = pd.read_csv(\"https://raw.githubusercontent.com/microsoft/msticpy/master/tests/testdata/host_logons.csv\", index_col=[0] )\n", - "# Display our DataFrame\n", - "df # or display(df)" - ], - "execution_count": null, - "outputs": [] - }, - { - "metadata": {}, - "cell_type": "markdown", - "source": [ - "> **Note** if the dataframe variable (`df` in the example above) is the last statement in a \n", - "> code cell, Jupyter will automatically display it without using the `display()` function. \n", - "> However, if you want to display a DataFrame in the middle of \n", - "> other code in a cell you must use the `display()` function.\n", - "\n", - "You may not want to display the whole DataFrame and instead display only a selection of items. There are numerous ways to do this and the cell below shows some of the most widely used functions." - ] - }, - { - "metadata": { - "trusted": true - }, - "cell_type": "code", - "source": [ - "md(\"Display the first 2 rows using head(): \", \"bold\")\n", - "display(df.head(2)) # we don't need to call display here but just for illustration" - ], - "execution_count": null, - "outputs": [] - }, - { - "metadata": { - "trusted": true - }, - "cell_type": "code", - "source": [ - "md(\"Display the 3rd row using iloc[]: \", \"bold\")\n", - "df.iloc[3]" - ], - "execution_count": null, - "outputs": [] - }, - { - "metadata": { - "trusted": true - }, - "cell_type": "code", - "source": [ - "md(\"Show the column names in the DataFrame \", \"bold\")\n", - "df.columns" - ], - "execution_count": null, - "outputs": [] - }, - { - "metadata": { - "trusted": true - }, - "cell_type": "code", - "source": [ - "md(\"Display just the TimeGenerated and TenantId columnns: \", \"bold\")\n", - "df[[\"TimeGenerated\", \"TenantId\"]]" - ], - "execution_count": null, - "outputs": [] - }, - { - "metadata": {}, - "cell_type": "markdown", - "source": [ - "We can also choose to select a subsection of our DataFrame based on the contents of the DataFrame:\n", - "\n", - "> **Tip**: the syntax in these examples is using a technique called *boolean indexing*. \n", - ">
`df[]`\n", - "> returns all rows in the dataframe where the boolean expression is True\n", - ">
In the first example we telling pandas to return all rows where the column value of\n", - "> 'TargetUserName' matches 'MSTICAdmin'" - ] - }, - { - "metadata": { - "trusted": true - }, - "cell_type": "code", - "source": [ - "md(\"Display only rows where TargetUserName value is 'MSTICAdmin': \", \"bold\")\n", - "df[df['TargetUserName']==\"MSTICAdmin\"]" - ], - "execution_count": null, - "outputs": [] - }, - { - "metadata": { - "trusted": true - }, - "cell_type": "code", - "source": [ - "md(\"Display rows where TargetUserName is either MSTICAdmin or adm1nistratror:\", \"bold\")\n", - "display(df[df['TargetUserName'].isin(['adm1nistrator', 'MSTICAdmin'])])" - ], - "execution_count": null, - "outputs": [] - }, - { - "metadata": {}, - "cell_type": "markdown", - "source": [ - "Our DataFrame call also be extended to add new columns with additional data if reqired:" - ] - }, - { - "metadata": { - "trusted": true - }, - "cell_type": "code", - "source": [ - "df[\"NewCol\"] = \"Look at my new data!\"\n", - "display(df[[\"TenantId\",\"Account\", \"TimeGenerated\", \"NewCol\"]].head(2))" - ], - "execution_count": null, - "outputs": [] - }, - { - "metadata": {}, - "cell_type": "markdown", - "source": [ - "### Learn more:\n", - "There is a lot more you can do with Pandas, the links below provide some useful resources:\n", - " - [Getting starting with Pandas](\"https://pandas.pydata.org/pandas-docs/stable/getting_started/index.html\")\n", - " - [Infosec Jupyerbook intro to Pandas](\"https://infosecjupyterbook.com/notebooks/tutorials/03_intro_to_pandas.html\")\n", - " - [A great list of Pandas hints and tricks](\"https://www.dataschool.io/python-pandas-tips-and-tricks/\")" - ] - }, - { - "metadata": {}, - "cell_type": "markdown", - "source": [ - "---\n", - "## Enriching data\n", - "\n", - "Now that we have seen how to query for data, and do some basic manipulation we can look at enriching this data with additional data sources. For this we are going to use an external threat intelligence provider to give us some more details about an IP address we have in our dataset using the [MSTICpy TIProvider](\"https://msticpy.readthedocs.io/en/latest/data_acquisition/TIProviders.html\") feature." - ] - }, - { - "metadata": { - "trusted": true - }, - "cell_type": "code", - "source": [ - "from datetime import datetime, timedelta\n", - "# Check if we have logon data already and if not get some\n", - "if not isinstance(logons_df, pd.DataFrame) or logons_df.empty:\n", - " # set our query end time as now\n", - " end = datetime.now()\n", - " # set our query start time as 1 hour ago\n", - " start = end - timedelta(days=1)\n", - " # run query with specified start and end times\n", - " logons_df = qry_prov.Azure.list_all_signins_geo(start=start, end=end)\n", - " \n", - "# Create our TI provider\n", - "ti = TILookup()\n", - "# Get the first logon IP address from our dataset\n", - "ip = logons_df.iloc[1]['IPAddress']\n", - "# Look up the IP in VirusTotal\n", - "ti_resp = ti.lookup_ioc(ip, providers=[\"VirusTotal\"])\n", - "\n", - "# Format our results as a DataFrame\n", - "ti_resp = ti.result_to_df(ti_resp)\n", - "display(ti_resp)" - ], - "execution_count": null, - "outputs": [] - }, - { - "metadata": {}, - "cell_type": "markdown", - "source": [ - "Using the [Pandas apply()](\"https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.apply.html\") feature we can get results for all the IP addresses in our data set and add the lookup severity score as a new column in our DataFrame for easier reference." - ] - }, - { - "metadata": { - "trusted": true - }, - "cell_type": "code", - "source": [ - "# Take the IP address in each row, look it up against TI and return the seveirty score\n", - "def lookup_res(row):\n", - " ip = row['IPAddress']\n", - " resp = ti.lookup_ioc(ip, providers=[\"VirusTotal\"])\n", - " resp = ti.result_to_df(resp)\n", - " return resp[\"Severity\"].iloc[0]\n", - "\n", - "# Take the first 3 rows of data and copy they into a new DataFrame\n", - "enrich_logons_df = logons_df.iloc[:3].copy()\n", - "# Create a new column called TIRisk and populate that with the TI severity score of the IP Address in that row\n", - "enrich_logons_df['TIRisk'] = enrich_logons_df.apply(lookup_res, axis=1)\n", - "# Display a subset of columns from our DataFrame\n", - "enrich_logons_df[[\"TimeGenerated\", \"ResultType\", \"UserPrincipalName\", \"IPAddress\", \"TIRisk\"]]" - ], - "execution_count": null, - "outputs": [] - }, - { - "metadata": {}, - "cell_type": "markdown", - "source": [ - "### Learn more:\n", - "MSTICpy includes further threat intelligence capabilities as well as other data enrichment options. More details on these can be found in the [documentation](\"https://msticpy.readthedocs.io/en/latest/DataEnrichment.html\")." - ] - }, - { - "metadata": {}, - "cell_type": "markdown", - "source": [ - "---\n", - "## Analyzing data\n", - "With the data we have collected we may wish to perform some analysis on it in order to better understand it. MSTICpy includes a number of features to help with this, and there are a vast array of other data analysis capabilities available via Python ranging from simple processes to complex ML models. We will start here by keeping it simple and look at how we can decode some Base64 encoded command line strings we have in order to allow us to understand their content." - ] - }, - { - "metadata": { - "trusted": true - }, - "cell_type": "code", - "source": [ - "from msticpy.sectools import base64unpack as b64\n", - "# Take our encoded Powershell Command\n", - "b64_cmd = \"powershell.exe -encodedCommand SW52b2tlLVdlYlJlcXVlc3QgaHR0cHM6Ly9jb250b3NvLmNvbS9tYWx3YXJlIC1PdXRGaWxlIEM6XG1hbHdhcmUuZXhl\"\n", - "# Unpack the Base64 encoded elements\n", - "unpack_txt = b64.unpack(input_string=b64_cmd)\n", - "# Display our results and transform for easier reading\n", - "unpack_txt[1].T" - ], - "execution_count": null, - "outputs": [] - }, - { - "metadata": {}, - "cell_type": "markdown", - "source": [ - "We can also use MSTICpy to extract Indicators of Compromise (IoCs) from a dataset, this makes it easy to extract and match on a set of IoCs within our data. In the example below we take a US Cybersecurity & Infrastructure Security Agency (CISA) report and extract all domains listed in the report:" - ] - }, - { - "metadata": { - "trusted": true - }, - "cell_type": "code", - "source": [ - "import requests\n", - "# Set up our IoCExtract oject\n", - "ioc_extractor = iocextract.IoCExtract()\n", - "# Download our threat report\n", - "data = requests.get(\"https://www.us-cert.gov/sites/default/files/publications/AA20-099A_WHITE.stix.xml\")\n", - "# Extract domains listed in our report\n", - "iocs = ioc_extractor.extract(data.text, ioc_types=\"dns\")['dns']\n", - "# Display the first 5 iocs found in our report\n", - "list(iocs)[:5]" - ], - "execution_count": null, - "outputs": [] - }, - { - "metadata": {}, - "cell_type": "markdown", - "source": [ - "### Learn more:\n", - "There are a wide range of options when it comes to data analysis in notebooks using Python. Here are some useful resources to get you started:\n", - " - [MSITCpy DataAnalysis documentation](\"https://msticpy.readthedocs.io/en/latest/DataAnalysis.html\")\n", - " - Scikit-Learn is a popular Python ML data analysis library, which has a useful [tutorial](\"https://scikit-learn.org/stable/tutorial/basic/tutorial.html\")" - ] - }, - { - "metadata": {}, - "cell_type": "markdown", - "source": [ - "---\n", - "## Visualizing data\n", - "Visualizing data can provide an excellent way to analyse data, identify patterns and anomalies. Python has a wide range of data visualization capabilities each of which have thier own benefits and drawbacks. We will look at some basic capabilities as well as the in-build visualizations in MSTICpy.\n", - "


\n", - "**Basic Graphs**
\n", - "Pandas and Matplotlib provide the easiest and simplest way to produce simple plots of data:" - ] - }, - { - "metadata": { - "trusted": true - }, - "cell_type": "code", - "source": [ - "vis_q = \"\"\"\n", - "SigninLogs\n", - "| where TimeGenerated > ago(7d)\n", - "| sample 5\"\"\"\n", - "\n", - "# Try and query for data but if using sample data load that instead\n", - "try:\n", - " vis_data = qry_prov.exec_query(vis_q)\n", - "except FileNotFoundError:\n", - " vis_data = logons_df\n", - "\n", - "# Check we have some data in our results and if not use previously used dataset\n", - "if not isinstance(vis_data, pd.DataFrame) or vis_data.empty:\n", - " vis_data = logons_df\n", - "\n", - "# Plot up to the first 5 IP addresses\n", - "vis_data.head()['IPAddress'].value_counts().plot.bar(title=\"IP prevelence\", legend=False)\n" - ], - "execution_count": null, - "outputs": [] - }, - { - "metadata": { - "trusted": true - }, - "cell_type": "code", - "source": [ - "pie_df = vis_data.copy()\n", - " # If we have lots of data just plot the first 5 rows\n", - "pie_df.head()['IPAddress'].value_counts().plot.pie(legend=True)" - ], - "execution_count": null, - "outputs": [] - }, - { - "metadata": {}, - "cell_type": "markdown", - "source": [ - "[Bokeh](\"https://bokeh.org/\") is a powerful visualization library that allows you to create complex, interactive visualizations. MSTICpy includes a number of pre-built visualizations using Bokeh including a timeline feature that can be used to represent events over time. You can interact with the timeline by zooming and panning, using the range selector, as well as hovering over data points to see more details." - ] - }, - { - "metadata": { - "trusted": true - }, - "cell_type": "code", - "source": [ - "from datetime import datetime, timedelta\n", - "# Check if we have logon data already and if not get some\n", - "if not isinstance(logons_df, pd.DataFrame) or logons_df.empty:\n", - " # set our query end time as now\n", - " end = datetime.now()\n", - " # set our query start time as 1 hour ago\n", - " start = end - timedelta(days=1)\n", - " # run query with specified start and end times\n", - " logons_df = qry_prov.Azure.list_all_signins_geo(start=start, end=end)\n", - " \n", - "display(timeline.display_timeline(logons_df.head(10), source_columns=[\"TimeGenerated\", \"ResultType\", \"UserPrincipalName\", \"IPAddress\"], group_by=\"AppDisplayName\"))" - ], - "execution_count": null, - "outputs": [] - }, - { - "metadata": {}, - "cell_type": "markdown", - "source": [ - "MSTICpy also includes a feature to allow you to map locations, this can be particularily useful when looking at the distribution of remote network connections or other events. Below we plot the locations of remote logons observed in our Azure AD data." - ] - }, - { - "metadata": { - "trusted": true - }, - "cell_type": "code", - "source": [ - "from msticpy.sectools.ip_utils import convert_to_ip_entities\n", - "from msticpy.nbtools.foliummap import FoliumMap, get_map_center\n", - "\n", - "# Convert our IP addresses in string format into an ip address entity\n", - "ip_entity = entityschema.IpAddress()\n", - "ip_list = [convert_to_ip_entities(i)[0] for i in logons_df['IPAddress'].head(10)]\n", - " \n", - "# Get center location of all IP locaitons to center the map on\n", - "location = get_map_center(ip_list)\n", - "logon_map = FoliumMap(location=location, zoom_start=4)\n", - "\n", - "# Add location markers to our map and dsiplay it\n", - "if len(ip_list) > 0:\n", - " logon_map.add_ip_cluster(ip_entities=ip_list)\n", - "display(logon_map.folium_map)" - ], - "execution_count": null, - "outputs": [] - }, - { - "metadata": {}, - "cell_type": "markdown", - "source": [ - "### Learn more:\n", - " - The [Infosec Jupyterbook](\"https://infosecjupyterbook.com/\") includes a section on data visualization.\n", - " - [Bokeh Library Documentation](\"https://bokeh.org/\")\n", - " - [Matplotlib tutorial](\"https://matplotlib.org/3.2.0/tutorials/index.html\")\n", - " - [Seaborn visualization library tutorial](\"https://seaborn.pydata.org/tutorial.html\")" - ] - }, - { - "metadata": {}, - "cell_type": "markdown", - "source": [ - "---\n", - "## Conclusion\n", - "This notebook has showed you the basics of using notebooks and Azure Sentinel for security investigaitons. There are many more things possible using notebooks and it is stronly encouraged to read the material we have referenced in the learn more sections in this notebook. You can also explore the other Azure Sentinel notebooks in order to take advantage of the pre-built hunting logic, and understand other analysis techniques that are possible.
\n", - "### Appendix:\n", - " - [Jupyter Notebooks: An Introduction](\"https://realpython.com/jupyter-notebook-introduction/\")\n", - " - [Threat Hunting in the cloud with Azure Notebooks](\"https://medium.com/@maarten.goet/threat-hunting-in-the-cloud-with-azure-notebooks-supercharge-your-hunting-skills-using-jupyter-8d69218e7ca0\")\n", - " - [MSTICpy documentation](\"https://msticpy.readthedocs.io/\")\n", - " - [Azure Sentinel Notebooks documentation](\"https://docs.microsoft.com/en-us/azure/sentinel/notebooks\")\n", - " - [The Infosec Jupyterbook](\"https://infosecjupyterbook.com/introduction.html\")\n", - " - [Linux Host Explorer Notebook walkthrough](\"https://techcommunity.microsoft.com/t5/azure-sentinel/explorer-notebook-series-the-linux-host-explorer/ba-p/1138273\")\n", - " - [Why use Jupyter for Security Investigations](\"https://techcommunity.microsoft.com/t5/azure-sentinel/why-use-jupyter-for-security-investigations/ba-p/475729\")\n", - " - [Security Investigtions with Azure Sentinel & Notebooks](\"https://techcommunity.microsoft.com/t5/azure-sentinel/security-investigation-with-azure-sentinel-and-jupyter-notebooks/ba-p/432921\")\n", - " - [Pandas Documentation](\"https://pandas.pydata.org/pandas-docs/stable/user_guide/index.html\")\n", - " - [Bokeh Documentation](\"https://docs.bokeh.org/en/latest/\")" - ] - }, - { - "metadata": { - "trusted": true - }, - "cell_type": "code", - "source": [], - "execution_count": null, - "outputs": [] - } - ], - "metadata": { - "kernelspec": { - "name": "python36", - "display_name": "Python 3.6", - "language": "python" - }, - "widgets": { - "application/vnd.jupyter.widget-state+json": { - "state": {}, - "version_major": 2, - "version_minor": 0 - } - }, - "language_info": { - "mimetype": "text/x-python", - "nbconvert_exporter": "python", - "name": "python", - "pygments_lexer": "ipython3", - "version": "3.6.6", - "file_extension": ".py", - "codemirror_mode": { - "version": 3, - "name": "ipython" - } - } - }, - "nbformat": 4, - "nbformat_minor": 4 -} \ No newline at end of file diff --git a/Guided Hunting - Anomalous Office365 Exchange Sessions.ipynb b/Guided Hunting - Anomalous Office365 Exchange Sessions.ipynb index 89b10bdd..935acf4e 100644 --- a/Guided Hunting - Anomalous Office365 Exchange Sessions.ipynb +++ b/Guided Hunting - Anomalous Office365 Exchange Sessions.ipynb @@ -90,7 +90,7 @@ "from IPython.display import display, HTML, Markdown\n", "\n", "REQ_PYTHON_VER=(3, 6)\n", - "REQ_MSTICPY_VER=(0, 5, 0)\n", + "REQ_MSTICPY_VER=(0, 6, 0)\n", "\n", "display(HTML(\"

Starting Notebook setup...

\"))\n", "if Path(\"./utils/nb_check.py\").is_file():\n", @@ -527,9 +527,9 @@ ], "metadata": { "kernelspec": { - "display_name": "Python 3", + "display_name": "Python 3.6", "language": "python", - "name": "python3" + "name": "python36" }, "language_info": { "codemirror_mode": { @@ -541,7 +541,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.8.0" + "version": "3.6.7" } }, "nbformat": 4, diff --git a/Guided Hunting - Office365-Exploring.ipynb b/Guided Hunting - Office365-Exploring.ipynb index 93af9d2b..018ee7b9 100644 --- a/Guided Hunting - Office365-Exploring.ipynb +++ b/Guided Hunting - Office365-Exploring.ipynb @@ -5,7 +5,7 @@ "metadata": {}, "source": [ "# Title: Office 365 Explorer\n", - "<details>\n", + "
\n", "  Details...\n", "**Notebook Version:** 1.0
\n", "**Python Version:** Python 3.6 (including Python 3.6 - AzureML)
\n", @@ -18,7 +18,7 @@ "**Data Sources Required**:\n", "- Log Analytics - OfficeActivity, IPLocation, Azure Network Analytics\n", "\n", - "</details>\n", + "
\n", "\n", "Brings together a series of queries and visualizations to help you investigate the security status of Office 365 subscription and individual user activities.\n", "- The first section focuses on Tenant-Wide data queries and analysis\n", @@ -30,34 +30,12 @@ }, { "cell_type": "markdown", - "metadata": {}, + "metadata": { + "toc": true + }, "source": [ - "\n", - "# Table of Contents\n", - "- [Setup and Authenticate](#setup)\n", - "- [Office 365 Activity](#o365)\n", - " - [Tenant-wide Information](#tenant_info)\n", - " - [AAD Operations - Account Modifications](#aad_ops)\n", - " - [Logon Anomalies](#logon_anomalies)\n", - " - [Activity Summary](#activity_summary)\n", - " - [Variability of IP Address for users](#ip_variability)\n", - " - [Accounts with multiple IPs and Geolocations](#acct_multi_geo)\n", - " - [User Logons with > N IP Address](#acct_multi_ips)\n", - " - [Operation Types by Location and IP](#ip_op_matrix)\n", - " - [Geolocation Map of Client IPs](#geo_map_tenant)\n", - " - [Distinct User Agent Strings in Use](#distinct_uas)\n", - " - [Graphical Activity Timeline](#op_timeline)\n", - " - [Users With largest Activity Type Count](#user_activity_counts)\n", - " - [Office User Investigation](#o365_user_inv)\n", - " - [Activity Summary](#user_act_summary)\n", - " - [Operation Breakdown for User](#user_op_count)\n", - " - [IP Count for Different User Operations](#user_ip_counts)\n", - " - [Activity Timeline](#user_act_timeline)\n", - " - [User IP GeoMap](#user_geomap)\n", - " - [Check for User IPs in Azure Network Flow Data](#ips_in_azure)\n", - " - [Rare Combinations of Country/UserAgent/Operation Type](#o365_cluster)\n", - "- [Appendices](#appendices)\n", - " - [Saving data to Excel](#appendices)\n" + "

Table of Contents

\n", + "" ] }, { @@ -65,7 +43,7 @@ "metadata": {}, "source": [ "---\n", - "### Notebook initialization\n", + "## Notebook initialization\n", "The next cell:\n", "- Checks for the correct Python version\n", "- Checks versions and optionally installs required packages\n", @@ -91,8 +69,8 @@ "execution_count": null, "metadata": { "ExecuteTime": { - "end_time": "2020-05-16T00:58:22.794036Z", - "start_time": "2020-05-16T00:58:18.510870Z" + "end_time": "2020-06-26T23:59:19.478614Z", + "start_time": "2020-06-26T23:59:17.250092Z" } }, "outputs": [], @@ -104,7 +82,7 @@ "from IPython.display import display, HTML, Markdown\n", "\n", "REQ_PYTHON_VER=(3, 6)\n", - "REQ_MSTICPY_VER=(0, 5, 0)\n", + "REQ_MSTICPY_VER=(0, 6, 0)\n", "\n", "display(HTML(\"

Starting Notebook setup...

\"))\n", "if Path(\"./utils/nb_check.py\").is_file():\n", @@ -121,6 +99,9 @@ " import msticpy\n", " check_mp_ver(REQ_MSTICPY_VER)\n", " \n", + "\n", + "# If not using Azure Notebooks, install msticpy with\n", + "# !pip install msticpy\n", "from msticpy.nbtools import nbinit\n", "extra_imports = [\n", " \"dns, reversename\",\n", @@ -148,7 +129,7 @@ }, "source": [ "### Get WorkspaceId and Authenticate to Log Analytics \n", - "<details>\n", + "
\n", "  Details...\n", "If you are using user/device authentication, run the following cell. \n", "- Click the 'Copy code to clipboard and authenticate' button.\n", @@ -168,7 +149,7 @@ "Note: you may occasionally see a JavaScript error displayed at the end of the authentication - you can safely ignore this.
\n", "On successful authentication you should see a ```popup schema``` button.\n", "To find your Workspace Id go to [Log Analytics](https://ms.portal.azure.com/#blade/HubsExtension/Resources/resourceType/Microsoft.OperationalInsights%2Fworkspaces). Look at the workspace properties to find the ID.\n", - "</details>" + "
" ] }, { @@ -176,23 +157,8 @@ "execution_count": null, "metadata": { "ExecuteTime": { - "end_time": "2020-05-16T00:57:50.958560Z", - "start_time": "2020-05-16T00:57:50.945560Z" - } - }, - "outputs": [], - "source": [ - "# To list configured workspaces run WorkspaceConfig.list_workspaces()\n", - "# WorkspaceConfig.list_workspaces()" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "ExecuteTime": { - "end_time": "2020-05-16T01:07:40.087301Z", - "start_time": "2020-05-16T00:58:30.041315Z" + "end_time": "2020-06-26T23:59:54.546697Z", + "start_time": "2020-06-26T23:59:27.512604Z" }, "tags": [ "todo" @@ -207,19 +173,31 @@ "table_index = qry_prov.schema_tables" ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Configuration\n", + "\n", + "#### `msticpyconfig.yaml` configuration File\n", + "You can configure primary and secondary TI providers and any required parameters in the `msticpyconfig.yaml` file. This is read from the current directory or you can set an environment variable (`MSTICPYCONFIG`) pointing to its location.\n", + "\n", + "To configure this file see the [ConfigureNotebookEnvironment notebook](https://github.com/Azure/Azure-Sentinel-Notebooks/blob/master/ConfiguringNotebookEnvironment.ipynb)" + ] + }, { "cell_type": "markdown", "metadata": {}, "source": [ "[Contents](#contents)\n", - "# Office 365 Activity" + "## Office 365 Activity" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "### Log Analytics Queries" + "### Log Analytics Queries and query time window" ] }, { @@ -227,8 +205,8 @@ "execution_count": null, "metadata": { "ExecuteTime": { - "end_time": "2020-05-16T00:13:44.706841Z", - "start_time": "2020-05-16T00:13:44.691841Z" + "end_time": "2020-06-27T00:32:09.095417Z", + "start_time": "2020-06-27T00:32:09.027664Z" } }, "outputs": [], @@ -237,7 +215,9 @@ " display(Markdown('

Warning. Office Data not available.


'\n", " 'Either Office 365 data has not been imported into the workspace or'\n", " ' the OfficeActivity table is empty.
'\n", - " 'This workbook is not useable with the current workspace.'))" + " 'This workbook is not useable with the current workspace.'))\n", + "else:\n", + " md('Office Activity table has records available for hunting')" ] }, { @@ -245,8 +225,8 @@ "execution_count": null, "metadata": { "ExecuteTime": { - "end_time": "2020-05-16T01:49:50.212900Z", - "start_time": "2020-05-16T01:49:50.137902Z" + "end_time": "2020-06-27T00:00:02.384265Z", + "start_time": "2020-06-27T00:00:02.302204Z" } }, "outputs": [], @@ -263,8 +243,8 @@ "execution_count": null, "metadata": { "ExecuteTime": { - "end_time": "2020-05-16T01:52:48.738859Z", - "start_time": "2020-05-16T01:52:48.733859Z" + "end_time": "2020-06-27T00:00:07.416773Z", + "start_time": "2020-06-27T00:00:07.406273Z" } }, "outputs": [], @@ -309,8 +289,8 @@ "execution_count": null, "metadata": { "ExecuteTime": { - "end_time": "2020-05-16T01:52:58.127604Z", - "start_time": "2020-05-16T01:52:52.998622Z" + "end_time": "2020-06-27T00:00:19.867160Z", + "start_time": "2020-06-27T00:00:17.317726Z" } }, "outputs": [], @@ -342,8 +322,8 @@ "execution_count": null, "metadata": { "ExecuteTime": { - "end_time": "2020-05-16T01:17:05.322665Z", - "start_time": "2020-05-16T01:17:05.298668Z" + "end_time": "2020-06-27T00:00:27.023842Z", + "start_time": "2020-06-27T00:00:26.990387Z" } }, "outputs": [], @@ -362,21 +342,21 @@ "execution_count": null, "metadata": { "ExecuteTime": { - "end_time": "2020-05-16T01:21:12.670387Z", - "start_time": "2020-05-16T01:21:03.701387Z" + "end_time": "2020-06-27T00:02:38.220632Z", + "start_time": "2020-06-27T00:02:38.196469Z" } }, "outputs": [], "source": [ "import math\n", "multi_ip_users = unique_ip_op_ua[unique_ip_op_ua[\"ClientIPCount\"] > 1]\n", - "if len(unique_ip_op_ua) > 0:\n", + "if len(multi_ip_users) > 0:\n", " height = max(math.log10(len(multi_ip_users.UserId.unique())) * 10, 8)\n", " aspect = 10 / height\n", " user_ip_op = sns.catplot(x=\"ClientIPCount\", y=\"UserId\", hue='Operation', data=multi_ip_users, height=height, aspect=aspect)\n", " md('Variability of IP Address Usage by user')\n", "else:\n", - " md('No IP Addresses')" + " md('No users with multiple IP addresses')" ] }, { @@ -392,8 +372,8 @@ "execution_count": null, "metadata": { "ExecuteTime": { - "end_time": "2020-05-16T01:26:11.150486Z", - "start_time": "2020-05-16T01:26:10.871487Z" + "end_time": "2020-06-27T00:02:48.941771Z", + "start_time": "2020-06-27T00:02:44.876978Z" } }, "outputs": [], @@ -455,8 +435,8 @@ "execution_count": null, "metadata": { "ExecuteTime": { - "end_time": "2020-05-16T01:26:26.039785Z", - "start_time": "2020-05-16T01:26:26.025787Z" + "end_time": "2020-06-27T00:02:53.533522Z", + "start_time": "2020-06-27T00:02:53.520578Z" } }, "outputs": [], @@ -478,8 +458,8 @@ "execution_count": null, "metadata": { "ExecuteTime": { - "end_time": "2020-05-16T01:27:22.043604Z", - "start_time": "2020-05-16T01:26:39.421506Z" + "end_time": "2020-06-27T00:03:01.285844Z", + "start_time": "2020-06-27T00:02:58.782751Z" } }, "outputs": [], @@ -560,8 +540,8 @@ "execution_count": null, "metadata": { "ExecuteTime": { - "end_time": "2020-05-16T01:27:44.018242Z", - "start_time": "2020-05-16T01:27:42.735241Z" + "end_time": "2020-06-27T00:03:05.569063Z", + "start_time": "2020-06-27T00:03:05.057550Z" } }, "outputs": [], @@ -606,8 +586,8 @@ "execution_count": null, "metadata": { "ExecuteTime": { - "end_time": "2020-05-16T01:27:55.044908Z", - "start_time": "2020-05-16T01:27:51.212910Z" + "end_time": "2020-06-27T00:03:11.829518Z", + "start_time": "2020-06-27T00:03:09.986019Z" } }, "outputs": [], @@ -634,8 +614,8 @@ "execution_count": null, "metadata": { "ExecuteTime": { - "end_time": "2020-05-16T01:35:06.031920Z", - "start_time": "2020-05-16T01:35:05.353920Z" + "end_time": "2020-06-27T00:03:15.304316Z", + "start_time": "2020-06-27T00:03:15.175812Z" } }, "outputs": [], @@ -654,10 +634,13 @@ " 'MailboxLogin']\n", " office_ops = office_ops[office_ops.Operation.isin(limit_op_types)]\n", " \n", - " sns.catplot(data=office_ops, y='Account', x='OperationCount', \n", - " hue='Operation', aspect=2)\n", - " display(office_ops.pivot_table('OperationCount', index=['Account'], \n", - " columns='Operation')) #.style.bar(color='orange', align='mid'))" + " if len(office_ops) > 0:\n", + " sns.catplot(data=office_ops, y='Account', x='OperationCount', \n", + " hue='Operation', aspect=2)\n", + " display(office_ops.pivot_table('OperationCount', index=['Account'], \n", + " columns='Operation')) #.style.bar(color='orange', align='mid'))\n", + " else:\n", + " md('no categorical data to plot')" ] }, { @@ -665,8 +648,8 @@ "execution_count": null, "metadata": { "ExecuteTime": { - "end_time": "2020-05-16T01:35:21.811305Z", - "start_time": "2020-05-16T01:35:21.567307Z" + "end_time": "2020-06-27T00:05:58.742811Z", + "start_time": "2020-06-27T00:05:58.694824Z" } }, "outputs": [], @@ -695,8 +678,8 @@ "execution_count": null, "metadata": { "ExecuteTime": { - "end_time": "2020-05-16T01:35:36.322501Z", - "start_time": "2020-05-16T01:35:36.272495Z" + "end_time": "2020-06-27T00:06:02.119250Z", + "start_time": "2020-06-27T00:06:02.032121Z" } }, "outputs": [], @@ -711,15 +694,15 @@ "execution_count": null, "metadata": { "ExecuteTime": { - "end_time": "2020-05-16T01:35:42.273892Z", - "start_time": "2020-05-16T01:35:42.112892Z" + "end_time": "2020-06-27T00:06:03.023848Z", + "start_time": "2020-06-27T00:06:02.983335Z" } }, "outputs": [], "source": [ "distinct_users = office_ops_df[['UserId']].sort_values('UserId')['UserId'].str.lower().drop_duplicates().tolist()\n", "distinct_users\n", - "user_select = nbwidgets.SelectString(description='Select User Id', item_list=distinct_users, auto_display=True)\n", + "user_select = nbwidgets.SelectItem(description='Select User Id', item_list=distinct_users, auto_display=True)\n", " # (items=distinct_users)" ] }, @@ -736,8 +719,8 @@ "execution_count": null, "metadata": { "ExecuteTime": { - "end_time": "2020-05-16T01:36:57.613215Z", - "start_time": "2020-05-16T01:36:28.890742Z" + "end_time": "2020-06-27T00:06:10.366715Z", + "start_time": "2020-06-27T00:06:08.663607Z" } }, "outputs": [], @@ -776,8 +759,8 @@ "execution_count": null, "metadata": { "ExecuteTime": { - "end_time": "2020-05-16T01:37:09.274863Z", - "start_time": "2020-05-16T01:37:07.067302Z" + "end_time": "2020-06-27T00:06:15.221357Z", + "start_time": "2020-06-27T00:06:14.703434Z" } }, "outputs": [], @@ -804,8 +787,8 @@ "execution_count": null, "metadata": { "ExecuteTime": { - "end_time": "2020-05-16T01:37:24.368273Z", - "start_time": "2020-05-16T01:37:24.066273Z" + "end_time": "2020-06-27T00:06:18.926471Z", + "start_time": "2020-06-27T00:06:18.687117Z" } }, "outputs": [], @@ -832,8 +815,8 @@ "execution_count": null, "metadata": { "ExecuteTime": { - "end_time": "2020-05-16T01:39:53.519748Z", - "start_time": "2020-05-16T01:39:53.048747Z" + "end_time": "2020-06-27T00:06:23.188010Z", + "start_time": "2020-06-27T00:06:22.814167Z" } }, "outputs": [], @@ -859,8 +842,8 @@ "execution_count": null, "metadata": { "ExecuteTime": { - "end_time": "2020-05-16T01:39:59.097697Z", - "start_time": "2020-05-16T01:39:58.876675Z" + "end_time": "2020-06-27T00:06:27.528551Z", + "start_time": "2020-06-27T00:06:27.470982Z" } }, "outputs": [], @@ -905,8 +888,8 @@ "execution_count": null, "metadata": { "ExecuteTime": { - "end_time": "2020-05-16T01:46:23.169433Z", - "start_time": "2020-05-16T01:46:23.150433Z" + "end_time": "2020-06-27T00:06:33.023802Z", + "start_time": "2020-06-27T00:06:33.019298Z" } }, "outputs": [], @@ -925,8 +908,8 @@ "execution_count": null, "metadata": { "ExecuteTime": { - "end_time": "2020-05-16T01:44:23.645211Z", - "start_time": "2020-05-16T01:44:20.392588Z" + "end_time": "2020-06-27T00:06:39.582431Z", + "start_time": "2020-06-27T00:06:37.614815Z" } }, "outputs": [], @@ -1022,8 +1005,8 @@ "execution_count": null, "metadata": { "ExecuteTime": { - "end_time": "2020-05-16T01:47:49.358082Z", - "start_time": "2020-05-16T01:47:14.392043Z" + "end_time": "2020-06-27T00:07:07.310221Z", + "start_time": "2020-06-27T00:07:07.124636Z" } }, "outputs": [], @@ -1083,18 +1066,6 @@ "# Appendices" ] }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Configuration\n", - "\n", - "### `msticpyconfig.yaml` configuration File\n", - "You can configure primary and secondary TI providers and any required parameters in the `msticpyconfig.yaml` file. This is read from the current directory or you can set an environment variable (`MSTICPYCONFIG`) pointing to its location.\n", - "\n", - "To configure this file see the [ConfigureNotebookEnvironment notebook](https://github.com/Azure/Azure-Sentinel-Notebooks/blob/master/ConfiguringNotebookEnvironment.ipynb)" - ] - }, { "cell_type": "markdown", "metadata": {}, @@ -1188,14 +1159,14 @@ "number_sections": false, "sideBar": true, "skip_h1_title": true, - "title_cell": "Table of Contents2", + "title_cell": "Table of Contents", "title_sidebar": "Contents", - "toc_cell": false, + "toc_cell": true, "toc_position": { "height": "calc(100% - 180px)", "left": "10px", "top": "150px", - "width": "351px" + "width": "230.17px" }, "toc_section_display": true, "toc_window_display": true diff --git a/Guided Investigation - Anomaly Lookup.ipynb b/Guided Investigation - Anomaly Lookup.ipynb index d00b9ff8..22c40a31 100644 --- a/Guided Investigation - Anomaly Lookup.ipynb +++ b/Guided Investigation - Anomaly Lookup.ipynb @@ -154,7 +154,7 @@ "nbconvert_exporter": "python", "name": "python", "pygments_lexer": "ipython3", - "version": "3.6.6", + "version": "3.6.7", "file_extension": ".py", "codemirror_mode": { "version": 3, diff --git a/Guided Investigation - MDATP Webshell Alerts.ipynb b/Guided Investigation - MDATP Webshell Alerts.ipynb index b60a8cda..ffde034e 100644 --- a/Guided Investigation - MDATP Webshell Alerts.ipynb +++ b/Guided Investigation - MDATP Webshell Alerts.ipynb @@ -75,7 +75,7 @@ "from IPython.display import display, HTML, Markdown\n", "\n", "REQ_PYTHON_VER=(3, 6)\n", - "REQ_MSTICPY_VER=(0, 5, 0)\n", + "REQ_MSTICPY_VER=(0, 6, 0)\n", "\n", "display(HTML(\"

Starting Notebook setup...

\"))\n", "if Path(\"./utils/nb_check.py\").is_file():\n", @@ -90,8 +90,11 @@ " importlib.reload(sys.modules[\"msticpy\"])\n", " else:\n", " import msticpy\n", - " check_mp_ver(MSTICPY_REQ_VERSION)\n", + " check_mp_ver(REQ_MSTICPY_VER)\n", " \n", + "\n", + "# If not using Azure Notebooks, install msticpy with\n", + "# !pip install msticpy\n", "from msticpy.nbtools import nbinit\n", "nbinit.init_notebook(\n", " namespace=globals(),\n", @@ -272,9 +275,7 @@ { "cell_type": "code", "execution_count": null, - "metadata": { - "scrolled": false - }, + "metadata": {}, "outputs": [], "source": [ "command_investigation_query = f'''\n", @@ -478,9 +479,7 @@ { "cell_type": "code", "execution_count": null, - "metadata": { - "scrolled": false - }, + "metadata": {}, "outputs": [], "source": [ "attackerip = iis_data['AttackerIP']\n", @@ -721,9 +720,9 @@ ], "metadata": { "kernelspec": { - "display_name": "webshell_investigation", + "display_name": "Python 3.6", "language": "python", - "name": "webshell_investigation" + "name": "python36" }, "language_info": { "codemirror_mode": { @@ -735,7 +734,14 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.8.2" + "version": "3.6.7" + }, + "widgets": { + "application/vnd.jupyter.widget-state+json": { + "state": {}, + "version_major": 2, + "version_minor": 0 + } } }, "nbformat": 4, diff --git a/Guided Investigation - Process-Alerts.ipynb b/Guided Investigation - Process-Alerts.ipynb index fdcf6362..64ab2c4a 100644 --- a/Guided Investigation - Process-Alerts.ipynb +++ b/Guided Investigation - Process-Alerts.ipynb @@ -5,7 +5,7 @@ "metadata": {}, "source": [ "# Alert Investigation - Windows Process Alerts\n", - "<details>\n", + "
\n", "  Details...\n", "**Notebook Version:** 1.1
\n", "\n", @@ -18,7 +18,7 @@ " - OTX (https://otx.alienvault.com/)\n", " - VirusTotal (https://www.virustotal.com/)\n", " - XForce (https://www.ibm.com/security/xforce)\n", - "</details>\n", + "
\n", "\n", "This notebook is intended for triage and investigation of security alerts related to process execution. It is specifically targeted at alerts triggered by suspicious process activity on Windows hosts. " ] @@ -73,12 +73,7 @@ { "cell_type": "code", "execution_count": null, - "metadata": { - "ExecuteTime": { - "end_time": "2020-05-16T02:10:54.830029Z", - "start_time": "2020-05-16T02:10:50.255047Z" - } - }, + "metadata": {}, "outputs": [], "source": [ "from pathlib import Path\n", @@ -88,7 +83,7 @@ "from IPython.display import display, HTML, Markdown\n", "\n", "REQ_PYTHON_VER=(3, 6)\n", - "REQ_MSTICPY_VER=(0, 5, 0)\n", + "REQ_MSTICPY_VER=(0, 6, 0)\n", "\n", "display(HTML(\"

Starting Notebook setup...

\"))\n", "if Path(\"./utils/nb_check.py\").is_file():\n", @@ -105,6 +100,9 @@ " import msticpy\n", " check_mp_ver(REQ_MSTICPY_VER)\n", " \n", + "\n", + "# If not using Azure Notebooks, install msticpy with\n", + "# !pip install msticpy\n", "from msticpy.nbtools import nbinit\n", "\n", "nbinit.init_notebook(namespace=globals());" @@ -116,7 +114,7 @@ "source": [ "[Contents](#toc)\n", "### Get WorkspaceId and Authenticate to Log Analytics \n", - "<details>\n", + "
\n", "  Details...\n", "If you are using user/device authentication, run the following cell. \n", "- Click the 'Copy code to clipboard and authenticate' button.\n", @@ -136,7 +134,7 @@ "Note: you may occasionally see a JavaScript error displayed at the end of the authentication - you can safely ignore this.
\n", "On successful authentication you should see a ```popup schema``` button.\n", "To find your Workspace Id go to [Log Analytics](https://ms.portal.azure.com/#blade/HubsExtension/Resources/resourceType/Microsoft.OperationalInsights%2Fworkspaces). Look at the workspace properties to find the ID.\n", - "</details>" + "
" ] }, { @@ -152,12 +150,7 @@ { "cell_type": "code", "execution_count": null, - "metadata": { - "ExecuteTime": { - "end_time": "2020-05-16T02:11:20.062429Z", - "start_time": "2020-05-16T02:10:57.597153Z" - } - }, + "metadata": {}, "outputs": [], "source": [ "# Authentication\n", @@ -180,12 +173,7 @@ { "cell_type": "code", "execution_count": null, - "metadata": { - "ExecuteTime": { - "end_time": "2020-05-16T02:11:21.408605Z", - "start_time": "2020-05-16T02:11:21.347607Z" - } - }, + "metadata": {}, "outputs": [], "source": [ "alert_q_times = nbwidgets.QueryTime(\n", @@ -197,12 +185,7 @@ { "cell_type": "code", "execution_count": null, - "metadata": { - "ExecuteTime": { - "end_time": "2020-05-16T02:14:07.827669Z", - "start_time": "2020-05-16T02:14:02.907988Z" - } - }, + "metadata": {}, "outputs": [], "source": [ "alert_list = qry_prov.SecurityAlert.list_alerts(\n", @@ -237,16 +220,11 @@ { "cell_type": "code", "execution_count": null, - "metadata": { - "ExecuteTime": { - "end_time": "2020-05-16T02:14:19.661858Z", - "start_time": "2020-05-16T02:14:19.614856Z" - } - }, + "metadata": {}, "outputs": [], "source": [ "get_alert = None\n", - "alert_select = nbwidgets.AlertSelector(alerts=alert_list, action=nbdisplay.display_alert)\n", + "alert_select = nbwidgets.SelectAlert(alerts=alert_list, action=nbdisplay.format_alert)\n", "alert_select.display()" ] }, @@ -266,12 +244,7 @@ { "cell_type": "code", "execution_count": null, - "metadata": { - "ExecuteTime": { - "end_time": "2020-05-16T02:14:38.285057Z", - "start_time": "2020-05-16T02:14:38.264058Z" - } - }, + "metadata": {}, "outputs": [], "source": [ "# Extract entities and properties into a SecurityAlert class\n", @@ -295,12 +268,7 @@ { "cell_type": "code", "execution_count": null, - "metadata": { - "ExecuteTime": { - "end_time": "2020-05-16T02:14:41.613471Z", - "start_time": "2020-05-16T02:14:41.251019Z" - } - }, + "metadata": {}, "outputs": [], "source": [ "# Draw the graph using Networkx/Matplotlib\n", @@ -327,12 +295,7 @@ { "cell_type": "code", "execution_count": null, - "metadata": { - "ExecuteTime": { - "end_time": "2020-05-16T02:14:46.099473Z", - "start_time": "2020-05-16T02:14:46.032476Z" - } - }, + "metadata": {}, "outputs": [], "source": [ "# set the origin time to the time of our alert\n", @@ -344,12 +307,7 @@ { "cell_type": "code", "execution_count": null, - "metadata": { - "ExecuteTime": { - "end_time": "2020-05-16T02:14:50.654284Z", - "start_time": "2020-05-16T02:14:48.146513Z" - } - }, + "metadata": {}, "outputs": [], "source": [ "if not security_alert.primary_host:\n", @@ -407,12 +365,7 @@ { "cell_type": "code", "execution_count": null, - "metadata": { - "ExecuteTime": { - "end_time": "2020-05-16T02:14:57.587177Z", - "start_time": "2020-05-16T02:14:57.110178Z" - } - }, + "metadata": {}, "outputs": [], "source": [ "# Draw a graph of this (add to entity graph)\n", @@ -438,23 +391,18 @@ { "cell_type": "code", "execution_count": null, - "metadata": { - "ExecuteTime": { - "end_time": "2020-05-16T02:15:03.233694Z", - "start_time": "2020-05-16T02:15:03.182692Z" - } - }, + "metadata": {}, "outputs": [], "source": [ "def disp_full_alert(alert):\n", " global related_alert\n", " related_alert = SecurityAlert(alert)\n", - " nbdisplay.display_alert(related_alert, show_entities=True)\n", + " return nbdisplay.format_alert(related_alert, show_entities=True)\n", "\n", "if related_alerts is not None and not related_alerts.empty:\n", " related_alerts['CompromisedEntity'] = related_alerts['Computer']\n", " print('Selected alert is available as \\'related_alert\\' variable.')\n", - " rel_alert_select = nbwidgets.AlertSelector(alerts=related_alerts, action=disp_full_alert)\n", + " rel_alert_select = nbwidgets.SelectAlert(alerts=related_alerts, action=disp_full_alert)\n", " rel_alert_select.display()\n", "else:\n", " md('No related alerts found.', styles=[\"bold\",\"green\"])" @@ -493,13 +441,7 @@ { "cell_type": "code", "execution_count": null, - "metadata": { - "ExecuteTime": { - "end_time": "2020-05-16T02:15:11.683710Z", - "start_time": "2020-05-16T02:15:11.632710Z" - }, - "scrolled": true - }, + "metadata": {}, "outputs": [], "source": [ "# set the origin time to the time of our alert\n", @@ -510,12 +452,7 @@ { "cell_type": "code", "execution_count": null, - "metadata": { - "ExecuteTime": { - "end_time": "2020-05-16T02:18:46.782959Z", - "start_time": "2020-05-16T02:18:43.483058Z" - } - }, + "metadata": {}, "outputs": [], "source": [ "if security_alert.data_family.name != \"WindowsSecurity\":\n", @@ -588,13 +525,7 @@ { "cell_type": "code", "execution_count": null, - "metadata": { - "ExecuteTime": { - "end_time": "2020-05-16T02:19:16.391952Z", - "start_time": "2020-05-16T02:19:16.269953Z" - }, - "scrolled": true - }, + "metadata": {}, "outputs": [], "source": [ "# Show timeline of events\n", @@ -636,12 +567,7 @@ { "cell_type": "code", "execution_count": null, - "metadata": { - "ExecuteTime": { - "end_time": "2020-05-16T02:19:46.286319Z", - "start_time": "2020-05-16T02:19:44.261867Z" - } - }, + "metadata": {}, "outputs": [], "source": [ "from msticpy.sectools.eventcluster import dbcluster_events, add_process_features\n", @@ -700,12 +626,7 @@ { "cell_type": "code", "execution_count": null, - "metadata": { - "ExecuteTime": { - "end_time": "2020-05-16T02:19:52.943265Z", - "start_time": "2020-05-16T02:19:51.149746Z" - } - }, + "metadata": {}, "outputs": [], "source": [ "# Looking at the variability of commandlines and process image paths\n", @@ -727,13 +648,7 @@ { "cell_type": "code", "execution_count": null, - "metadata": { - "ExecuteTime": { - "end_time": "2020-05-16T02:20:00.562015Z", - "start_time": "2020-05-16T02:19:57.711493Z" - }, - "scrolled": true - }, + "metadata": {}, "outputs": [], "source": [ "if 'clus_events' in locals() and not clus_events.empty:\n", @@ -748,12 +663,7 @@ { "cell_type": "code", "execution_count": null, - "metadata": { - "ExecuteTime": { - "end_time": "2020-05-16T02:20:03.856018Z", - "start_time": "2020-05-16T02:20:03.843018Z" - } - }, + "metadata": {}, "outputs": [], "source": [ "# Look at clusters for individual process names\n", @@ -777,12 +687,7 @@ { "cell_type": "code", "execution_count": null, - "metadata": { - "ExecuteTime": { - "end_time": "2020-05-16T02:20:06.531933Z", - "start_time": "2020-05-16T02:20:06.352934Z" - } - }, + "metadata": {}, "outputs": [], "source": [ "# Show timeline of events - clustered events\n", @@ -807,12 +712,7 @@ { "cell_type": "code", "execution_count": null, - "metadata": { - "ExecuteTime": { - "end_time": "2020-05-16T02:20:12.962663Z", - "start_time": "2020-05-16T02:20:11.763218Z" - } - }, + "metadata": {}, "outputs": [], "source": [ "process = security_alert.primary_process\n", @@ -848,12 +748,7 @@ { "cell_type": "code", "execution_count": null, - "metadata": { - "ExecuteTime": { - "end_time": "2020-05-16T02:20:15.419954Z", - "start_time": "2020-05-16T02:20:15.359955Z" - } - }, + "metadata": {}, "outputs": [], "source": [ "ioc_extractor = IoCExtract()\n", @@ -897,12 +792,7 @@ { "cell_type": "code", "execution_count": null, - "metadata": { - "ExecuteTime": { - "end_time": "2020-05-16T02:20:19.183556Z", - "start_time": "2020-05-16T02:20:19.165556Z" - } - }, + "metadata": {}, "outputs": [], "source": [ "if source_processes is not None:\n", @@ -945,12 +835,7 @@ { "cell_type": "code", "execution_count": null, - "metadata": { - "ExecuteTime": { - "end_time": "2020-05-16T02:20:32.758293Z", - "start_time": "2020-05-16T02:20:22.825297Z" - } - }, + "metadata": {}, "outputs": [], "source": [ "from msticpy.sectools.tiproviders.ti_provider_base import TISeverity\n", @@ -984,13 +869,7 @@ { "cell_type": "code", "execution_count": null, - "metadata": { - "ExecuteTime": { - "end_time": "2020-05-16T02:21:17.223752Z", - "start_time": "2020-05-16T02:21:17.162753Z" - }, - "scrolled": true - }, + "metadata": {}, "outputs": [], "source": [ "# set the origin time to the time of our alert\n", @@ -1003,12 +882,7 @@ { "cell_type": "code", "execution_count": null, - "metadata": { - "ExecuteTime": { - "end_time": "2020-05-16T02:21:22.127025Z", - "start_time": "2020-05-16T02:21:19.300028Z" - } - }, + "metadata": {}, "outputs": [], "source": [ "# This query needs a commandline parameter which isn't supplied\n", @@ -1067,13 +941,7 @@ { "cell_type": "code", "execution_count": null, - "metadata": { - "ExecuteTime": { - "end_time": "2020-05-16T02:21:24.995564Z", - "start_time": "2020-05-16T02:21:24.938564Z" - }, - "scrolled": true - }, + "metadata": {}, "outputs": [], "source": [ "# set the origin time to the time of our alert\n", @@ -1097,12 +965,7 @@ { "cell_type": "code", "execution_count": null, - "metadata": { - "ExecuteTime": { - "end_time": "2020-05-16T02:21:30.636724Z", - "start_time": "2020-05-16T02:21:28.813548Z" - } - }, + "metadata": {}, "outputs": [], "source": [ "logon_id = security_alert.get_logon_id()\n", @@ -1131,12 +994,7 @@ { "cell_type": "code", "execution_count": null, - "metadata": { - "ExecuteTime": { - "end_time": "2020-05-16T02:21:33.574131Z", - "start_time": "2020-05-16T02:21:31.528726Z" - } - }, + "metadata": {}, "outputs": [], "source": [ "from msticpy.sectools.eventcluster import dbcluster_events, add_process_features, _string_score\n", @@ -1176,76 +1034,24 @@ { "cell_type": "code", "execution_count": null, - "metadata": { - "ExecuteTime": { - "end_time": "2020-05-16T02:21:39.050530Z", - "start_time": "2020-05-16T02:21:39.008528Z" - } - }, + "metadata": {}, "outputs": [], "source": [ - "select_logon_type = widgets.Select(options=nbdisplay._WIN_LOGON_TYPE_MAP.values(), layout=widgets.Layout(height=\"200px\"))\n", - "num_items = widgets.IntSlider(min=1, max=200, value=10, description=\"# logons\")\n", - "df_output1 = widgets.Output()\n", - "df_output2 = widgets.Output()\n", - "\n", - "def display_logons(host_logons, order_column, number_shown, output, title, ascending=True):\n", - " pivot_df = (\n", - " host_logons[[\"Account\", \"LogonType\", \"EventID\"]]\n", - " .astype({'LogonType': 'int32'})\n", - " .merge(right=pd.Series(data=nbdisplay._WIN_LOGON_TYPE_MAP, name=\"LogonTypeDesc\"),\n", - " left_on=\"LogonType\", right_index=True)\n", - " .drop(columns=\"LogonType\")\n", - " .groupby([\"Account\", \"LogonTypeDesc\"])\n", - " .count()\n", - " .unstack()\n", - " .rename(columns={\"EventID\": \"LogonCount\"})\n", - " .fillna(0)\n", - " )\n", - " \n", - " with output:\n", - " if ('LogonCount', order_column) in pivot_df.columns:\n", - " md(title)\n", - " display(\n", - " pivot_df\n", - " [pivot_df[(\"LogonCount\", order_column)] > 0]\n", - " .sort_values((\"LogonCount\", order_column), ascending=ascending)\n", - " .head(number_shown)\n", - " .style\n", - " .background_gradient(cmap=\"viridis\", low=0.5, high=0)\n", - " .format(\"{0:0>3.0f}\")\n", - " )\n", - " else:\n", - " md(f\"No logons of type {order_column}\")\n", - "\n", - " \n", - "def show_logons(evt):\n", - " del evt\n", - " logon_type = select_logon_type.value\n", - " list_size = num_items.value\n", - " df_output1.clear_output()\n", - " df_output2.clear_output()\n", - " display_logons(\n", - " host_logons,\n", - " order_column=logon_type, \n", - " number_shown=list_size, \n", - " output=df_output1, \n", - " title=\"Most Frequent Logons\",\n", - " ascending=False,\n", - " )\n", - " display_logons(\n", - " host_logons,\n", - " order_column=logon_type, \n", - " number_shown=list_size, \n", - " output=df_output2, \n", - " title=\"Rarest Logons\",\n", - " ascending=True\n", - " )\n", - " \n", - "select_logon_type.observe(show_logons, names=\"value\")\n", - "ctrls = widgets.HBox([select_logon_type, num_items])\n", - "outputs = widgets.HBox([df_output1, df_output2])\n", - "display(widgets.VBox([ctrls, outputs]))" + "display(\n", + " host_logons[[\"Account\", \"LogonType\", \"EventID\"]]\n", + " .astype({'LogonType': 'int32'})\n", + " .merge(right=pd.Series(data=nbdisplay._WIN_LOGON_TYPE_MAP, name=\"LogonTypeDesc\"),\n", + " left_on=\"LogonType\", right_index=True)\n", + " .drop(columns=\"LogonType\")\n", + " .groupby([\"Account\", \"LogonTypeDesc\"])\n", + " .count()\n", + " .unstack()\n", + " .rename(columns={\"EventID\": \"LogonCount\"})\n", + " .fillna(0)\n", + " .style\n", + " .background_gradient(cmap=\"viridis\", low=0.5, high=0)\n", + " .format(\"{0:0>3.0f}\")\n", + ")" ] }, { @@ -1262,12 +1068,7 @@ "cell_type": "code", "execution_count": null, "metadata": { - "ExecuteTime": { - "end_time": "2020-05-16T02:21:52.253650Z", - "start_time": "2020-05-16T02:21:50.510974Z" - }, - "hidden": true, - "scrolled": true + "hidden": true }, "outputs": [], "source": [ @@ -1282,9 +1083,20 @@ " \n", "\n", "if failedLogons is None or failedLogons.empty:\n", - " md(f'No logon failures recorded for this host between {security_alert.StartTimeUtc} and {security_alert.EndTimeUtc}', styles=[\"bold\",\"blue\"])\n", + " md(\n", + " f\"\"\"\n", + " No logon failures recorded for this host between\n", + " {security_alert.StartTimeUtc} and {security_alert.EndTimeUtc}.\n", + " \"\"\",\n", + " styles=[\"bold\",\"blue\"]\n", + " )\n", "else:\n", - " md('Failed Logons observed for the host:')\n", + " md(\n", + " f\"\"\"Failed Logons observed for the host between \n", + " {security_alert.StartTimeUtc} and {security_alert.EndTimeUtc} :\n", + " \"\"\",\n", + " styles=[\"bold\"]\n", + " )\n", " display(failedLogons)" ] }, @@ -1301,12 +1113,7 @@ { "cell_type": "code", "execution_count": null, - "metadata": { - "ExecuteTime": { - "end_time": "2020-02-13T20:45:29.434121Z", - "start_time": "2020-02-13T20:45:29.427150Z" - } - }, + "metadata": {}, "outputs": [], "source": [ "print('List of current DataFrames in Notebook')\n", @@ -1363,11 +1170,11 @@ ], "metadata": { "hide_input": false, + "history": [], "kernel_info": { "name": "python3" }, "kernelspec": { - "display_name": "Python 3.6", "language": "python", "name": "python36" }, @@ -1381,7 +1188,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.6.7" + "version": "3.6.10" }, "latex_envs": { "LaTeX_envs_menu_present": true, @@ -1422,6 +1229,7 @@ "toc_section_display": true, "toc_window_display": true }, + "uuid": "3e8f75f2-2a40-4170-b5f5-c5cec2a483c0", "varInspector": { "cols": { "lenName": 16, @@ -1457,13 +1265,6 @@ "_Feature" ], "window_display": false - }, - "widgets": { - "application/vnd.jupyter.widget-state+json": { - "state": {}, - "version_major": 2, - "version_minor": 0 - } } }, "nbformat": 4, diff --git a/Guided Triage - Alerts.ipynb b/Guided Triage - Alerts.ipynb index d7934ba0..e2e84870 100644 --- a/Guided Triage - Alerts.ipynb +++ b/Guided Triage - Alerts.ipynb @@ -7,16 +7,16 @@ }, "source": [ "

Table of Contents

\n", - "" + "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "# Example Alert Triage Notebook\n", + "# Alert Triage Notebook\n", "\n", - "**Notebook Version:** 1.0
\n", + "**Notebook Version:** 1.1
\n", "**Python Version:** Python 3.6 (including Python 3.6 - AzureML)
\n", "**Data Sources Required:** SecurityAlerts
\n", "\n", @@ -32,9 +32,26 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "### Notebook Setup\n", + "---\n", + "### Notebook initialization\n", + "The next cell:\n", + "- Checks for the correct Python version\n", + "- Checks versions and optionally installs required packages\n", + "- Imports the required packages into the notebook\n", + "- Sets a number of configuration options.\n", "\n", - "If this is your first time running this Notebook please run the cell below before proceeding to ensure you have the required packages installed correctly. " + "This should complete without errors. If you encounter errors or warnings look at the following two notebooks:\n", + "- [TroubleShootingNotebooks](https://github.com/Azure/Azure-Sentinel-Notebooks/blob/master/TroubleShootingNotebooks.ipynb)\n", + "- [ConfiguringNotebookEnvironment](https://github.com/Azure/Azure-Sentinel-Notebooks/blob/master/ConfiguringNotebookEnvironment.ipynb)\n", + "\n", + "If you are running in the Azure Sentinel Notebooks environment (Azure Notebooks or Azure ML) you can run live versions of these notebooks:\n", + "- [Run TroubleShootingNotebooks](./TroubleShootingNotebooks.ipynb)\n", + "- [Run ConfiguringNotebookEnvironment](./ConfiguringNotebookEnvironment.ipynb)\n", + "\n", + "You may also need to do some additional configuration to successfully use functions such as Threat Intelligence service lookup and Geo IP lookup. \n", + "There are more details about this in the `ConfiguringNotebookEnvironment` notebook and in these documents:\n", + "- [msticpy configuration](https://msticpy.readthedocs.io/en/latest/getting_started/msticpyconfig.html)\n", + "- [Threat intelligence provider configuration](https://msticpy.readthedocs.io/en/latest/data_acquisition/TIProviders.html#configuration-file)" ] }, { @@ -48,27 +65,54 @@ }, "outputs": [], "source": [ - "#Check that the Notebook kernel is Pytyhon 3.6\n", + "from pathlib import Path\n", + "import os\n", "import sys\n", - "MIN_REQ_PYTHON = (3,6)\n", - "if sys.version_info < MIN_REQ_PYTHON:\n", - " print('Check the Kernel->Change Kernel menu and ensure that Python 3.6')\n", - " print('or later is selected as the active kernel.')\n", - " sys.exit(\"Python %s.%s or later is required.\\n\" % MIN_REQ_PYTHON)\n", + "import warnings\n", + "from IPython.display import display, HTML, Markdown\n", + "\n", + "REQ_PYTHON_VER=(3, 6)\n", + "REQ_MSTICPY_VER=(0, 6, 0)\n", + "\n", + "display(HTML(\"

Starting Notebook setup...

\"))\n", + "if Path(\"./utils/nb_check.py\").is_file():\n", + " from utils.nb_check import check_python_ver, check_mp_ver\n", + " check_python_ver(min_py_ver=REQ_PYTHON_VER)\n", + " try:\n", + " check_mp_ver(min_msticpy_ver=REQ_MSTICPY_VER)\n", + " except ImportError:\n", + " !pip install --upgrade msticpy\n", + " if \"msticpy\" in sys.modules:\n", + " importlib.reload(sys.modules[\"msticpy\"])\n", + " else:\n", + " import msticpy\n", + " check_mp_ver(REQ_MSTICPY_VER)\n", + " \n", + "\n", + "# If not using Azure Notebooks, install msticpy with\n", + "# !pip install msticpy\n", + "from msticpy.nbtools import nbinit\n", + "extra_imports = [\n", + " \"whois\", \n", + " \"datetime,,dt\",\n", + " \"msticpy.nbtools.foliummap, get_center_ip_entities\",\n", + " \"msticpy.nbtools.observationlist, Observations\",\n", + " \"msticpy.nbtools.observationlist, Observation\",\n", + " \"msticpy.sectools.ip_utils, convert_to_ip_entities\"\n", + "]\n", + "nbinit.init_notebook(\n", + " namespace=globals(),\n", + " additional_packages=[\"IPWhois\", \"tldextract\", \"python-whois\"],\n", + " extra_imports=extra_imports,\n", "\n", - "# Install required packages for this Notebook\n", - "!pip install msticpy --upgrade --user\n", - "!pip install python-whois --user --upgrade\n", - "!pip install tqdm --upgrade --user\n", - "!pip install IPWhois --upgrade --user\n", - "!pip install tldextract --upgrade --user" + "pd.options.mode.chained_assignment = None" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "Import the required packages and initialize a set of required entities and properties:" + "Initialize TI and Observation list" ] }, { @@ -83,33 +127,10 @@ }, "outputs": [], "source": [ - "#Import required packages\n", - "print('Importing python packages....')\n", - "import whois\n", - "import numpy as np\n", - "import datetime as dt\n", - "import ipywidgets as widgets\n", - "import pandas as pd\n", - "print('Importing msticpy packages...')\n", - "from msticpy.sectools import *\n", - "from msticpy.nbtools import *\n", - "pd.set_option('display.max_rows', 100)\n", - "pd.set_option('display.max_columns', 50)\n", - "pd.set_option('display.max_colwidth', 100)\n", - "%env KQLMAGIC_LOAD_MODE=silent\n", - "WIDGET_DEFAULTS = {'layout': widgets.Layout(width=\"900px\"),\n", - " 'style': {'description_width': 'initial'}}\n", - "from msticpy.nbtools.foliummap import FoliumMap, get_center_ip_entities\n", - "from msticpy.nbtools.observationlist import Observations, Observation\n", + "# Initialize observations and TI modules\n", "summary = Observations()\n", - "from msticpy.data.data_providers import QueryProvider\n", "ti = TILookup()\n", - "from msticpy.nbtools.utility import md, md_warn\n", - "pd.options.mode.chained_assignment = None\n", - "from msticpy.nbtools.wsconfig import WorkspaceConfig\n", - "from msticpy.sectools.ip_utils import convert_to_ip_entities\n", - "\n", - "print('Imports complete')" + "print('Observation summary and TILookup loaded.')" ] }, { @@ -425,9 +446,10 @@ "#Display full alert details when selected\n", "def show_full_alert(selected_alert):\n", " global security_alert, alert_ip_entities\n", + " output = []\n", " security_alert = SecurityAlert(\n", " rel_alert_select.selected_alert)\n", - " nbdisplay.display_alert(security_alert, show_entities=True) \n", + " output.append(nbdisplay.format_alert(security_alert, show_entities=True))\n", " ioc_list = []\n", " if security_alert['Entities'] is not None:\n", " for entity in security_alert['Entities']:\n", @@ -437,7 +459,7 @@ " ioc_list.append(entity['Url'])\n", " if len(ioc_list) > 0:\n", " ti_data = ti.lookup_iocs(data=ioc_list, providers=ti_prov_use)\n", - " display(ti_data[['Ioc','IocType','Provider','Result','Severity','Details']].reset_index().style.applymap(color_cells).hide_index())\n", + " output.append(ti_data[['Ioc','IocType','Provider','Result','Severity','Details']].reset_index().style.applymap(color_cells).hide_index())\n", " ti_ips = ti_data[ti_data['IocType'] == 'ipv4']\n", " # If we have IP entities try and plot these on a map\n", " if not ti_ips.empty:\n", @@ -445,18 +467,21 @@ " center = get_center_ip_entities(ip_ents)\n", " ip_map = FoliumMap(location=center, zoom_start=4)\n", " ip_map.add_ip_cluster(ip_ents, color='red')\n", - " display(ip_map)\n", + " output.append(ip_map)\n", + " else:\n", + " output.append(\"\")\n", " else:\n", - " md(\"No IoCs\")\n", + " output.append(\"No IoCs\")\n", " else:\n", - " md(\"No IoCs\")\n", + " output.append(\"No Entities with IoCs\")\n", + " return output\n", " \n", "# Show selected alert when selected\n", "if isinstance(alerts, pd.DataFrame) and not alerts.empty:\n", " ti_data = None\n", " md('Click on alert to view details.', \"large\")\n", - " rel_alert_select = nbwidgets.AlertSelector(alerts=selected_alert_type,\n", - " action=show_full_alert)\n", + " rel_alert_select = nbwidgets.SelectAlert(alerts=selected_alert_type,\n", + " action=show_full_alert)\n", " rel_alert_select.display()\n", " # Add alert details to summary.\n", " if ti_data is not None:\n", @@ -609,7 +634,7 @@ ], "metadata": { "hide_input": false, -"kernelspec": { + "kernelspec": { "display_name": "Python 3.6", "language": "python", "name": "python36" diff --git a/Notebook Template.ipynb b/Notebook Template.ipynb index d7eb9da1..4eb9bd7a 100644 --- a/Notebook Template.ipynb +++ b/Notebook Template.ipynb @@ -63,12 +63,7 @@ { "cell_type": "code", "execution_count": null, - "metadata": { - "ExecuteTime": { - "end_time": "2020-05-16T02:00:38.505687Z", - "start_time": "2020-05-16T02:00:31.727307Z" - } - }, + "metadata": {}, "outputs": [], "source": [ "from pathlib import Path\n", @@ -78,7 +73,7 @@ "from IPython.display import display, HTML, Markdown\n", "\n", "REQ_PYTHON_VER=(3, 6)\n", - "REQ_MSTICPY_VER=(0, 5, 0)\n", + "REQ_MSTICPY_VER=(0, 6, 0)\n", "\n", "display(HTML(\"

Starting Notebook setup...

\"))\n", "if Path(\"./utils/nb_check.py\").is_file():\n", @@ -95,6 +90,9 @@ " import msticpy\n", " check_mp_ver(REQ_MSTICPY_VER)\n", " \n", + "\n", + "# If not using Azure Notebooks, install msticpy with\n", + "# !pip install msticpy\n", "from msticpy.nbtools import nbinit\n", "nbinit.init_notebook(\n", " namespace=globals(),\n", @@ -108,7 +106,7 @@ "source": [ "[Contents](#toc)\n", "### Get WorkspaceId and Authenticate to Log Analytics \n", - "<details>\n", + "
\n", " Details...\n", "If you are using user/device authentication, run the following cell. \n", "- Click the 'Copy code to clipboard and authenticate' button.\n", @@ -128,7 +126,7 @@ "Note: you may occasionally see a JavaScript error displayed at the end of the authentication - you can safely ignore this.
\n", "On successful authentication you should see a ```popup schema``` button.\n", "To find your Workspace Id go to [Log Analytics](https://ms.portal.azure.com/#blade/HubsExtension/Resources/resourceType/Microsoft.OperationalInsights%2Fworkspaces). Look at the workspace properties to find the ID.\n", - "</details>" + "
" ] }, { @@ -148,12 +146,7 @@ { "cell_type": "code", "execution_count": null, - "metadata": { - "ExecuteTime": { - "end_time": "2020-05-16T02:01:11.022700Z", - "start_time": "2020-05-16T02:00:49.394760Z" - } - }, + "metadata": {}, "outputs": [], "source": [ "# Authentication\n", @@ -166,12 +159,7 @@ { "cell_type": "code", "execution_count": null, - "metadata": { - "ExecuteTime": { - "end_time": "2020-05-16T02:03:12.112983Z", - "start_time": "2020-05-16T02:03:12.055984Z" - } - }, + "metadata": {}, "outputs": [], "source": [ "query_scope = nbwidgets.QueryTime(auto_display=True)" @@ -187,12 +175,7 @@ { "cell_type": "code", "execution_count": null, - "metadata": { - "ExecuteTime": { - "end_time": "2020-05-16T02:03:25.227614Z", - "start_time": "2020-05-16T02:03:21.291120Z" - } - }, + "metadata": {}, "outputs": [], "source": [ "qry_prov.SecurityAlert.list_alerts(query_scope)" @@ -277,13 +260,6 @@ "_Feature" ], "window_display": false - }, - "widgets": { - "application/vnd.jupyter.widget-state+json": { - "state": {}, - "version_major": 2, - "version_minor": 0 - } } }, "nbformat": 4, diff --git a/Sample-Notebooks/Example - Guided Hunting - Office365-Exploring.ipynb b/Sample-Notebooks/Example - Guided Hunting - Office365-Exploring.ipynb index 07b38b3f..2fae3078 100644 --- a/Sample-Notebooks/Example - Guided Hunting - Office365-Exploring.ipynb +++ b/Sample-Notebooks/Example - Guided Hunting - Office365-Exploring.ipynb @@ -7721,7 +7721,7 @@ "source": [ "distinct_users = office_ops_df[['UserId']].sort_values('UserId')['UserId'].str.lower().drop_duplicates().tolist()\n", "distinct_users\n", - "user_select = mas.SelectString(description='Select User Id', item_list=distinct_users, auto_display=True)\n", + "user_select = mas.SelectItem(description='Select User Id', item_list=distinct_users, auto_display=True)\n", " # (items=distinct_users)" ] }, diff --git a/Sample-Notebooks/Example - Guided Investigation - Process-Alerts.ipynb b/Sample-Notebooks/Example - Guided Investigation - Process-Alerts.ipynb index 8e94e5f3..23b1ea93 100644 --- a/Sample-Notebooks/Example - Guided Investigation - Process-Alerts.ipynb +++ b/Sample-Notebooks/Example - Guided Investigation - Process-Alerts.ipynb @@ -1074,7 +1074,7 @@ } ], "source": [ - "alert_select = mas.AlertSelector(alerts=alert_list, action=nbdisp.display_alert)\n", + "alert_select = mas.SelectAlert(alerts=alert_list, action=nbdisp.display_alert)\n", "alert_select.display()" ] }, @@ -1989,7 +1989,7 @@ "if related_alerts is not None and not related_alerts.empty:\n", " related_alerts['CompromisedEntity'] = related_alerts['Computer']\n", " print('Selected alert is available as \\'related_alert\\' variable.')\n", - " rel_alert_select = mas.AlertSelector(alerts=related_alerts, action=disp_full_alert)\n", + " rel_alert_select = mas.SelectAlert(alerts=related_alerts, action=disp_full_alert)\n", " rel_alert_select.display()\n", "else:\n", " display(Markdown('No related alerts found.'))" diff --git a/Sample-Notebooks/Example - Step-by-Step Linux-Windows-Office Investigation.ipynb b/Sample-Notebooks/Example - Step-by-Step Linux-Windows-Office Investigation.ipynb index abc0f263..ae1bfe0c 100644 --- a/Sample-Notebooks/Example - Step-by-Step Linux-Windows-Office Investigation.ipynb +++ b/Sample-Notebooks/Example - Step-by-Step Linux-Windows-Office Investigation.ipynb @@ -1244,7 +1244,7 @@ " global security_alert\n", " security_alert = nbtools.SecurityAlert(alert_select.selected_alert)\n", " nbtools.disp.display_alert(security_alert, show_entities=True)\n", - "alert_select = nbtools.AlertSelector(alerts=alert_list, action=show_full_alert)\n", + "alert_select = nbtools.SelectAlert(alerts=alert_list, action=show_full_alert)\n", "alert_select.display()" ] }, @@ -8635,7 +8635,7 @@ " return host_logons.query('TargetUserName == @acct and LogonType == @logon_type'\n", " ' and TargetLogonId == @logon_id')\n", " \n", - "logon_wgt = nbtools.SelectString(description='Select logon cluster to examine', \n", + "logon_wgt = nbtools.SelectItem(description='Select logon cluster to examine', \n", " item_list=items, height='200px', width='100%', auto_display=True)" ] }, @@ -11874,4 +11874,4 @@ }, "nbformat": 4, "nbformat_minor": 4 -} +} \ No newline at end of file diff --git a/SigmaRuleImporter.ipynb b/SigmaRuleImporter.ipynb index 3e0ac4bd..2864c332 100644 --- a/SigmaRuleImporter.ipynb +++ b/SigmaRuleImporter.ipynb @@ -1,589 +1,644 @@ { - "cells": [ - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Import and convert Neo23x0 Sigma scripts\n", - "ianhelle@microsoft.com\n", - "\n", - "This notebook is a is a quick and dirty Sigma to Log Analytics converter.\n", - "It uses the modules from sigmac package to do the conversion.\n", - "\n", - "Only a subset of the Sigma rules are convertible currently. Failure to convert\n", - "could be for one or more of these reasons:\n", - "- known limitations of the converter\n", - "- mismatch between the syntax expressible in Sigma and KQL\n", - "- data sources referenced in Sigma rules do not yet exist in Azure Sentinel\n", - "\n", - "The sigmac tool is downloadable as a package from PyPi but since we are downloading\n", - "the rules from the repo, we also copy and import the package from the repo source.\n", - "\n", - "After conversion you can use an interactive browser to step through the rules and\n", - "view (and copy/save) the KQL equivalents. You can also take the conversion results and \n", - "use them in another way (e.g.bulk save to files).\n", - "\n", - "The notebook is all somewhat experimental and offered as-is without any guarantees" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Download and unzip the Sigma repo" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "import requests\n", - "# Download the repo ZIP\n", - "sigma_git_url = 'https://github.com/Neo23x0/sigma/archive/master.zip'\n", - "r = requests.get(sigma_git_url)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from ipywidgets import widgets, Layout\n", - "import os\n", - "from pathlib import Path\n", - "def_path = Path.joinpath(Path(os.getcwd()), \"sigma\")\n", - "path_wgt = widgets.Text(value=str(def_path), \n", - " description='Path to extract to zipped repo files: ', \n", - " layout=Layout(width='50%'),\n", - " style={'description_width': 'initial'})\n", - "path_wgt" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "import zipfile\n", - "import io\n", - "repo_zip = io.BytesIO(r.content)\n", - "\n", - "zip_archive = zipfile.ZipFile(repo_zip, mode='r')\n", - "zip_archive.extractall(path=path_wgt.value)\n", - "RULES_REL_PATH = 'sigma-master/rules'\n", - "rules_root = Path(path_wgt.value) / RULES_REL_PATH" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Check that we have the files\n", - "You should see a folder with folders such as application, apt, windows..." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "scrolled": true - }, - "outputs": [], - "source": [ - "%ls {rules_root}" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Convert Sigma Files to Log Analytics Kql queries" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "scrolled": false - }, - "outputs": [], - "source": [ - "# Read the Sigma YAML file paths into a dict and make a\n", - "# a copy for the target Kql queries\n", - "from pathlib import Path\n", - "from collections import defaultdict\n", - "import copy\n", - "\n", - "def get_rule_files(rules_root):\n", - " file_dict = defaultdict(dict)\n", - " for file in Path(rules_root).resolve().rglob(\"*.yml\"):\n", - " rel_path = Path(file).relative_to(rules_root)\n", - " path_key = '.'.join(rel_path.parent.parts)\n", - " file_dict[path_key][rel_path.name] = file\n", - " return file_dict\n", - " \n", - "sigma_dict = get_rule_files(rules_root)\n", - "kql_dict = copy.deepcopy(sigma_dict)\n" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Add downloaded sigmac tool to sys.path and import Sigmac functions\n", - "import os\n", - "import sys\n", - "module_path = os.path.abspath(os.path.join('sigma/sigma-master/tools'))\n", - "if module_path not in sys.path:\n", - " sys.path.append(module_path)\n", - "from sigma.parser.collection import SigmaCollectionParser\n", - "from sigma.parser.exceptions import SigmaCollectionParseError, SigmaParseError\n", - "from sigma.configuration import SigmaConfiguration, SigmaConfigurationChain\n", - "from sigma.config.exceptions import SigmaConfigParseError, SigmaRuleFilterParseException\n", - "from sigma.filter import SigmaRuleFilter\n", - "import sigma.backends.discovery as backends\n", - "from sigma.backends.base import BackendOptions\n", - "from sigma.backends.exceptions import BackendError, NotSupportedError, PartialMatchError, FullMatchError" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "scrolled": false - }, - "outputs": [], - "source": [ - "# Sigma to Log Analytics Conversion\n", - "import yaml\n", - "_LA_MAPPINGS = '''\n", - "fieldmappings:\n", - " Image: NewProcessName\n", - " ParentImage: ParentProcessName\n", - " ParentCommandLine: NO_MAPPING\n", - "'''\n", - "\n", - "NOT_CONVERTIBLE = 'Not convertible'\n", - "\n", - "def sigma_to_la(file_path):\n", - " with open(file_path, 'r') as input_file:\n", - " try:\n", - " sigmaconfigs = SigmaConfigurationChain()\n", - " sigmaconfig = SigmaConfiguration(_LA_MAPPINGS)\n", - " sigmaconfigs.append(sigmaconfig)\n", - " backend_options = BackendOptions(None, None)\n", - " backend = backends.getBackend('ala')(sigmaconfigs, backend_options)\n", - " parser = SigmaCollectionParser(input_file, sigmaconfigs, None)\n", - " results = parser.generate(backend)\n", - " kql_result = ''\n", - " for result in results:\n", - " kql_result += result\n", - " except (NotImplementedError, NotSupportedError):\n", - " kql_result = NOT_CONVERTIBLE\n", - " input_file.seek(0,0)\n", - " sigma_txt = input_file.read()\n", - " if not kql_result == NOT_CONVERTIBLE:\n", - " try:\n", - " kql_header = \"\\n\".join(get_sigma_properties(sigma_txt))\n", - " kql_result = kql_header + \"\\n\" + kql_result\n", - " except Exception as e:\n", - " print(\"exception reading sigma YAML: \", e)\n", - " print(sigma_txt, kql_result, sep='\\n')\n", - " return sigma_txt, kql_result\n", - "\n", - "sigma_keys = ['title', 'description', 'tags', 'status', \n", - " 'author', 'logsource', 'falsepositives', 'level']\n", - "\n", - "def get_sigma_properties(sigma_rule):\n", - " sigma_docs = yaml.load_all(sigma_rule, Loader=yaml.SafeLoader)\n", - " sigma_rule_dict = next(sigma_docs)\n", - " for prop in sigma_keys:\n", - " yield get_property(prop, sigma_rule_dict)\n", - "\n", - "def get_property(name, sigma_rule_dict):\n", - " sig_prop = sigma_rule_dict.get(name, 'na')\n", - " if isinstance(sig_prop, dict):\n", - " sig_prop = ' '.join([f\"{k}: {v}\" for k, v in sig_prop.items()])\n", - " return f\"// {name}: {sig_prop}\"\n", - " \n", - " \n", - "_KQL_FILTERS = {\n", - " 'date': ' | where TimeGenerated >= datetime({start}) and TimeGenerated <= datetime({end}) ',\n", - " 'host': ' | where Computer has {host_name} '\n", - "}\n", - "\n", - "def insert_at(source, insert, find_sub):\n", - " pos = source.find(find_sub)\n", - " if pos != -1:\n", - " return source[:pos] + insert + source[pos:]\n", - " else:\n", - " return source + insert\n", - " \n", - "def add_filter_clauses(source, **kwargs):\n", - " if \"{\" in source or \"}\" in source:\n", - " source = (\"// Warning: embedded braces in source. Please edit if necessary.\\n\"\n", - " + source)\n", - " source = source.replace('{', '{{').replace('}', '}}')\n", - " if kwargs.get('host', False):\n", - " source = insert_at(source, _KQL_FILTERS['host'], '|')\n", - " if kwargs.get('date', False):\n", - " source = insert_at(source, _KQL_FILTERS['date'], '|')\n", - " return source\n", - "\n", - "\n", - "# Run the conversion\n", - "conv_counter = {}\n", - "for categ, sources in sigma_dict.items():\n", - " src_converted = 0\n", - " for file_name, file_path in sources.items():\n", - " sigma, kql = sigma_to_la(file_path)\n", - " kql_dict[categ][file_name] = (sigma, kql)\n", - " if not kql == NOT_CONVERTIBLE:\n", - " src_converted += 1\n", - " conv_counter[categ] = (len(sources), src_converted)\n", - " \n", - "print(\"Conversion statistics\")\n", - "print(\"-\" * len(\"Conversion statistics\"))\n", - "print('\\n'.join([f'{categ}: rules: {counter[0]}, converted: {counter[1]}'\n", - " for categ, counter in conv_counter.items()]))" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Display the results in an interactive browser" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "scrolled": false - }, - "outputs": [], - "source": [ - "from ipywidgets import widgets, Layout\n", - "\n", - "# Browser Functions\n", - "def on_cat_value_change(change):\n", - " queries_w.options = kql_dict[change['new']].keys()\n", - " queries_w.value = queries_w.options[0]\n", - "\n", - "def on_query_value_change(change):\n", - " if view_qry_check.value:\n", - " qry_text = kql_dict[sub_cats_w.value][queries_w.value][1]\n", - " if \"Not convertible\" not in qry_text:\n", - " qry_text = add_filter_clauses(qry_text,\n", - " date=add_date_filter_check.value,\n", - " host=add_host_filter_check.value)\n", - " query_text_w.value = qry_text.replace('|', '\\n|')\n", - " orig_text_w.value = kql_dict[sub_cats_w.value][queries_w.value][0]\n", - "\n", - "def on_view_query_value_change(change):\n", - " vis = 'visible' if view_qry_check.value else 'hidden'\n", - " on_query_value_change(None)\n", - " query_text_w.layout.visibility = vis\n", - " orig_text_w.layout.visibility = vis\n", - "\n", - "# Function defs for ExecuteQuery cell below\n", - "def click_exec_hqry(b):\n", - " global qry_results\n", - " query_name = queries_w.value\n", - " query_cat = sub_cats_w.value\n", - " query_text = query_text_w.value\n", - " query_text = query_text.format(**qry_wgt.query_params)\n", - "\n", - " disp_results(query_text)\n", - " \n", - "def disp_results(query_text):\n", - " out_wgt.clear_output()\n", - " with out_wgt:\n", - " print(\"Running query...\", end=' ')\n", - " qry_results = execute_kql_query(query_text)\n", - " print(f'done. {len(qry_results)} rows returned.')\n", - " display(qry_results)\n", - " \n", - "exec_hqry_button = widgets.Button(description=\"Execute query..\")\n", - "out_wgt = widgets.Output() #layout=Layout(width='100%', height='200px', visiblity='visible'))\n", - "exec_hqry_button.on_click(click_exec_hqry)\n", - "\n", - "# Browser widget setup\n", - "categories = list(sorted(kql_dict.keys()))\n", - "sub_cats_w = widgets.Select(options=categories, \n", - " description='Category : ',\n", - " layout=Layout(width='30%', height='120px'),\n", - " style = {'description_width': 'initial'})\n", - "\n", - "queries_w = widgets.Select(options = kql_dict[categories[0]].keys(),\n", - " description='Query : ',\n", - " layout=Layout(width='30%', height='120px'),\n", - " style = {'description_width': 'initial'})\n", - "\n", - "query_text_w = widgets.Textarea(\n", - " value='',\n", - " description='Kql Query:',\n", - " layout=Layout(width='100%', height='300px', visiblity='hidden'),\n", - " disabled=False)\n", - "orig_text_w = widgets.Textarea(\n", - " value='',\n", - " description='Sigma Query:',\n", - " layout=Layout(width='100%', height='250px', visiblity='hidden'),\n", - " disabled=False)\n", - "\n", - "query_text_w.layout.visibility = 'hidden'\n", - "orig_text_w.layout.visibility = 'hidden'\n", - "sub_cats_w.observe(on_cat_value_change, names='value')\n", - "queries_w.observe(on_query_value_change, names='value')\n", - "\n", - "view_qry_check = widgets.Checkbox(description=\"View query\", value=True)\n", - "add_date_filter_check = widgets.Checkbox(description=\"Add date filter\", value=False)\n", - "add_host_filter_check = widgets.Checkbox(description=\"Add host filter\", value=False)\n", - "\n", - "view_qry_check.observe(on_view_query_value_change, names='value')\n", - "add_date_filter_check.observe(on_view_query_value_change, names='value')\n", - "add_host_filter_check.observe(on_view_query_value_change, names='value')\n", - "# view_qry_button.on_click(click_exec_hqry)\n", - "# display(exec_hqry_button);\n", - "\n", - "vbox_opts = widgets.VBox([view_qry_check, add_date_filter_check, add_host_filter_check])\n", - "hbox = widgets.HBox([sub_cats_w, queries_w, vbox_opts])\n", - "vbox = widgets.VBox([hbox, orig_text_w, query_text_w])\n", - "on_view_query_value_change(None)\n", - "display(vbox)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Click the `Execute query` button to run the currently display query\n", - "**Notes:**\n", - "- To run the queries, first authenticate to Log Analytics (scroll down and execute remaining cells in the notebook)\n", - "- If you added a date filter to the query set the date range below" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "scrolled": true - }, - "outputs": [], - "source": [ - "from msticpy.nbtools.nbwidgets import QueryTime\n", - "qry_wgt = QueryTime(units='days', before=5, after=0, max_before=30, max_after=10)\n", - "vbox = widgets.VBox([exec_hqry_button, out_wgt])\n", - "display(vbox)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Set Query Time bounds" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "scrolled": true - }, - "outputs": [], - "source": [ - "qry_wgt.display()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Authenticate to Azure Sentinel" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "def clean_kql_comments(query_string):\n", - " \"\"\"Cleans\"\"\"\n", - " import re\n", - " return re.sub(r'(//[^\\n]+)', '', query_string, re.MULTILINE).replace('\\n', '').strip()\n", - "\n", - "def execute_kql_query(query_string):\n", - " if not query_string or len(query_string.strip()) == 0:\n", - " print('No query supplied')\n", - " return None\n", - " src_query = clean_kql_comments(query_string)\n", - " result = get_ipython().run_cell_magic('kql', line='', cell=src_query)\n", - " \n", - " if result is not None and result.completion_query_info['StatusCode'] == 0:\n", - " results_frame = result.to_dataframe()\n", - " return results_frame\n", - " return []" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "import os\n", - "from msticpy.nbtools.wsconfig import WorkspaceConfig\n", - "from msticpy.nbtools import kql, GetEnvironmentKey\n", - "\n", - "ws_config_file = 'config.json'\n", - "try:\n", - " ws_config = WorkspaceConfig(ws_config_file)\n", - " print('Found config file')\n", - " for cf_item in ['tenant_id', 'subscription_id', 'resource_group', 'workspace_id', 'workspace_name']:\n", - " print(cf_item, ws_config[cf_item])\n", - "except:\n", - " ws_config = None\n", - "\n", - "ws_id = GetEnvironmentKey(env_var='WORKSPACE_ID',\n", - " prompt='Log Analytics Workspace Id:')\n", - "if ws_config:\n", - " ws_id.value = ws_config['workspace_id']\n", - "ws_id.display()" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "try:\n", - " WORKSPACE_ID = select_ws.value\n", - "except NameError:\n", - " try:\n", - " WORKSPACE_ID = ws_id.value\n", - " except NameError:\n", - " WORKSPACE_ID = None\n", - " \n", - "if not WORKSPACE_ID:\n", - " raise ValueError('No workspace selected.')\n", - "\n", - "kql.load_kql_magic()\n", - "\n", - "%kql loganalytics://code().workspace(WORKSPACE_ID)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Save All Converted Files" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "path_save_wgt = widgets.Text(value=str(def_path) + \"_kql_out\",\n", - " description='Path to save KQL files: ',\n", - " layout=Layout(width='50%'),\n", - " style={'description_width': 'initial'})\n", - "path_save_wgt" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "root = Path(path_save_wgt.value)\n", - "root.mkdir(exist_ok=True)\n", - "for categ, kql_files in kql_dict.items():\n", - " sub_dir = root.joinpath(categ)\n", - " \n", - " for file_name, contents in kql_files.items():\n", - " kql_txt = contents[1]\n", - " if not kql_txt == NOT_CONVERTIBLE:\n", - " sub_dir.mkdir(exist_ok=True)\n", - " file_path = sub_dir.joinpath(file_name.replace('.yml', '.kql'))\n", - " with open(file_path, 'w') as output_file:\n", - " output_file.write(kql_txt)\n", - " print(f\"Saved {file_path}\")\n" - ] - } - ], - "metadata": { - "hide_input": false, - "kernelspec": { - "display_name": "Python 3.6", - "language": "python", - "name": "python36" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.7.5" - }, - "toc": { - "base_numbering": 1, - "nav_menu": {}, - "number_sections": false, - "sideBar": true, - "skip_h1_title": false, - "title_cell": "Table of Contents", - "title_sidebar": "Contents", - "toc_cell": false, - "toc_position": {}, - "toc_section_display": true, - "toc_window_display": false - }, - "varInspector": { - "cols": { - "lenName": 16, - "lenType": 16, - "lenVar": 40 - }, - "kernels_config": { - "python": { - "delete_cmd_postfix": "", - "delete_cmd_prefix": "del ", - "library": "var_list.py", - "varRefreshCmd": "print(var_dic_list())" + "cells": [ + { + "cell_type": "markdown", + "source": [ + "# Import and convert Neo23x0 Sigma scripts\n", + "ianhelle@microsoft.com\n", + "\n", + "This notebook is a is a quick and dirty Sigma to Log Analytics converter.\n", + "It uses the modules from sigmac package to do the conversion.\n", + "\n", + "Only a subset of the Sigma rules are convertible currently. Failure to convert\n", + "could be for one or more of these reasons:\n", + "- known limitations of the converter\n", + "- mismatch between the syntax expressible in Sigma and KQL\n", + "- data sources referenced in Sigma rules do not yet exist in Azure Sentinel\n", + "\n", + "The sigmac tool is downloadable as a package from PyPi but since we are downloading\n", + "the rules from the repo, we also copy and import the package from the repo source.\n", + "\n", + "After conversion you can use an interactive browser to step through the rules and\n", + "view (and copy/save) the KQL equivalents. You can also take the conversion results and \n", + "use them in another way (e.g.bulk save to files).\n", + "\n", + "The notebook is all somewhat experimental and offered as-is without any guarantees" + ], + "metadata": {} + }, + { + "cell_type": "markdown", + "source": [ + "## Download and unzip the Sigma repo" + ], + "metadata": {} }, - "r": { - "delete_cmd_postfix": ") ", - "delete_cmd_prefix": "rm(", - "library": "var_list.r", - "varRefreshCmd": "cat(var_dic_list()) " + { + "cell_type": "code", + "source": [ + "from pathlib import Path\r\n", + "import sys\r\n", + "from IPython.display import display, HTML, Markdown\r\n", + "\r\n", + "REQ_PYTHON_VER=(3, 6)\r\n", + "REQ_MSTICPY_VER=(0, 6, 0)\r\n", + "\r\n", + "display(HTML(\"

Starting Notebook setup...

\"))\r\n", + "if Path(\"./utils/nb_check.py\").is_file():\r\n", + " from utils.nb_check import check_python_ver, check_mp_ver\r\n", + "\r\n", + " check_python_ver(min_py_ver=REQ_PYTHON_VER)\r\n", + " try:\r\n", + " check_mp_ver(min_msticpy_ver=REQ_MSTICPY_VER)\r\n", + " except ImportError:\r\n", + " !pip install --upgrade msticpy\r\n", + " if \"msticpy\" in sys.modules:\r\n", + " importlib.reload(sys.modules[\"msticpy\"])\r\n", + " else:\r\n", + " import msticpy\r\n", + " check_mp_ver(REQ_MSTICPY_VER)\r\n", + "\r\n", + "# If not using Azure Notebooks, install msticpy with\r\n", + "# !pip install msticpy\r\n", + "\r\n", + "from msticpy.nbtools import nbinit\r\n", + "nbinit.init_notebook(namespace=globals());" + ], + "outputs": [], + "execution_count": null, + "metadata": { + "collapsed": true, + "jupyter": { + "source_hidden": false, + "outputs_hidden": false + }, + "nteract": { + "transient": { + "deleting": false + } + } + } + }, + { + "cell_type": "code", + "source": [ + "import requests\n", + "# Download the repo ZIP\n", + "sigma_git_url = 'https://github.com/Neo23x0/sigma/archive/master.zip'\n", + "r = requests.get(sigma_git_url)" + ], + "outputs": [], + "execution_count": null, + "metadata": {} + }, + { + "cell_type": "code", + "source": [ + "from ipywidgets import widgets, Layout\n", + "import os\n", + "from pathlib import Path\n", + "def_path = Path.joinpath(Path(os.getcwd()), \"sigma\")\n", + "path_wgt = widgets.Text(value=str(def_path), \n", + " description='Path to extract to zipped repo files: ', \n", + " layout=Layout(width='50%'),\n", + " style={'description_width': 'initial'})\n", + "path_wgt" + ], + "outputs": [], + "execution_count": null, + "metadata": {} + }, + { + "cell_type": "code", + "source": [ + "RULES_REL_PATH = 'sigma-master/rules'\r\n", + "rules_root = Path(path_wgt.value) / RULES_REL_PATH" + ], + "outputs": [], + "execution_count": null, + "metadata": { + "collapsed": true, + "jupyter": { + "source_hidden": false, + "outputs_hidden": false + }, + "nteract": { + "transient": { + "deleting": false + } + } + } + }, + { + "cell_type": "code", + "source": [ + "import zipfile\n", + "import io\n", + "repo_zip = io.BytesIO(r.content)\n", + "\n", + "zip_archive = zipfile.ZipFile(repo_zip, mode='r')\n", + "zip_archive.extractall(path=path_wgt.value)\n" + ], + "outputs": [], + "execution_count": null, + "metadata": {} + }, + { + "cell_type": "markdown", + "source": [ + "### Check that we have the files\n", + "You should see a folder with folders such as application, apt, windows..." + ], + "metadata": {} + }, + { + "cell_type": "code", + "source": [ + "%ls {rules_root}" + ], + "outputs": [], + "execution_count": null, + "metadata": { + "scrolled": true + } + }, + { + "cell_type": "markdown", + "source": [ + "## Convert Sigma Files to Log Analytics Kql queries" + ], + "metadata": {} + }, + { + "cell_type": "code", + "source": [ + "# Read the Sigma YAML file paths into a dict and make a\n", + "# a copy for the target Kql queries\n", + "from pathlib import Path\n", + "from collections import defaultdict\n", + "import copy\n", + "\n", + "def get_rule_files(rules_root):\n", + " file_dict = defaultdict(dict)\n", + " for file in Path(rules_root).resolve().rglob(\"*.yml\"):\n", + " rel_path = Path(file).relative_to(rules_root)\n", + " path_key = '.'.join(rel_path.parent.parts)\n", + " file_dict[path_key][rel_path.name] = file\n", + " return file_dict\n", + " \n", + "sigma_dict = get_rule_files(rules_root)\n", + "kql_dict = copy.deepcopy(sigma_dict)\n" + ], + "outputs": [], + "execution_count": null, + "metadata": { + "scrolled": false + } + }, + { + "cell_type": "code", + "source": [ + "# Add downloaded sigmac tool to sys.path and import Sigmac functions\n", + "import os\n", + "import sys\n", + "module_path = os.path.abspath(os.path.join('sigma/sigma-master/tools'))\n", + "if module_path not in sys.path:\n", + " sys.path.append(module_path)\n", + "from sigma.parser.collection import SigmaCollectionParser\n", + "from sigma.parser.exceptions import SigmaCollectionParseError, SigmaParseError\n", + "from sigma.configuration import SigmaConfiguration, SigmaConfigurationChain\n", + "from sigma.config.exceptions import SigmaConfigParseError, SigmaRuleFilterParseException\n", + "from sigma.filter import SigmaRuleFilter\n", + "import sigma.backends.discovery as backends\n", + "from sigma.backends.base import BackendOptions\n", + "from sigma.backends.exceptions import BackendError, NotSupportedError, PartialMatchError, FullMatchError" + ], + "outputs": [], + "execution_count": null, + "metadata": {} + }, + { + "cell_type": "code", + "source": [ + "# Sigma to Log Analytics Conversion\n", + "import yaml\n", + "_LA_MAPPINGS = '''\n", + "fieldmappings:\n", + " Image: NewProcessName\n", + " ParentImage: ParentProcessName\n", + " ParentCommandLine: NO_MAPPING\n", + "'''\n", + "\n", + "NOT_CONVERTIBLE = 'Not convertible'\n", + "\n", + "def sigma_to_la(file_path):\n", + " with open(file_path, 'r') as input_file:\n", + " try:\n", + " sigmaconfigs = SigmaConfigurationChain()\n", + " sigmaconfig = SigmaConfiguration(_LA_MAPPINGS)\n", + " sigmaconfigs.append(sigmaconfig)\n", + " backend_options = BackendOptions(None, None)\n", + " backend = backends.getBackend('ala')(sigmaconfigs, backend_options)\n", + " parser = SigmaCollectionParser(input_file, sigmaconfigs, None)\n", + " results = parser.generate(backend)\n", + " kql_result = ''\n", + " for result in results:\n", + " kql_result += result\n", + " except (NotImplementedError, NotSupportedError, TypeError):\n", + " kql_result = NOT_CONVERTIBLE\n", + " input_file.seek(0,0)\n", + " sigma_txt = input_file.read()\n", + " if not kql_result == NOT_CONVERTIBLE:\n", + " try:\n", + " kql_header = \"\\n\".join(get_sigma_properties(sigma_txt))\n", + " kql_result = kql_header + \"\\n\" + kql_result\n", + " except Exception as e:\n", + " print(\"exception reading sigma YAML: \", e)\n", + " print(sigma_txt, kql_result, sep='\\n')\n", + " return sigma_txt, kql_result\n", + "\n", + "sigma_keys = ['title', 'description', 'tags', 'status', \n", + " 'author', 'logsource', 'falsepositives', 'level']\n", + "\n", + "def get_sigma_properties(sigma_rule):\n", + " sigma_docs = yaml.load_all(sigma_rule, Loader=yaml.SafeLoader)\n", + " sigma_rule_dict = next(sigma_docs)\n", + " for prop in sigma_keys:\n", + " yield get_property(prop, sigma_rule_dict)\n", + "\n", + "def get_property(name, sigma_rule_dict):\n", + " sig_prop = sigma_rule_dict.get(name, 'na')\n", + " if isinstance(sig_prop, dict):\n", + " sig_prop = ' '.join([f\"{k}: {v}\" for k, v in sig_prop.items()])\n", + " return f\"// {name}: {sig_prop}\"\n", + " \n", + " \n", + "_KQL_FILTERS = {\n", + " 'date': ' | where TimeGenerated >= datetime({start}) and TimeGenerated <= datetime({end}) ',\n", + " 'host': ' | where Computer has {host_name} '\n", + "}\n", + "\n", + "def insert_at(source, insert, find_sub):\n", + " pos = source.find(find_sub)\n", + " if pos != -1:\n", + " return source[:pos] + insert + source[pos:]\n", + " else:\n", + " return source + insert\n", + " \n", + "def add_filter_clauses(source, **kwargs):\n", + " if \"{\" in source or \"}\" in source:\n", + " source = (\"// Warning: embedded braces in source. Please edit if necessary.\\n\"\n", + " + source)\n", + " source = source.replace('{', '{{').replace('}', '}}')\n", + " if kwargs.get('host', False):\n", + " source = insert_at(source, _KQL_FILTERS['host'], '|')\n", + " if kwargs.get('date', False):\n", + " source = insert_at(source, _KQL_FILTERS['date'], '|')\n", + " return source\n", + "\n", + "\n", + "# Run the conversion\n", + "print(\"Converting rules\")\n", + "conv_counter = {}\n", + "for categ, sources in sigma_dict.items():\n", + " src_converted = 0\n", + " print(\"\\n\", categ, end=\"\")\n", + " for file_name, file_path in sources.items():\n", + " try:\n", + " sigma, kql = sigma_to_la(file_path)\n", + " print(\".\", end=\"\")\n", + " except:\n", + " print(f\"Error converting {file_name} ({file_path})\")\n", + " continue\n", + " kql_dict[categ][file_name] = (sigma, kql)\n", + " if not kql == NOT_CONVERTIBLE:\n", + " src_converted += 1\n", + " conv_counter[categ] = (len(sources), src_converted)\n", + "\n", + "print(\"\\nConversion statistics\")\n", + "print(\"-\" * len(\"Conversion statistics\"))\n", + "print('\\n'.join([f'{categ}: rules: {counter[0]}, converted: {counter[1]}'\n", + " for categ, counter in conv_counter.items()]))" + ], + "outputs": [], + "execution_count": null, + "metadata": { + "scrolled": false + } + }, + { + "cell_type": "markdown", + "source": [ + "## Display the results in an interactive browser" + ], + "metadata": {} + }, + { + "cell_type": "code", + "source": [ + "from ipywidgets import widgets, Layout\n", + "\n", + "# Browser Functions\n", + "def on_cat_value_change(change):\n", + " queries_w.options = kql_dict[change['new']].keys()\n", + " queries_w.value = queries_w.options[0]\n", + "\n", + "def on_query_value_change(change):\n", + " if view_qry_check.value:\n", + " qry_text = kql_dict[sub_cats_w.value][queries_w.value][1]\n", + " if \"Not convertible\" not in qry_text:\n", + " qry_text = add_filter_clauses(qry_text,\n", + " date=add_date_filter_check.value,\n", + " host=add_host_filter_check.value)\n", + " query_text_w.value = qry_text.replace('|', '\\n|')\n", + " orig_text_w.value = kql_dict[sub_cats_w.value][queries_w.value][0]\n", + "\n", + "def on_view_query_value_change(change):\n", + " vis = 'visible' if view_qry_check.value else 'hidden'\n", + " on_query_value_change(None)\n", + " query_text_w.layout.visibility = vis\n", + " orig_text_w.layout.visibility = vis\n", + "\n", + "# Function defs for ExecuteQuery cell below\n", + "def click_exec_hqry(b):\n", + " global qry_results\n", + " query_name = queries_w.value\n", + " query_cat = sub_cats_w.value\n", + " query_text = query_text_w.value\n", + " query_text = query_text.format(**qry_wgt.query_params)\n", + "\n", + " disp_results(query_text)\n", + " \n", + "def disp_results(query_text):\n", + " out_wgt.clear_output()\n", + " with out_wgt:\n", + " print(\"Running query...\", end=' ')\n", + " qry_results = execute_kql_query(query_text)\n", + " print(f'done. {len(qry_results)} rows returned.')\n", + " display(qry_results)\n", + " \n", + "exec_hqry_button = widgets.Button(description=\"Execute query..\")\n", + "out_wgt = widgets.Output() #layout=Layout(width='100%', height='200px', visiblity='visible'))\n", + "exec_hqry_button.on_click(click_exec_hqry)\n", + "\n", + "# Browser widget setup\n", + "categories = list(sorted(kql_dict.keys()))\n", + "sub_cats_w = widgets.Select(options=categories, \n", + " description='Category : ',\n", + " layout=Layout(width='30%', height='120px'),\n", + " style = {'description_width': 'initial'})\n", + "\n", + "queries_w = widgets.Select(options = kql_dict[categories[0]].keys(),\n", + " description='Query : ',\n", + " layout=Layout(width='30%', height='120px'),\n", + " style = {'description_width': 'initial'})\n", + "\n", + "query_text_w = widgets.Textarea(\n", + " value='',\n", + " description='Kql Query:',\n", + " layout=Layout(width='100%', height='300px', visiblity='hidden'),\n", + " disabled=False)\n", + "orig_text_w = widgets.Textarea(\n", + " value='',\n", + " description='Sigma Query:',\n", + " layout=Layout(width='100%', height='250px', visiblity='hidden'),\n", + " disabled=False)\n", + "\n", + "query_text_w.layout.visibility = 'hidden'\n", + "orig_text_w.layout.visibility = 'hidden'\n", + "sub_cats_w.observe(on_cat_value_change, names='value')\n", + "queries_w.observe(on_query_value_change, names='value')\n", + "\n", + "view_qry_check = widgets.Checkbox(description=\"View query\", value=True)\n", + "add_date_filter_check = widgets.Checkbox(description=\"Add date filter\", value=False)\n", + "add_host_filter_check = widgets.Checkbox(description=\"Add host filter\", value=False)\n", + "\n", + "view_qry_check.observe(on_view_query_value_change, names='value')\n", + "add_date_filter_check.observe(on_view_query_value_change, names='value')\n", + "add_host_filter_check.observe(on_view_query_value_change, names='value')\n", + "# view_qry_button.on_click(click_exec_hqry)\n", + "# display(exec_hqry_button);\n", + "\n", + "vbox_opts = widgets.VBox([view_qry_check, add_date_filter_check, add_host_filter_check])\n", + "hbox = widgets.HBox([sub_cats_w, queries_w, vbox_opts])\n", + "vbox = widgets.VBox([hbox, orig_text_w, query_text_w])\n", + "on_view_query_value_change(None)\n", + "display(vbox)" + ], + "outputs": [], + "execution_count": null, + "metadata": { + "scrolled": false, + "collapsed": false + } + }, + { + "cell_type": "markdown", + "source": [ + "## Click the `Execute query` button below to run the currently display query\n", + "**Notes:**\n", + "- To run the queries, first authenticate to Azure Sentinel\n", + "- If you added a date filter to the query set the date range below in the control below" + ], + "metadata": {} + }, + { + "cell_type": "markdown", + "source": [ + "### Authenticate to Azure Sentinel and Set Query Time bounds" + ], + "metadata": {} + }, + { + "cell_type": "code", + "source": [ + "from msticpy.nbtools.nbwidgets import QueryTime\r\n", + "from IPython.display import display\r\n", + "from msticpy.data import QueryProvider\r\n", + "from msticpy.common.wsconfig import WorkspaceConfig\r\n", + "ws_config = WorkspaceConfig()\r\n", + "qry_prov = QueryProvider(\"LogAnalytics\")\r\n", + "qry_prov.connect(ws_config.code_connect_str)\r\n", + "\r\n", + "exec_hqry_button = widgets.Button(description=\"Execute Query\")\r\n", + "exec_hqry_button.on_click(exec_query_btn)\r\n", + "\r\n", + "qry_wgt = QueryTime(units='days', before=5, after=0, max_before=30, max_after=10)\r\n", + "\r\n", + "vbox = widgets.VBox([exec_hqry_button, out_wgt])\r\n", + "display(qry_wgt)" + ], + "outputs": [], + "execution_count": null, + "metadata": { + "collapsed": true, + "jupyter": { + "source_hidden": false, + "outputs_hidden": false + }, + "nteract": { + "transient": { + "deleting": false + } + } + } + }, + { + "cell_type": "markdown", + "source": [ + "### Execute the Query" + ], + "metadata": { + "nteract": { + "transient": { + "deleting": false + } + } + } + }, + { + "cell_type": "code", + "source": [ + "def clean_kql_comments(query_string):\r\n", + " \"\"\"Cleans\"\"\"\r\n", + " import re\r\n", + " return re.sub(r'(//[^\\n]+)', '', query_string, re.MULTILINE).replace('\\n', '').strip()\r\n", + "\r\n", + "def execute_kql_query(query_string):\r\n", + " if not query_string or len(query_string.strip()) == 0:\r\n", + " print('No query supplied')\r\n", + " return None\r\n", + " src_query = clean_kql_comments(query_string)\r\n", + " src_query = src_query.format(start=qry_wgt.start, end=qry_wgt.end)\r\n", + " result = qry_prov.exec_query(src_query)\r\n", + " \r\n", + " return result\r\n", + "\r\n", + "disp_result = display(display_id=True)\r\n", + "\r\n", + "def exec_query_btn(btn):\r\n", + " query = query_text_w.value\r\n", + " result = execute_kql_query(query)\r\n", + " disp_result.update(result)\r\n", + "\r\n", + "display(exec_hqry_button)" + ], + "outputs": [], + "execution_count": null, + "metadata": { + "scrolled": true + } + }, + { + "cell_type": "markdown", + "source": [ + "## Save All Converted Files" + ], + "metadata": {} + }, + { + "cell_type": "code", + "source": [ + "path_save_wgt = widgets.Text(value=str(def_path) + \"_kql_out\",\n", + " description='Path to save KQL files: ',\n", + " layout=Layout(width='50%'),\n", + " style={'description_width': 'initial'})\n", + "path_save_wgt" + ], + "outputs": [], + "execution_count": null, + "metadata": {} + }, + { + "cell_type": "code", + "source": [ + "root = Path(path_save_wgt.value)\n", + "root.mkdir(exist_ok=True)\n", + "for categ, kql_files in kql_dict.items():\n", + " sub_dir = root.joinpath(categ)\n", + " \n", + " for file_name, contents in kql_files.items():\n", + " kql_txt = contents[1]\n", + " if not kql_txt == NOT_CONVERTIBLE:\n", + " sub_dir.mkdir(exist_ok=True)\n", + " file_path = sub_dir.joinpath(file_name.replace('.yml', '.kql'))\n", + " with open(file_path, 'w') as output_file:\n", + " output_file.write(kql_txt)\n", + " print(f\"Saved {file_path}\")\n" + ], + "outputs": [], + "execution_count": null, + "metadata": {} } - }, - "types_to_exclude": [ - "module", - "function", - "builtin_function_or_method", - "instance", - "_Feature" - ], - "window_display": false - } - }, - "nbformat": 4, - "nbformat_minor": 2 -} + ], + "metadata": { + "hide_input": false, + "kernelspec": { + "name": "python3-azureml", + "language": "python", + "display_name": "Python 3.6 - AzureML" + }, + "language_info": { + "name": "python", + "version": "3.6.9", + "mimetype": "text/x-python", + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "pygments_lexer": "ipython3", + "nbconvert_exporter": "python", + "file_extension": ".py" + }, + "toc": { + "toc_position": {}, + "skip_h1_title": false, + "number_sections": false, + "title_cell": "Table of Contents", + "toc_window_display": false, + "base_numbering": 1, + "toc_section_display": true, + "title_sidebar": "Contents", + "toc_cell": false, + "nav_menu": {}, + "sideBar": true + }, + "varInspector": { + "cols": { + "lenName": 16, + "lenType": 16, + "lenVar": 40 + }, + "kernels_config": { + "python": { + "delete_cmd_postfix": "", + "delete_cmd_prefix": "del ", + "library": "var_list.py", + "varRefreshCmd": "print(var_dic_list())" + }, + "r": { + "delete_cmd_postfix": ") ", + "delete_cmd_prefix": "rm(", + "library": "var_list.r", + "varRefreshCmd": "cat(var_dic_list()) " + } + }, + "types_to_exclude": [ + "module", + "function", + "builtin_function_or_method", + "instance", + "_Feature" + ], + "window_display": false + }, + "kernel_info": { + "name": "python3-azureml" + }, + "nteract": { + "version": "nteract-front-end@1.0.0" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} \ No newline at end of file diff --git a/TroubleShootingNotebooks.ipynb b/TroubleShootingNotebooks.ipynb index 74890402..c0bd9751 100644 --- a/TroubleShootingNotebooks.ipynb +++ b/TroubleShootingNotebooks.ipynb @@ -33,12 +33,7 @@ { "cell_type": "code", "execution_count": null, - "metadata": { - "ExecuteTime": { - "end_time": "2020-02-27T00:34:21.030512Z", - "start_time": "2020-02-27T00:34:21.016520Z" - } - }, + "metadata": {}, "outputs": [], "source": [ "import sys\n", @@ -98,13 +93,7 @@ { "cell_type": "code", "execution_count": null, - "metadata": { - "ExecuteTime": { - "end_time": "2020-02-27T00:34:26.670210Z", - "start_time": "2020-02-27T00:34:21.032510Z" - }, - "scrolled": true - }, + "metadata": {}, "outputs": [], "source": [ "import importlib\n", @@ -112,7 +101,7 @@ "import warnings\n", "from IPython.display import display, HTML, Markdown\n", "\n", - "MSTICPY_REQ_VERSION = (0, 2, 7)\n", + "REQ_PYTHON_VER = (0, 2, 7)\n", "display(Markdown(\"#### Checking msticpy...\"))\n", "warn_mssg = []\n", "err_mssg = []\n", @@ -147,7 +136,7 @@ " \n", " else:\n", " setup_warn(\"msticpy missing or out-of-date.\")\n", - " display(Markdown(\"Please run `pip install --user --upgrade msticpy` to upgrade/install msticpy\"))\n", + " display(Markdown(\"Please run `pip install --upgrade msticpy` to upgrade/install msticpy\"))\n", " " ] }, @@ -164,12 +153,7 @@ { "cell_type": "code", "execution_count": null, - "metadata": { - "ExecuteTime": { - "end_time": "2020-02-27T00:34:26.683205Z", - "start_time": "2020-02-27T00:34:26.671186Z" - } - }, + "metadata": {}, "outputs": [], "source": [ "display(Markdown(\"#### Checking pandas...\"))\n", @@ -190,7 +174,7 @@ "if need_update:\n", " resp = input(\"Install the package now? (y/n)\")\n", " if resp.casefold().startswith(\"y\"):\n", - " !pip install --user --upgrade pandas\n", + " !pip install --upgrade pandas\n", " if \"pandas\" in sys.modules:\n", " importlib.reload(pd)\n", " else:\n", @@ -199,17 +183,12 @@ " \n", " else:\n", " setup_warn(\"pandas missing or out-of-date.\")\n", - " display(Markdown(\"Please run `pip install --user --upgrade pandas` to upgrade/install pandas\"))" + " display(Markdown(\"Please run `pip install --upgrade pandas` to upgrade/install pandas\"))" ] }, { "cell_type": "markdown", - "metadata": { - "ExecuteTime": { - "end_time": "2020-02-24T19:41:28.539607Z", - "start_time": "2020-02-24T19:41:28.536637Z" - } - }, + "metadata": {}, "source": [ "## Workspace Configuration Check\n", "\n", @@ -275,13 +254,7 @@ { "cell_type": "code", "execution_count": null, - "metadata": { - "ExecuteTime": { - "end_time": "2020-02-27T00:34:26.714184Z", - "start_time": "2020-02-27T00:34:26.684179Z" - }, - "scrolled": true - }, + "metadata": {}, "outputs": [], "source": [ "import os\n", @@ -370,12 +343,7 @@ }, { "cell_type": "markdown", - "metadata": { - "ExecuteTime": { - "end_time": "2020-02-24T19:41:42.199685Z", - "start_time": "2020-02-24T19:41:42.196662Z" - } - }, + "metadata": {}, "source": [ "# msticpy Configuration\n", "\n", @@ -399,13 +367,7 @@ { "cell_type": "code", "execution_count": null, - "metadata": { - "ExecuteTime": { - "end_time": "2020-02-27T00:34:26.742145Z", - "start_time": "2020-02-27T00:34:26.715162Z" - }, - "scrolled": true - }, + "metadata": {}, "outputs": [], "source": [ "import os\n", @@ -483,12 +445,7 @@ { "cell_type": "code", "execution_count": null, - "metadata": { - "ExecuteTime": { - "end_time": "2020-02-27T00:34:26.758139Z", - "start_time": "2020-02-27T00:34:26.745144Z" - } - }, + "metadata": {}, "outputs": [], "source": [ "if errors:\n", @@ -522,7 +479,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.7.5" + "version": "3.6.7" }, "toc": { "base_numbering": 1, diff --git a/aznbsetup.sh b/aznbsetup.sh index 417b5f5c..43f0f501 100644 --- a/aznbsetup.sh +++ b/aznbsetup.sh @@ -1,7 +1,17 @@ #!/bin/bash -# Activate environment -source /home/nbuser/anaconda3_501/bin/activate +# Activate environment (add this to the end of .bashrc) +source ~/anaconda3_501/bin/activate +echo >> ~.bashrc +echo source ~/anaconda3_501/bin/activate >> .bashrc +echo Started environment setup +date +touch ~/.mpnb.lock # pip -pip install -r /home/nbuser/library/requirements.txt \ No newline at end of file +pip install --upgrade pip +pip install --disable-pip-version-check -r ~/library/requirements.txt + +rm -f ~/.mpnb.lock +echo Environment setup complete +date \ No newline at end of file diff --git a/images/nb_ipexplorer-mindmap.png b/images/nb_ipexplorer-mindmap.png new file mode 100644 index 00000000..77e6abab Binary files /dev/null and b/images/nb_ipexplorer-mindmap.png differ diff --git a/requirements.txt b/requirements.txt index 45fd7eae..cbc0127c 100644 --- a/requirements.txt +++ b/requirements.txt @@ -1,3 +1 @@ -msticpy>=0.5.0 -Kqlmagic>=0.1.106 -pandas>=0.25 +msticpy>=0.6.0 diff --git a/utils/check_nb_kernel.py b/utils/check_nb_kernel.py index c0aaba8a..5c6a65fb 100644 --- a/utils/check_nb_kernel.py +++ b/utils/check_nb_kernel.py @@ -1,47 +1,217 @@ +# ------------------------------------------------------------------------- +# Copyright (c) Microsoft Corporation. All rights reserved. +# Licensed under the MIT License. See License.txt in the project root for +# license information. +# -------------------------------------------------------------------------- +""" +Checker/Updater for Notebook kernelspec versions. + +check_nb_kernel.py CMD [-h] [--path PATH] [--target TARGET] [--verbose] + +CMD is one of: + {check, list, update} (default is "check") + + list - shows list of internal kernelspecs that can be used + check - checks the notebook or notebooks for comformance to kernspecs + update - updates notebook or notebooks to target kernelspec + +optional arguments: + -h, --help show this help message and exit + --path PATH, -p PATH Path search for notebooks. Can be a single file, + a directory path or a 'glob'-compatible wildcard. + (e.g. "*" for all files in current folder, "**/*" + for all files in folder and subfolders) + Defaults to current directory. + --target TARGET, -t TARGET + Target kernel spec to check or set. + Required for 'update' command + --verbose, -v Show details of all checked notebooks. Otherwise + only list notebooks with errors or updated notebooks. + +Notes +----- + +If CMD is 'update' you must specify a kernelspec target. The updated +notebook is written to the same name as the input. The old version is +saved as {input-notebook-name}.{previous-kernelspec-name} +If CMD is 'view', target is optional and it reports any notebooks +with kernelspecs different to internal kernelspecs (view with 'list' command) +as errors. + +""" import argparse from pathlib import Path +from typing import Optional, Iterable +import sys + import nbformat -PY36_KERNEL = {"name": ["python36", "python3"], "display_name": ["Python 3.6", "Python 3"], 'language': 'python'} +IP_KERNEL_SPEC = { + "python36": { + "name": "python36", + "language": "python", + "display_name": "Python 3.6", + }, + "python3": {"name": "python3", "language": "python", "display_name": "Python 3"}, + "python3-azureml": { + "name": "python3-azureml", + "language": "python", + "display_name": "Python 3.6 - AzureML", + }, +} + + +def check_notebooks(nb_path: str, k_tgts: Iterable[str], verbose: bool = False): + """Check notebooks for valid kernelspec.""" + err_count = 0 + good_count = 0 + for nbook in _get_notebook_paths(nb_path): + if ".ipynb_checkpoints" in str(nbook): + continue + nb_obj = nbformat.read(str(nbook), as_version=4.0) + kernelspec = nb_obj.get("metadata", {}).get("kernelspec", None) + if not kernelspec: + print("Error: no kernel information.") + continue + nb_ok = False + for config in k_tgts: + tgt_spec = IP_KERNEL_SPEC[config] + for k_name, k_item in kernelspec.items(): + if tgt_spec[k_name] != k_item: + break + else: + nb_ok = True + if not nb_ok: + err_count += 1 + _print_nb_header(nbook) + print("ERROR - Invalid kernelspec '" f"{kernelspec.get('name')}" "'") + print(" ", kernelspec, "\n") + continue + if verbose: + _print_nb_header(nbook) + print(f"{kernelspec['name']} ok\n") + good_count += 1 + print(f"{good_count} with no errors, {err_count} with errors") + + +def _get_notebook_paths(nb_path: str): + """Generate notebook paths.""" + if "*" in nb_path: + for glob_path in Path().glob(nb_path): + if glob_path.is_file() and glob_path.suffix.casefold() == ".ipynb": + yield glob_path + elif Path(nb_path).is_dir(): + yield from Path(nb_path).glob("*.ipynb") + elif Path(nb_path).is_file(): + yield Path(nb_path) + +def _print_nb_header(nbook_path): + print(str(nbook_path.name)) + print("-" * len(str(nbook_path.name))) + print(str(nbook_path.resolve())) -def check_notebooks(nb_path): - notebooks = Path(nb_path).glob("**/*.ipynb") - for nb_path in notebooks: - if ".ipynb_checkpoints" in str(nb_path): +def set_kernelspec(nb_path: str, k_tgt: str, verbose: bool = False): + """Update specified notebooks to `k_tgt` kernelspec.""" + changed_count = 0 + good_count = 0 + for nbook in _get_notebook_paths(nb_path): + if ".ipynb_checkpoints" in str(nbook): continue - nb = nbformat.read(str(nb_path), as_version=4.0) - kernelspec = nb.get("metadata", {}).get("kernelspec", None) - print(str(nb_path)) - print("-" * len(str(nb_path))) - nb_ok = True - for config in PY36_KERNEL: - if not kernelspec: - print("no kernel information.") - if not kernelspec[config] in PY36_KERNEL[config]: - print("Incorrect value in", config, end=". ") - print(f"Should be: '{PY36_KERNEL[config]}' Found:'{kernelspec[config]}'") - nb_ok = False - if nb_ok: - print(f"{kernelspec['name']} ok")) - else: - print() - - + with open(str(nbook), "r") as nb_read: + nb_obj = nbformat.read(nb_read, as_version=4.0) + kernelspec = nb_obj.get("metadata", {}).get("kernelspec", None) + current_kspec_name = kernelspec.get("name") + if not kernelspec: + print("Error: no kernel information.") + continue + updated = False + tgt_spec = IP_KERNEL_SPEC[k_tgt] + for k_name, k_item in kernelspec.items(): + if tgt_spec[k_name] != k_item: + updated = True + kernelspec[k_name] = tgt_spec[k_name] + if updated: + changed_count += 1 + _print_nb_header(nbook) + print( + f"Kernelspec updated from '{current_kspec_name}' to '" + f"{kernelspec.get('name')}" + "'" + ) + print(" ", kernelspec, "\n") + nbook.rename(f"{str(nbook)}.{current_kspec_name}") + nbformat.write(nb_obj, str(nbook)) + continue + if verbose: + _print_nb_header(nbook) + print(f"{kernelspec['name']} ok\n") + good_count += 1 + print(f"{good_count} with no changes, {changed_count} updated") + + def _add_script_args(): parser = argparse.ArgumentParser(description="Notebook kernelspec checker.") + parser.add_argument( + "cmd", default="check", type=str, choices=["check", "list", "update"], + ) parser.add_argument( "--path", "-p", default=".", required=False, help="Path search for notebooks." ) + parser.add_argument( + "--target", "-t", required=False, help="Target kernel spec to check or set." + ) + parser.add_argument( + "--verbose", + "-v", + action="store_true", + help="Show details of all checked notebooks.", + ) return parser +def _view_targets(): + print("Valid targets:") + for kernel, settings in IP_KERNEL_SPEC.items(): + print(f"{kernel}:") + print(" ", settings) + + # pylint: disable=invalid-name if __name__ == "__main__": arg_parser = _add_script_args() args = arg_parser.parse_args() - check_notebooks(args.path) - + if args.cmd == "list": + _view_targets() + sys.exit(0) + + krnl_tgt: Optional[str] = None + if args.target: + krnl_tgt = args.target + if krnl_tgt not in IP_KERNEL_SPEC: + print("'target' must be a valid kernelspec definition") + print("Valid kernel specs:") + _view_targets() + sys.exit(1) + + if krnl_tgt is not None: + krnl_tgts = [krnl_tgt] + else: + krnl_tgts = list(IP_KERNEL_SPEC.keys()) + + if not args.path: + print("check and update commands need a 'path' parameter.") + sys.exit(1) + if args.cmd == "check": + check_notebooks(args.path, krnl_tgts, verbose=args.verbose) + sys.exit(0) + + if args.cmd == "update": + if not krnl_tgt: + print("A kernel target must be specified with 'update'.") + sys.exit(1) + set_kernelspec(args.path, krnl_tgt, verbose=args.verbose) + sys.exit(0) diff --git a/utils/nb_check.py b/utils/nb_check.py index 1d5509d2..b9627141 100644 --- a/utils/nb_check.py +++ b/utils/nb_check.py @@ -4,32 +4,29 @@ # license information. # -------------------------------------------------------------------------- """Checker for Python and msticpy versions.""" -import importlib import os import sys -import warnings -from IPython.display import display, HTML, Markdown +from IPython.display import display, HTML + -warn_mssg = [] -err_mssg = [] MISSING_PKG_ERR = """ -

The package '{package}' is not +

The package '{package}' is not installed or has an incorrect version

Please install this now

""" MIN_PYTHON_VER_DEF = (3, 6) -MSTICPY_REQ_VERSION = (0, 5, 0) +MSTICPY_REQ_VERSION = (0, 5, 2) def check_python_ver(min_py_ver=MIN_PYTHON_VER_DEF): """ - Checks the current version of the Python kernel. - + Check the current version of the Python kernel. + Parameters ---------- min_py_ver : Tuple[int, int] Minimum Python version - + Raises ------ RuntimeError @@ -38,18 +35,22 @@ def check_python_ver(min_py_ver=MIN_PYTHON_VER_DEF): """ display(HTML("Checking Python kernel version...")) if sys.version_info < min_py_ver: - display(HTML( - """ + display( + HTML( + """

This notebook requires a different notebook (Python) kernel version.

- From the Notebook menu (above), choose Kernel then + From the Notebook menu (above), choose Kernel then Change Kernel... from the menu.
Select a Python %s.%s (or later) version kernel and then re-run this cell.

- """ % min_py_ver - )) - display(HTML( """ + % min_py_ver + ) + ) + display( + HTML( + """ Please see the TroubleShootingNotebooks in this folder for more information


@@ -58,21 +59,26 @@ def check_python_ver(min_py_ver=MIN_PYTHON_VER_DEF): ) raise RuntimeError("Python %s.%s or later kernel is required." % min_py_ver) - display(HTML( - "Python kernel version %s.%s.%s OK" % ( - sys.version_info[0], sys.version_info[1], sys.version_info[2] + display( + HTML( + "Python kernel version %s.%s.%s OK" + % (sys.version_info[0], sys.version_info[1], sys.version_info[2]) ) - )) + ) + + _check_nteract() + +# pylint: disable=import-outside-toplevel def check_mp_ver(min_msticpy_ver=MSTICPY_REQ_VERSION): """ - Checks the current version of . - + Check and optionally update the current version of msticpy. + Parameters ---------- min_py_ver : Tuple[int, int] Minimum Python version - + Raises ------ RuntimeError @@ -82,26 +88,49 @@ def check_mp_ver(min_msticpy_ver=MSTICPY_REQ_VERSION): display(HTML("Checking msticpy version...")) try: import msticpy + wrong_ver_err = "msticpy %s.%s.%s or later is needed." % min_msticpy_ver mp_version = tuple([int(v) for v in msticpy.__version__.split(".")]) if mp_version < min_msticpy_ver: - raise ImportError("msticpy %s.%s.%s or later is needed." % min_msticpy_ver) - + raise ImportError(wrong_ver_err) + except ImportError: display(HTML(MISSING_PKG_ERR.format(package="msticpy"))) - resp = input("Install? (y/n)") + resp = input("Install? (y/n)") # nosec if resp.casefold().startswith("y"): raise ImportError("Install msticpy") - else: - display(HTML( + + display( + HTML( """ -

The notebook cannot be run without - the correct version of '%s' (%s.%s.%s or later) -

- Please see the - TroubleShootingNotebooks - in this folder for more information


- """ % ("msticpy", *min_msticpy_ver) - ) +

The notebook cannot be run without + the correct version of '%s' (%s.%s.%s or later) +

+ Please see the + TroubleShootingNotebooks + in this folder for more information


+ """ + % ("msticpy", *min_msticpy_ver) ) - raise RuntimeError("msticpy %s.%s.%s or later is required." % min_msticpy_ver) - display(HTML("msticpy version %s.%s.%s OK" % mp_version)) \ No newline at end of file + ) + raise RuntimeError(wrong_ver_err) + + display(HTML("msticpy version %s.%s.%s OK" % mp_version)) + + +_NTERACT_MSSG = """ +Azure ML detected
+It looks like this notebook is running in an Azure Machine Learning workspace. +If you using the AzureML native notebook interface +(i.e. not Jupyter or Jupyter lab) we need to adjust a +setting for the UI to behave properly. +Ignoring or answering "n" will not affect the functionality of the notebook +but you may see some extraneous UI elements being displayed. +""" + + +def _check_nteract(): + if os.environ.get("USER", "").casefold() == "azureuser": + display(HTML(_NTERACT_MSSG)) + set_app = input("Configure for Azure ML Notebooks? (y/n)") # nosec + if set_app.casefold().startswith("y"): + os.environ["KQLMAGIC_NOTEBOOK_APP"] = "nteract"