{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<h1 align=\"center\">Smart Home Sensor Analysis</h1>\n",
    "<p>The ‘Household Power Consumption‘ dataset is a multivariate time series dataset that describes the electricity consumption for a single household for last few months. The  dataset is modeled after household consumption dataset available here - \n",
    "https://archive.ics.uci.edu/ml/datasets/individual+household+electric+power+consumption\n",
    "\n",
    "\n",
    "It is a multivariate series comprised of seven variables (besides the date and time); they are:\n",
    "\n",
    "<b>global_active_power:</b> The total active power consumed by the household (kilowatts).<br>\n",
    "<b>global_reactive_power:</b> The total reactive power consumed by the household (kilowatts).<br>\n",
    "<b>voltage:</b> Average voltage (volts).<br>\n",
    "<b>global_intensity:</b> Average current intensity (amps).<br>\n",
    "<b>sub_metering_1:</b> Active energy for kitchen (watt-hours of active energy).<br>\n",
    "<b>sub_metering_2:</b> Active energy for laundry (watt-hours of active energy).<br>\n",
    "<b>sub_metering_3:</b> Active energy for climate control systems (watt-hours of active energy).<br>\n",
    "\n",
    "<p> In the following section, we will analyze and predict hourly power consumption using DeepAR on SageMaker. The purpose of this exercise is to demonstrate Sagemaker integration with IoT Analytics and not to focus on training the right model to generate accurate prediction.\n",
    "<p>For more information see the DeepAR [documentation](https://docs.aws.amazon.com/sagemaker/latest/dg/deepar.html) or [paper](https://arxiv.org/abs/1704.04110), "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": false
   },
   "outputs": [],
   "source": [
    "from __future__ import print_function\n",
    "%matplotlib inline\n",
    "\n",
    "import sys\n",
    "from urllib.request import urlretrieve\n",
    "import zipfile\n",
    "from dateutil.parser import parse\n",
    "import json\n",
    "from random import shuffle\n",
    "import random\n",
    "import datetime\n",
    "import os\n",
    "\n",
    "import boto3\n",
    "import s3fs\n",
    "import sagemaker\n",
    "import numpy as np\n",
    "import pandas as pd\n",
    "import matplotlib.pyplot as plt\n",
    "\n",
    "from ipywidgets import interact, interactive, fixed, interact_manual\n",
    "import ipywidgets as widgets\n",
    "from ipywidgets import IntSlider, FloatSlider, Checkbox\n",
    "\n",
    "from sagemaker import get_execution_role\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": false
   },
   "outputs": [],
   "source": [
    "# set random seeds for reproducibility\n",
    "np.random.seed(42)\n",
    "random.seed(42)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": false
   },
   "outputs": [],
   "source": [
    "sagemaker_session = sagemaker.Session()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Before starting, we can override the default values for the following:\n",
    "- The S3 bucket and prefix that you want to use for training and model data. This should be within the same region as the Notebook Instance, training, and hosting.\n",
    "- The IAM role arn used to give training and hosting access to your data. See the documentation for how to create these."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": false
   },
   "outputs": [],
   "source": [
    "s3_bucket = sagemaker.Session().default_bucket()# replace with an existing bucket if needed\n",
    "\n",
    "s3_prefix = 'iot-analytics-demo-notebook'    # prefix used for all data stored within the bucket\n",
    "\n",
    "role = get_execution_role()\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": false
   },
   "outputs": [],
   "source": [
    "region = sagemaker_session.boto_region_name\n",
    "s3_output_path = \"s3://{}/{}/output\".format(s3_bucket, s3_prefix)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Next, we configure the container image to be used for the region that we are running in."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": false
   },
   "outputs": [],
   "source": [
    "image_name = sagemaker.amazon.amazon_estimator.get_image_uri(region, \"forecasting-deepar\", \"latest\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Import  dataset from the IotAnalytics database and upload it to S3 to make it available for Sagemaker"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": false
   },
   "outputs": [],
   "source": [
    "## reading from the IotAnalytics database\n",
    "import pandas as pd\n",
    "from datetime import datetime\n",
    "from numpy import isnan\n",
    "import pandas as pd\n",
    "from datetime import datetime\n",
    "from numpy import isnan\n",
    "\n",
    "\n",
    "bucket='iotareinvent18'\n",
    "data_key = 'inputdata.csv'\n",
    "data_location = 's3://{}/{}'.format(bucket, data_key)\n",
    "\n",
    "\n",
    "# Function to convert date columns into a single timestamp field\n",
    "#def parse(x):\n",
    "#    return datetime.strptime(x, '%Y %m %d %H')\n",
    "def parse(x):\n",
    "    t= pd.to_datetime(str(x)) \n",
    "    timestring = t.strftime('%Y.%m.%d %H:%M:%S')\n",
    "    return t\n",
    "\n",
    "def fill_missing(values):\n",
    "    one_day = 60 * 24\n",
    "    for row in range(values.shape[0]):\n",
    "        for col in range(values.shape[1]):\n",
    "            if np.isnan(values[row, col]):\n",
    "                values[row, col] = values[row - one_day, col]\n",
    " \n",
    "# load sample data\n",
    "\n",
    "#n = sum(1 for line in open(data_location)) - 1 #number of records in file (excludes header)\n",
    "n=244207 #total observations\n",
    "s = 10000 #desired sample size\n",
    "sampling = sorted(random.sample(range(1,n+1),n-s)) #the 0-indexed header will not be included in the skip list\n",
    "\n",
    "dataset = pd.read_csv(data_location,  header=0, skiprows=sampling ,low_memory=False, infer_datetime_format=True, date_parser = parse)\n",
    "dataset['__dt'] = dataset['timestamp']\n",
    "dataset['cost'] = (dataset['sub_metering_1'] +dataset['sub_metering_2'] + dataset['sub_metering_3'])*1.5\n",
    "dataset['timestamp'] = pd.to_datetime(dataset['timestamp'])\n",
    "dataset.set_index('timestamp', inplace=True)\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": false
   },
   "outputs": [],
   "source": [
    "print(\"#####Examine the dataset\")\n",
    "print(\"dataset shape \",dataset.shape)\n",
    "print(dataset.sample(5))\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<h3>Explore the data</h3>\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": false
   },
   "outputs": [],
   "source": [
    "\n",
    "cor_cols = [\"global_active_power\",\"global_reactive_power\", \"global_intensity\",\"voltage\",\"sub_metering_1\",\"sub_metering_2\",\"sub_metering_3\",\"cost\"]\n",
    "\n",
    "\n",
    "# plot correlation matrix\n",
    "plt.figure(figsize=(10,5))\n",
    "\n",
    "plt.matshow(dataset.loc[:, cor_cols].corr(), fignum=1)\n",
    "plt.xticks(range(len(cor_cols)), cor_cols, rotation=90)\n",
    "plt.yticks(range(len(cor_cols)), cor_cols)\n",
    "plt.colorbar()\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<p><b>Correlation Plot</b>\n",
    "<p>The correlation plot, we can observe that Sub Meters 1 and 2 show some correlation with each other. Sub meter 3 has less correlation with Sub meter 1 and 2. Since cost is derived by aggregating Sub Meter 1, 2 and 3, we will retain only cost column to train the model."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": false
   },
   "outputs": [],
   "source": [
    "##updating data to dataset\n",
    "analysis_cols = [\"global_active_power\",\"global_reactive_power\", \"global_intensity\",\"voltage\",\"cost\"]\n",
    "\n",
    "# select the value columns in the DataFrame to compare\n",
    "dataset= dataset.loc[:, analysis_cols]\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Then, we load and parse the dataset and convert it to a collection of Pandas time series, which makes common time series operations such as indexing by time periods or resampling much easier. Here we want to forecast longer periods (one week) and resample the data to a granularity of every hour."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": false
   },
   "outputs": [],
   "source": [
    "num_timeseries = dataset.shape[1]\n",
    "data_kw = dataset.resample('2H').sum()\n",
    "timeseries = []\n",
    "for i in range(num_timeseries):\n",
    "    timeseries.append(np.trim_zeros(data_kw.iloc[:,i], trim='f'))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Train and Test splits\n",
    "\n",
    "Often times one is interested in evaluating the model or tuning its hyperparameters by looking at error metrics on a hold-out test set. Here we split the available data into train and test sets for evaluating the trained model. For standard machine learning tasks such as classification and regression, one typically obtains this split by randomly separating examples into train and test sets. However, in forecasting it is important to do this train/test split based on time rather than by time series.\n",
    "\n",
    "In this example, we will reserve the last section of each of the time series for evalutation purpose and use only the first part as training data. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": false
   },
   "outputs": [],
   "source": [
    "# we use 1 (changed 2 to 1) hour frequency for the time series\n",
    "freq = '2H' \n",
    "\n",
    "# we predict for 7 days\n",
    "prediction_length = 7 * 12  ##original 7*12\n",
    "\n",
    "# we also use 7 days as context length, this is the number of state updates accomplished before making predictions\n",
    "context_length = 7 * 12 ##original 7*12 due to 2H sample"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Here, we assume that the minimum date is the first day sensors started recording the dataset. We split the dataset 70, 30 with 70% of the data retained for training."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": false
   },
   "outputs": [],
   "source": [
    "dailyGroups = dataset.resample('D').sum() ## Gathering unique days for dataset split\n",
    "dailyGroups"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": false
   },
   "outputs": [],
   "source": [
    "\n",
    "startTrainDate = dailyGroups.index.min()\n",
    "\n",
    "maxDate=dailyGroups.index.max()\n",
    "\n",
    "\n",
    "traingSplit=round(dailyGroups.shape[0]*.7)\n",
    "\n",
    "\n",
    "splitDate=dailyGroups.index[traingSplit]\n",
    "\n",
    "\n",
    "start_dataset = pd.Timestamp(startTrainDate, freq=freq)\n",
    "end_training = pd.Timestamp(splitDate, freq=freq)\n",
    "\n",
    "print(\"###### Confirm the datasets\")\n",
    "print(\"startTrainDate, maxDate, traingSplit, splitDate\", startTrainDate, maxDate, traingSplit, splitDate)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The DeepAR JSON input format represents each time series as a JSON object. In the simplest case each time series just consists of a start time stamp (``start``) and a list of values (``target``). For more complex cases, DeepAR also supports the fields ``dynamic_feat`` for time-series features and ``cat`` for categorical features, which we will use  later."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": false
   },
   "outputs": [],
   "source": [
    "\n",
    "\n",
    "training_data = [\n",
    "    {\n",
    "        \"start\": str(start_dataset),\n",
    "        \"target\": ts[start_dataset:end_training - pd.Timedelta(1, unit='D')].tolist()\n",
    "         # We use -1 days, because pandas indexing includes the upper bound \n",
    "    }\n",
    "    for ts in timeseries\n",
    "]\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "As test data, we will consider time series extending beyond the training range: these will be used for computing test scores, by using the trained model to forecast their trailing 7 days, and comparing predictions with actual values.\n",
    "To evaluate our model performance on more than one week, we generate test data that extends to 1, 2, 3, 4 weeks beyond the training range. This way we perform *rolling evaluation* of our model."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": false
   },
   "outputs": [],
   "source": [
    "num_test_windows = 4\n",
    "\n",
    "test_data = [\n",
    "    {\n",
    "        \"start\": str(start_dataset),\n",
    "        \"target\": ts[start_dataset:end_training + pd.Timedelta(k, unit='D') * prediction_length].tolist()\n",
    "    }\n",
    "    for k in range(1, num_test_windows + 1) \n",
    "    for ts in timeseries\n",
    "]\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's now write the dictionary to the `jsonlines` file format that DeepAR understands (it also supports gzipped jsonlines and parquet)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": false
   },
   "outputs": [],
   "source": [
    "def write_dicts_to_file(path, data):\n",
    "    with open(path, 'wb') as fp:\n",
    "        for d in data:\n",
    "            fp.write(json.dumps(d).encode(\"utf-8\"))\n",
    "            fp.write(\"\\n\".encode('utf-8'))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": false
   },
   "outputs": [],
   "source": [
    "%%time\n",
    "write_dicts_to_file(\"train.json\", training_data)\n",
    "write_dicts_to_file(\"test.json\", test_data)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now that we have the data files locally, let us copy them to S3 where DeepAR can access them. Depending on your connection, this may take a couple of minutes."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": false
   },
   "outputs": [],
   "source": [
    "s3 = boto3.resource('s3')\n",
    "def copy_to_s3(local_file, s3_path, override=True):\n",
    "    assert s3_path.startswith('s3://')\n",
    "    split = s3_path.split('/')\n",
    "    bucket = split[2]\n",
    "    path = '/'.join(split[3:])\n",
    "    buk = s3.Bucket(bucket)\n",
    "    \n",
    "    if len(list(buk.objects.filter(Prefix=path))) > 0:\n",
    "        if not override:\n",
    "            print('File s3://{}/{} already exists.\\nSet override to upload anyway.\\n'.format(s3_bucket, s3_path))\n",
    "            return\n",
    "        else:\n",
    "            print('Overwriting existing file')\n",
    "    with open(local_file, 'rb') as data:\n",
    "        print('Uploading file to {}'.format(s3_path))\n",
    "        buk.put_object(Key=path, Body=data)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": false
   },
   "outputs": [],
   "source": [
    "%%time\n",
    "copy_to_s3(\"train.json\", s3_output_path + \"/train/train.json\")\n",
    "copy_to_s3(\"test.json\", s3_output_path + \"/test/test.json\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's have a look to what we just wrote to S3."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": false
   },
   "outputs": [],
   "source": [
    "s3filesystem = s3fs.S3FileSystem()\n",
    "with s3filesystem.open(s3_output_path + \"/train/train.json\", 'rb') as fp:\n",
    "    print(fp.readline().decode(\"utf-8\")[:100] + \"...\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We are all set with our dataset processing, we can now call DeepAR to train a model and generate predictions."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Train a model\n",
    "\n",
    "Here we define the estimator that will launch the training job."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": false
   },
   "outputs": [],
   "source": [
    "estimator = sagemaker.estimator.Estimator(\n",
    "    sagemaker_session=sagemaker_session,\n",
    "    image_name=image_name,\n",
    "    image_uri=image_name,\n",
    "    role=role,\n",
    "    train_instance_count=1,\n",
    "    train_instance_type='ml.c4.2xlarge',\n",
    "    base_job_name='deepar-electricity-demo',\n",
    "    output_path=s3_output_path\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Next we need to set the hyperparameters for the training job. For example frequency of the time series used, number of data points the model will look at in the past, number of predicted data points. The other hyperparameters concern the model to train (number of layers, number of cells per layer, likelihood function) and the training options (number of epochs, batch size, learning rate...). We use default parameters for every optional parameter in this case (you can always use [Sagemaker Automated Model Tuning](https://aws.amazon.com/blogs/aws/sagemaker-automatic-model-tuning/) to tune them)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": false
   },
   "outputs": [],
   "source": [
    "hyperparameters = {\n",
    "    \"time_freq\": freq,\n",
    "    \"epochs\": \"40\",\n",
    "    \"early_stopping_patience\": \"40\",\n",
    "    \"mini_batch_size\": \"64\",\n",
    "    \"learning_rate\": \"1E-4\",\n",
    "    \"context_length\": str(context_length),\n",
    "    \"prediction_length\": str(prediction_length)\n",
    "}"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": false
   },
   "outputs": [],
   "source": [
    "estimator.set_hyperparameters(**hyperparameters)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We are ready to launch the training job. SageMaker will start an EC2 instance, download the data from S3, start training the model and save the trained model.\n",
    "\n",
    "If you provide the `test` data channel as we do in this example, DeepAR will also calculate accuracy metrics for the trained model on this test. This is done by predicting the last `prediction_length` points of each time-series in the test set and comparing this to the actual value of the time-series. \n",
    "\n",
    "**Note:** the next cell may take a few minutes to complete, depending on data size, model complexity, training options."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "%%time\n",
    "data_channels = {\n",
    "    \"train\": \"{}/train/\".format(s3_output_path),\n",
    "    \"test\": \"{}/test/\".format(s3_output_path)\n",
    "}\n",
    "\n",
    "estimator.fit(inputs=data_channels, wait=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Since you pass a test set in this example, accuracy metrics for the forecast are computed and logged (see bottom of the log).\n",
    "You can find the definition of these metrics from [our documentation](https://docs.aws.amazon.com/sagemaker/latest/dg/deepar.html). You can use these to optimize the parameters and tune your model or use SageMaker's [Automated Model Tuning service](https://aws.amazon.com/blogs/aws/sagemaker-automatic-model-tuning/) to tune the model for you."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Create endpoint and predictor"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now that we have a trained model, we can use it to perform predictions by deploying it to an endpoint.\n",
    "\n",
    "**Note: Remember to delete the endpoint after running this experiment. A cell at the very bottom of this notebook will do that: make sure you run it at the end.**"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "To query the endpoint and perform predictions, we can define the following utility class: this allows making requests using `pandas.Series` objects rather than raw JSON strings."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": false
   },
   "outputs": [],
   "source": [
    "class DeepARPredictor(sagemaker.predictor.RealTimePredictor):\n",
    "    \n",
    "    def __init__(self, *args, **kwargs):\n",
    "        super().__init__(*args, content_type=sagemaker.content_types.CONTENT_TYPE_JSON, **kwargs)\n",
    "        \n",
    "    def predict(self, ts, cat=None, dynamic_feat=None, \n",
    "                num_samples=100, return_samples=False, quantiles=[\"0.1\", \"0.5\", \"0.9\"]):\n",
    "        \"\"\"Requests the prediction of for the time series listed in `ts`, each with the (optional)\n",
    "        corresponding category listed in `cat`.\n",
    "        \n",
    "        ts -- `pandas.Series` object, the time series to predict\n",
    "        cat -- integer, the group associated to the time series (default: None)\n",
    "        num_samples -- integer, number of samples to compute at prediction time (default: 100)\n",
    "        return_samples -- boolean indicating whether to include samples in the response (default: False)\n",
    "        quantiles -- list of strings specifying the quantiles to compute (default: [\"0.1\", \"0.5\", \"0.9\"])\n",
    "        \n",
    "        Return value: list of `pandas.DataFrame` objects, each containing the predictions\n",
    "        \"\"\"\n",
    "        prediction_time = ts.index[-1] + pd.Timedelta(1, unit='D')\n",
    "\n",
    "        quantiles = [str(q) for q in quantiles]\n",
    "        req = self.__encode_request(ts, cat, dynamic_feat, num_samples, return_samples, quantiles)\n",
    "        res = super(DeepARPredictor, self).predict(req)\n",
    "        return self.__decode_response(res, ts.index.freq, prediction_time, return_samples)\n",
    "    \n",
    "    def __encode_request(self, ts, cat, dynamic_feat, num_samples, return_samples, quantiles):\n",
    "        instance = series_to_dict(ts, cat if cat is not None else None, dynamic_feat if dynamic_feat else None)\n",
    "        configuration = {\n",
    "            \"num_samples\": num_samples,\n",
    "            \"output_types\": [\"quantiles\", \"samples\"] if return_samples else [\"quantiles\"],\n",
    "            \"quantiles\": quantiles\n",
    "        }\n",
    "        \n",
    "        http_request_data = {\n",
    "            \"instances\": [instance],\n",
    "            \"configuration\": configuration\n",
    "        }\n",
    "        \n",
    "        return json.dumps(http_request_data).encode('utf-8')\n",
    "    \n",
    "    def __decode_response(self, response, freq, prediction_time, return_samples):\n",
    "        # we only sent one time series so we only receive one in return\n",
    "        # however, if possible one will pass multiple time series as predictions will then be faster\n",
    "        predictions = json.loads(response.decode('utf-8'))['predictions'][0]\n",
    "        prediction_length = len(next(iter(predictions['quantiles'].values())))\n",
    "        #prediction_index = pd.DatetimeIndex(start=prediction_time, freq=freq, periods=prediction_length)\n",
    "        prediction_index = pd.date_range(start=prediction_time, freq=freq, periods=prediction_length)\n",
    "        if return_samples:\n",
    "            dict_of_samples = {'sample_' + str(i): s for i, s in enumerate(predictions['samples'])}\n",
    "        else:\n",
    "            dict_of_samples = {}\n",
    "        return pd.DataFrame(data={**predictions['quantiles'], **dict_of_samples}, index=prediction_index)\n",
    "\n",
    "    def set_frequency(self, freq):\n",
    "        self.freq = freq\n",
    "        \n",
    "def encode_target(ts):\n",
    "    return [x if np.isfinite(x) else \"NaN\" for x in ts]        \n",
    "\n",
    "def series_to_dict(ts, cat=None, dynamic_feat=None):\n",
    "    \"\"\"Given a pandas.Series object, returns a dictionary encoding the time series.\n",
    "\n",
    "    ts -- a pands.Series object with the target time series\n",
    "    cat -- an integer indicating the time series category\n",
    "\n",
    "    Return value: a dictionary\n",
    "    \"\"\"\n",
    "    obj = {\"start\": str(ts.index[0]), \"target\": encode_target(ts)}\n",
    "    if cat is not None:\n",
    "        obj[\"cat\"] = cat\n",
    "    if dynamic_feat is not None:\n",
    "        obj[\"dynamic_feat\"] = dynamic_feat        \n",
    "    return obj"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now we can deploy the model and create and endpoint that can be queried using our custom DeepARPredictor class."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": false
   },
   "outputs": [],
   "source": [
    "predictor = estimator.deploy(\n",
    "    initial_instance_count=1,\n",
    "    instance_type='ml.m4.xlarge',\n",
    "    predictor_cls=DeepARPredictor)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Make predictions "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now we can use the `predictor` object to generate predictions."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": false
   },
   "outputs": [],
   "source": [
    "respDF=predictor.predict(ts=timeseries[4], quantiles=[0.90])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Printing predicted cost for next 10 hours  at 0.9 quantile. Please note, the data is generated using Device simulators and not recorded by actual sensors. Predicted cost can sometime display negative numbers, which will not be the case in case of real life scenario. Power company will never credit money for low consumption!!!!! "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": false
   },
   "outputs": [],
   "source": [
    "respDF.head(10)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Delete endpoints"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": false
   },
   "outputs": [],
   "source": [
    "predictor.delete_endpoint()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": false
   },
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "conda_mxnet_p36",
   "language": "python",
   "name": "conda_mxnet_p36"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.13"
  },
  "notice": "Copyright 2018 Amazon.com, Inc. or its affiliates. All Rights Reserved.  Licensed under the Apache License, Version 2.0 (the \"License\"). You may not use this file except in compliance with the License. A copy of the License is located at http://aws.amazon.com/apache2.0/ or in the \"license\" file accompanying this file. This file is distributed on an \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License."
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
