{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "d2973a0ad7677b5b",
   "metadata": {},
   "source": [
    "# Step #5\n",
    "\n",
    "## Ensemble predictions from RealMLP and TabPFN models"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5987a0b227a3826c",
   "metadata": {},
   "source": [
    "**Last update: August 15, 2025**\n",
    "\n",
    "AI Assistance: Claude.AI (Anthropic) is used for documentation, code \n",
    "restructuring, and performance optimization."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4c5366cf8f34f07e",
   "metadata": {},
   "source": [
    "This program is free software: you can redistribute it and/or modify\n",
    "it under the terms of the GNU General Public License as published by\n",
    "the Free Software Foundation, either version 3 of the License, or\n",
    "(at your option) any later version.\n",
    "\n",
    "This program is distributed in the hope that it will be useful,\n",
    "but WITHOUT ANY WARRANTY; without even the implied warranty of\n",
    "MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the\n",
    "GNU General Public License for more details.\n",
    "\n",
    "You should have received a copy of the GNU General Public License\n",
    "along with this program.  If not, see <https://www.gnu.org/licenses/>."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6c5abe5121606f77",
   "metadata": {},
   "source": [
    "**Overall Strategy**"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "45083d8951beac15",
   "metadata": {},
   "source": [
    "Step 1: Preprocess and engineer new features. \n",
    "\n",
    "Step 2: Use AutoGluon to generate OOF predictions for each target separately.\n",
    "These predictions will be used as additional input features in steps 3 and 4.\n",
    "\n",
    "Step 3: Train the RealMLP model with processed input (step 1) + ten\n",
    "AutoGluon-OOFs (step 2). These additional features will capture the correlation\n",
    "among targets effectively.\n",
    "\n",
    "Step 4: Similar to step 3 except use the TabPFN (v2) model.\n",
    "\n",
    "**Step 5: Combine predictions from RealMLP (step 3) and TabPFN (step 4).**"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "68a1a63e6e5092cb",
   "metadata": {},
   "source": [
    "**Imports**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "100baff0b8383211",
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "import pandas as pd\n",
    "import os\n",
    "import random"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f170a91814769ceb",
   "metadata": {},
   "source": [
    "**Set Random Seeds**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "39dcd83e2b4ef016",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Set random seed for reproducibility\n",
    "random.seed(7)\n",
    "np.random.seed(7)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7d15d96e4b1676ac",
   "metadata": {},
   "source": [
    "**Input & Output Directories**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "681d1d749825dde8",
   "metadata": {},
   "outputs": [],
   "source": [
    "ROOT_DIR = '/data/Sukanta/Works_AIML/2025_SHELL_FuelProperty/'\n",
    "DATA_DIR = ROOT_DIR + 'DATA/'\n",
    "ExtractedDATA_DIR = ROOT_DIR + 'ExtractedDATA/'"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6b59c605839a3144",
   "metadata": {},
   "source": [
    "**Load Predictions from RealMLP and TabPFN**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "6d3084869b85a0cd",
   "metadata": {},
   "outputs": [],
   "source": [
    "print(\"=== LOADING PREDICTIONS ===\")\n",
    "\n",
    "# Load RealMLP predictions\n",
    "df_realmlp = pd.read_csv(ExtractedDATA_DIR + 'RealMLP_submission.csv')\n",
    "print(f\"RealMLP predictions shape: {df_realmlp.shape}\")\n",
    "print(f\"RealMLP columns: {list(df_realmlp.columns)}\")\n",
    "\n",
    "# Load TabPFN predictions\n",
    "df_tabpfn = pd.read_csv(ExtractedDATA_DIR + 'TabPFN_submission.csv')\n",
    "print(f\"TabPFN predictions shape: {df_tabpfn.shape}\")\n",
    "print(f\"TabPFN columns: {list(df_tabpfn.columns)}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "486f2780abc0f39a",
   "metadata": {},
   "source": [
    "**Create Ensemble Predictions**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "a9f0c1c49247e970",
   "metadata": {},
   "outputs": [],
   "source": [
    "print(\"\\n=== CREATING ENSEMBLE PREDICTIONS ===\")\n",
    "\n",
    "# Initialize ensemble dataframe\n",
    "df_ensemble = pd.DataFrame()\n",
    "df_ensemble['ID'] = df_realmlp['ID'].copy()\n",
    "\n",
    "# Use TabPFN for targets 1-4, RealMLP for targets 5-10\n",
    "for target in range(1, 11):\n",
    "    column_name = f'BlendProperty{target}'\n",
    "\n",
    "    if target <= 4:\n",
    "        # Use TabPFN for targets 1-4\n",
    "        df_ensemble[column_name] = df_tabpfn[column_name].copy()\n",
    "        print(f\"Target {target}: Using TabPFN predictions\")\n",
    "    else:\n",
    "        # Use RealMLP for targets 5-10\n",
    "        df_ensemble[column_name] = df_realmlp[column_name].copy()\n",
    "        print(f\"Target {target}: Using RealMLP predictions\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "393420edc437e694",
   "metadata": {},
   "source": [
    "**Save Ensemble Predictions**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "initial_id",
   "metadata": {},
   "outputs": [],
   "source": [
    "print(\"\\n=== SAVING ENSEMBLE PREDICTIONS ===\")\n",
    "\n",
    "ensemble_file = ExtractedDATA_DIR + 'Ensemble_submission.csv'\n",
    "df_ensemble.to_csv(ensemble_file, index=False)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.13.5"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}