{ "cells": [ { "cell_type": "markdown", "id": "d2973a0ad7677b5b", "metadata": {}, "source": [ "# Step #5\n", "\n", "## Ensemble predictions from RealMLP and TabPFN models" ] }, { "cell_type": "markdown", "id": "5987a0b227a3826c", "metadata": {}, "source": [ "**Last update: August 15, 2025**\n", "\n", "AI Assistance: Claude.AI (Anthropic) is used for documentation, code \n", "restructuring, and performance optimization." ] }, { "cell_type": "markdown", "id": "4c5366cf8f34f07e", "metadata": {}, "source": [ "This program is free software: you can redistribute it and/or modify\n", "it under the terms of the GNU General Public License as published by\n", "the Free Software Foundation, either version 3 of the License, or\n", "(at your option) any later version.\n", "\n", "This program is distributed in the hope that it will be useful,\n", "but WITHOUT ANY WARRANTY; without even the implied warranty of\n", "MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the\n", "GNU General Public License for more details.\n", "\n", "You should have received a copy of the GNU General Public License\n", "along with this program. If not, see ." ] }, { "cell_type": "markdown", "id": "6c5abe5121606f77", "metadata": {}, "source": [ "**Overall Strategy**" ] }, { "cell_type": "markdown", "id": "45083d8951beac15", "metadata": {}, "source": [ "Step 1: Preprocess and engineer new features. \n", "\n", "Step 2: Use AutoGluon to generate OOF predictions for each target separately.\n", "These predictions will be used as additional input features in steps 3 and 4.\n", "\n", "Step 3: Train the RealMLP model with processed input (step 1) + ten\n", "AutoGluon-OOFs (step 2). These additional features will capture the correlation\n", "among targets effectively.\n", "\n", "Step 4: Similar to step 3 except use the TabPFN (v2) model.\n", "\n", "**Step 5: Combine predictions from RealMLP (step 3) and TabPFN (step 4).**" ] }, { "cell_type": "markdown", "id": "68a1a63e6e5092cb", "metadata": {}, "source": [ "**Imports**" ] }, { "cell_type": "code", "execution_count": null, "id": "100baff0b8383211", "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "import pandas as pd\n", "import os\n", "import random" ] }, { "cell_type": "markdown", "id": "f170a91814769ceb", "metadata": {}, "source": [ "**Set Random Seeds**" ] }, { "cell_type": "code", "execution_count": null, "id": "39dcd83e2b4ef016", "metadata": {}, "outputs": [], "source": [ "# Set random seed for reproducibility\n", "random.seed(7)\n", "np.random.seed(7)" ] }, { "cell_type": "markdown", "id": "7d15d96e4b1676ac", "metadata": {}, "source": [ "**Input & Output Directories**" ] }, { "cell_type": "code", "execution_count": null, "id": "681d1d749825dde8", "metadata": {}, "outputs": [], "source": [ "ROOT_DIR = '/data/Sukanta/Works_AIML/2025_SHELL_FuelProperty/'\n", "DATA_DIR = ROOT_DIR + 'DATA/'\n", "ExtractedDATA_DIR = ROOT_DIR + 'ExtractedDATA/'" ] }, { "cell_type": "markdown", "id": "6b59c605839a3144", "metadata": {}, "source": [ "**Load Predictions from RealMLP and TabPFN**" ] }, { "cell_type": "code", "execution_count": null, "id": "6d3084869b85a0cd", "metadata": {}, "outputs": [], "source": [ "print(\"=== LOADING PREDICTIONS ===\")\n", "\n", "# Load RealMLP predictions\n", "df_realmlp = pd.read_csv(ExtractedDATA_DIR + 'RealMLP_submission.csv')\n", "print(f\"RealMLP predictions shape: {df_realmlp.shape}\")\n", "print(f\"RealMLP columns: {list(df_realmlp.columns)}\")\n", "\n", "# Load TabPFN predictions\n", "df_tabpfn = pd.read_csv(ExtractedDATA_DIR + 'TabPFN_submission.csv')\n", "print(f\"TabPFN predictions shape: {df_tabpfn.shape}\")\n", "print(f\"TabPFN columns: {list(df_tabpfn.columns)}\")" ] }, { "cell_type": "markdown", "id": "486f2780abc0f39a", "metadata": {}, "source": [ "**Create Ensemble Predictions**" ] }, { "cell_type": "code", "execution_count": null, "id": "a9f0c1c49247e970", "metadata": {}, "outputs": [], "source": [ "print(\"\\n=== CREATING ENSEMBLE PREDICTIONS ===\")\n", "\n", "# Initialize ensemble dataframe\n", "df_ensemble = pd.DataFrame()\n", "df_ensemble['ID'] = df_realmlp['ID'].copy()\n", "\n", "# Use TabPFN for targets 1-4, RealMLP for targets 5-10\n", "for target in range(1, 11):\n", " column_name = f'BlendProperty{target}'\n", "\n", " if target <= 4:\n", " # Use TabPFN for targets 1-4\n", " df_ensemble[column_name] = df_tabpfn[column_name].copy()\n", " print(f\"Target {target}: Using TabPFN predictions\")\n", " else:\n", " # Use RealMLP for targets 5-10\n", " df_ensemble[column_name] = df_realmlp[column_name].copy()\n", " print(f\"Target {target}: Using RealMLP predictions\")" ] }, { "cell_type": "markdown", "id": "393420edc437e694", "metadata": {}, "source": [ "**Save Ensemble Predictions**" ] }, { "cell_type": "code", "execution_count": null, "id": "initial_id", "metadata": {}, "outputs": [], "source": [ "print(\"\\n=== SAVING ENSEMBLE PREDICTIONS ===\")\n", "\n", "ensemble_file = ExtractedDATA_DIR + 'Ensemble_submission.csv'\n", "df_ensemble.to_csv(ensemble_file, index=False)" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.13.5" } }, "nbformat": 4, "nbformat_minor": 5 }