Step #5

Ensemble predictions from RealMLP and TabPFN models

Last update: August 15, 2025

AI Assistance: Claude.AI (Anthropic) is used for documentation, code restructuring, and performance optimization.

This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program. If not, see https://www.gnu.org/licenses/.

Overall Strategy

Step 1: Preprocess and engineer new features.

Step 2: Use AutoGluon to generate OOF predictions for each target separately. These predictions will be used as additional input features in steps 3 and 4.

Step 3: Train the RealMLP model with processed input (step 1) + ten AutoGluon-OOFs (step 2). These additional features will capture the correlation among targets effectively.

Step 4: Similar to step 3 except use the TabPFN (v2) model.

Step 5: Combine predictions from RealMLP (step 3) and TabPFN (step 4).

Imports

[ ]:
import numpy as np
import pandas as pd
import os
import random

Set Random Seeds

[ ]:
# Set random seed for reproducibility
random.seed(7)
np.random.seed(7)

Input & Output Directories

[ ]:
ROOT_DIR = '/data/Sukanta/Works_AIML/2025_SHELL_FuelProperty/'
DATA_DIR = ROOT_DIR + 'DATA/'
ExtractedDATA_DIR = ROOT_DIR + 'ExtractedDATA/'

Load Predictions from RealMLP and TabPFN

[ ]:
print("=== LOADING PREDICTIONS ===")

# Load RealMLP predictions
df_realmlp = pd.read_csv(ExtractedDATA_DIR + 'RealMLP_submission.csv')
print(f"RealMLP predictions shape: {df_realmlp.shape}")
print(f"RealMLP columns: {list(df_realmlp.columns)}")

# Load TabPFN predictions
df_tabpfn = pd.read_csv(ExtractedDATA_DIR + 'TabPFN_submission.csv')
print(f"TabPFN predictions shape: {df_tabpfn.shape}")
print(f"TabPFN columns: {list(df_tabpfn.columns)}")

Create Ensemble Predictions

[ ]:
print("\n=== CREATING ENSEMBLE PREDICTIONS ===")

# Initialize ensemble dataframe
df_ensemble = pd.DataFrame()
df_ensemble['ID'] = df_realmlp['ID'].copy()

# Use TabPFN for targets 1-4, RealMLP for targets 5-10
for target in range(1, 11):
    column_name = f'BlendProperty{target}'

    if target <= 4:
        # Use TabPFN for targets 1-4
        df_ensemble[column_name] = df_tabpfn[column_name].copy()
        print(f"Target {target}: Using TabPFN predictions")
    else:
        # Use RealMLP for targets 5-10
        df_ensemble[column_name] = df_realmlp[column_name].copy()
        print(f"Target {target}: Using RealMLP predictions")

Save Ensemble Predictions

[ ]:
print("\n=== SAVING ENSEMBLE PREDICTIONS ===")

ensemble_file = ExtractedDATA_DIR + 'Ensemble_submission.csv'
df_ensemble.to_csv(ensemble_file, index=False)