Detailed Information

Cited 0 time in webofscience Cited 0 time in scopus
Metadata Downloads

Layered Feature Engineering for E-Commerce Purchase Prediction: A Hierarchical Evaluation on Taobao User Behavior Datasetsopen access

Authors
Suo, LiqiuXia, LinChung, YoonaKim, Eunchan
Issue Date
Feb-2026
Publisher
Tech Science Press
Keywords
e-commerce platform; feature importance; Hierarchical feature engineering; purchase prediction; Taobao; user behavior dataset
Citation
Computers, Materials and Continua, v.87, no.1, pp 1 - 25
Pages
25
Indexed
SCIE
SCOPUS
Journal Title
Computers, Materials and Continua
Volume
87
Number
1
Start Page
1
End Page
25
URI
https://scholarworks.bwise.kr/hanyang/handle/2021.sw.hanyang/213264
DOI
10.32604/cmc.2025.076329
ISSN
1546-2218
1546-2226
Abstract
Accurate purchase prediction in e-commerce critically depends on the quality of behavioral features. This paper proposes a layered and interpretable feature engineering framework that organizes user signals into three layers: Basic, Conversion & Stability (efficiency and volatility across actions), and Advanced Interactions & Activity (cross-behavior synergies and intensity). Using real Taobao (Alibaba’s primary e-commerce platform) logs (57,976 records for 10,203 users; 25 November–03 December 2017), we conducted a hierarchical, layer-wise evaluation that holds data splits and hyperparameters fixed while varying only the feature set to quantify each layer’s marginal contribution. Across logistic regression (LR), decision tree, random forest, XGBoost, and CatBoost models with stratified 5-fold cross-validation, the performance improved monotonically from Basic to Conversion & Stability to Advanced features. With LR, F1 increased from 0.613 (Basic) to 0.962 (Advanced); boosted models achieved high discrimination (0.995 AUC Score) and an F1 score up to 0.983. Calibration and precision–recall analyses indicated strong ranking quality and acknowledged potential dataset and period biases given the short (9-day) window. By making feature contributions measurable and reproducible, the framework complements model-centric advances and offers a transparent blueprint for production-grade behavioral modeling. The code and processed artifacts are publicly available, and future work will extend the validation to longer, seasonal datasets and hybrid approaches that combine automated feature learning with domain-driven design.
Files in This Item
Go to Link
Appears in
Collections
서울 공과대학 > 서울 정보시스템학과 > 1. Journal Articles

qrcode

Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.

Related Researcher

Researcher Kim, Eunchan photo

Kim, Eunchan
COLLEGE OF ENGINEERING (DEPARTMENT OF INFORMATION SYSTEMS)
Read more

Altmetrics

Total Views & Downloads

BROWSE