Kaggle M5 Competition Part 2 -Modeling
Contents of table:
Kaggle M5 Competition Part 2 -Modeling
5. Feature Engineering
1 | #5.1 Label Encoding |
1 | #id, department, category, store, state 를 코드값으로 저장 |
1 | #1 |
1 | import time |
1 | #5.2 introduce lags |
1 | #lag col들을 추가 |
1 | #5.3 Mean Encoding |
1 | %time |
Wall time: 1 ms
1 | #5.4 Rolling Window Statistics |
1 | df['rolling_sold_mean'] = df.groupby(['id', 'item_id', 'dept_id', 'cat_id', 'store_id', 'state_id'])['sold'].transform(lambda x: x.rolling(window=7).mean()).astype(np.float16) |
1 | #5.5 Expanding Window Statistics |
1 | df['expanding_sold_mean'] = df.groupby(['id', 'item_id', 'dept_id', 'cat_id', 'store_id', 'state_id'])['sold'].transform(lambda x: x.expanding(2).mean()).astype(np.float16) |
1 | #5.6 Trends |
1 | #Selling Trend는 간단하게, 평균보다 큰지 작은지 만을 비교. |
1 | #5.7 Save the data |
1 | #lag 추가로 인해서, d 35까지 빈 row 들이 많이 발생했으므로 해당기간을 제외. |
1 | df.info() |
<class 'pandas.core.frame.DataFrame'>
Int64Index: 58967660 entries, 1067150 to 60034809
Data columns (total 43 columns):
id int16
item_id int16
dept_id int8
cat_id int8
store_id int8
state_id int8
d int16
sold int16
wm_yr_wk int16
weekday int8
wday int8
month int8
year int16
event_name_1 int8
event_type_1 int8
event_name_2 int8
event_type_2 int8
snap_CA int8
snap_TX int8
snap_WI int8
sell_price float16
sold_lag_1 float16
sold_lag_2 float16
sold_lag_3 float16
sold_lag_6 float16
sold_lag_12 float16
sold_lag_24 float16
sold_lag_36 float16
iteam_sold_avg float16
state_sold_avg float16
store_sold_avg float16
cat_sold_avg float16
dept_sold_avg float16
cat_dept_sold_avg float16
store_item_sold_avg float16
cat_item_sold_avg float16
dept_item_sold_avg float16
state_store_sold_avg float16
state_store_cat_sold_avg float16
store_cat_dept_sold_avg float16
rolling_sold_mean float16
expanding_sold_mean float16
selling_trend float16
dtypes: float16(23), int16(6), int8(14)
memory usage: 4.4 GB
1 | df.to_pickle('data.pkl') |
6. Modeling and Prediction
1 | import time |
1 | %time |
Wall time: 0 ns
1 | #Get the store ids |
*****Prediction for Store: CA_1*****
Training until validation scores don't improve for 20 rounds
[20] training's rmse: 0.843923 training's l2: 0.712206 valid_1's rmse: 0.556612 valid_1's l2: 0.309817
[40] training's rmse: 0.805702 training's l2: 0.649156 valid_1's rmse: 0.536648 valid_1's l2: 0.287992
[60] training's rmse: 0.782521 training's l2: 0.612339 valid_1's rmse: 0.529075 valid_1's l2: 0.27992
[80] training's rmse: 0.765509 training's l2: 0.586004 valid_1's rmse: 0.519001 valid_1's l2: 0.269362
[100] training's rmse: 0.746824 training's l2: 0.557746 valid_1's rmse: 0.516391 valid_1's l2: 0.26666
[120] training's rmse: 0.736669 training's l2: 0.542682 valid_1's rmse: 0.512239 valid_1's l2: 0.262389
[140] training's rmse: 0.725183 training's l2: 0.52589 valid_1's rmse: 0.507517 valid_1's l2: 0.257574
[160] training's rmse: 0.71879 training's l2: 0.516659 valid_1's rmse: 0.503054 valid_1's l2: 0.253063
[180] training's rmse: 0.713246 training's l2: 0.508719 valid_1's rmse: 0.501668 valid_1's l2: 0.25167
Early stopping, best iteration is:
[177] training's rmse: 0.713815 training's l2: 0.509531 valid_1's rmse: 0.501194 valid_1's l2: 0.251195
*****Prediction for Store: CA_2*****
Training until validation scores don't improve for 20 rounds
[20] training's rmse: 0.509193 training's l2: 0.259277 valid_1's rmse: 0.488679 valid_1's l2: 0.238808
[40] training's rmse: 0.476985 training's l2: 0.227515 valid_1's rmse: 0.481392 valid_1's l2: 0.231738
[60] training's rmse: 0.459124 training's l2: 0.210795 valid_1's rmse: 0.469844 valid_1's l2: 0.220753
[80] training's rmse: 0.446454 training's l2: 0.199321 valid_1's rmse: 0.466131 valid_1's l2: 0.217278
[100] training's rmse: 0.44062 training's l2: 0.194146 valid_1's rmse: 0.465138 valid_1's l2: 0.216353
[120] training's rmse: 0.435579 training's l2: 0.189729 valid_1's rmse: 0.462275 valid_1's l2: 0.213698
[140] training's rmse: 0.433312 training's l2: 0.187759 valid_1's rmse: 0.46174 valid_1's l2: 0.213204
[160] training's rmse: 0.430487 training's l2: 0.185319 valid_1's rmse: 0.461825 valid_1's l2: 0.213283
Early stopping, best iteration is:
[149] training's rmse: 0.431706 training's l2: 0.18637 valid_1's rmse: 0.461223 valid_1's l2: 0.212727
*****Prediction for Store: CA_3*****
Training until validation scores don't improve for 20 rounds
[20] training's rmse: 1.31768 training's l2: 1.73629 valid_1's rmse: 0.620532 valid_1's l2: 0.38506
[40] training's rmse: 1.25016 training's l2: 1.56289 valid_1's rmse: 0.599518 valid_1's l2: 0.359422
[60] training's rmse: 1.21357 training's l2: 1.47275 valid_1's rmse: 0.583401 valid_1's l2: 0.340357
[80] training's rmse: 1.18962 training's l2: 1.41519 valid_1's rmse: 0.580415 valid_1's l2: 0.336882
[100] training's rmse: 1.16704 training's l2: 1.36198 valid_1's rmse: 0.573824 valid_1's l2: 0.329274
Early stopping, best iteration is:
[83] training's rmse: 1.18341 training's l2: 1.40046 valid_1's rmse: 0.571149 valid_1's l2: 0.326211
*****Prediction for Store: CA_4*****
Training until validation scores don't improve for 20 rounds
[20] training's rmse: 0.379545 training's l2: 0.144055 valid_1's rmse: 0.306421 valid_1's l2: 0.0938936
[40] training's rmse: 0.362723 training's l2: 0.131568 valid_1's rmse: 0.296737 valid_1's l2: 0.0880528
[60] training's rmse: 0.352526 training's l2: 0.124275 valid_1's rmse: 0.286469 valid_1's l2: 0.0820644
[80] training's rmse: 0.347152 training's l2: 0.120515 valid_1's rmse: 0.283419 valid_1's l2: 0.0803261
[100] training's rmse: 0.342128 training's l2: 0.117052 valid_1's rmse: 0.279012 valid_1's l2: 0.0778477
[120] training's rmse: 0.339248 training's l2: 0.115089 valid_1's rmse: 0.27756 valid_1's l2: 0.0770398
[140] training's rmse: 0.336076 training's l2: 0.112947 valid_1's rmse: 0.27745 valid_1's l2: 0.0769786
Early stopping, best iteration is:
[129] training's rmse: 0.337326 training's l2: 0.113789 valid_1's rmse: 0.276789 valid_1's l2: 0.0766123
*****Prediction for Store: TX_1*****
Training until validation scores don't improve for 20 rounds
[20] training's rmse: 0.779231 training's l2: 0.607202 valid_1's rmse: 0.495078 valid_1's l2: 0.245102
[40] training's rmse: 0.734945 training's l2: 0.540143 valid_1's rmse: 0.477927 valid_1's l2: 0.228414
[60] training's rmse: 0.715 training's l2: 0.511225 valid_1's rmse: 0.474993 valid_1's l2: 0.225618
[80] training's rmse: 0.700945 training's l2: 0.491324 valid_1's rmse: 0.471686 valid_1's l2: 0.222487
[100] training's rmse: 0.688138 training's l2: 0.473534 valid_1's rmse: 0.469721 valid_1's l2: 0.220638
[120] training's rmse: 0.671506 training's l2: 0.45092 valid_1's rmse: 0.468799 valid_1's l2: 0.219772
Early stopping, best iteration is:
[111] training's rmse: 0.678168 training's l2: 0.459912 valid_1's rmse: 0.466017 valid_1's l2: 0.217172
*****Prediction for Store: TX_2*****
Training until validation scores don't improve for 20 rounds
[20] training's rmse: 0.949797 training's l2: 0.902115 valid_1's rmse: 0.519843 valid_1's l2: 0.270237
[40] training's rmse: 0.901254 training's l2: 0.812259 valid_1's rmse: 0.50753 valid_1's l2: 0.257587
[60] training's rmse: 0.860935 training's l2: 0.741208 valid_1's rmse: 0.496691 valid_1's l2: 0.246702
[80] training's rmse: 0.837279 training's l2: 0.701036 valid_1's rmse: 0.500869 valid_1's l2: 0.25087
Early stopping, best iteration is:
[60] training's rmse: 0.860935 training's l2: 0.741208 valid_1's rmse: 0.496691 valid_1's l2: 0.246702
*****Prediction for Store: TX_3*****
Training until validation scores don't improve for 20 rounds
[20] training's rmse: 0.741642 training's l2: 0.550033 valid_1's rmse: 0.569192 valid_1's l2: 0.323979
[40] training's rmse: 0.71047 training's l2: 0.504767 valid_1's rmse: 0.557032 valid_1's l2: 0.310284
[60] training's rmse: 0.68682 training's l2: 0.471721 valid_1's rmse: 0.546532 valid_1's l2: 0.298697
[80] training's rmse: 0.672727 training's l2: 0.452562 valid_1's rmse: 0.541006 valid_1's l2: 0.292688
[100] training's rmse: 0.66163 training's l2: 0.437754 valid_1's rmse: 0.539347 valid_1's l2: 0.290895
[120] training's rmse: 0.650395 training's l2: 0.423014 valid_1's rmse: 0.534985 valid_1's l2: 0.286208
[140] training's rmse: 0.645165 training's l2: 0.416238 valid_1's rmse: 0.532259 valid_1's l2: 0.2833
Early stopping, best iteration is:
[132] training's rmse: 0.646645 training's l2: 0.418149 valid_1's rmse: 0.531403 valid_1's l2: 0.28239
*****Prediction for Store: WI_1*****
Training until validation scores don't improve for 20 rounds
[20] training's rmse: 0.40387 training's l2: 0.163111 valid_1's rmse: 0.351971 valid_1's l2: 0.123884
[40] training's rmse: 0.379547 training's l2: 0.144056 valid_1's rmse: 0.339714 valid_1's l2: 0.115405
[60] training's rmse: 0.370228 training's l2: 0.137069 valid_1's rmse: 0.338534 valid_1's l2: 0.114605
[80] training's rmse: 0.362681 training's l2: 0.131537 valid_1's rmse: 0.335793 valid_1's l2: 0.112757
Early stopping, best iteration is:
[75] training's rmse: 0.363574 training's l2: 0.132186 valid_1's rmse: 0.335287 valid_1's l2: 0.112418
*****Prediction for Store: WI_2*****
Training until validation scores don't improve for 20 rounds
[20] training's rmse: 0.798844 training's l2: 0.638151 valid_1's rmse: 0.99757 valid_1's l2: 0.995147
[40] training's rmse: 0.75986 training's l2: 0.577388 valid_1's rmse: 0.979328 valid_1's l2: 0.959084
[60] training's rmse: 0.729671 training's l2: 0.53242 valid_1's rmse: 0.968394 valid_1's l2: 0.937787
Early stopping, best iteration is:
[57] training's rmse: 0.732588 training's l2: 0.536685 valid_1's rmse: 0.967836 valid_1's l2: 0.936707
*****Prediction for Store: WI_3*****
Training until validation scores don't improve for 20 rounds
[20] training's rmse: 0.803068 training's l2: 0.644919 valid_1's rmse: 0.580289 valid_1's l2: 0.336735
[40] training's rmse: 0.762335 training's l2: 0.581154 valid_1's rmse: 0.573159 valid_1's l2: 0.328512
[60] training's rmse: 0.739142 training's l2: 0.546331 valid_1's rmse: 0.566164 valid_1's l2: 0.320541
Early stopping, best iteration is:
[51] training's rmse: 0.748455 training's l2: 0.560184 valid_1's rmse: 0.563976 valid_1's l2: 0.318069
1 |
|
1 |