models and scores
model definition can be found in scripts/model_lib.py
model1 LGBM with 83 (76 numerical, 7 categorical) features.
model2 keras with 27(18 numerical, 9 categorical) features, You can see network structure in model.png
|model||private score||public score|
feature engineering and scripts
Most of these features have already been discussed on the kaggle forum.
time to next click
time bucket count.(make multiple time intervals, and count the number of buckets which the IP exists)
target encoding: woe
Features will be calculated once and saved to disk.
Importance from LGBM is found in importance.txt.
I used following environment
- Memory: 256GB RAM, 256GB SWAP
- CPU: 20 core, 2.10GHz
- GPU: 1080Ti
How to run
At first, put sample_submission.csv test.csv test_supplement.csv train.csv to
Then run shell scripts as follows,
$ cd scripts/
Output prediction files will be in csv directory.
It took about one day for feature extraction(run_mk_feats.sh).
It needs large memory(~256GB) to build model1(run_mk_model1.sh), sorry.
GPU is required to build model2(run_mk_model2.sh)