有这么一句话在业界广泛流传:数据和特征决定了机器学习的上限,而模型和算法只是逼近这个上限而已。那特征工程到底是什么呢?顾名思义,其本质是一项工程活动,目的是最大限度地从原始数据中提取特征以供算法和模型使用。特征工程是一个非常重要的课题,是机器学习中不可缺少的一部分,机器学习实践的成功与否很大程度上取决于如何特征工程。
The process of feature engineering
- Brainstorming or Testing features;
- Deciding what features to create;
- Creating features;
- Checking how the features work with your model;
- Improving your features if needed;
- Go back to brainstorming/creating more features until the work is done.
参考:
Original link:https://izhangzhihao.github.io//2017/11/26/Advanced-Feature-Engineering/