Member-only story
Feature Engineering of DateTime Variables for Data Science, Machine Learning
Learn how to make more meaningful features from DateTime type variables to be used by Machine Learning Models

INTRODUCTION
DateTime fields require Feature Engineering to turn them from data to insightful information that can be used by our Machine Learning Models. This post is divided into 3 parts and a Bonus section towards the end, we will use a combination of inbuilt pandas and NumPy functions as well as our functions to extract useful features.
- Part 1 — Extract Date / Time Components
- Part 2 — Create Boolean Flags
- Part 3 — Calculate Date / Time Differences
- Bonus — Feature Engineering in 2 lines of code using
fast_ml
BACKGROUND
Whenever I have worked on e-commerce related data, in some way or the other dataset contains DateTime columns.
- User registration date-time
- User login date-time
- Transaction date-time
- Disputed transaction date-time
- … and many more
At the outset, this date field gives us nothing more than a specific point on a timeline. But these DateTime fields are potential treasure troves of data. These fields are immensely powerful ‘if used rightly’ for uncovering patterns.
As a Data Scientist, your job is to bring the insight to the table, and for that, you are required to ask the right questions. For Ex.
- Ques 1 — When do you see most carts getting created?
- Ques 2 — When do you see most carts getting abandoned?
- Ques 3 — When do you see the most fraudulent transactions?
- Ques 4 — When do the maximum users subscribe?
- Ques 5 — When are certain items purchased most often?
- Ques 6 — After how many days/hours after registration user makes the first order?
- Ques 7 — After how many days of inactivity customer never returns to your site?
- … etc