Member-only story

Feature Engineering of DateTime Variables for Data Science, Machine Learning

Learn how to make more meaningful features from DateTime type variables to be used by Machine Learning Models

Samarth Agrawal

Published in

TDS Archive

7 min readApr 28, 2021

Feature Engineering of DateTime Variables. Image by Author.

INTRODUCTION

DateTime fields require Feature Engineering to turn them from data to insightful information that can be used by our Machine Learning Models. This post is divided into 3 parts and a Bonus section towards the end, we will use a combination of inbuilt pandas and NumPy functions as well as our functions to extract useful features.

Part 1 — Extract Date / Time Components
Part 2 — Create Boolean Flags
Part 3 — Calculate Date / Time Differences
Bonus — Feature Engineering in 2 lines of code using fast_ml

BACKGROUND

Whenever I have worked on e-commerce related data, in some way or the other dataset contains DateTime columns.

User registration date-time
User login date-time
Transaction date-time
Disputed transaction date-time
… and many more

At the outset, this date field gives us nothing more than a specific point on a timeline. But these DateTime fields are potential treasure troves of data. These fields are immensely powerful ‘if used rightly’ for uncovering patterns.

As a Data Scientist, your job is to bring the insight to the table, and for that, you are required to ask the right questions. For Ex.

Ques 1 — When do you see most carts getting created?
Ques 2 — When do you see most carts getting abandoned?
Ques 3 — When do you see the most fraudulent transactions?
Ques 4 — When do the maximum users subscribe?
Ques 5 — When are certain items purchased most often?
Ques 6 — After how many days/hours after registration user makes the first order?
Ques 7 — After how many days of inactivity customer never returns to your site?
… etc

TDS Archive

Feature Engineering of DateTime Variables for Data Science, Machine Learning

Learn how to make more meaningful features from DateTime type variables to be used by Machine Learning Models

INTRODUCTION

BACKGROUND

Create an account to read the full story.

Published in TDS Archive

Written by Samarth Agrawal

No responses yet