This notebook has been designed for the nurse care activity recognition challenge competition with the the aim of providing the basic knowledge of Human Activity Recognition by accelerometer.

It has been made by Le Nhat Tan.

Download ipybn

click here

Exploring the data

First, we have to load the data and label file. In this tutorial, we only use the train data of 1 user to reduce the time of the whole process.

Let's check what information the data contains.

We can see that the data file contains 5 columns: subject_id, datetime, and 3 coordinates of the accelerometer data.

For label file, we have 8 columns: id (label id), user_id, activity_type_id, activity_type (name), target_id (patients), activity2user_id, start and finish timestamp of the activity.

Visualization the data

Check the bar plot of labels

Pre-processing the data

The missing value and duplicated value rows are dropped. But we can also use several methods to handle missing value, it depends on your pipelines.

All the values are sorted by datetime

In the label file, nan and duplicated value rows are also dropped.

Now, we only get the label of user we utilize

We change all the timestamp data into the same data type

We can check how the data change after pre-processing


We try to reset the index value

Segment the data by the timestamp given by label file.

Every segment windows are extracted by the start and finish time of the activity in label file.

Features Extraction

In this tutorial, we extract 4 main features: STD, Average, Max, Min of 3 coordinates.


The Random Forest Model is utilized in this tutorial

Divide data into train and test file to evaluate the results

Train the model

Check the results

Now, your turn! Let's analyze the challenge dataset!