This notebook has been designed for the cooking challenge competition. It has been made by Paula Lago.

Exploring the data¶

import pandas as pd
import glob
import numpy as np
%matplotlib inline  
# enable plots to be shown in the notebook

First we will try to read one file and explore the data. We will use glob to get all files in one folder. Glob searches all files that match a query string. In this case, we want all files inside the right arm folder with csv extension.

data_folder = 'right_arm/*.csv' 
files = glob.glob(data_folder)

The files variable now contains all the files that matched our query.

print(len(files))
print(files[:10]) #Print the first ten file names

288
['right_arm/subject3_file_680.csv', 'right_arm/subject3_file_694.csv', 'right_arm/subject3_file_482.csv', 'right_arm/subject1_file_248.csv', 'right_arm/subject2_file_687.csv', 'right_arm/subject3_file_284.csv', 'right_arm/subject2_file_877.csv', 'right_arm/subject1_file_300.csv', 'right_arm/subject2_file_650.csv', 'right_arm/subject2_file_136.csv']

Let's read one of the files and see what it contains

arm_data = pd.read_csv(files[0])
arm_data.plot(subplots=True)

array([<matplotlib.axes._subplots.AxesSubplot object at 0x114559b70>,
       <matplotlib.axes._subplots.AxesSubplot object at 0x11460c438>,
       <matplotlib.axes._subplots.AxesSubplot object at 0x11469e898>],
      dtype=object)

Let's read the labels file to identify what is the activity for this file. We will first read the file into a single column.

labels = pd.read_csv("labels.txt", sep=' ', header=None)
labels.head()

We will now split the file identifier and the macro activity into separate columns using the split method

labels = labels[0].str.split(",", n=2, expand=True)
labels.columns = ['file_id', 'macro', 'micro'] #give names to the columns
labels.index = labels['file_id'] #use the file id as index to make it searchable by file_id
labels.head()

Now, let's see what are the activities for the file we read. We need the file id, which is the name of the file without the folder and without the .csv extension

file_id = files[0][files[0].find("/")+1:files[0].find(".")]
print(file_id)

subject3_file_680

labels.loc[file_id]

file_id    subject3_file_680
macro               sandwich
micro         Put,Cut,other,
Name: subject3_file_680, dtype: object

The file corresponds to making sandwich, and there are three micro activities: Put, Cut and other. Let's see the other sensors from the same file.

hip_data = pd.read_csv("left_hip/"+file_id+".csv")
hip_data.plot(subplots=True)

array([<matplotlib.axes._subplots.AxesSubplot object at 0x1146e5048>,
       <matplotlib.axes._subplots.AxesSubplot object at 0x114708160>,
       <matplotlib.axes._subplots.AxesSubplot object at 0x11487c518>],
      dtype=object)

lwrist_data = pd.read_csv("left_wrist/"+file_id+".csv")
lwrist_data.plot(subplots=True)

array([<matplotlib.axes._subplots.AxesSubplot object at 0x114c43048>,
       <matplotlib.axes._subplots.AxesSubplot object at 0x114c98470>,
       <matplotlib.axes._subplots.AxesSubplot object at 0x114cbe8d0>],
      dtype=object)

rwrist_data = pd.read_csv("right_wrist/"+file_id+".csv")
rwrist_data.plot(subplots=True)

array([<matplotlib.axes._subplots.AxesSubplot object at 0x1154859e8>,
       <matplotlib.axes._subplots.AxesSubplot object at 0x11550e710>,
       <matplotlib.axes._subplots.AxesSubplot object at 0x114ec8b70>],
      dtype=object)

As you can see, sometimes the data is noisy or missing. You need to decide how to handle such potential errors.

	0
0	subject1_file_111,fruitsalad,Take,Add,Mix,
1	subject1_file_113,cereal,Take,
2	subject1_file_116,cereal,Take,Open,Cut,Peel,
3	subject1_file_119,sandwich,Take,
4	subject1_file_131,sandwich,Cut,other,

	file_id	macro	micro
file_id
subject1_file_111	subject1_file_111	fruitsalad	Take,Add,Mix,
subject1_file_113	subject1_file_113	cereal	Take,
subject1_file_116	subject1_file_116	cereal	Take,Open,Cut,Peel,
subject1_file_119	subject1_file_119	sandwich	Take,
subject1_file_131	subject1_file_131	sandwich	Cut,other,