This notebook has been designed for the cooking challenge competition. It has been made by Paula Lago.
import pandas as pd
import glob
import numpy as np
%matplotlib inline
# enable plots to be shown in the notebook
First we will try to read one file and explore the data. We will use glob to get all files in one folder. Glob searches all files that match a query string. In this case, we want all files inside the right arm folder with csv extension.
data_folder = 'right_arm/*.csv'
files = glob.glob(data_folder)
The files variable now contains all the files that matched our query.
print(len(files))
print(files[:10]) #Print the first ten file names
Let's read one of the files and see what it contains
arm_data = pd.read_csv(files[0])
arm_data.plot(subplots=True)
Let's read the labels file to identify what is the activity for this file. We will first read the file into a single column.
labels = pd.read_csv("labels.txt", sep=' ', header=None)
labels.head()
We will now split the file identifier and the macro activity into separate columns using the split method
labels = labels[0].str.split(",", n=2, expand=True)
labels.columns = ['file_id', 'macro', 'micro'] #give names to the columns
labels.index = labels['file_id'] #use the file id as index to make it searchable by file_id
labels.head()
Now, let's see what are the activities for the file we read. We need the file id, which is the name of the file without the folder and without the .csv extension
file_id = files[0][files[0].find("/")+1:files[0].find(".")]
print(file_id)
labels.loc[file_id]
The file corresponds to making sandwich, and there are three micro activities: Put, Cut and other. Let's see the other sensors from the same file.
hip_data = pd.read_csv("left_hip/"+file_id+".csv")
hip_data.plot(subplots=True)
lwrist_data = pd.read_csv("left_wrist/"+file_id+".csv")
lwrist_data.plot(subplots=True)
rwrist_data = pd.read_csv("right_wrist/"+file_id+".csv")
rwrist_data.plot(subplots=True)
As you can see, sometimes the data is noisy or missing. You need to decide how to handle such potential errors.