Data Description
The data recorded ten nurses with more than three years of clinical suctioning experience and twelve nursing students from a university performing Endotracheal Suctioning (ES) on the ESTE-SIM simulation system.
There are two types of data in this dataset:
- Video: recorded from the front side of the nurse beyond the patient mannequin. This video will be published only during training.
- Pose Skeleton (Keypoints): extracted from videos by using YOLOv7. This data will be published for training and testing.
There are a total of 9 activities in the ES procedure. All the activities are listed in the below table.
Table 1. Activities in endotracheal suctioning and their IDs
Activity class id | Activity name |
---|---|
0 | Catheter preparation |
1 | Temporal removement of an artificial airway |
2 | Suctioning phlegm |
3 | Refitting the artificial airway |
4 | Catheter disinfection |
5 | Discarding gloves |
6 | Positioning |
7 | Auscultation |
8 | Others |
Data Structure
Subject Information
We invited ten nurses and twelve nursing students for our data collection. Each participant was asked to perform ES procedure twice. The dataset is divided into a training set (32 videos) and a submission set (12 videos). Here is the information for currently available subjects.
Table 2. Subject information
Subject_id | Usage | Experience | Note |
---|---|---|---|
N01T1 | Training | Nurse | |
N01T2 | Training | Nurse | Same person as N01T1 |
N02T1 | Training | Nurse | |
N02T2 | Training | Nurse | Same person as N02T1 |
N03T1 | Submission | Nurse | |
N03T2 | Submission | Nurse | Same person as N03T1 |
N04T1 | Training | Nurse | |
N04T2 | Training | Nurse | Same person as N04T1 |
N05T1 | Submission | Nurse | |
N05T2 | Submission | Nurse | Same person as N05T1 |
N06T1 | Training | Nurse | |
N06T2 | Training | Nurse | Same person as N06T1 |
N07T1 | Training | Nurse | |
N07T2 | Training | Nurse | Same person as N07T1 |
N09T1 | Submission | Nurse | |
N09T2 | Submission | Nurse | Same person as N09T1 |
N11T1 | Training | Nurse | |
N11T1 | Training | Nurse | Same person as N11T1 |
N12T1 | Training | Nurse | |
N12T1 | Training | Nurse | Same person as N12T1 |
S01T1 | Training | Student | |
S01T2 | Training | Student | Same person as S01T1 |
S02T1 | Training | Student | |
S02T2 | Training | Student | Same person as S02T1 |
S03T1 | Training | Student | |
S03T2 | Training | Student | Same person as S03T1 |
S04T1 | Submission | Student | |
S04T2 | Submission | Student | Same person as S04T1 |
S05T1 | Training | Student | |
S05T2 | Training | Student | Same person as S05T1 |
S06T1 | Submission | Student | |
S06T2 | Submission | Student | Same person as S06T1 |
S07T1 | Training | Student | |
S07T2 | Training | Student | Same person as S07T1 |
S08T1 | Training | Student | |
S08T2 | Training | Student | Same person as S08T1 |
S09T1 | Training | Student | |
S09T2 | Training | Student | Same person as S09T1 |
S10T1 | Training | Student | |
S10T2 | Training | Student | Same person as S10T1 |
S11T1 | Training | Student | |
S11T2 | Training | Student | Same person as S11T1 |
S12T1 | Submission | Student | |
S12T2 | Submission | Student | Same person as S12T1 |
Datasets will be published in the following directory structure:
- ./dataset/
- ./video/
- {Training_subject_id}.MTS
- N01T2.MTS
- ...
- S11T1.MTS
- S11T2.MTS
- ./keypoints/
- {Subject_id}_keypoint.csv
- N01T2_keypoint.csv
- ...
- S12T1_keypoint.csv
- S12T2_keypoint.csv
- ./ann/
- {Training_subject_id}_ann.csv
- N01T2_ann.csv
- ...
- S11T1_ann.csv
- S11T2_ann.csv
Video
The ./video/ folder contains 32 videos in MTS format of subjects in the Training set (see Table 2). The frame per second of videos is 30 and the image size is 1920×1080.
Keypoints
Inside the ./keypoint/ folder, we have provided 44 CSV files containing x, and y coordinates and confidence scores of 17 positions on the subject's body in each video frame. These keypoints were extracted by using YOLOv7. Sometimes, there are other people included in the frame while passing by in the background, therefore we applied post-processing steps in the skeleton results to keep only the skeleton of the main nurse. Therefore, the sampling rate of keypoint is also 30. If you open the files you can see the following columns.
Table 3. Keypoint data description ({Subject_id}_keypoint.csv)
Column name | Description of column |
---|---|
nose_x | X coordinate value of nose |
nose_y | Y coordinate value of nose |
nose_conf | Confidence value of nose |
left_eye_x | X coordinate value of left eye |
left_eye_y | Y coordinate value of left eye |
left_eye_conf | Confidence value of left eye |
right_eye_x | X coordinate value of right eye |
right_eye_y | Y coordinate value of right eye |
right_eye_conf | Confidence value of right eye |
left_ear_x | X coordinate value of left ear |
left_ear_y | Y coordinate value of left ear |
left_ear_conf | Confidence value of left ear |
right_ear_x | X coordinate value of right ear |
right_ear_y | Y coordinate value of right ear |
right_ear_conf | Confidence value of right ear |
left_shoulder_x | X coordinate value of left shoulder |
left_shoulder_y | Y coordinate value of left shoulder |
left_shoulder_conf | Confidence value of left shoulder |
right_shoulder_x | X coordinate value of right shoulder |
right_shoulder_y | Y coordinate value of right shoulder |
right_shoulder_conf | Confidence value of right shoulder |
left_elbow_x | X coordinate value of left elbow |
left_elbow_y | Y coordinate value of left elbow |
left_elbow_conf | Confidence value of left elbow |
right_elbow_x | X coordinate value of right elbow |
right_elbow_y | Y coordinate value of right elbow |
right_elbow_conf | Confidence value of right elbow |
left_wrist_x | X coordinate value of left wrist |
left_wrist_y | Y coordinate value of left wrist |
left_wrist_conf | Confidence value of left wrist |
right_wrist_x | X coordinate value of right wrist |
right_wrist_y | Y coordinate value of right wrist |
right_wrist_conf | Confidence value of right wrist |
left_hip_x | X coordinate value of left hip |
left_hip_y | Y coordinate value of left hip |
left_hip_conf | Confidence value of left hip |
right_hip_x | X coordinate value of right hip |
right_hip_y | Y coordinate value of right hip |
right_hip_conf | Confidence value of right hip |
left_knee_x | X coordinate value of left knee |
left_knee_y | Y coordinate value of left knee |
left_knee_conf | Confidence value of left knee |
right_knee_x | X coordinate value of right knee |
right_knee_y | Y coordinate value of right knee |
right_knee_conf | Confidence value of right knee |
left_ankle_x | X coordinate value of left ankle |
left_ankle_y | Y coordinate value of left ankle |
left_ankle_conf | Confidence value of left ankle |
right_ankle_x | X coordinate value of right ankle |
right_ankle_y | Y coordinate value of right ankle |
right_ankle_conf | Confidence value of right ankle |
Annotation (ann)
Inside the ./ann/ folder, we have provided 32 CSV files containing annotations of each video in the Training set (see Table 2). If you open the files you can see the following columns.
Table 4. Annotation data description ({Training_subject_id}_ann.csv)
Column name | Description |
---|---|
start_time | Start time of activity (in second) |
stop_time | Stop time of activity (in second) |
annotation_str | Activity name |
annotation | Activity ID (described in Table 1) |
Data Usage
You can get the dataset download link by registering to participate on the challenge.
Rules
We provide video and skeleton data for the Training set. The skeleton data is extracted from the video by using YOLOv7 and applying our post-processing to define and track the main subject in the video. Because of limitations on camera position, skeleton data could only identify certain portions of the body. According to the challenge's objective, we only allow participants to utilize the provided skeleton for activity recognition in the testing phase. Participants must utilize a generative model or large language model for some ingenuity. The video data is only able to be used for training or for being imaginative and creative using Generative AI.
For the testing set, only skeleton data are published. From the provided skeleton data, participants are required to propose their pipelines, predict, and submit the activity label for the submission set (in each second) as shown in the tutorial.
The submission file contains the columns detailed below.
- subjectID: Corresponding participant’s ID (see Table 2)
- timestamp: Each second
- activityID: The activity class is supposed to happen (see Table 1)
SubjectID | timestamp | activityID |
---|---|---|
N03T1 | 00:00 - 00:01 | 0 |
N03T1 | 00:01 - 00:02 | 0 |
For evaluation, we will consider F1 Score and the paper contents. We will take an average F1 score for all the subjects. The baseline result for each subject is shared in the table below.
Subject ID | Accuracy | F1 score |
---|---|---|
N03T1 | 0.50 | 0.39 |
N03T2 | 0.49 | 0.44 |
N05T1 | 0.39 | 0.28 |
N05T2 | 0.51 | 0.36 |