Module 8: plyr mean by Sex + filtering names with “i”
For this assignment, I worked with the Assignment 6 dataset in R. The dataset contains four variables: Name, Age, Sex, and Grade. The goal was to import the dataset, calculate the mean grade by category, filter the data based on a condition in the Name column, and export the results to files.
First, I imported the dataset into R using read.table() and confirmed that it loaded correctly. To quickly inspect the data, I used the head() function to display the first few rows of the dataset. This allowed me to verify that the columns and values were read properly.
The preview shows that each row represents a student and includes their name, age, sex, and grade.
Next, I wanted to see how many students were in each category of the Sex variable. I used the table() function to count how many males and females were present in the dataset.
ddply() function to calculate the average grade grouped by Sex. This operation groups the dataset by the Sex variable and computes the mean of the Grade column for each group.The results show that the average grade for females is 86.94, while the average grade for males is 80.25. This means the female group has a slightly higher average grade in this dataset.
In the next step, I filtered the dataset to find students whose names contain the letter “i.” I used the subset() function along with grepl() and set ignore.case = TRUE so the search would match both uppercase and lowercase letters.
Finally, I exported the results to files. The dataset with the calculated grade averages was written to a file, and the filtered dataset was converted to a CSV file. Saving results this way makes it easier to share the data or open it in other programs like Excel.
Overall, this assignment helped reinforce several important R skills, including importing data, summarizing grouped data, filtering observations based on text patterns, and exporting results. These operations are common steps in data analysis workflows and are useful when working with larger datasets.
Comments
Post a Comment