Posts

Module 10: Friedman R Package Proposal

     For this assignment, I created the initial structure for an R package called Friedman . The goal of this project was to learn how R packages are built and how they organize code, documentation, and metadata in a standardized way. Before this assignment, I didn’t fully understand how packages worked behind the scenes, so this helped me see how everything is connected. Purpose and Scope The purpose of the Friedman package is to make it easier to reuse code for data analysis and data mining tasks. In many assignments, I find myself writing similar code over and over again, especially for summarizing data or preparing results. This package is designed to group those repeated tasks into simple, reusable functions. The intended users are students and beginner R users who want a basic toolkit to help with analysis. Instead of rewriting code each time, they can use functions from the package to save time and stay organized. Key Functions Right now, the package includes one m...

Module 9: Comparing Base R, Lattice, and ggplot2

Image
       For this assignment, I used the built-in iris dataset in R. I chose this dataset because it includes several numerical variables and a grouping variable, Species , which made it useful for comparing different visualization systems. I created plots using base R, lattice, and ggplot2 to see how the syntax, workflow, and final output differed across the three systems. Base R Graphics      For the base R portion, I created a scatter plot showing the relationship between sepal length and petal length. Base R was straightforward to use for a basic graph, but it required more manual work, such as adding the legend separately.   Lattice Graphics      For the lattice portion, I used a conditioned scatter plot to separate the data by species. This made it easier to compare patterns across groups because each species appeared in its own panel.   ggplot2 Graphics      For ggplot2, I created a scatter plot with...

Module 8: plyr mean by Sex + filtering names with “i”

Image
     For this assignment, I worked with the Assignment 6 dataset in R. The dataset contains four variables: Name, Age, Sex, and Grade . The goal was to import the dataset, calculate the mean grade by category, filter the data based on a condition in the Name column, and export the results to files.      First, I imported the dataset into R using read.table() and confirmed that it loaded correctly. To quickly inspect the data, I used the head() function to display the first few rows of the dataset. This allowed me to verify that the columns and values were read properly.        The preview shows that each row represents a student and includes their name, age, sex, and grade.            Next, I wanted to see how many students were in each category of the Sex variable. I used the table() function to count how many males and females were present in the dataset.           ...

Module 7: R Objects S3 vs. S4

  For this assignment, I used the built-in dataset mtcars from R (so I didn’t have to download anything). First I loaded it and checked the first few rows to confirm it worked. Step 1. Data mtcars is a data frame (32 rows × 11 columns). Since it’s a normal R dataset, it already comes with a class and lots of functions that work with it. Step 2. Can a generic function be assigned to this dataset? If not, why? A generic function is a function that chooses which method to run based on the class of the object you pass in (like print() , summary() , or plot() ). For mtcars , generic functions already work because mtcars has class "data.frame" (and also behaves like a list under the hood). For example, summary(mtcars) runs the summary.data.frame() method automatically. If I tried to use a generic function that has no method for a data frame, it wouldn’t know what to do (it would either fall back to a default method or error). That’s basically the “why not” case: the obj...

Module 6: Matrix Operations in R

Image
     In this assignment, I practiced basic matrix operations in R, including addition, subtraction, and creating special matrices using the diag() function. These skills are important for understanding linear algebra concepts in data analysis and statistics. The topics are connected to matrix operations discussed in The Art of R Programming  and good coding practices from R Packages.    Question 1 :  Consider A=matrix(c(2,0,1,3), ncol=2) and B=matrix(c(5,2,4,-1), ncol=2). a) Find A + B b) Find A - B First, I created two matrices:      Matrix addition and subtraction are done element-by-element, as long as both matrices have the same dimensions. Question 2 :  Using the diag() function to build a matrix of size 4 with the following values in the diagonal 4,1,2,3. Next, I created a 4×4 matrix with values 4, 1, 2, and 3 on the diagonal.                 The diag() function places the given va...

Module 5: Determinant and Inverse of Matrices in R

Image
 This assignment practiced creating matrices in R and computing a determinant and inverse (when possible).   Step 1: Creating Matrices A and B   I created the matrices using matrix() :    Matrix A contains the numbers 1 to 100 arranged into 10 rows. Matrix B contains the numbers 1 to 1000 arranged into 10 rows.    Step 2: Checking the Dimensions To check the size of each matrix, I used: Step 3: Finding the Determinant of A I used the det() function to find the determinant of A: This result shows that matrix A is singular , which means it does not have an inverse.    Step 4: Finding the Inverse of A I attempted to find the inverse of A using:   Result: R returned an error saying that the system is computationally singular. This confirms that A does not have an inverse because its determinant is zero.   Step 5: Testing Matrix B Since B is not a square matrix, I tested:     Result: Both commands returned errors. This...

Hospital Patient Blood Pressure Analysis: Boxplots and Histogram

Image
       For this assignment, I analyzed hospital patient data to examine how blood pressure relates to doctors’ assessments and final treatment decisions. The dataset includes 10 patients and contains information on visit frequency, blood pressure, doctor evaluations, and final care decisions. Using R, I created boxplots and a histogram to visualize patterns in the data. Methods I entered the dataset into R and converted the doctor assessments into numeric values. I then created: A basic boxplot of blood pressure Side-by-side boxplots comparing blood pressure with doctor ratings A histogram showing the overall distribution of blood pressure I also calculated summary statistics to support my interpretation. Results Boxplots The boxplots compare blood pressure with: First doctor rating (good vs bad) Second doctor assessment (low vs high) Final decision (low vs high) Histogram The histogram shows the overall distribution of blood pressure values. Di...