Second attempt
This commit is contained in:
@@ -1,32 +1,23 @@
|
||||
##Final R assignment in Intro to Statistics course, fall semster.
|
||||
#+Written by Matan Horovitz (207130253) and Guy Amzaleg ()
|
||||
#+We have chosen a dataset of the first 600,000 commits to the Linux Kernel Git repository - as published on Kaggle:
|
||||
#+https://www.kaggle.com/datasets/philschmidt/linux-kernel-git-revision-history
|
||||
#+This dataset examines commits made to the Linux kernel project over the last 12 years.
|
||||
raw_kernel_commits <- read.csv("/home/shmick/linux_kernel_git_revlog.csv")
|
||||
##BONUS: convert from EPOCH: as.Date(as.POSIXct(1100171890,origin = "1970-01-01"))
|
||||
View(raw_kernel_commits)
|
||||
##For question 1, we have chosen to examine whether the amount of changes (additions and deletions) vary over time.
|
||||
##A larger amount of changes per commit is correlated with a decrease in the quality of the code - as more changes
|
||||
#+are harder to track and audit, while a smaller amount of changes per commit implies stricter coding standards
|
||||
#+over time.
|
||||
##To examine the correlation, we made a subset of the dataset, discarding data which is irrelevant to our question,
|
||||
#+and summed all changes from the same commits into an aggregate - as we are not concerned about the files changed,
|
||||
#+only the total amount of changes on each individual commit.
|
||||
#+We have chosen a dataset of CPU and GPU performance trends since 2000 - as published on Kaggle:
|
||||
#+https://www.kaggle.com/datasets/michaelbryantds/cpu-and-gpu-product-data
|
||||
|
||||
kernel_commits <- subset(raw_kernel_commits, select= c(author_timestamp,n_additions,n_deletions))
|
||||
# Make a subset ^ of the dataset which includes ^ the EPOCH time,^ additions ^ and deletions.
|
||||
#Rename the resulting dataset into friendlier column names
|
||||
colnames(kernel_commits) <- c("EPOCH","Additions","Deletions")
|
||||
#Sum additions and deletion into a new column, named "Total changes"
|
||||
kernel_commits["Total changes"] <- kernel_commits["Additions"]+kernel_commits["Deletions"]
|
||||
View(kernel_commits) #< examine the resulting dataset
|
||||
#Unite all commits with same EPOCH to get changes PER COMMIT
|
||||
kernel_commits_sum <- aggregate(. ~ EPOCH,data=kernel_commits,FUN=sum)
|
||||
|
||||
View(kernel_commits_sum)
|
||||
#Measure the correlation between EPOCH time (from oldest to newest) and the total number of changes per commit.
|
||||
cor(kernel_commits_sum["EPOCH"],kernel_commits_sum["Total changes"])
|
||||
raw_perf_data <- read.csv("/home/shmick/Downloads/chip_dataset.csv")
|
||||
##BONUS: convert from EPOCH: as.Date(as.POSIXct(1100171890,origin = "1970-01-01"))
|
||||
View(raw_perf_data)
|
||||
##For question 1, we have chosen to examine which type of chip has examined the greater improvement over the years - GPU chips or CPU chips.
|
||||
#+As chip perfomance is most directly correlated with the number of transistors, we have measured the pace of development based on pace of
|
||||
#+increasing transistor count.
|
||||
CPU <- chip[chip$Type == 'CPU',]
|
||||
GPU <- chip[chip$Type == 'GPU',]
|
||||
|
||||
CPU_Transistor_Count <- order(CPU$Transistors..million.)
|
||||
GPU_Transistor_Count <- order(GPU$Transistors..million.)
|
||||
##Iterate over date entries
|
||||
for (i in 1:length(CPU$Release.Date)){print(i)}
|
||||
##Get date
|
||||
for (i in 1:length(CPU$Release.Date)){print(CPU$Release.Date[i])}
|
||||
|
||||
##QUESTION 2: measure number of columns in our dataset and calculate a permutation and combination of
|
||||
#+that number, minus two, and 3.
|
||||
|
||||
Reference in New Issue
Block a user