I’m working on a data analytics project and need an explanation and answer to help me learn

Posted: June 8th, 2022

Place your order now for a similar assignment and have exceptional work written by our team of experts, At affordable rates

Learning Goal: I’m working on a data analytics project and need an explanation and answer to help me learn.Data:https://www.kaggle.com/mastmustu/income?select=test.csv grading rubric can be found below:R codeDecision/WhyCommunication of findingsPercentage of Assigned Points30%35%35%Decision/why?: Explain your reasoning behind your choice of the procedure, set of variables and such for the question.Explain why you use the procedure/model/variable
To exceed this criterion, describe steps taken to implement the procedure in a non technical way.
Communication of your findings: Explain your results in terms of training MSE, testing MSE, and prediction of the variable YExplain why you think one model is better than the other.
To exceed this criterion, explain your model and how it predicts y in a non technical way.
Part 1: Exploratory Data Analysis (20 points)Check for existence of NA’s (missing data)
If necessary, classify all categorical variables except the one you are predicting as factors. Calculate the summary statistics of the entire data set.
For the numerical variables, plot box plots based on values of y. Do you see a difference between the box plots for any of the variables you choose?
For the categorical variables, plot bar charts for the different values of y. Do you see a difference between plots for any of the variables you choose?
Test/training separation: Separate your data into 80% training and 20% testing data. Do not forget to set seed. Please use the same separation for the whole assignment, as it is needed to be able to compare the models.
Part 2: Logistic Regression or LDA (15 points)Develop a classification model where the variable y is the dependent variable using the Logistic Regression or LDA, rest of the variables, and your training data set.
Obtain the confusion matrix and compute the testing error rate based on the logistic regression classification.
Part 3: KNN (15 points)Apply a KNN classification to the training data using.
Obtain the confusion matrix and compute the testing error rate based on the KNN classification.
Part 4: Tree Based Model (15 points)Apply one of the following models to your training data: Classification Tree, Random Forrest, Bagging or Boosting
Obtain the confusion matrix and compute the testing error rate based on your chosen tree based model.
Part 5: SVM (15 points)Apply a SVM model to your training data.
Calculate the confusion matrix using the testing data.
Part 6: Conclusion (20 points)(10 points) Based on the different classification models, which one do you think is the best model to predict y? Please consider the following in your response:Accuracy/error rates
Do you think you can improve the model by adding any other information?
(10 points) What are your learning outcomes for this assignment? Please focus on your learning outcomes in terms of statistical learning, model interpretations, and R skills – it is up to you to include this part in your presentation or not.
Requirements: R CODE

Expert paper writers are just a few clicks away

Place an order in 3 easy steps. Takes less than 5 mins.

Calculate the price of your order

You will get a personal manager and a discount.
We'll send you the first draft for approval by at
Total price:
\$0.00