Key Steps of Performing Data Mining in R
August 01, 2024
Lesson 06 Exercise Review
Lesson Question!
The 8 Key Steps of a Data Mining Project
Step 1: Define the project’s goal based on available data
The project objective serves as the guiding light.
Car Data Mining Project Example: predict a car’s fuel efficiency (measured as miles per gallon, mpg) based on key factors or attributes of the car.
Example Project: Car’s fuel efficiency (mpg) influenced by:
The dataset becomes the mine where insights are excavated.
Step 2: Acquire Analysis Tools
Use R for data analysis
R includes an extensive collection of packages and functions
Example R packages:
dplyrtidyrData preparation: transformation, cleaning, and preprocessing
Filtering out missing or invalid values
Transforming and restructuring data
Cleaning and Filtering:
is.na(), complete.cases(), filter()Data Transformation:
scale(), normalize(), log()Data Restructuring:
pivot_longer(), pivot_wider()Data Type Conversion:
as.numeric(), as.Date(), as.factor()Summarize key insights from the core of the data to identify trends, correlations, and high-level patterns
Techniques:
Visual representation of data is crucial for interpretation. It is used to communicate insights visually, identify relationships, and detect outliers.
Techniques:
Build models to uncover hidden patterns and relationships
Common techniques:
Linear Model (lm):
mpg using wt (weight) and hp (horsepower).summary(model):
Model Output:
Intercept: 37.23 (baseline mpg when wt and hp are zero).
Coefficients:
wt, mpg decreases by 3.88 units.hp, mpg decreases by 0.03 units.R-squared:
mpg).Significance:
wt and hp are significant predictors (p-values < 0.05).Validate model accuracy and generalizability to ensure that the model performs well on unseen data and avoids overfitting.
Techniques:
Cross-validation:
trainControl(method = "cv", number = 10) sets up 10-fold cross-validation.train() function:
lm) using wt (weight) and hp (horsepower) as independent variables to predict mpg.Model Output:
mpg explained by the model).Purpose:
Interpret results and apply findings in decision-making
Key considerations:
Implementation:
Main Takeaways from this lecture:
8 Key Steps of Data Mining:
Data Preparation:
dplyr and tidyr streamline these tasks.Modeling:
mpg from wt and hp).Model Validation:
Interpretation & Implementation:
Data Mining Lab