Skip to content

fraisasghar/Machine-Learning-Classification-using-Python

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Project Banner

Python NumPy Pandas Matplotlib Seaborn Scikit Learn License

Bottom Line

Introduction

This project implements machine learning classification techniques using Python to analyze and interpret data patterns. The primary focus is on applying Decision Tree algorithms to classify data based on key features, with comprehensive evaluation through various performance metrics and visualization tools.

The project demonstrates end-to-end machine learning workflow including data preprocessing, exploratory data analysis, model training, hyperparameter tuning, and performance evaluation using confusion matrices and classification reports.

image

Objectives

  1. Data Processing: Clean, preprocess, and prepare datasets for machine learning algorithms
  2. Exploratory Data Analysis: Visualize data distributions and identify patterns
  3. Model Implementation: Build and train Decision Tree classification models
  4. Performance Optimization: Apply hyperparameter tuning using GridSearchCV
  5. Evaluation: Assess model performance through various metrics and visualizations
  6. Visualization: Create insightful plots for data and model analysis

Dataset Overview

The project utilizes a dataset containing:

  • Stock Price: Numerical values representing stock prices
  • Trading Volume: Numerical values representing trading volumes
  • Stock Name: Categorical labels for different stocks (AAL, AAPL, AAP, ABBV, ABC, ABT, ACN)

Dataset Characteristics:

  • Total samples: 1,833 entries
  • Features: 2 numerical features (Stock Price, Trading Volume)
  • Target: 7 classes (stock names)
  • Data types: Float64, Int64, Object

Key Insights

  • Stock Price Range: Shows significant variation from 25 to 105
  • Trading Volume: Remains constant across all quartiles (0.05)
  • Data Distribution: Stock prices display normal distribution while trading volumes show uniform distribution

image image image image

Decision Tree Analysis

Model Architecture

The Decision Tree classifier was implemented to analyze stock classification based on trading volume and stock price. The tree structure shows how the model makes splitting decisions at each node to classify different stock symbols.

Tree Structure Explanation

The decision tree uses the following splitting criteria:

  • Primary Split: Trading volume threshold at 3,209,598
  • Secondary Splits: Stock price thresholds at various levels (86.73, 45.14, etc.)
  • Gini Index: Measures impurity at each node (lower values indicate better splits)
  • Samples: Number of data points at each node
  • Value Array: Distribution of stock classes [AAL, AAP, AAPL, ABBV, ABC, ABT, ACN]
image

Key Decision Points

  1. Root Node: Initial split based on trading volume
  2. Branching: Subsequent splits based on stock price thresholds
  3. Leaf Nodes: Final classification decisions with class distributions
  4. Purity Metrics: Gini index values indicate split effectiveness

Visual Representation

The following tree structure illustrates the classification logic:

image

Model Performance Analysis

Classification Results Comparison

This section presents the comprehensive evaluation of the Decision Tree classifier before and after hyperparameter tuning. The analysis includes precision, recall, F1-scores, and confusion matrices to assess model effectiveness.

image

Pre-Tuning Performance

The initial model shows strong overall performance with 90% accuracy. Key observations:

  • AAL: Perfect classification (1.00 across all metrics)
  • ABT: High performance with 0.95 precision and 0.93 F1-score
  • ACN: Lower precision (0.70) but good recall (0.88)
  • AAPL: Perfect precision but lower recall (0.33) due to limited samples

Post-Tuning Performance

After hyperparameter optimization with GridSearchCV:

  • Overall Accuracy: Improved to 91%
  • ABBV: Enhanced performance (0.93 F1-score)
  • ABC: Improved recall (0.93)
  • ACN: Zero predictions due to class imbalance
  • Trade-offs: Some classes show precision-recall trade-offs after tuning
image image image

Confusion Matrix Analysis

Training Data Results

The training confusion matrix demonstrates:

  • Diagonal Dominance: Strong diagonal values indicate correct predictions
  • Class Separation: Clear distinction between most stock classes
  • Minor Misclassifications: Some confusion between similar stock patterns
  • Training Fit: Model shows good learning from training data

Testing Data Results

The testing confusion matrix reveals:

  • Generalization: Model maintains performance on unseen data
  • Consistency: Similar patterns to training matrix
  • Real-world Performance: Good applicability to new data
  • Robustness: Stable predictions across different stock classes

image image

Decision Boundary Visualization

Model Decision Analysis

The decision boundary plot provides a visual representation of how the Decision Tree classifier separates different stock classes in the feature space. This visualization helps understand the model's classification logic and feature importance.

image

Plot Interpretation

The decision boundary shows:

  • X-axis: Stock Price (Primary feature for classification)
  • Y-axis: Trading Volume (Secondary feature influencing decisions)
  • Colored Regions: Different areas representing classified stock symbols
  • Boundary Lines: Decision thresholds learned by the model
image

Conclusion

This project successfully demonstrates the application of machine learning classification techniques for stock market analysis using Python. Through comprehensive data preprocessing, exploratory analysis, and model development, the Decision Tree classifier achieved 91% accuracy in classifying different stocks based on price and volume features.

🎉 Enjoy Your Machine Learning 🎉

If this project helped or inspired you,
give it a ⭐ Star on GitHub!

Built with precision ❤️ for the Engineering Community
Happy Designing!

Bottom Line

About

This project applies machine learning techniques in Python to analyze data, visualize feature relationships, and perform classification using models such as Decision Tree and Support Vector Machine. Model performance is evaluated through confusion matrices and graphical analysis.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages