Skip to content

coding-for-it/Natural-Language-SQL-Query-Engine

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Natural Language SQL Query Engine

Overview

Natural Language SQL Query Engine is an intelligent analytics system that converts plain English business questions into safe, executable SQL queries and automatically generates insights with visualizations.

Instead of writing SQL manually, users can simply ask:

  • "Show sales by region"
  • "Top products by revenue"
  • "Average order value"

The system interprets intent, generates secure SQL queries, executes them on the database, and displays structured results with charts.


Problem Statement

In many organizations, non-technical stakeholders cannot write SQL queries. They depend on data teams for even basic insights, leading to:

  • Slower decision-making
  • Analyst bottlenecks
  • Reporting delays
  • Communication gaps

There is a need for a self-service analytics interface where business users can interact with data directly using natural language.


Project Goal

Build a secure and reliable Natural Language to SQL engine that:

  1. Accepts business questions in English
  2. Extracts intent, metric, and dimensions
  3. Generates structured SQL queries
  4. Validates queries for safety
  5. Executes them on the database
  6. Automatically visualizes results

All without requiring SQL knowledge.


System Architecture

High-Level Workflow

  1. User enters question
  2. NLP logic detects:
    • Intent (aggregation, ranking, trend, average)
    • Metric (sales, revenue, quantity)
    • Dimension (region, product, date)
    • Filters (time conditions)
  3. SQL query is generated
  4. Validator checks for unsafe commands
  5. Database executes safe query
  6. Results are returned
  7. Automatic visualization is created

Flow Diagram

        User Question
              |
              v
     Intent Detection (Rule-based NLP)
              |
              v
       SQL Query Generator
              |
              v
        Query Validator
          |           |
         Safe       Unsafe
          |           |
          v           v
   Execute Query    Reject
          |
          v
      Fetch Results
          |
          v
   Auto Visualization
          |
          v
        Display

Project Structure

Natural-Language-SQL-Query-Engine/
│
├── app.py                 # Streamlit user interface
├── ai_sql_generator.py    # NLP-based SQL generation logic
├── validator.py           # Query security validation
├── snowflake_connector.py # Database connection & execution
├── visualization.py       # Automatic chart selection logic
├── schema.sql             # Sample dataset schema
├── requirements.txt
└── README.md

How the System Understands Questions

This project uses structured rule-based NLP instead of uncontrolled LLM outputs.

It extracts:

  • Intent → aggregation / ranking / trend
  • Metric → revenue / sales / quantity
  • Dimension → region / product / date
  • Time filters → last month / this year

Then constructs a valid SQL query using predefined mappings.

This ensures:

  • Accuracy
  • Control
  • Predictable behavior
  • Enterprise reliability

Example

User Question:

How many orders has each customer placed?

Generated SQL:

SELECT
  C.CUSTOMERNAME,
  COUNT(O.ORDERID) AS NumberOfOrders
FROM CUSTOMERS AS C
LEFT JOIN ORDERS AS O
  ON C.CUSTOMERID = O.CUSTOMERID
GROUP BY
  C.CUSTOMERNAME
ORDER BY
  C.CUSTOMERNAME;

Security Layer (Key Feature)

The query validator blocks destructive operations such as:

  • DELETE
  • DROP
  • UPDATE
  • ALTER
  • INSERT

Only safe SELECT queries are executed.


Visualization Engine

The system automatically selects chart type based on result structure:

Result Type Visualization
Category + Value Bar Chart
Date + Value Line Chart
Single Metric KPI Display

No manual plotting required.


Technologies Used

  • Python
  • Streamlit
  • Pandas
  • Plotly
  • Snowflake Connector

How to Run

1. Install Dependencies

pip install -r requirements.txt

2. Run Application

streamlit run app.py

3. Ask Questions in the UI


Sample Questions

  • total sales
  • sales by region
  • top regions by revenue
  • average order value
  • sales trend over time

Business Value

This project enables:

  • Self-service analytics
  • Faster insight generation
  • Reduced analyst dependency
  • Real-time decision-making

It functions like a lightweight BI tool powered by natural language input.


Future Enhancements

  • Multi-table joins
  • CSV upload support
  • Role-based access control
  • Natural language explanation of results
  • Session memory for saved insights

Conclusion

Natural Language SQL Query Engine demonstrates how NLP, SQL generation, validation logic, and visualization can be combined to create a secure and intelligent analytics assistant for business users.

About

An AI-powered analytics web application that allows users to query a Snowflake database using natural language. The system converts user questions into SQL queries using Gemini API, executes them in Snowflake, and displays results with visualization.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages