Expert guide to using OpenAI's ChatGPT to write SQL queries
Using ChatGPT to generate SQL queries can be a game-changer, especially for those who might not be SQL experts. ChatGPT quickly translates your request into SQL code with just a simple description of what you want to achieve. This can save you valuable time spent on manual query writing and debugging. It’s like having an SQL expert on standby, helping bridge the gap for beginners while speeding up the workflow for seasoned developers.
Another advantage of using ChatGPT for SQL queries is its ability to assist learning and problem-solving. Getting stuck on syntax or logic errors can be frustrating when learning SQL. ChatGPT can generate queries and explain each part, helping you understand the structure and purpose behind the code. For experienced users, it can provide quick solutions to complex problems, offering different query options you might not have initially considered. This blend of support, speed, and flexibility makes ChatGPT a valuable companion for anyone working with databases.
Advantages:
Speed: Quickly generate SQL queries, speeding up the development process and allowing for faster iterations.
Learning Tool: Beginners can use chatbots like ChatGPT as a helpful learning resource, providing query examples and verifying SQL structures.
Adaptability: ChatGPT can handle standard SQL and different flavours like MySQL, PostgreSQL, and even SQL Server. For more specific insights, check out our guide on best practices when using ChatGPT with SQL Server. We are always surprised to see how easily it can switch between dialects on request.
Best Results with Detailed Context: To achieve the most accurate results, it’s often necessary to provide the AI with the database schema directly. The more context it has, the better the generated query will align with the specific requirements.
Disadvantages:
Limited Default Context: GPT doesn't have a built-in understanding of your database schema by default and will attempt to answer your query regardless, which can result in using columns or tables that don't exist. This issue can be resolved by pasting in the relevant database schema, but it can become tedious to do this repeatedly.
Specific Instructions Needed: General or vague prompts can lead to generic or less-than-optimal queries. Detailed instructions are needed to get precise outputs.
Doesn’t Replace Human Expertise: For highly complex, nuanced, or performance-critical tasks, the experience and intuition of a seasoned database professional remain essential.
Security Concerns: Relying on external platforms for SQL generation can introduce security risks, especially when sharing sensitive schema information or data.
Risk of Misinterpretation: If the user’s prompt isn’t clear enough, ChatGPT might produce a syntactically correct query that doesn't fully meet the intended goal.
Lack of Query Optimization: ChatGPT doesn't have insight into the size or distribution of data within your tables, which means it might not produce optimal queries in terms of performance. For example, without understanding data volume or indexing, the generated SQL may be inefficient, leading to slower execution times. For those concerned about performance, it's important to consider ChatGPT SQL query optimization best practices to enhance the efficiency of the generated queries.
Best Practices for Writing SQL with ChatGPT
Using ChatGPT for writing SQL can be a game-changer, but getting the best results takes a little know-how, like any tool. A lot depends on how you phrase your requests and the context you give them. In this guide, we'll go over some tips to help you structure your prompts to get the most accurate queries and what to do if the responses start to miss the mark.
How to Structure ChatGPT Prompts for SQL Writing
A well-structured prompt is crucial in getting useful SQL output. Here are some tips on how to provide clear and concise instructions for ChatGPT:
Be Specific About Your Requirements: The more detail you provide in your prompt, the more accurate ChatGPT’s response will be. When describing the SQL query you need, include information about the tables, columns, conditions, and any specific operations (e.g., filtering, sorting, grouping) you want to perform.
Example Prompt:
I have a table called "orders" with columns: order_id, customer_id, product_id, order_date, and total_amount. I need a query to find the total sales for each customer in 2023, sorted by the highest total sales.
ChatGPT's Output:
SELECT customer_id, SUM(total_amount) AS total_salesFROM ordersWHERE YEAR(order_date) = 2023GROUP BY customer_idORDER BY total_sales DESC;
Include the Schema When Possible: Since ChatGPT doesn't have an inherent knowledge of your database schema, pasting in relevant parts of the schema can significantly improve response accuracy. For example, if your database has tables with similar names or multiple related columns, providing the schema helps ChatGPT generate a more precise query.
Example of Providing Schema Context:
Here is the schema for my database:
Table: customers Columns: customer_id, first_name, last_name, email
Table: orders Columns: order_id, customer_id, product_id, order_date, total_amount
Table: products Columns: product_id, product_name, category, price
I need a query to list all customers who made a purchase in 2023, along with the total amount they spent.
ChatGPT's Output:
SELECT customers.customer_id, customers.first_name, customers.last_name, SUM(orders.total_amount) AS total_spent
FROM customers
JOIN orders ON customers.customer_id = orders.customer_id
WHERE YEAR(orders.order_date) = 2023
GROUP BY customers.customer_id, customers.first_name, customers.last_name;
Specify the SQL Flavor: SQL syntax can vary slightly between databases (e.g., MySQL, PostgreSQL, SQL Server). When relevant, mention which SQL dialect you use to ensure the query is formatted correctly.
Example Prompt:
I’m using PostgreSQL. Write a query to find the top 5 products based on the total sales amount in 2023 from the "orders" and "products" tables.
ChatGPT's Output:
SELECT p.product_name, SUM(o.total_amount) AS total_sales
FROM products p
JOIN orders o ON p.product_id = o.product_id
WHERE DATE_PART('year', o.order_date) = 2023
GROUP BY p.product_name
ORDER BY total_sales DESC
LIMIT 5;
Fixing drops in output quality from ChatGPT for SQL Queries.
If you notice that ChatGPT’s responses start hallucinating or make incorrect assumptions, it might be because the session has become cluttered with too much prior information. Creating a new chat can reset the conversation, helping you obtain more accurate results.
Start by asking ChatGPT for a detailed breakdown of all tables and columns using our example prompt:
Based on the information provided, summarize all SQL tables and columns in detail to use in another session for query generation.
Once you receive this summary, open a new ChatGPT session and paste the summary into the initial prompt. By providing this fresh context, you ensure ChatGPT only uses the relevant data moving forward, making its responses more accurate. This practice is particularly useful if you've been working with multiple databases or have explored various aspects of a schema in previous conversations.
Additionally, if you’re dealing with a large database, you don’t need to paste the entire schema—just the tables and columns relevant to your query. This keeps the context clear and focused, reducing the chance of misinterpretation.
By following these practices, you can increase the quality of the SQL queries generated by ChatGPT.
Example ChatGPT prompts to generate SQL.
In this section, we provide a series of example prompts to demonstrate how to effectively use ChatGPT for generating SQL queries based on specific data retrieval tasks and given database schemas. Each example illustrates how to structure your prompt to include the necessary context and details, enabling ChatGPT to produce accurate and efficient SQL code for various scenarios. By studying these examples, you'll learn best practices for crafting prompts that help you leverage ChatGPT's capabilities in SQL query generation.
Example 1: Provide a list of all customer countries, ranked by the number of orders placed in the last 24 days, from highest to lowest.
Given the following table schema, provide a list of all customer countries ranked by the number of orders placed in the last 24 days, from highest to lowest.
```sql
-- Customers Table
CREATE TABLE Customers (
customer_id INT PRIMARY KEY,
name VARCHAR(100),
country VARCHAR(100)
);
-- Orders Table
CREATE TABLE Orders (
order_id INT PRIMARY KEY,
customer_id INT,
order_date DATE,
FOREIGN KEY (customer_id) REFERENCES Customers(customer_id)
);
```
Now, write a query to rank the countries by the total number of orders in the last 24 days.
Example 2: List all sign-ups with their attribution channels, ranked by the number of sign-ups for each channel in the last calendar month.
Given the following table schema, list all sign-ups with their attribution channels, ranked by the number of sign-ups for each channel in the last calendar month.
```sql
-- Users Table
CREATE TABLE Users (
user_id INT PRIMARY KEY,
signup_date DATE,
attribution_channel VARCHAR(100)
);
```
Now, write a query to rank the attribution channels by the number of sign-ups in the last calendar month.
Example 3: Identify customers who have not made a purchase in the past year but have shown activity on the website.
Given the following table schema, identify customers who have not made a purchase in the past year but have shown activity on the website.
```sql
-- Customers Table
CREATE TABLE Customers (
customer_id INT PRIMARY KEY,
name VARCHAR(100),
email VARCHAR(100)
);
-- Orders Table
CREATE TABLE Orders (
order_id INT PRIMARY KEY,
customer_id INT,
order_date DATE,
FOREIGN KEY (customer_id) REFERENCES Customers(customer_id)
);
-- Activity Table
CREATE TABLE Activity (
activity_id INT PRIMARY KEY,
customer_id INT,
activity_date DATE,
activity_type VARCHAR(100),
FOREIGN KEY (customer_id) REFERENCES Customers(customer_id)
);
```
Now, write a query to find customers who haven't made a purchase in the past year but have at least one activity recorded on the website.
Example 4: Identify the first action taken in our app by customers who convert within 48 hours.
Given the following table schema, identify the first action taken in our app by customers who convert within 48 hours.
```sql
-- Customers Table
CREATE TABLE Customers (
customer_id INT PRIMARY KEY,
signup_date DATETIME
);
-- Conversions Table
CREATE TABLE Conversions (
conversion_id INT PRIMARY KEY,
customer_id INT,
conversion_date DATETIME,
FOREIGN KEY (customer_id) REFERENCES Customers(customer_id)
);
-- Actions Table
CREATE TABLE Actions (
action_id INT PRIMARY KEY,
customer_id INT,
action_type VARCHAR(100),
action_date DATETIME,
FOREIGN KEY (customer_id) REFERENCES Customers(customer_id)
);
```
Now, write a query to find the first action taken by customers who convert within 48 hours of their signup date.
Using ChatGPT SQL plugins
ChatGPT SQL plugins make life easier by letting the AI connect directly to your database. This real-time access means you don't have to input schema details manually—the plugin grabs and uses them automatically. As a result, you get more accurate and context-aware SQL queries without the extra hassle. These plugins can optimize queries on the fly based on table sizes, indexes, and performance metrics, helping you avoid inefficient queries and boosting overall performance. For more details, check out our guide on ChatGPT SQL plugins.