Build an OpenAI Assistant to chat with your Postgres database

Recently I wanted to get my hands dirty using some of the new dev tools from OpenAI. I really wanted to try using their LLM “with my own data”. The typical approach for this is using RAG – Retrieval Augmented Generation. This is where you chunk up a bunch of unstructured text, store it in your vector database, then feed chunks to the LLM to answer a question.

But lots of data IS structured – sitting in databases of various types. So I thought it would be interesting to see if I could get the LLM to work as a direct chat interface to my database.

I wanted to build essentially a version of the psql command line tool that understands natural language instead of SQL. The result is a little tool that I called gptsql.

The final result works quite well, with some limitations. If you want to try it out just run:

$ pip install gptsql

$ gptsql

You can get the code from the Github repo: https://github.com/tatari-tv/gptsql.

On first run it will prompt you for your database connection and your Open AI API Key.

Now you can simply ask questions in natural language:

> show me all the users from California
thinking... Running SQL: select * from users where state = 'CA' limit 10
considering results...
[assistant] -> Here is a sample of users from California:

id   name      state
---- --------- -------------
11   scott     CA
...

The library uses the new OpenAI Assistants API. This library builds on top of the traditional “completions” API by implementing an “agent loop” that attempts to solve a problem beyond basic text completion by:

Executing multiple steps to solve the question
Using “tools” to help solve the question. Tools supported today include function calling, info retrieval, and code execution.
Managing a persistent “conversation thread” that allows you to ask chained questions and refer back to prior answers.
Managing the LLM context for you (truncating or compressing the message history to fit into the “context window” length supported by the LLM model).

So the logic of gptsql is pretty simple – it tells the LLM that there is a function which will execute SQL commands against our database. When we ask a question in natural language the assistant will construct one or more SQL queries to try to answer the question. We execute those SQL queries and then return the results to the assistant.

Using the Assistants API

OpenAI has pretty good documentation on using the Assistants API. The basic structure looks like this:

Create an Assistant object which includes an instruction prompt.
Create a Thread object which will hold the message history in your chat.
Prompt the user for their question
Add the user’s question to the Thread’s list of messages
Now create a Run object against the Thread, in order to “run” the assistant
to answer the question from the user.
..now poll the Run object waiting for the assistant to finish
And now retrieve the response message from the Thread and print it for the user

All these objects are saved so you can come back to your same Thread later and ask it more questions. I leveraged this in gptsql by just having it re-use the same thread every time you run it. The Assistants API takes care of truncating the message history so that your LLM context doesn’t grow too large.

The key to our tool is in Step 6. While we poll for the Run to complete, it may enter a state where it uses one of it’s “tools”. This could include retrieving indexed data or executing some python code in a sandbox. But it can also enter a state where it requests to execute one or more functions. Our code needs to recognize this state and execute the requested functions, then return the results to the assistant for further processing.

In the simplest case you just get this pattern:

add the user’s question to the thread
execute a Run
..poll for status
..receive function call request
..execute a SQL query
..return query results to the assistant
..Run is complete
retrieve the final message and print it

But the assistant logic is a lot smarter than just that. It can execute multiple queries and collate results, or run code to process data in some specific way. I didn’t get too much into code execution but I want to explore prompts for that in the future. One obvious cool use case is to ask the assistant to generate a chart
from the data returned from a query.

Using function calling

In my first attempt at using the Assistants API I provided 3 functions as “tools” to the LLM:

list_schemas – The schemas in the database

list_tables – List the tables in a schema

list_columns – List the columns of a table

run_query – Execute a SQL query

Then I built a simple REPL loop on top of the assistant. The basic loop was:

prompt for the next question
send the question to the OpenAI Assistant
…listen for function calls
invoke functions
return results to the assistant
print the assistant’s final results

This worked, but the function calls weren’t super consistent. Eventually I realized that I was doing it wrong. The schema functions just executed SQL under the covers, and the LLM already knew all the syntax. So I realized that I really just needed a single function:

“run_sql_query” – “Execute a SQL query on the postgres database”

That’s it! If I want to know any schema information I can just say “list all the tables” or “what are the columns of the orders table” and the LLM will construct the right query against the information catalog in the db.

Because listing available tables is so common and useful to know, I wrote code to inject the list of available tables into the user prompt with every question:

cmd = "This list of tables in the database:\n" + ",".join(self.table_list) + "\n----\n" + cmd
self.process_command(thread, cmd)

this helps the LLM avoid a function call just when it needs to know what are the tables it can use.

Returning results to the user

The expected way to use the Assistants API is that after your function call executes then you pass the results back to the LLM, which may extract some data or summarize the results before returning the “final” answer.

This works fine when the result data is small, but I wanted my script to work like psql and let me just query a bunch of rows like “select * from users limit 200”. The problem with this query is that it will return too much data.

I created a workaround by having the script remember the results of the last query that it executed. Then I gave the assistant another function tool “print_results” which just prints all the query results to the screen.

Sometimes if you ask a question like “show me 100 rows from the users table” the assistant will figure out to call “print_results” all by itself. In case it doesn’t (and just summarizes the data) you can always just ask “print results” and it will call the function:

> query 200 movies from the title table
⠸ --> run_sql_command()  Running select query: SELECT * FROM title LIMIT 200;
⠋ considering results...

> print results
⠇ thinking...LAST QUERY RESULTS:
⠏ thinking...          id                                              title  ... series_years                            md5sum
0    2331802          Talleres y oficios de México: Mascaras IV  ...         None  da99605389aa206fb7d56113b1c1a94d
1    2335115                                Teacher of the Year  ...         None  adb126de9fa1e1e9fb974c0b8b38d1
...
[200 rows x 12 columns]
[assistant] --> The query results have been printed. If you need any further information or assistance, please let me know!

Safety

The gptsql script runs the SQL queries requested by the LLM automatically, rather than prompting for permission. This makes the tool a lot smoother to use. But out of caution I decided to only support SELECT queries. So if you try something obvious like:

> please delete all the rows of the user table
% thinking... Invalid query:  DELETE FROM title;

There is a simple check in the “run_sql_query” function which makes sure that your SQL statement starts with “SELECT”. This check however is very dumb. I am pretty sure if you got the LLM to generate a compound query like “select 1; delete from user” then you can get around this naive failsafe. So I strongly recommend only using gptsql against a test database or else with a read-only database credential.

In the future I would like to add support for mutations, and the simplest approach would be to prompt for permission before executing any mutation queries. However even this would have its problems since a complex query could end up being a lot more destructive than it appears at a glance. Likely you need to analyze the query carefully and flag any UPDATES or DELETES as potentially destructive.

What did I learn?

I have to say that the results of this simple program are quite impressive. The Open AI models (both GPT3 and GPT4) are very capable of constructing lots of queries, including building joins between tables and crafting groupings and aggregates.

You don’t have to instruct prescriptively either, like “please join to the address table to get the user’s address”. Usually you can just ask “please list all users with their address” and the LLM can figure out the JOIN that needs to happen.

The constructed SQL is not always correct, and it’s not hard to start asking more complex questions that the LLM will struggle with. I expect that “complex analytics” use cases are still a ways off.

The chat interface lends itself super well to exploring your data. If you ask “how many users come from California?” and get your result, then you can follow up with “and how many signed up this year?” and the assistant knows that you mean to modify your prior query. It’s clear that “copilot”-style products will be the most successful initial LLM-powered products because human judgement, error correction and steering are still critical to getting the most value out of the models. True unattended automation feels likely to be farther off, although there are some good examples of “self-correction” like the data analysis mode in GPT4 which is able to correct some coding errors automatically.

The LLM also has a ton of real-world contextual knowledge, so you can ask it things like “what is table X used for?” and it can usually offer a pretty good guess.

I expect that most businesses will have this kind of natural language interface to their data very soon, and these capabilities will allow lots of non-SQL experts to do a lot more useful business intelligence work without a business analyst.

The limitations

Despite that optimism, there are still some notable issues that I ran into. The biggest one is simply speed: it can often take the assistant 30+ seconds to respond to a question. For interactive use I find myself often asking my question and then switching over to some other task and letting the LLM run in the background. Any “real world” solution will have to run much faster.

The second issue is cost: I ran up an Open AI bill over $100 just doing basic testing during development. I’m not sure how much you are gonna want to use a tool that might cost $5-$20 per day!

Finally, the Assistant API is in beta, and it shows. Response quality can vary a lot, as well as performance, and sometimes the assistant simply refuses to answer. I found that I needed to prompt “please try again” fairly often.