Math GPT Prompt Design

LLMs have remarkable ability in understanding and generating natural language. However, there are certain tasks they struggle with.

LLMs don't do math

Language and mathematics are fundamentally different. Language has a finite set of words and grammatical rules that can be learned and applied to new situations.

Numbers are infinite however and can't be learned in the same way as language. No matter how many times-tables you learn by wrote, there are always more which don't follow the same patterns. This makes it challenging for LLMs to perform even simple mathematical calculations.

Code is language

Luckily, LLMs are very good at writing code for evaluating mathematical expressions. And we have plenty of tools for performing calculations. A simple calculator will do the job, or basically any programming language.

To test this idea, let's try to build a chatbot which acts as a helpful assistant, demonstrating the things that LLMs are good at, but also capabale of answering questions involving arithmetic.

Seeding the prompt

We'll start the prompt by describing roughly what role we want the chatbot to play. This has been documented a bunch online. We'll start with something simple, and see how text-davinci-003 does in response to an arithmetic question.

Explicitly naming each speaker in the prompt and prefilling the response with the chatbot's name helps the model understand its role in the conversation.


MathBot is a knowledgable assistant, trying to be as helpful as possible in response to any question from Human.

Human: what is the cube root of 523415?



The cube root of 523415 is 125.

As expected, this is a way off the actual answer of 80.59. Let's add to the prompt, telling the model to resist any urge to try its hand at arithmetic, and instead write code to express the calculation. We'll also give it an example, so it gets the idea.

Asking the chatbot to enclose expressions in XML tags helps the response stay on track and also makes it easier for us to parse the text afterwards


MathBot is a knowledgable assistant, trying to be as helpful as possible in response to any question from Human.

If the question requires an arithmetic calculation, MathBot answers with "<math>EXPRESSION</math>" where EXPRESSION is a mathematical expression suitable for sending to the math.js API, described on this page: . Before any <math> expressions, MathBot thinks step-by-step and states any assumptions it makes.

Human: what is 1 + 1?

MathBot: 1 + 1 is <math>1+1</math>

Human: what is the cube root of 523415?



The cube root of 523415 is <math>cbrt(523415)</math>

Nice, that's what we want. Now lets think about turning that expression into a numerical answer.

Calculating expressions

Choosing a language

You might notice we asked the chatbot to write mathematical expressions in a specific language defined by a third-party API. This is a harder task than writing in better known languages such as Python or JavaScript, and using an HTTP API is much slower too. However this way we don't have to deal with the security concerns of executing unsanitised code on our servers, so it makes sense for the demo.

Server vs client

We opt to perform calculations from the web browser, after the LLM's response, rather than doing it on the server. This way you can see the chatbot's raw response, and the role the calculator is playing.

Avoid unlearning

It's important we hold onto the chatbot's raw response and feed this dialogue back into subsequent prompts. If we instead fed back the parsed response with numerical answers, gradually the model would "unlearn" the original instruction to write <math> expressions rather than numerical answers.

Adding examples

Creating a chatbot that can do a combination of general questions, arithmetic questions and questions that incorporate reasoning as well as math is a challenge. We can help by providing lots of examples in the prompt. Let's explore some of those examples.

Answering non-math

We still want the chatbot to answer general questions which don't require performing arithmetic. This is especially challenging for questions involving quantities. In these scenarious its often better to rely on the LLMs knowledge rather than try to derive an answer mathemtically.

Human: How far away is the moon?

MathBot: The moon is an average of 238,855 miles away from earth, which is about 30 earths.

Human: What is the 5th prime number?

MathBot: The 5th prime number is 11.

Stating assumptions

Human: How many ping pong balls would it take to fill an empty olympic swimming pool?

MathBot: An Olympic-sized swimming pool is 50 meters long, 25 meters wide and 2 meters deep. A standard ping pong ball is 40mm in diameter, or 0.04 meters, and its radius is half of that. The packing density of randomly ordered spheres is around 64%. Therefore it would take roughly <math>0.64*50*25*2/(4*pi*pow(0.5*0.04,3)/3)</math> ping pong balls to fill an empty pool.

Including units

Human: A goat is tied to a stick with a 46.5 meter rope. What area of grass can it graze on?

MathBot: Assuming that the pole is thin, the goat can graze on a circle of grass with radius equal to the length of the rope.

The area of a circle is πr^2, where r is the radius. Therefore the goat can graze on <math>pi*pow(46.5,2)</math> square meters of grass.

Unit conversions

Human: How many cm2 are in one km2?

MathBot: 1 km2 is equal to 10,000,000,000 cm2.


  • [] Use </math> as a stop sequence -> hit the api from server -> start another prompt with the chatbot's response prefilled with the math answer. This would allow the answer to be context-aware of the math answers, and likely to be more insightful.