Thursday, August 7, 2025
Implementing AI function calling with Express and Azure OpenAI
Context
The context of this entry is the natural technical evolution— from simply requesting a response from Azure OpenAI, to later understanding the need for streaming responses, and now to enabling the model to call functions we make available.
As mentioned, when we allow the model decide if it is required to use functions as tools. The model — just like humans — becomes significantly more useful, especially when equipped with the right ones.
The implementation is quite straightforward, though there are some considerations to be aware of.
TLDRS
You can find all code in the following Github repository.
Scope
Maintaining simplicity as a core principle, the goal of this entry is to equip our solution with the ability to recognize the need to use a tool — a function that exists solely within our workspace.
Additionally, this integration acknowledges the technical challenges previously covered in earlier entries.
- Reducing response latency through data streaming to optimize the user experience.
Implementation
To begin the implementation, let’s take a look at the official documentation of function calling from OpenAI.
[…] The model may decide to call these functions — instead of (or in addition to) generating text or audio. You’ll then execute the function code, send back the results, and the model will incorporate them into its final response.
We can highlight a simplify workflow:
1. I will send a message which includes the tools available
2. The model might decide to use them or not. If so I have to execute the function and send back the result.
3. The model will take this answer and reply back.
We can take a look at the workflow in the following diagram.

Here is a key questions:
- If I have to send the model two messages, how should I structure them so that the model understands the sequence?
To understand the answer, we first need to understand how to handle a conversation with the model. Having conversations with a model has been a crucial evolution for the stability and security of interactions with language models. I (highly) recommend reading OpenAI’s Model Spec.
It presents, in a simple and coherent way, the rules and the chain of command — like the levels of authority — to ensure the AI’s behavior is consistent and predictable.
Implementing a conversation format
A simple test would be to restructure the request as follows:
- Our API will take the role of user, and the input from the request will be appended as content to the message.
The model will take the role of assistant, and its responses will be appended as content to the message — creating a base payload format.
const userMessages = [
{ role: "user", content: req.body.input }
];
const basePayload = {
input: JSON.stringify(userMessages),
model: req.body.model || "o4-mini",
tools: tools,
tool_choice: "auto",
stream: true
};Please note that we are in a scenario where the output of the assistant is consumed by an application — in this case, our API — and is typically required to follow a precise format.
Next, let’s define the tool that will be available to the model.
export const tools = [
{
type: "function",
name: "getWeather",
description: "Gets the current weather in a given city",
parameters: {
type: "object",
properties: {
location: {
type: "string",
description: "City and country e.g. Arica, Chile"
}
},
required: [
"location"
],
type Props string
getWeather location Props PromiseWeather
rawData
location
weatherSchemarawData
We are going to implement the first request and format the streaming result.
const initialRequest = await fetch(env.AZURE_ENDPOINT, {
method: "POST",
headers: {
"Content-Type": "application/json;",
"Authorization": `Bearer ${env.AZURE_API_KEY}`
},
body: JSON.stringify(basePayload)
});
if (!initialRequest.ok || !initialRequestbody
err initialRequest
res
res
reader initialRequestbody
decoder
As a basic implementation, I’m going to define a flag to detect whether the response includes a tool call. As the diagram guided us, if a tool is required, we need to identify the function name and perform that function call.
Next, with the result from that call, we create an upgraded messages array that is ready for the second request.
let toolTriggered = false;
let updatedMessages = [...userMessages];
while (true) {
const { done, value } = await reader.read();
if (done) break;
const chunk = decoder.decode(value, { stream: true });
const lines = chunk.split("\n");
line lines
line
json line
json
parsed json
outputs parsed
type parsed
type outputslength
output outputs
status output
outputType output
status outputType
name output
params output
callId output
name
toolTriggered
tool_response params
updatedMessages
userMessages
output
callId
tool_response
toolTriggered
reader
res
toolTriggered
Finally, we perform the second request in a fresh new iteration, while continuing to stream the results.
if (toolTriggered) {
const secondPayload = {
input: JSON.stringify(updatedMessages),
model: req.body.model || "o4-mini",
tools: tools,
stream: true
};
const secondRequest = await fetch(env.AZURE_ENDPOINT, {
method: "POST",
headers: {
"Content-Type": "application/json;"
secondPayload
secondRequestok secondRequestbody
err secondRequest
res
res
secondRequestbody secondRequestbody
reader secondRequestbody
decoder
isDone
done value reader
done
chunk decodervalue
line chunk
line
json line
json
res
res
res
isDone
res
res
It was a straightforward, point-by-point workflow implementation. We can later consider refactoring the streaming process. As a basic implementation, let’s see if it works first.
Testing
Let’s start by testing a scenario where no tool is needed.
curl -N -H "Content-Type: application/json" \ 3s 18:28:53
-X POST http://localhost:4000/v1/chat \
-d '{
"model": "o4-mini",
"input": "what are things I should do in Coyhaique, Chile. explain briefly? short and simple"
}'
...
"output": [
...
{
"status": "completed",
"content": [{
...
"text": "Here are a few must-do activities in and around Coyhaique, ..."
}],
"role": "assistant"
}],And let’s test a scenario where a tool is actually needed.
curl -N -H "Content-Type: application/json" \
-X POST http://localhost:4000/v1/chat \
-d '{
"input": "what is the weather like in Coyhaique, Chile today?"
}'
...
{
"item": {
"type": "function_call",
"status": "completed",
"arguments": "{\"location\":\"Coyhaique, Chile\"}",
"call_id": "call_sz...",
"name": "getWeather"
}
}
...
"output": [
...
{
"status": "completed",
"content": [{
.
:
:
All seems correct.
Conclusions
One thing I’ve noticed throughout these implementations is the number of functionalities required to build a solution that might include an interface with a person.
So far, I have only considered backend logic, but after implementing the most basic API interaction, I already found the need to stream responses to compensate for latency; manage dynamic flows to handle responses without compromising token cost; and keep the context window healthy.
Sure thing, I could opt for a brute force approach, like most paths I’ve taken. However, this opens the door to several more interesting questions:
- How can I evaluate the results of my solution according to my implementation?
- How can I implement a minimum level of cost control?
Operational excellence
The importance of implementing some form of observability is crearly critical. Exposing a solution with a cost model based on tokens sent and received, without any level of monitoring, is not advisable.
The risks of running a AI-based solution in production blindly are simply too high to postpone them.
Next steps
I consider this implementation series complete, paving the way for the next entries: testing an approach more focused on adding intelligence to business workflows.
