Accessing LLMs in Code – Automating LLM Calls

30.5.2024 | 6 minutes of reading time

Hardly any technology has had such an impact in recent years as LLMs – with ChatGPT from OpenAI leading the way. Many media outlets are intensely engaged in how this tool can be used for personal and business purposes. Another aspect, which receives a bit less attention, is the automation and integration of LLMs into software or a product. Generative AI is very well-suited for analysing customer desires and intentions and making suggestions accordingly. Possible application areas are not only limited to a chat window where the user has direct contact with the LLM, like a shopping advisor, but can also improve business applications under the hood. LLMs can automatically filter or summarize emails by urgency so that the reader can be brought up to date in the morning by an AI-generated report. Websites can expand their search functionality by adding LLM powered semantic search in addition to keyword search. Such, and other features, can be achieved by automating LLM calls in the code. In this blog post I will show how this can be achieved with a few lines of code and additionally explain a few key details you have to know to automate LLM calls.

Sending requests to LLMs

OpenAI offers a very easy-to-use interface and is used here to illustrate the possibilities of today's AI interfaces. Since OpenAI represents the current state of the art, many of the other LLM providers try to emulate this functionality. This means that with little effort, it is possible to switch to other common models in the application. As such, many of the information and details are relevant to most LLM use cases.

OpenAI offers interaction with their models via a REST interface which can be accessed after creating an account and generating an API Key. Additionally, users will be provided with a 5$ start budget to play around with the API. For the programming languages Python and JavaScript, OpenAI itself also provides a library to further simplify usage. For most other major programming languages, there already are libraries operated by the community (https://platform.openai.com/docs/libraries/). The following code examples are based on the Javascript (Typescript) library. A completion request, for completing a chat conversation, looks like this:

1const systemWithContext: Array<ChatCompletionMessageParam> = [
2  { role: 'system', content: 'Always be nice and helpful to the user.' },
3  { role: 'user', content: 'Can you advise me on computer keyboards?' },
4  { role: 'assistant', content: 'Of course! I am happy to help with questions about computer keyboards. What exactly would you like to know?'},
5  { role: 'user', content: 'What is the advantage of mechanical keyboards?' }
6];
7
8const completion = await openai.chat.completions.create({
9       messages: systemWithContext,
10       model: 'gpt-3.5-turbo',
11   });

A request (completion) consists of a messages array and a declaration of which model should be used. A message consists of a role declaration and the message content itself. The role declaration makes it clear to the LLM how to interpret the individual messages. Self-explanatory is the user role, which marks the messages that have been entered by the user. The assistant role marks messages generated as a response by the GPT model. The most important role for product or feature design is the system role. System messages can only be accessed in the code by the developers. Users are not able to access or manipulate them. These messages are instructions intended to describe the behaviour of the model to be able to control reactions to user messages as well as the output format. It is also important to mention that any completion request to the model must always include the entire chat history. LLMs use REST interfaces, which simplifies usage and implementation, however, it also follows that no previous interaction is stored or recognised. So for most LLMs to refer to the previous chat history and understand the full context, it must be provided with every request. All submitted messages, regardless of the role, together form the so-called context and are crucial for the model's response.

After a successful request to the model, a response object with a choices array, containing by default exactly one answer, is returned:

1"choices": [
2   {
3     "index": 0,
4     "message": {
5       "role": "assistant",
6       "content": "A major advantage of mechanical keyboards is the improved typing accuracy and user-friendliness. They are often more durable than traditional membrane keyboards and offer tactile feedback and audible clicking, which gives many users a pleasant typing feel. The individual mechanical switches also allow for customization to personal preferences, as there are different types of switches with different characteristics, such as linear, tactile, or haptic switches. Mechanical keyboards are also generally more robust and better suited for heavy typists and gamers."
7     },
8     "logprobs": null,
9     "finish_reason": "stop"
10   }
11 ]

In the message object, the message and the role are contained. Another important point is the finish_reason, indicated here as stop. Stop means that the GPT model has finished generating the answer, marking the end of a successful request.

The choice of the model

OpenAI currently offers mostly two GPT models, GPT 3.5 (Turbo) and GPT 4o. Both models are optimized for the generation of natural language and code. The choice of the model depends on the required performance and costs. In general, it can be said that GPT 4o is distinguished from GPT 3.5 by better quality in understanding the input prompt and the answer. This is especially evident in the maximum input and output length, both of which are generally higher with GPT 4o. This means, GPT 4o allows more tokens at input and delivers better results with longer input than GPT 3.5. OpenAI itself states that for most simple tasks the difference between the two models is not very significant. Only for more complex tasks, which require logical understanding, the strengths of GPT 4o become apparent, which is currently treated as one of the best models on the market.

The higher quality also becomes noticeable in the costs. The price table given below is a snapshot that was taken directly from OpenAI (https://openai.com/pricing). However, since the models are rapidly changing in their performance and prices, it is worthwhile to double-check before use.

Model	Price per 1000 Token Input	Price per 1000 Token Output
GPT 4o	0.005$	0.015$
GPT 3.5 Turbo	0.0005$	0.0015$

As can be seen from the table, the prices are calculated per token. According to OpenAI, 1000 tokens correspond approximately to 750 words, which is equivalent to about 2.5-3 text pages in a novel. The tokens will not be rounded to the nearest 1000; instead, each request will be billed per exact token. Looking at the image below, the first three messages form the input, from which the fourth message, the output, was generated.

The input of the example corresponds to roughly 107 tokens, the output 171 tokens. Thus, the following cost calculation results: GPT 3.5 Turbo Costs: 0.0005/1000 * 107 + 0.0015/1000 * 171 = $0.00031 GPT 4o Costs: 0.005/1000 * 107 + 0.015/1000 * 171 = $0.0031 The costs for the GPT 3.5 Turbo model are now surprisingly low. However, keep in mind that costs can rise sharply with longer chat history because it always has to be sent in its entirety, thus increasing the input length. In the example above, the input for the next message would already be 278 tokens, plus the next question asked.

All in all, the most important thing to note for using LLMs in code is to keep in mind that the context, consisting of the chat history, as well as the system message provided by the developer, are determining the output of the models. Since these models are mostly accessed by REST API, the entire context is needed for every request.

Was this post helpful?

Likes

Blog author

Daniel Töws

Software Developer

Do you still have questions? Just send me a message.

fromDaniel Töws

How to program my LLM with Prompt Engineering

When developing a feature powered by LLMs, it is essential to make the most use of Prompt Engineering. A well designed prompt written in the “system” role of the LLM (more information here: https://www.codecentric.de/wissens-hub/blog/accessing-llms-in...

LLM
Generative AI

19.6.2024 | 8 Minuten Lesezeit

Daniel Töws

Charts im Browser – Eine Einführung in AG Grid (Teil 2)

Nachdem wir in Teil 1 unserer kleinen Reihe zum AG-Grid-Framework gezeigt haben, wie man damit schnell interaktive Tabellen erstellt, geht es in diesem Beitrag darum, wie man die gleichen Daten auch in Grafiken (wie Balkendiagramme, Pie Charts oder Zeitserien...

React
Frontend
JavaScript
Framework
Softwareentwicklung

2.5.2023 | 6 Minuten Lesezeit

Daniel Töws

Selvarajah Sivarupan

Tabellen im Browser – Eine Einführung in AG Grid (Teil 1)

Die heutige Datenflut hat Software und Frameworks, wie Tableau, D3 und viele andere, hervorgebracht, deren Aufgabe es ist, die Visualisierung von Daten zu verbessern. Doch trotz der teilweise sehr ausgefallenen Darstellungsformen ist manchmal die simple...

Framework
Frontend
JavaScript
React
Softwareentwicklung

17.2.2023 | 6 Minuten Lesezeit

Daniel Töws

Selvarajah Sivarupan

Your job at codecentric?

Jobs

Agile Developer und Consultant (w/d/m)

Alle Standorte

GenAI für Full Stack EntwicklerInnen: Sprachverständnis als User Interface...

Ein gutes User Interface zu designen und umzusetzen, ist schwierig. Wir als Full Stack EntwicklerInnen wissen nur zu gut, dass der Teufel im Detail steckt. Einmal ist die Animation schlecht getimed, ein Input schiebt sich über den nächsten, ein Bild...

Künstliche Intelligenz
Generative KI
LLM

18.7.2024 | 12 Minuten Lesezeit

Robin Schlenker

GenAI für Full Stack EntwicklerInnen: RAG Evaluation mit TypeScript (Teil...

Disclaimer: Dieser Artikel ist Teil einer Serie. Lies am besten zuerst Teil 1 und Teil 2, um auf dem neuesten Stand zu sein. In der traditionellen Softwareentwicklung sind Tests ein essenzieller Bestandteil. Wir nutzen E2E-Tests, Unit- und Integrationstests...

LLM
Künstliche Intelligenz
Generative KI

3.7.2024 | 15 Minuten Lesezeit

Robin Schlenker

GenAI für Full Stack EntwicklerInnen: Der erste echte Use Case (Teil 2...

Disclaimer: Dieser Artikel ist Teil einer Serie, wenn du den ersten Teil noch nicht gelesen hast dann findest du ihn hier. Nachdem wir beim letzten Mal einen ersten Kontakt mit Open Source LLMs hatten, geht es heute endlich ans Codieren! Die Chatbots...

Künstliche Intelligenz
LLM
Generative KI

24.6.2024 | 11 Minuten Lesezeit

Robin Schlenker

GenAI für Full Stack EntwicklerInnen: Aller Anfang ist... lokal? (Teil...

Als Full Stack EntwicklerIn gibt es heutzutage wohl genug Themenkomplexe zur Einarbeitung. Ob das nächste Frontend-Framework des Jahres, die neue Backend-Technologie, ein weiterer Security-Scanner oder doch nur eine weitere Cloud-Integration: Die Auswahl...

LLM
Künstliche Intelligenz
Generative KI

14.6.2024 | 7 Minuten Lesezeit

Robin Schlenker

Gemeinsam bessere Projekte umsetzen.

Wir helfen deinem Unternehmen.

Du stehst vor einer großen IT-Herausforderung? Wir sorgen für eine maßgeschneiderte Unterstützung. Informiere dich jetzt.

Hilf uns, noch besser zu werden.

Wir sind immer auf der Suche nach neuen Talenten. Auch für dich ist die passende Stelle dabei.

Accessing LLMs in Code – Automating LLM Calls

Sending requests to LLMs

The choice of the model

Was this post helpful?

Ja

Blog author

Get in contact

Get in contact

More articles

How to program my LLM with Prompt Engineering

Charts im Browser – Eine Einführung in AG Grid (Teil 2)

Tabellen im Browser – Eine Einführung in AG Grid (Teil 1)

Your job at codecentric?

Agile Developer und Consultant (w/d/m)

View Job

More articles in this subject area

GenAI für Full Stack EntwicklerInnen: Sprachverständnis als User Interface...

GenAI für Full Stack EntwicklerInnen: RAG Evaluation mit TypeScript (Teil...

GenAI für Full Stack EntwicklerInnen: Der erste echte Use Case (Teil 2...

GenAI für Full Stack EntwicklerInnen: Aller Anfang ist... lokal? (Teil...

Gemeinsam bessere Projekte umsetzen.

Wir helfen deinem Unternehmen.

Unsere Leistungen

Hilf uns, noch besser zu werden.

Zu den Jobangeboten