Beliebte Suchanfragen
//

Accessing LLMs in Code – Automating LLM Calls

30.5.2024 | 6 minutes of reading time

Hardly any technology has had such an impact in recent years as LLMs – with ChatGPT from OpenAI leading the way. Many media outlets are intensely engaged in how this tool can be used for personal and business purposes. Another aspect, which receives a bit less attention, is the automation and integration of LLMs into software or a product. Generative AI is very well-suited for analysing customer desires and intentions and making suggestions accordingly. Possible application areas are not only limited to a chat window where the user has direct contact with the LLM, like a shopping advisor, but can also improve business applications under the hood. LLMs can automatically filter or summarize emails by urgency so that the reader can be brought up to date in the morning by an AI-generated report. Websites can expand their search functionality by adding LLM powered semantic search in addition to keyword search. Such, and other features, can be achieved by automating LLM calls in the code. In this blog post I will show how this can be achieved with a few lines of code and additionally explain a few key details you have to know to automate LLM calls.

Sending requests to LLMs

OpenAI offers a very easy-to-use interface and is used here to illustrate the possibilities of today's AI interfaces. Since OpenAI represents the current state of the art, many of the other LLM providers try to emulate this functionality. This means that with little effort, it is possible to switch to other common models in the application. As such, many of the information and details are relevant to most LLM use cases.

OpenAI offers interaction with their models via a REST interface which can be accessed after creating an account and generating an API Key. Additionally, users will be provided with a 5$ start budget to play around with the API. For the programming languages Python and JavaScript, OpenAI itself also provides a library to further simplify usage. For most other major programming languages, there already are libraries operated by the community (https://platform.openai.com/docs/libraries/). The following code examples are based on the Javascript (Typescript) library. A completion request, for completing a chat conversation, looks like this:

1const systemWithContext: Array<ChatCompletionMessageParam> = [
2  { role: 'system', content: 'Always be nice and helpful to the user.' },
3  { role: 'user', content: 'Can you advise me on computer keyboards?' },
4  { role: 'assistant', content: 'Of course! I am happy to help with questions about computer keyboards. What exactly would you like to know?'},
5  { role: 'user', content: 'What is the advantage of mechanical keyboards?' }
6];
7
8const completion = await openai.chat.completions.create({
9       messages: systemWithContext,
10       model: 'gpt-3.5-turbo',
11   });

A request (completion) consists of a messages array and a declaration of which model should be used. A message consists of a role declaration and the message content itself. The role declaration makes it clear to the LLM how to interpret the individual messages. Self-explanatory is the user role, which marks the messages that have been entered by the user. The assistant role marks messages generated as a response by the GPT model. The most important role for product or feature design is the system role. System messages can only be accessed in the code by the developers. Users are not able to access or manipulate them. These messages are instructions intended to describe the behaviour of the model to be able to control reactions to user messages as well as the output format. It is also important to mention that any completion request to the model must always include the entire chat history. LLMs use REST interfaces, which simplifies usage and implementation, however, it also follows that no previous interaction is stored or recognised. So for most LLMs to refer to the previous chat history and understand the full context, it must be provided with every request. All submitted messages, regardless of the role, together form the so-called context and are crucial for the model's response.

After a successful request to the model, a response object with a choices array, containing by default exactly one answer, is returned:

1"choices": [
2   {
3     "index": 0,
4     "message": {
5       "role": "assistant",
6       "content": "A major advantage of mechanical keyboards is the improved typing accuracy and user-friendliness. They are often more durable than traditional membrane keyboards and offer tactile feedback and audible clicking, which gives many users a pleasant typing feel. The individual mechanical switches also allow for customization to personal preferences, as there are different types of switches with different characteristics, such as linear, tactile, or haptic switches. Mechanical keyboards are also generally more robust and better suited for heavy typists and gamers."
7     },
8     "logprobs": null,
9     "finish_reason": "stop"
10   }
11 ]

In the message object, the message and the role are contained. Another important point is the finish_reason, indicated here as stop. Stop means that the GPT model has finished generating the answer, marking the end of a successful request.

The choice of the model

OpenAI currently offers mostly two GPT models, GPT 3.5 (Turbo) and GPT 4o. Both models are optimized for the generation of natural language and code. The choice of the model depends on the required performance and costs. In general, it can be said that GPT 4o is distinguished from GPT 3.5 by better quality in understanding the input prompt and the answer. This is especially evident in the maximum input and output length, both of which are generally higher with GPT 4o. This means, GPT 4o allows more tokens at input and delivers better results with longer input than GPT 3.5. OpenAI itself states that for most simple tasks the difference between the two models is not very significant. Only for more complex tasks, which require logical understanding, the strengths of GPT 4o become apparent, which is currently treated as one of the best models on the market.

The higher quality also becomes noticeable in the costs. The price table given below is a snapshot that was taken directly from OpenAI (https://openai.com/pricing). However, since the models are rapidly changing in their performance and prices, it is worthwhile to double-check before use.

ModelPrice per 1000 Token InputPrice per 1000 Token Output
GPT 4o0.005$0.015$
GPT 3.5 Turbo0.0005$0.0015$

As can be seen from the table, the prices are calculated per token. According to OpenAI, 1000 tokens correspond approximately to 750 words, which is equivalent to about 2.5-3 text pages in a novel. The tokens will not be rounded to the nearest 1000; instead, each request will be billed per exact token. Looking at the image below, the first three messages form the input, from which the fourth message, the output, was generated.

The input of the example corresponds to roughly 107 tokens, the output 171 tokens. Thus, the following cost calculation results: GPT 3.5 Turbo Costs: 0.0005/1000 * 107 + 0.0015/1000 * 171 = $0.00031 GPT 4o Costs: 0.005/1000 * 107 + 0.015/1000 * 171 = $0.0031 The costs for the GPT 3.5 Turbo model are now surprisingly low. However, keep in mind that costs can rise sharply with longer chat history because it always has to be sent in its entirety, thus increasing the input length. In the example above, the input for the next message would already be 278 tokens, plus the next question asked.

All in all, the most important thing to note for using LLMs in code is to keep in mind that the context, consisting of the chat history, as well as the system message provided by the developer, are determining the output of the models. Since these models are mostly accessed by REST API, the entire context is needed for every request.

share post

Likes

1

//

More articles in this subject area

Discover exciting further topics and let the codecentric world inspire you.

//

Gemeinsam bessere Projekte umsetzen.

Wir helfen deinem Unternehmen.

Du stehst vor einer großen IT-Herausforderung? Wir sorgen für eine maßgeschneiderte Unterstützung. Informiere dich jetzt.

Hilf uns, noch besser zu werden.

Wir sind immer auf der Suche nach neuen Talenten. Auch für dich ist die passende Stelle dabei.