{"id":176425,"date":"2026-06-13T21:52:59","date_gmt":"2026-06-13T19:52:59","guid":{"rendered":"https:\/\/www.startupbusiness.it\/inside-the-machine-3-how-the-model-conveys-the-meaning\/176425\/"},"modified":"2026-06-14T12:46:19","modified_gmt":"2026-06-14T10:46:19","slug":"inside-the-machine-3-how-the-model-conveys-the-meaning","status":"publish","type":"post","link":"https:\/\/www.startupbusiness.it\/en\/inside-the-machine-3-how-the-model-conveys-the-meaning\/176425\/","title":{"rendered":"Inside the machine (3): how the model conveys the meaning"},"content":{"rendered":"\n<p class=\"wp-block-paragraph\">The &#8220;Inside the machine&#8221; series by Giuseppe Ciuni continues. In the <a href=\"https:\/\/www.startupbusiness.it\/en\/inside-the-machine-what-llms-really-are\/171120\/\" target=\"_blank\" rel=\"noreferrer noopener\">first article<\/a>, we looked at what an LLM is \u2014 a statistical text simulator that predicts the next token \u2014 along with tokenisers, RAG and the first business use cases. In the <a href=\"https:\/\/www.startupbusiness.it\/en\/inside-the-machine-2-how-a-neural-network-learns\/174058\/\" target=\"_blank\" rel=\"noreferrer noopener\">second article<\/a>, we looked at how a neural network learns: loss, gradient, learning rate, Adam and autograd. This third article answers the question that remained open: what exactly does the model operate on? Not on words. On vectors.<\/p>\n\n<p class=\"wp-block-paragraph\">The concepts that will be introduced in this article are as follows:<\/p>\n\n<ul class=\"wp-block-list\">\n<li>Embedding (the vector representation of tokens)<\/li>\n\n\n\n<li>Positional encoding (how the model knows the order of the words)<\/li>\n\n\n\n<li>Self-attention (how the model connects tokens to one another)<\/li>\n\n\n\n<li>Multi-head attention (multiple parallel readings of the same text)<\/li>\n<\/ul>\n\n<p class=\"wp-block-paragraph\">Each of these concepts explains a specific behaviour of the model in production. A CTO or founder integrating AI into their company will sooner or later have to answer questions such as these: <\/p>\n\n<ul class=\"wp-block-list\">\n<li>Why does a model that performs excellently in English perform less well in Italian, or when it comes to technical industry terminology?<\/li>\n\n\n\n<li>Why does extending the scope cause costs to skyrocket?<\/li>\n\n\n\n<li>Why does the model itself recognise that the &#8216;football&#8217; played in a match and the &#8216;football&#8217; that\u2019s good for your bones are two different things?<\/li>\n<\/ul>\n\n<p class=\"wp-block-paragraph\">Let\u2019s take it one step at a time, as always.<\/p>\n\n<h2 class=\"wp-block-heading\"><strong>Picking up from the previous article<\/strong><\/h2>\n\n<p class=\"wp-block-paragraph\">In Article 2, we left the model in training: the loss measures the error, the gradient indicates the direction of correction, and Adam adjusts the parameters step by step. But one question remains unanswered: what is the neural network processing? <\/p>\n\n<p class=\"wp-block-paragraph\">What about tokens as integers? No, and understanding why is the starting point. <\/p>\n\n<p class=\"wp-block-paragraph\">Let\u2019s recap what a tokeniser does (we introduced it in Article 1): it breaks the text down into pieces \u2013 tokens \u2013 and assigns a number to each piece, drawing it from a fixed vocabulary. <\/p>\n\n<p class=\"wp-block-paragraph\">It\u2019s like a huge dictionary in which every entry has a number: \u2018cat\u2019 is entry number 4821, \u2018pizza\u2019 is 4822, \u2018feline\u2019 is 7980, and so on. <\/p>\n\n<p class=\"wp-block-paragraph\">After passing through the tokeniser, the text no longer consists of words but of a sequence of these numbers, as shown in the figure below:<\/p>\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1272\" height=\"321\" src=\"https:\/\/www.startupbusiness.it\/wp-content\/uploads\/2026\/06\/image-2.png\" alt=\"\" class=\"wp-image-176325\"\/><\/figure>\n\n<p class=\"wp-block-paragraph\">Figure 1. The tokeniser in three steps: it takes the text, breaks it into pieces (the tokens) and assigns each one a number from a fixed vocabulary. The result is a sequence of numbers in place of the words.  <\/p>\n\n<p class=\"wp-block-paragraph\">The problem is that that number is just a label, not a measure. It\u2019s like a footballer\u2019s shirt number: it serves to identify him, but it says nothing about what sort of player he is. The number 10 and number 11 shirts are close together, but that doesn\u2019t mean the two players are alike.   <\/p>\n\n<p class=\"wp-block-paragraph\">The same applies to tokens. <\/p>\n\n<p class=\"wp-block-paragraph\">&#8220;Cat&#8221; (4821) and &#8220;pizza&#8221; (4822) have consecutive numbers but have nothing to do with one another: they happen to be next to each other purely by chance, like two words that are adjacent in alphabetical order. <\/p>\n\n<p class=\"wp-block-paragraph\">Conversely, \u2018cat\u2019 (4821) and \u2018feline\u2019 (7980) are very closely related concepts, but their numbers are very far apart; see the example in Figure 2.<\/p>\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1277\" height=\"315\" src=\"https:\/\/www.startupbusiness.it\/wp-content\/uploads\/2026\/06\/image-3.png\" alt=\"\" class=\"wp-image-176326\"\/><\/figure>\n\n<p class=\"wp-block-paragraph\">Figure 2: The numerical distance between tokens says nothing about their meaning: \u2018cat\u2019 (4821) and \u2018feline\u2019 (7980) are semantically close but numerically far apart, whilst \u2018cat\u2019 (4821) and \u2018pizza\u2019 (4822) have almost identical numerical values but are different concepts<\/p>\n\n<p class=\"wp-block-paragraph\">In other words: the distance between the numbers bears no relation to the distance between the meanings. The index (the number) identifies the token but carries no information about what that token means. <\/p>\n\n<p class=\"wp-block-paragraph\">A neural network works by calculating distances, sums and relationships between numbers. Given what we\u2019ve discussed, we need a more sophisticated conversion that transforms each token into something that truly encodes its meaning.  <\/p>\n\n<p class=\"wp-block-paragraph\">That conversion is called embedding.<\/p>\n\n<h2 class=\"wp-block-heading\"><strong>Embedding<\/strong><\/h2>\n\n<p class=\"wp-block-paragraph\">Once a token has been converted into an index by the tokeniser, it is transformed into a vector: a list of numbers in a high-dimensional space. <\/p>\n\n<p class=\"wp-block-paragraph\">In modern models, this dimension typically has a value of 768, 1,024 or even 4,096 (OpenAI\u2019s text-embedding-ada-002 algorithm, for example, uses 1,536). <\/p>\n\n<p class=\"wp-block-paragraph\">The fundamental property of these vectors is not their dimension: it is their geometry. <\/p>\n\n<p class=\"wp-block-paragraph\">During training, the model learns to place tokens in this space so that tokens with similar meanings are placed close together. This is not a manual decision made by a programmer: it emerges automatically from the gradient during training. The result is a sort of geographical map of the language.   <\/p>\n\n<p class=\"wp-block-paragraph\">In fact, words with similar semantic relationships form clusters: &#8220;King&#8221;, &#8220;Queen&#8221; and &#8220;Prince&#8221; end up grouped together; &#8220;Cat&#8221;, &#8220;Dog&#8221; and &#8220;Feline&#8221; form another group. <\/p>\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"964\" height=\"528\" src=\"https:\/\/www.startupbusiness.it\/wp-content\/uploads\/2026\/06\/image-4.png\" alt=\"\" class=\"wp-image-176327\"\/><\/figure>\n\n<p class=\"wp-block-paragraph\">Figure 3: The similarity clusters are shown: the cat, the dog and the feline belong to one cluster. The historical novel *The Betrothed*, published by Gutenberg, belongs to another cluster. <\/p>\n\n<p class=\"wp-block-paragraph\">One interesting thing about vectors is that proximity isn\u2019t limited to synonyms: in a well-constructed space, the vector for the phrase \u201cQuel ramo del lago di Como\u201d ends up close to those for \u201cI promessi sposi\u201d and \u201cAlessandro Manzoni\u201d because they appear in the same contexts within the texts.<\/p>\n\n<p class=\"wp-block-paragraph\">This structure has a direct practical implication. When you ask a model, \u201cWhich places would you recommend I visit in Sicily?\u201d, not only are the words of the question converted into vectors, but the model also \u201cknows\u201d which vectors are close to \u201cSicily\u201d and uses that context to construct its response.  <\/p>\n\n<p class=\"wp-block-paragraph\">Finally, there is a surprising property: semantic relations are preserved as vector operations. The relation &#8220;King \u2212 Man + Woman&#8221; in the embedding space produces a vector close to &#8220;Queen&#8221;. (This brings us back to the vector spaces and linear algebra we studied in Geometry!)   <\/p>\n\n<p class=\"wp-block-paragraph\">It\u2019s not a trick: it\u2019s an emergent property of training on large corpora.<\/p>\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"964\" height=\"612\" src=\"https:\/\/www.startupbusiness.it\/wp-content\/uploads\/2026\/06\/image-5.png\" alt=\"\" class=\"wp-image-176328\"\/><\/figure>\n\n<p class=\"wp-block-paragraph\">Figure 4: 2D projection of the embedding space \u2014 semantic clusters<\/p>\n\n<p class=\"wp-block-paragraph\"><strong>What exactly are these vectors?<\/strong> <\/p>\n\n<p class=\"wp-block-paragraph\">A vector is simply a list of numbers: think of it as a fact sheet for a word. Just as a person\u2019s profile contains lots of different details (age, height, town, job, hobbies), a token\u2019s profile contains lots of numbers, and each one describes a small aspect of its meaning. <\/p>\n\n<p class=\"wp-block-paragraph\">And here you can see what the 1,024 dimensions are for: they are simply 1,024 cells, 1,024 numbers for each word. <\/p>\n\n<p class=\"wp-block-paragraph\">Why so many? Because a single number wouldn\u2019t be enough to convey the meaning.  <\/p>\n\n<p class=\"wp-block-paragraph\">If I had only one piece of information to describe a person (let\u2019s say their height), two people of the same height would appear identical even though they are very different. By adding age, occupation, interests and city, however, I can determine much more precisely how similar or different two people are.  <\/p>\n\n<p class=\"wp-block-paragraph\">The same applies to words: with 1,024 values, the model can capture 1,024 different nuances, so two words may be similar in some respects and quite different in others. An important point: these cells do not have labels that we have decided on (there is no cell labelled \u2018it is an animal\u2019 or \u2018it is positive\u2019); it is the model itself that decides what to put in them. <\/p>\n\n<p class=\"wp-block-paragraph\">The question remains: we have the labels for the boxes, but who fills in the values for the actual boxes? <\/p>\n\n<p class=\"wp-block-paragraph\">None by hand.<\/p>\n\n<p class=\"wp-block-paragraph\">At first, they are filled with random numbers, so each word starts at a random point on the map, making no sense. Then the mechanism of <a href=\"https:\/\/www.startupbusiness.it\/en\/inside-the-machine-2-how-a-neural-network-learns\/174058\/\" target=\"_blank\" rel=\"noreferrer noopener\">Article 2 <\/a>kicks in.  <\/p>\n\n<p class=\"wp-block-paragraph\">The model makes a guess, gets it wrong, and measures how far off it was (the loss). This is where backpropagation comes into play. Put simply, it is the process by which the model starts from the final error and works its way back through all the calculations it has made to work out how much each number contributed to the error. Once it has worked this out, it adjusts each number slightly, in the direction that minimises the error. By repeating this process thousands of times, words that appear in similar contexts end up moving closer together on the map.    <\/p>\n\n<p class=\"wp-block-paragraph\">The geometry is not designed but emerges from this continuous adjustment.<\/p>\n\n<p class=\"wp-block-paragraph\">That\u2019s why it\u2019s said that embeddings are model weights just like any others: the cells for each word are numbers that the training process adjusts in exactly the same way as it does for the rest of the network. They\u2019re just the first ones to be created, and there are loads of them!  <\/p>\n\n<p class=\"wp-block-paragraph\">Let\u2019s take an example: 50,000 words, each with 1,024 cells, gives 50,000 \u00d7 1,024 \u2248 51 million numbers. This table is, quite literally, the embedding matrix: one row per word, one column per dimension (see table below).  <\/p>\n\n<p class=\"wp-block-paragraph\">51 million parameters used just to describe the words before even building the rest of the model: that explains the staggering figure of 3B parameters for one model, 11B for another, and so on. Ah. B stands for billion.  <\/p>\n\n<p class=\"wp-block-paragraph\">Let\u2019s take an example of embedding the words \u2018dog\u2019, \u2018cat\u2019 and \u2018hammer\u2019, simplifying the dimensions to 6 (rather than 1024).<\/p>\n\n<figure class=\"wp-block-table\"><table class=\"has-fixed-layout\"><tbody><tr><td><br\/><\/td><td><strong>Is he alive?<\/strong><\/td><td><strong>Is it domestic?<\/strong><\/td><td><strong>Is it big?<\/strong><\/td><td><strong>related to food?<\/strong><\/td><td><strong>Is it emotional?<\/strong><\/td><td><strong>Is it an object?<\/strong><\/td><\/tr><tr><td><strong>Cat<\/strong><\/td><td>0,9<\/td><td>0,8<\/td><td>\u22120,6<\/td><td>0,1<\/td><td>0,7<\/td><td>\u22120,9<\/td><\/tr><tr><td><strong>Dog<\/strong><\/td><td>0,9<\/td><td>0,8<\/td><td>0,1<\/td><td>0,1<\/td><td>0,8<\/td><td>\u22120,9<\/td><\/tr><tr><td><strong>hammer<\/strong><\/td><td>\u22120,9<\/td><td>\u22120,8<\/td><td>\u22120,2<\/td><td>\u22120,3<\/td><td>0,0<\/td><td>0,9<\/td><\/tr><\/tbody><\/table><\/figure>\n\n<p class=\"wp-block-paragraph\">Read as it stands, the &#8220;cat&#8221; entry reads: <\/p>\n\n<ul class=\"wp-block-list\">\n<li>very much alive (0.9)<\/li>\n\n\n\n<li>very domestic (0.8)<\/li>\n\n\n\n<li>rather small (\u22120.6)<\/li>\n\n\n\n<li>not particularly related to food (0.1) <\/li>\n\n\n\n<li>emotional (0.7)<\/li>\n\n\n\n<li>is not an object (\u22120.9)<\/li>\n<\/ul>\n\n<p class=\"wp-block-paragraph\">If we compare dogs with cats, we have:<\/p>\n\n<p class=\"wp-block-paragraph\">&#8220;Dog&#8221; has almost the same number of occurrences as &#8220;cat&#8221;: that&#8217;s why they end up close to each other on the map. <\/p>\n\n<p class=\"wp-block-paragraph\">&#8220;Martello&#8221; has almost opposite scores for &#8220;living&#8221; and &#8220;object&#8221;: that is why it ends up far away. This means that similar words have similar vectors: they have lists of similar numbers. <\/p>\n\n<p class=\"wp-block-paragraph\">In microgpt, Karpathy\u2019s 200-line GPT, everything is stripped down to the bare essentials: the tokeniser processes characters one by one, and the vocabulary consists of the 26 letters of the alphabet plus a special token that marks the start and end of each word (27 tokens in total). Each token becomes a vector of just 16 dimensions. The entire model has around 4,000 parameters. It is precisely this miniaturisation that makes it readable in an afternoon: the embedding matrix is the first thing to be initialised, and each row corresponds to a token, each column to a dimension of the semantic space.    <\/p>\n\n<p class=\"wp-block-paragraph\">The difference compared to GPT-4 is in the order of magnitude of the numbers, but the structure is identical.<\/p>\n\n<h2 class=\"wp-block-heading\"><strong>Positional encoding<\/strong><\/h2>\n\n<p class=\"wp-block-paragraph\">There is one problem that embeddings alone cannot solve: the network does not know the order in which the words arrive. The way it is designed, it receives the word vectors all at once, as if they had been thrown into a bag. A bag has no order!  <\/p>\n\n<p class=\"wp-block-paragraph\">The point is that the order changes everything. \u201cThe cat eats the mouse\u201d and \u201cThe mouse eats the cat\u201d contain exactly the same words but mean the opposite. If the network only sees the set of words, it treats the two sentences as identical, and that\u2019s not right.  <\/p>\n\n<p class=\"wp-block-paragraph\">The solution is called positional encoding. <\/p>\n\n<p class=\"wp-block-paragraph\">The idea is simple: before feeding the words into the network, information about where each word appears in the sentence is attached to it. In effect, in addition to the vector that identifies the word (the embedding), a second vector is added that indicates its position. Thus, the same word \u2018cat\u2019 receives a different signal depending on whether it is the first word in the sentence or the seventh.  <\/p>\n\n<p class=\"wp-block-paragraph\">Conceptually, it is a sum: <\/p>\n\n<p class=\"wp-block-paragraph\">final_word = word_vector + position_vector.<\/p>\n\n<p class=\"wp-block-paragraph\">The two vectors are added together to form a single vector, which is the one that actually enters the network.<\/p>\n\n<p class=\"wp-block-paragraph\">One final note on microgpt: in keeping with the GPT-2 style, positions are not calculated using a fixed formula. These are values that the model learns on its own during training, in exactly the same way as it learns embeddings.  <\/p>\n\n<p class=\"wp-block-paragraph\">In other words, it is the model itself that determines the most useful way to represent the \u2018first\u2019, \u2018second\u2019, \u2018third\u2019 word and so on.<\/p>\n\n<h2 class=\"wp-block-heading\"><strong>Self-attention<\/strong><\/h2>\n\n<p class=\"wp-block-paragraph\">Embedding and positional encoding address the representation of individual words. The most difficult problem remains: context. <\/p>\n\n<p class=\"wp-block-paragraph\">Let\u2019s consider the word \u201cpesca\u201d. In \u201cI ate a peach\u201d, it is a fruit; in \u201cI went fishing\u201d, it is an activity. As you can see, \u201cpesca\u201d is the same token, so it starts with exactly the same embedding. The correct meaning can only emerge by looking at the other words in the sentence. This is precisely the problem that self-attention solves: for each word, it decides which other words to pay attention to in order to clarify its meaning in that context.    <\/p>\n\n<p class=\"wp-block-paragraph\"><strong>The mechanism.<\/strong> Starting from its embedding, each word generates three vectors. How? By multiplying the embedding by three different weight matrices, which are learnt during training just like everything else (for those who want to get technical: <a href=\"https:\/\/karpathy.ai\/microgpt.html\" target=\"_blank\" rel=\"noreferrer noopener nofollow\">in Karpathy\u2019s code, these are called wq, wk and wv<\/a>).   <\/p>\n\n<p class=\"wp-block-paragraph\">The three vectors can be interpreted as three roles:<\/p>\n\n<ul class=\"wp-block-list\">\n<li>Query (Q): &#8220;What am I looking for to understand myself?&#8221; For &#8220;fishing&#8221; in our sentence: &#8220;Is there a verb of movement nearby, or a word related to food?&#8221; <\/li>\n\n\n\n<li>Key (K): &#8220;What kind of word am I? What do I offer others?&#8221; The &#8220;past tense&#8221; label says: &#8220;I am a verb of motion&#8221;. <\/li>\n\n\n\n<li>Value (V): &#8220;If they pay attention to me, what message am I conveying?&#8221;. The content of &#8220;past&#8221;: the idea of action, of movement. <\/li>\n<\/ul>\n\n<p class=\"wp-block-paragraph\">A metaphor might help: a fair! <\/p>\n\n<p class=\"wp-block-paragraph\">Each participant takes turns with:<\/p>\n\n<ul class=\"wp-block-list\">\n<li>a question on my mind (the query) <\/li>\n\n\n\n<li>a badge on their chest that says who they are (the key) <\/li>\n\n\n\n<li>a folder containing the materials to be handed in (the value)<\/li>\n<\/ul>\n\n<p class=\"wp-block-paragraph\">Everyone stops at the desks whose badges match their query \u2014 and takes the materials from those desks, and only from those.<\/p>\n\n<p class=\"wp-block-paragraph\">We organised one a while back, and watching how people behaved, that\u2019s exactly what it was: people asking questions, wearing a name badge on their chest and carrying a folder full of various materials.<\/p>\n\n<p class=\"wp-block-paragraph\">From here, the process takes place in three steps.<\/p>\n\n<p class=\"wp-block-paragraph\"><strong>1. How well two words go together.<\/strong> Take the &#8220;query&#8221; from the first word and the &#8220;key&#8221; from the second; as both are numbers, multiply them digit by digit and add the results together. <\/p>\n\n<p class=\"wp-block-paragraph\">This results in a single number: the more the two lists resemble each other, the higher that number is. The cell-by-cell multiplication rewards pairs that have high values in the same places, i.e. that \u2018talk about the same things\u2019. This number is the attention score between the two words.  <\/p>\n\n<p class=\"wp-block-paragraph\"><strong>2. From scores to percentages.<\/strong> The scores obtained in this way are converted into percentages which, when added together across all the words, total 100% (this is done by a function called softmax). In this way, for each word, we know how much weight to assign to each of the others: for example, 70% to one, 20% to another, and 10% to a third. <\/p>\n\n<p class=\"wp-block-paragraph\"><strong>3. The text is updated.<\/strong> Each word incorporates some of the \u2018content\u2019 of the others (the Value), taking on as much as its percentage of attention dictates. A word that has received 70% carries significant weight, whilst one that has received 10% carries almost none. The result is a new vector for that word, enriched by its surroundings.  <\/p>\n\n<p class=\"wp-block-paragraph\"><strong><br\/>Let&#8217;s look at an example with numbers. Let&#8217;s stick with &#8220;I went fishing&#8221; and put ourselves in the shoes of &#8220;fishing&#8221;, which has to work out whether it refers to the fruit or the activity.<br\/><\/strong> <\/p>\n\n<p class=\"wp-block-paragraph\">For simplicity\u2019s sake, let\u2019s look at just two of the neighbouring words, \u201candato\u201d and \u201ca\u201d, and use six cells: \u201cpesca\u201d displays its query, the others display their labels (the keys).<\/p>\n\n<figure class=\"wp-block-table\"><table class=\"has-fixed-layout\"><tbody><tr><td><br\/><\/td><td><strong>c1<\/strong><\/td><td><strong>c2<\/strong><\/td><td><strong>c3<\/strong><\/td><td><strong>c4<\/strong><\/td><td><strong>c5<\/strong><\/td><td><strong>c6<\/strong><\/td><\/tr><tr><td><strong>&#8220;Fishing&#8221; query<\/strong><\/td><td>0,9<\/td><td>0,1<\/td><td>0,0<\/td><td>0,8<\/td><td>0,2<\/td><td>0,1<\/td><\/tr><tr><td><strong>&#8220;Out&#8221; key<\/strong><\/td><td>0,8<\/td><td>0,0<\/td><td>0,1<\/td><td>0,9<\/td><td>0,1<\/td><td>0,0<\/td><\/tr><tr><td><strong>Key &#8220;a&#8221;<\/strong><\/td><td>0,1<\/td><td>0,2<\/td><td>0,1<\/td><td>0,0<\/td><td>0,1<\/td><td>0,2<\/td><\/tr><\/tbody><\/table><\/figure>\n\n<p class=\"wp-block-paragraph\">Columns c1\u2013c6 are the elements of the vector: numbered positions that do not have a meaning that is immediately clear to a human being. To simplify matters and illustrate the concept, c1 = \u2018how much I am a verb\u2019, c2 = \u2018how much I talk about food\u2019, and so on. In effect, it is a vector that represents the geometry of the word. <\/p>\n\n<p class=\"wp-block-paragraph\">To see how well &#8220;pesca&#8221; and &#8220;andato&#8221; go together, let\u2019s compare their two lines line by line. <\/p>\n\n<p class=\"wp-block-paragraph\">The rule is simple: when both of them have a large number in the same square, that square is &#8216;worth a lot&#8217;. <\/p>\n\n<p class=\"wp-block-paragraph\">In our case, this occurs in row c1 (0.9 and 0.8) and row c4 (0.8 and 0.9): both rows contain high numbers, so the two words are very closely aligned. <\/p>\n\n<p class=\"wp-block-paragraph\">All in all, the score is high: 1.46.<\/p>\n\n<p class=\"wp-block-paragraph\">With &#8220;a&#8221;, however, this never happens: whereas &#8220;pesca&#8221; has high scores, &#8220;a&#8221; has low ones. The two words have nothing in common and the score remains low, at around 0.15. <\/p>\n\n<p class=\"wp-block-paragraph\">These scores are now converted into attention percentages: since 1.46 is much higher than 0.15, \u201cpesca\u201d receives the lion\u2019s share of its attention (around 75%) and \u201ca\u201d only a fraction (around 10%); the rest is distributed amongst the other words in the sentence.<\/p>\n\n<p class=\"wp-block-paragraph\">At this point, \u2018pesca\u2019 takes on a new meaning, drawing primarily on the connotations of \u2018andato\u2019, which is a verb denoting movement. Thus, \u2018pesca\u2019 takes on a sense of action, and its meaning shifts towards the activity (going fishing) rather than the fruit.  <\/p>\n\n<p class=\"wp-block-paragraph\">In other words, she might have overheard different neighbours and ended up on the side of the fruit. The context did its job. <\/p>\n\n<p class=\"wp-block-paragraph\"><strong>The metaphor.<\/strong> Let\u2019s think about how we read. When we come across an ambiguous word, our eyes briefly go back to the surrounding words to clarify its meaning. When reading \u201cI went fishing\u201d, it is \u201cwent\u201d that tells us what kind of fishing it is, not \u201cto\u201d.  <\/p>\n\n<p class=\"wp-block-paragraph\"> Self-attention does exactly that: each word \u2018re-reads\u2019 the others in the sentence, focusing on those that help it to understand itself and ignoring the rest.<\/p>\n\n<p class=\"wp-block-paragraph\">Let\u2019s look at a visual example to make it clearer. Below is the attention matrix for the sentence \u201cI went fishing\u201d  <\/p>\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1072\" height=\"899\" src=\"https:\/\/www.startupbusiness.it\/wp-content\/uploads\/2026\/06\/image-6.png\" alt=\"\" class=\"wp-image-176329\"\/><\/figure>\n\n<p class=\"wp-block-paragraph\">Figure 5: Attention matrix for \u201cI went fishing\u201d: each row shows how much attention a word pays to the others (total 100%). The \u201cfishing\u201d row pays 75% of its attention to \u201cwent\u201d, and this is why it understands that it is an activity, not a fruit. <\/p>\n\n<p class=\"wp-block-paragraph\">How to read this matrix in three steps:<\/p>\n\n<ul class=\"wp-block-list\">\n<li><strong>Lines are words &#8216;that look&#8217;.<\/strong> Choose a row: it is a word in the sentence as you try to make sense of it. The grid has four rows because the sentence has four words: \u201csono\u201d, \u201candato\u201d, \u201ca\u201d, \u201cpesca\u201d. <\/li>\n\n\n\n<li>The columns represent the words \u2018viewed\u2019 in the same order. When you scan a row from left to right, each cell indicates how much attention that word (the row) pays to each other word (the column). The number in the cell is the percentage of attention, and the colour makes this clearer: the darker the colour, the more attention is paid.<\/li>\n\n\n\n<li><strong>Each row totals 100%.<\/strong> The focus of a word is a &#8216;cake&#8217; shared out among all the others.<\/li>\n<\/ul>\n\n<p class=\"wp-block-paragraph\">The focus of the diagram is the bottom row: &#8220;pesca&#8221; (highlighted in dark brown). You can see where the focus lies: 75% on &#8220;andato&#8221; (the darkest cell), 10% on &#8220;a&#8221;, 10% on the word itself, and 5% on &#8220;sono&#8221;.  <\/p>\n\n<p class=\"wp-block-paragraph\">In practice, the word &#8220;pesca&#8221; primarily conveys the idea of &#8220;going&#8221; (a word denoting movement), and it is precisely for this reason that it tends to be associated with the activity of fishing rather than with the fruit. <\/p>\n\n<p class=\"wp-block-paragraph\">The famous saying \u2018a picture is worth a thousand words\u2019 is true. Looking at the matrix, the following observations emerge:  <\/p>\n\n<ul class=\"wp-block-list\">\n<li>the diagonal (sono\u2192sono, andato\u2192andato\u2026) is often highlighted because each word draws a little attention to itself;<\/li>\n\n\n\n<li>&#8216;Empty&#8217; words such as &#8216;a&#8217; receive little attention from others; they are clearer and play a minor role in conveying meaning;<\/li>\n\n\n\n<li>If the sentence were changed to \u201cI ate a peach\u201d, the word \u201cpeach\u201d would be highlighted under \u201cate\u201d rather than \u201cgone\u201d, and the meaning would shift to refer to the fruit. The map always relates to that particular sentence. <\/li>\n<\/ul>\n\n<h2 class=\"wp-block-heading\"><strong>Multi-head attention<\/strong><\/h2>\n\n<p class=\"wp-block-paragraph\">A single self-attention operation captures one type of relationship at a time. But in a sentence, there are many relationships, all occurring simultaneously: grammatical relationships (who is the subject, who is the verb), relationships of meaning, relationships of reference (who is referred to by \u2018it\u2019 or \u2018this\u2019), and temporal relationships.  <\/p>\n\n<p class=\"wp-block-paragraph\">A single &#8220;reading&#8221; is not enough to grasp everything.<\/p>\n\n<p class=\"wp-block-paragraph\">Multi-head attention solves this problem by performing multiple parallel passes rather than a single one. Each pass is a complete self-attention mechanism with its own Query, Key and Value, and can choose to specialise in a different type of relationship. Each pass is called a \u2018head\u2019.  <\/p>\n\n<p class=\"wp-block-paragraph\">To give an idea of the scale: GPT-2 Small has 12 heads, whilst GPT-4 is estimated to have hundreds. In MicroGPT, for simplicity\u2019s sake, there are four on a single attention layer. <\/p>\n\n<p class=\"wp-block-paragraph\">To return to the earlier example: it is like reading the same sentence over and over again, but each time looking for something different: one reading follows the grammatical structure, another the cause-and-effect relationships, and yet another works out which words the pronouns refer to. <\/p>\n\n<p class=\"wp-block-paragraph\">In the end, all these readings are combined into a single result.<\/p>\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1469\" height=\"724\" src=\"https:\/\/www.startupbusiness.it\/wp-content\/uploads\/2026\/06\/image-7.png\" alt=\"\" class=\"wp-image-176330\"\/><\/figure>\n\n<p class=\"wp-block-paragraph\">Figure 6: Multi-head attention architecture with multiple heads operating in parallel to produce a recombined output<\/p>\n\n<p class=\"wp-block-paragraph\">The result of all this is a new vector for each word, enriched by the context of the entire sentence. No longer a fixed point on the map of language, but a point that has shifted according to its surroundings. <\/p>\n\n<h2 class=\"wp-block-heading\"><strong>What is a transformer?<\/strong><\/h2>\n\n<p class=\"wp-block-paragraph\">We have just looked at two mechanisms: self-attention, whereby each word \u2018looks\u2019 at the others in the sentence and absorbs their context (this is how \u2018pesca\u2019 in \u2018sono andato a pesca\u2019 understands that it refers to an activity and not a fruit), and multi-head attention, which repeats that same process several times in parallel to identify different connections. This set of operations is known as a Transformer. <\/p>\n\n<p class=\"wp-block-paragraph\">Transformer is the name of the architecture \u2013 that is, the framework that brings together all the elements we have seen so far in the correct order:<\/p>\n\n<ul class=\"wp-block-list\">\n<li>embedding<\/li>\n\n\n\n<li>positional encoding<\/li>\n\n\n\n<li>attention (self-attention and multi-head)<\/li>\n<\/ul>\n\n<p class=\"wp-block-paragraph\">In fact, this is the architecture on which all modern models are based, from GPT to Claude to Gemini. The \u2018T\u2019 in GPT stands for Transformer.  <\/p>\n\n<p class=\"wp-block-paragraph\">It originated in 2017 from a now-famous Google article, &#8220;Attention is all you need&#8221;: the idea, which was revolutionary at the time, was that attention alone was enough to understand language.<\/p>\n\n<p class=\"wp-block-paragraph\">The transformer consists of a block that repeats itself identically. <\/p>\n\n<p class=\"wp-block-paragraph\">Within each block, two things happen one after the other: first, the words exchange information with the attention, then each one processes what it has gathered on its own (this is the feed-forward at the bottom).<\/p>\n\n<p class=\"wp-block-paragraph\">The power of these models lies in lining up many of these blocks, one on top of the other. \u2018Stacking\u2019 means connecting them in sequence: the output from the first block goes into the second, the output from the second goes into the third, and so on. Each block processes the result already refined by the previous one and improves it a little further.  <\/p>\n\n<p class=\"wp-block-paragraph\">Let\u2019s look at our usual example, \u201cI went fishing\u201d, focusing on the word \u201cfishing\u201d, which is somewhat ambiguous. <\/p>\n\n<p class=\"wp-block-paragraph\">As we have seen in the matrix, in the first section her focus makes her look primarily at the past tense, and by the time she reaches \u2018pesca\u2019, her focus has already shifted to the meaning of the activity. <\/p>\n\n<p class=\"wp-block-paragraph\">This concept, now clearer, moves on to the second section, which further refines it: for example, it recognises that \u2018andato a pesca\u2019 is a single expression, a way of saying \u2018to go fishing\u2019. <\/p>\n\n<p class=\"wp-block-paragraph\">Block by block, the meaning becomes clearer and clearer. That\u2019s why the number of blocks matters!  <\/p>\n\n<p class=\"wp-block-paragraph\">The block structure is always the same; the only thing that changes is how many are stacked: just one in microgpt, 12 in GPT-2 small, and several dozen in state-of-the-art models. <\/p>\n\n<p class=\"wp-block-paragraph\">The more blocks a word passes through, the better the model is able to capture nuances and, therefore, the better it \u2018understands the meaning of what we write\u2019. <\/p>\n\n<p class=\"wp-block-paragraph\">Microgpt has just one block, perhaps for the sake of readability (in my opinion). I\u2019m tempted to use an expression from my dialect to say: I wonder how many blocks Anthropic uses in Fable 5. <\/p>\n\n<h2 class=\"wp-block-heading\"><strong>The other components of the block<\/strong><\/h2>\n\n<p class=\"wp-block-paragraph\">To really understand Karpathy\u2019s code, you need three more ingredients, which, along with attention, complete the picture. They are less well-known, but without them the model would not train.  <\/p>\n\n<p class=\"wp-block-paragraph\">A useful tip before you start: a neural network works in stages, like an assembly line, and each stage is called a layer.<\/p>\n\n<p class=\"wp-block-paragraph\">Here are the ingredients:<\/p>\n\n<p class=\"wp-block-paragraph\"><strong>Feed-forward.<\/strong> After \u2018listening\u2019 to its neighbours, each neuron processes the information it has gathered on its own, as if returning to its desk. Its vector is expanded (in microgpt from 16 to 64 numbers) and then compressed again. In between, ReLU comes into play, an algorithm that removes negative numbers: it keeps the positive ones as they are and sets the negative ones to zero. This small rule gives the calculation a &#8220;bend&#8221; which serves to distinguish one layer from another so that they can be stacked, allowing the network to grasp complex relationships.  <\/p>\n\n<p class=\"wp-block-paragraph\">Residual connection. The result of each step does not replace the initial vector: it is added to it (x = x + correction) as a side note rather than rewriting the whole thing. This is what allows dozens of layers to be stacked without the information (the gradient of <a href=\"https:\/\/www.startupbusiness.it\/en\/inside-the-machine-2-how-a-neural-network-learns\/174058\/\" target=\"_blank\" rel=\"noreferrer noopener\">Article 2<\/a>) being lost along the way. <\/p>\n\n<p class=\"wp-block-paragraph\"><strong>Normalisation.<\/strong> Before each block, the numbers are scaled to a standard range. Without this, they would increase or decrease layer by layer, making the training unstable. <\/p>\n\n<p class=\"wp-block-paragraph\">Embedding, Attention, Feed-forward, Residual connections and Normalisation: this is the complete transformer architecture, the building blocks that, when stacked one on top of the other, form the models.<\/p>\n\n<h2 class=\"wp-block-heading\"><strong>To sum up<\/strong><\/h2>\n\n<ol class=\"wp-block-list\">\n<li>Tokeniser: splits the text and assigns a label number to each word (it identifies the word, but does not indicate its meaning).<\/li>\n\n\n\n<li>Embedding: transforms that number into meaning by placing the word on the \u2018language map\u2019, where similar words are grouped together.<\/li>\n\n\n\n<li>Positional encoding: it takes into account word order, so \u201cthe cat eats the mouse\u201d is different from \u201cthe mouse eats the cat\u201d.<\/li>\n\n\n\n<li>Self-attention: each word takes account of the others and adapts to the context: this is how &#8220;pesca&#8221; knows whether it refers to a fruit or an activity.<\/li>\n\n\n\n<li>Multi-head attention: the same self-attention mechanism repeated multiple times in parallel to capture multiple types of connections simultaneously.<\/li>\n\n\n\n<li>Feed-forward, residual and normalisation: these complete the block, processing the information gathered by the attention mechanism and ensuring the calculations remain stable. They are the traffic controllers\u263a <\/li>\n<\/ol>\n\n<h2 class=\"wp-block-heading\"><strong>Conclusions<\/strong><\/h2>\n\n<p class=\"wp-block-paragraph\">In Article 2, we saw how the model learns; in this one, we have discovered what it actually works on: not on words, not on simple numbers used as labels, but on vectors \u2013 that is, lists of numbers that assign each word a position on a \u2018language map\u2019. <\/p>\n\n<p class=\"wp-block-paragraph\">This map isn\u2019t drawn by a programmer: the model builds it itself during training by grouping together words with similar meanings. <\/p>\n\n<p class=\"wp-block-paragraph\">The position of a word is not fixed; instead, it is adjusted each time, depending on the context. In this way, \u2018pesca\u2019 can refer to the fruit or the activity, depending on the surrounding words. <\/p>\n\n<p class=\"wp-block-paragraph\"><strong>Tip 1: Vertical fine-tuning starts with the embeddings<\/strong><\/p>\n\n<p class=\"wp-block-paragraph\">When a general-purpose model performs poorly on domain-specific terminology (e.g. legal, manufacturing, medical, financial), the problem often lies in the embeddings. Terms such as \u2018anti-decubitus bed\u2019, \u2018compound pelvic fracture\u2019 and \u2018renal failure\u2019 were rare or absent in the pre-training data: their vectors are positioned almost at random and far from the clusters where they should be.  <\/p>\n\n<p class=\"wp-block-paragraph\">Fine-tuning using domain data corrects this geometry. <\/p>\n\n<p class=\"wp-block-paragraph\">Before purchasing a \u2018pre-fine-tuned\u2019 vertical model, it is advisable to ask the supplier what data it was trained on and using which method. If the answers are vague, the fine-tuning has probably not been carried out, or has not been done properly. <\/p>\n\n<p class=\"wp-block-paragraph\"><strong>Tip 2: A large context window does not necessarily mean a better model<\/strong><\/p>\n\n<p class=\"wp-block-paragraph\">Suppliers are promoting ever-wider context windows as if they were a straightforward advantage.<\/p>\n\n<p class=\"wp-block-paragraph\"> Self-attention scales quadratically with the length of the sequence: doubling the context quadruples the computational cost for each call. For enterprise applications such as on-premises helpdesks, document classification or contract extraction, 4,000\u20138,000 tokens are sufficient.  <\/p>\n\n<p class=\"wp-block-paragraph\">Choosing a model with a context window of 128,000 tokens for tasks of this kind means paying a high price without reaping any benefits. The optimal context window is a parameter that should be measured against real-world tasks, not maximised indiscriminately. <\/p>\n\n<p class=\"wp-block-paragraph\">News from a couple of days ago: Autonomous Recursive Improvement \u2013 AI rewrites its own code. <a href=\"https:\/\/www.anthropic.com\/institute\/recursive-self-improvement\" target=\"_blank\" rel=\"noreferrer noopener nofollow\">Source: Anthropic.<\/a> <\/p>\n\n<p class=\"wp-block-paragraph\">Anthropic states that over 80% of the code in its codebase was written by Claude to a standard comparable to that of a human. Individual engineers\u2019 productivity (measured in lines of code) has increased eightfold compared to 2024.  <\/p>\n\n<p class=\"wp-block-paragraph\">In the next and final issue of Inside the Machine, we\u2019ll bring our exploration of microgpt full circle. We\u2019ll see how the enriched vector becomes a probability ranking and how the model generates text one piece at a time. We\u2019ll understand why output tokens cost more than input tokens, what really controls the \u2018temperature\u2019, and what the difference is between the base model and the instruct model, with the two steps that separate them: SFT and RLHF (photo by <a href=\"https:\/\/unsplash.com\/it\/@omilaev?utm_source=unsplash&amp;utm_medium=referral&amp;utm_content=creditCopyText\">Igor Omilaev<\/a> on <a href=\"https:\/\/unsplash.com\/it\/foto\/uninsegna-al-neon-al-neon-che-si-trova-sul-lato-di-un-muro-9XtKSci9crg?utm_source=unsplash&amp;utm_medium=referral&amp;utm_content=creditCopyText\">Unsplash<\/a>).<br\/><\/p>\n","protected":false},"excerpt":{"rendered":"<p>The third instalment in the series explaining how artificial intelligence works; in this article, we look at how the models operate<\/p>\n","protected":false},"author":125,"featured_media":176424,"comment_status":"open","ping_status":"open","sticky":true,"template":"","format":"standard","meta":{"_acf_changed":false,"_substack_draft_id":"","_substack_draft_url":"","_substack_last_pushed":"","_substack_premium":"","footnotes":""},"categories":[1134],"tags":[1324,1480,1331],"companies":[],"journalist":[3458],"class_list":["post-176425","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-learn","tag-artificial-intelligence-en","tag-education-en","tag-innovation-en","journalist-giuseppe-ciuni"],"featured_sizes_urls":{"thumbnail":{"src":"https:\/\/www.startupbusiness.it\/wp-content\/uploads\/2026\/06\/igor-omilaev-9XtKSci9crg-unsplash-2.jpg","width":150,"height":84,"crop":false,"srcset":false,"alt":"how artificial intelligence works"},"large":{"src":"https:\/\/www.startupbusiness.it\/wp-content\/uploads\/2026\/06\/igor-omilaev-9XtKSci9crg-unsplash-2.jpg","width":1024,"height":576,"crop":false,"srcset":false,"alt":"how artificial intelligence works"},"2048x2048":{"src":"https:\/\/www.startupbusiness.it\/wp-content\/uploads\/2026\/06\/igor-omilaev-9XtKSci9crg-unsplash-2.jpg","width":1280,"height":720,"crop":false,"srcset":false,"alt":"how artificial intelligence works"}},"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.8 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>Inside the machine (3): how the model conveys the meaning<\/title>\n<meta name=\"description\" content=\"The third instalment in the series explaining how artificial intelligence works; in this article, we look at how the models operate\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.startupbusiness.it\/en\/inside-the-machine-3-how-the-model-conveys-the-meaning\/176425\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Inside the machine (3): how the model conveys the meaning\" \/>\n<meta property=\"og:description\" content=\"The third instalment in the series explaining how artificial intelligence works; in this article, we look at how the models operate\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.startupbusiness.it\/en\/inside-the-machine-3-how-the-model-conveys-the-meaning\/176425\/\" \/>\n<meta property=\"og:site_name\" content=\"Startupbusiness.it\" \/>\n<meta property=\"article:published_time\" content=\"2026-06-13T19:52:59+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2026-06-14T10:46:19+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/www.startupbusiness.it\/wp-content\/uploads\/2026\/06\/igor-omilaev-9XtKSci9crg-unsplash-2.jpg\" \/>\n\t<meta property=\"og:image:width\" content=\"1280\" \/>\n\t<meta property=\"og:image:height\" content=\"720\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/jpeg\" \/>\n<meta name=\"author\" content=\"Emil Abirascid\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@emilabirascid\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Emil Abirascid\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"24 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\\\/\\\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\\\/\\\/www.startupbusiness.it\\\/en\\\/inside-the-machine-3-how-the-model-conveys-the-meaning\\\/176425\\\/#article\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/www.startupbusiness.it\\\/en\\\/inside-the-machine-3-how-the-model-conveys-the-meaning\\\/176425\\\/\"},\"author\":{\"name\":\"Emil Abirascid\",\"@id\":\"https:\\\/\\\/www.startupbusiness.it\\\/en\\\/#\\\/schema\\\/person\\\/4e2760db2cb7ca47d635ea3d8d8486dd\"},\"headline\":\"Inside the machine (3): how the model conveys the meaning\",\"datePublished\":\"2026-06-13T19:52:59+00:00\",\"dateModified\":\"2026-06-14T10:46:19+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\\\/\\\/www.startupbusiness.it\\\/en\\\/inside-the-machine-3-how-the-model-conveys-the-meaning\\\/176425\\\/\"},\"wordCount\":4878,\"commentCount\":0,\"image\":{\"@id\":\"https:\\\/\\\/www.startupbusiness.it\\\/en\\\/inside-the-machine-3-how-the-model-conveys-the-meaning\\\/176425\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/www.startupbusiness.it\\\/wp-content\\\/uploads\\\/2026\\\/06\\\/igor-omilaev-9XtKSci9crg-unsplash-2.jpg\",\"keywords\":[\"artificial intelligence\",\"education\",\"innovation\"],\"articleSection\":[\"Learn\"],\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\\\/\\\/www.startupbusiness.it\\\/en\\\/inside-the-machine-3-how-the-model-conveys-the-meaning\\\/176425\\\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\\\/\\\/www.startupbusiness.it\\\/en\\\/inside-the-machine-3-how-the-model-conveys-the-meaning\\\/176425\\\/\",\"url\":\"https:\\\/\\\/www.startupbusiness.it\\\/en\\\/inside-the-machine-3-how-the-model-conveys-the-meaning\\\/176425\\\/\",\"name\":\"Inside the machine (3): how the model conveys the meaning\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/www.startupbusiness.it\\\/en\\\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\\\/\\\/www.startupbusiness.it\\\/en\\\/inside-the-machine-3-how-the-model-conveys-the-meaning\\\/176425\\\/#primaryimage\"},\"image\":{\"@id\":\"https:\\\/\\\/www.startupbusiness.it\\\/en\\\/inside-the-machine-3-how-the-model-conveys-the-meaning\\\/176425\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/www.startupbusiness.it\\\/wp-content\\\/uploads\\\/2026\\\/06\\\/igor-omilaev-9XtKSci9crg-unsplash-2.jpg\",\"datePublished\":\"2026-06-13T19:52:59+00:00\",\"dateModified\":\"2026-06-14T10:46:19+00:00\",\"author\":{\"@id\":\"https:\\\/\\\/www.startupbusiness.it\\\/en\\\/#\\\/schema\\\/person\\\/4e2760db2cb7ca47d635ea3d8d8486dd\"},\"description\":\"The third instalment in the series explaining how artificial intelligence works; in this article, we look at how the models operate\",\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\\\/\\\/www.startupbusiness.it\\\/en\\\/inside-the-machine-3-how-the-model-conveys-the-meaning\\\/176425\\\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/www.startupbusiness.it\\\/en\\\/inside-the-machine-3-how-the-model-conveys-the-meaning\\\/176425\\\/#primaryimage\",\"url\":\"https:\\\/\\\/www.startupbusiness.it\\\/wp-content\\\/uploads\\\/2026\\\/06\\\/igor-omilaev-9XtKSci9crg-unsplash-2.jpg\",\"contentUrl\":\"https:\\\/\\\/www.startupbusiness.it\\\/wp-content\\\/uploads\\\/2026\\\/06\\\/igor-omilaev-9XtKSci9crg-unsplash-2.jpg\",\"width\":1280,\"height\":720,\"caption\":\"how artificial intelligence works\"},{\"@type\":\"WebSite\",\"@id\":\"https:\\\/\\\/www.startupbusiness.it\\\/en\\\/#website\",\"url\":\"https:\\\/\\\/www.startupbusiness.it\\\/en\\\/\",\"name\":\"Startupbusiness.it\",\"description\":\"May the Force be with you!\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\\\/\\\/www.startupbusiness.it\\\/en\\\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Person\",\"@id\":\"https:\\\/\\\/www.startupbusiness.it\\\/en\\\/#\\\/schema\\\/person\\\/4e2760db2cb7ca47d635ea3d8d8486dd\",\"name\":\"Emil Abirascid\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/ac925fb049ca749d7e16a36a75fbc9bf342bb19b7c22dae55be274ab4557890a?s=96&d=mm&r=g\",\"url\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/ac925fb049ca749d7e16a36a75fbc9bf342bb19b7c22dae55be274ab4557890a?s=96&d=mm&r=g\",\"contentUrl\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/ac925fb049ca749d7e16a36a75fbc9bf342bb19b7c22dae55be274ab4557890a?s=96&d=mm&r=g\",\"caption\":\"Emil Abirascid\"},\"description\":\"Emil Abirascid (\u30a8\u30df\u30fc\u30eb\u30fb\u30a2\u30d3\u30e9\u30b7\u30c3\u30c9), giornalista, fondatore e direttore di Startupbusiness, il primo magazine sull\u2019ecosistema startup e innovazione italiano. Co-fondatore di Designtech, l\u2019hub di innovazione che avvicina il mondo del design con quello della tecnologia. Advisor di Austrian Business Agency, Emil Banca, Fondazione Symbola, Fondazione Quadrans. Partecipa regolarmente a incontri, convegni, conferenze dedicate all\u2019ecosistema dell\u2019innovazione. E\u2019 stato co-organizzatore degli Italian Innovation Day and Series che si sono svolti dal dal 2016 al 2020 nelle citt\u00e0 di Tokyo in Giappone, Melbourne, Adelaide, Perth, Canberra in Australia e Singapore e co-organizzatore dell\u2019Italy India Innovation Day di AIICP 2021 e 2022. Ha curato il volume \u2018L\u2019innovazione che non ti aspetti. Contesti e visioni per l\u2019impresa\u2019 , l\u2019edizione italiana di \u2018La startup digitale, guida pratica step by step\u2019 e ha scritto la prefazione all\u2019edizione italiana de \u2018La quarta era\u2019 di Byron Reese, ed \u00e8 co-autore di \u2018Cosa e Dove: strategie digitali di ricerca del lavoro\u2019, tutti editi da FrancoAngeli. In passato \u00e8 stato curatore di StartupDigest Italy, coordinatore scientifico del Forum per la Ricerca della Provincia Autonoma di Trento, board member di TechChill Milano advisor di di ScaleIT , ha collaborato con Il Sole 24 Ore \u00e8 stato direttore di Innov\u2019azione, bimestrale edito da Apsti, ha collaborato con Corriere Innovazione ed \u00e8 stato presidente del comitato di selezione del Premio Marzotto e advisor di Cetif-Universit\u00e0 Cattolica Milano.\",\"sameAs\":[\"https:\\\/\\\/www.abirascid.com\\\/\",\"https:\\\/\\\/www.linkedin.com\\\/in\\\/emilabirascid\",\"https:\\\/\\\/x.com\\\/emilabirascid\"],\"url\":\"https:\\\/\\\/www.startupbusiness.it\\\/en\\\/author\\\/emil-abirascid\\\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Inside the machine (3): how the model conveys the meaning","description":"The third instalment in the series explaining how artificial intelligence works; in this article, we look at how the models operate","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.startupbusiness.it\/en\/inside-the-machine-3-how-the-model-conveys-the-meaning\/176425\/","og_locale":"en_US","og_type":"article","og_title":"Inside the machine (3): how the model conveys the meaning","og_description":"The third instalment in the series explaining how artificial intelligence works; in this article, we look at how the models operate","og_url":"https:\/\/www.startupbusiness.it\/en\/inside-the-machine-3-how-the-model-conveys-the-meaning\/176425\/","og_site_name":"Startupbusiness.it","article_published_time":"2026-06-13T19:52:59+00:00","article_modified_time":"2026-06-14T10:46:19+00:00","og_image":[{"width":1280,"height":720,"url":"https:\/\/www.startupbusiness.it\/wp-content\/uploads\/2026\/06\/igor-omilaev-9XtKSci9crg-unsplash-2.jpg","type":"image\/jpeg"}],"author":"Emil Abirascid","twitter_card":"summary_large_image","twitter_creator":"@emilabirascid","twitter_misc":{"Written by":"Emil Abirascid","Est. reading time":"24 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/www.startupbusiness.it\/en\/inside-the-machine-3-how-the-model-conveys-the-meaning\/176425\/#article","isPartOf":{"@id":"https:\/\/www.startupbusiness.it\/en\/inside-the-machine-3-how-the-model-conveys-the-meaning\/176425\/"},"author":{"name":"Emil Abirascid","@id":"https:\/\/www.startupbusiness.it\/en\/#\/schema\/person\/4e2760db2cb7ca47d635ea3d8d8486dd"},"headline":"Inside the machine (3): how the model conveys the meaning","datePublished":"2026-06-13T19:52:59+00:00","dateModified":"2026-06-14T10:46:19+00:00","mainEntityOfPage":{"@id":"https:\/\/www.startupbusiness.it\/en\/inside-the-machine-3-how-the-model-conveys-the-meaning\/176425\/"},"wordCount":4878,"commentCount":0,"image":{"@id":"https:\/\/www.startupbusiness.it\/en\/inside-the-machine-3-how-the-model-conveys-the-meaning\/176425\/#primaryimage"},"thumbnailUrl":"https:\/\/www.startupbusiness.it\/wp-content\/uploads\/2026\/06\/igor-omilaev-9XtKSci9crg-unsplash-2.jpg","keywords":["artificial intelligence","education","innovation"],"articleSection":["Learn"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/www.startupbusiness.it\/en\/inside-the-machine-3-how-the-model-conveys-the-meaning\/176425\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/www.startupbusiness.it\/en\/inside-the-machine-3-how-the-model-conveys-the-meaning\/176425\/","url":"https:\/\/www.startupbusiness.it\/en\/inside-the-machine-3-how-the-model-conveys-the-meaning\/176425\/","name":"Inside the machine (3): how the model conveys the meaning","isPartOf":{"@id":"https:\/\/www.startupbusiness.it\/en\/#website"},"primaryImageOfPage":{"@id":"https:\/\/www.startupbusiness.it\/en\/inside-the-machine-3-how-the-model-conveys-the-meaning\/176425\/#primaryimage"},"image":{"@id":"https:\/\/www.startupbusiness.it\/en\/inside-the-machine-3-how-the-model-conveys-the-meaning\/176425\/#primaryimage"},"thumbnailUrl":"https:\/\/www.startupbusiness.it\/wp-content\/uploads\/2026\/06\/igor-omilaev-9XtKSci9crg-unsplash-2.jpg","datePublished":"2026-06-13T19:52:59+00:00","dateModified":"2026-06-14T10:46:19+00:00","author":{"@id":"https:\/\/www.startupbusiness.it\/en\/#\/schema\/person\/4e2760db2cb7ca47d635ea3d8d8486dd"},"description":"The third instalment in the series explaining how artificial intelligence works; in this article, we look at how the models operate","inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.startupbusiness.it\/en\/inside-the-machine-3-how-the-model-conveys-the-meaning\/176425\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.startupbusiness.it\/en\/inside-the-machine-3-how-the-model-conveys-the-meaning\/176425\/#primaryimage","url":"https:\/\/www.startupbusiness.it\/wp-content\/uploads\/2026\/06\/igor-omilaev-9XtKSci9crg-unsplash-2.jpg","contentUrl":"https:\/\/www.startupbusiness.it\/wp-content\/uploads\/2026\/06\/igor-omilaev-9XtKSci9crg-unsplash-2.jpg","width":1280,"height":720,"caption":"how artificial intelligence works"},{"@type":"WebSite","@id":"https:\/\/www.startupbusiness.it\/en\/#website","url":"https:\/\/www.startupbusiness.it\/en\/","name":"Startupbusiness.it","description":"May the Force be with you!","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.startupbusiness.it\/en\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Person","@id":"https:\/\/www.startupbusiness.it\/en\/#\/schema\/person\/4e2760db2cb7ca47d635ea3d8d8486dd","name":"Emil Abirascid","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/secure.gravatar.com\/avatar\/ac925fb049ca749d7e16a36a75fbc9bf342bb19b7c22dae55be274ab4557890a?s=96&d=mm&r=g","url":"https:\/\/secure.gravatar.com\/avatar\/ac925fb049ca749d7e16a36a75fbc9bf342bb19b7c22dae55be274ab4557890a?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/ac925fb049ca749d7e16a36a75fbc9bf342bb19b7c22dae55be274ab4557890a?s=96&d=mm&r=g","caption":"Emil Abirascid"},"description":"Emil Abirascid (\u30a8\u30df\u30fc\u30eb\u30fb\u30a2\u30d3\u30e9\u30b7\u30c3\u30c9), giornalista, fondatore e direttore di Startupbusiness, il primo magazine sull\u2019ecosistema startup e innovazione italiano. Co-fondatore di Designtech, l\u2019hub di innovazione che avvicina il mondo del design con quello della tecnologia. Advisor di Austrian Business Agency, Emil Banca, Fondazione Symbola, Fondazione Quadrans. Partecipa regolarmente a incontri, convegni, conferenze dedicate all\u2019ecosistema dell\u2019innovazione. E\u2019 stato co-organizzatore degli Italian Innovation Day and Series che si sono svolti dal dal 2016 al 2020 nelle citt\u00e0 di Tokyo in Giappone, Melbourne, Adelaide, Perth, Canberra in Australia e Singapore e co-organizzatore dell\u2019Italy India Innovation Day di AIICP 2021 e 2022. Ha curato il volume \u2018L\u2019innovazione che non ti aspetti. Contesti e visioni per l\u2019impresa\u2019 , l\u2019edizione italiana di \u2018La startup digitale, guida pratica step by step\u2019 e ha scritto la prefazione all\u2019edizione italiana de \u2018La quarta era\u2019 di Byron Reese, ed \u00e8 co-autore di \u2018Cosa e Dove: strategie digitali di ricerca del lavoro\u2019, tutti editi da FrancoAngeli. In passato \u00e8 stato curatore di StartupDigest Italy, coordinatore scientifico del Forum per la Ricerca della Provincia Autonoma di Trento, board member di TechChill Milano advisor di di ScaleIT , ha collaborato con Il Sole 24 Ore \u00e8 stato direttore di Innov\u2019azione, bimestrale edito da Apsti, ha collaborato con Corriere Innovazione ed \u00e8 stato presidente del comitato di selezione del Premio Marzotto e advisor di Cetif-Universit\u00e0 Cattolica Milano.","sameAs":["https:\/\/www.abirascid.com\/","https:\/\/www.linkedin.com\/in\/emilabirascid","https:\/\/x.com\/emilabirascid"],"url":"https:\/\/www.startupbusiness.it\/en\/author\/emil-abirascid\/"}]}},"author_name":"Emil Abirascid","categories_names":["Learn"],"_links":{"self":[{"href":"https:\/\/www.startupbusiness.it\/en\/wp-json\/wp\/v2\/posts\/176425","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.startupbusiness.it\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.startupbusiness.it\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.startupbusiness.it\/en\/wp-json\/wp\/v2\/users\/125"}],"replies":[{"embeddable":true,"href":"https:\/\/www.startupbusiness.it\/en\/wp-json\/wp\/v2\/comments?post=176425"}],"version-history":[{"count":1,"href":"https:\/\/www.startupbusiness.it\/en\/wp-json\/wp\/v2\/posts\/176425\/revisions"}],"predecessor-version":[{"id":176429,"href":"https:\/\/www.startupbusiness.it\/en\/wp-json\/wp\/v2\/posts\/176425\/revisions\/176429"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.startupbusiness.it\/en\/wp-json\/wp\/v2\/media\/176424"}],"wp:attachment":[{"href":"https:\/\/www.startupbusiness.it\/en\/wp-json\/wp\/v2\/media?parent=176425"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.startupbusiness.it\/en\/wp-json\/wp\/v2\/categories?post=176425"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.startupbusiness.it\/en\/wp-json\/wp\/v2\/tags?post=176425"},{"taxonomy":"companies","embeddable":true,"href":"https:\/\/www.startupbusiness.it\/en\/wp-json\/wp\/v2\/companies?post=176425"},{"taxonomy":"journalist","embeddable":true,"href":"https:\/\/www.startupbusiness.it\/en\/wp-json\/wp\/v2\/journalist?post=176425"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}