What are Large Language Models & Why Do They Matter to PMs?

Article Contents

Hey there! Want to become a certified AI product manager? We’ve got you covered.

Check out this comprehensive AI product management course with hands-on live workshops, a private community, and coaching.

Artificial intelligence is taking the lead in the platform shift today, thanks to the availability of Large Language Models (LLMs).

People have rapidly embraced AIs that use natural language processing (NLP) like OpenAI’s GPT in their daily routines. This is an exciting time for product managers as it serves as an opportunity to define the next generation of product experiences.

What are Large Language Models (LLMs)?

Large Language Models, or LLMs, are a branch of AI that comprehends, interprets, and generates text based on human language. They figure out how likely a word will come after an input sentence, mimicking our language.

Using vast and diverse data sets, LLMs are trained to process and generate content. These data can come from books, online text, articles, transcripts of videos, and other forms of written communication.

Large language models are designed to refine complex human language, including semantics, syntax, context, and grammar.

Let’s take Open AI’s GPT as an example. In 2020, the company launched a conversational chat application built on top of its GPT model and called it ChatGPT, letting users generate, summarize, and translate content, create code, and get answers to their questions.

Just two months later, ChatGPT recorded 100 million unique users, becoming the fastest-ever internet service adoption.

Difference Between an LLM and Generative AI

Large language models are a subset of generative AI.

Generative AI is a generic term for artificial intelligence capabilities that generate content such as text, images, codes, videos, and music. Some of the most common generative AI include ChatGPT, DALL-E, and Adobe Firefly.

LLMs are focused on processing and generating text, which means all LLMs are generative AI.

What runs these technologies?

Artificial intelligence once relied heavily on rule-based programming, which falls short when problems become too complex.

Today, various AI products use machine learning to better handle complexity and scalability. Through ML, a model learns from datasets via training, improving performance over time without being explicitly programmed.

Then, there’s deep learning. Deep learning is a more complex machine learning system that involves training artificial neural networks with multiple layers to recognize intricate patterns in vast datasets.

Through deep learning, large language models have demonstrated remarkable success in various NLP tasks such as text generation, summarization, question-answering, and sentiment analysis.

Why Do Large Language Models Matter to Product Managers?

LLMs are continuously improving. As they get better, these models can generate higher-quality content, making them valuable across different products and applications.

Since product managers continually build new products that solve emerging user problems, understanding LLMs will help you drive innovative product development.

WHAT PRODUCT MANAGERS NEED TO KNOW ABOUT LLMS

Aside from the general idea of how large language models work, a PM in the tech industry needs to understand specific intricacies.

1. Neural Networks

How effective is your large language model?

One can determine an LLM’s performance by looking at the size of its architecture, which is based on artificial neural networks.

Neural networks are LLMs’ underlying computational structures inspired by the human brain. They are designed to process and generate natural language text through layers of interconnected “neurons,” or nodes, that learn from data training.

Neurons carry a set of numbers that serve as the network’s parameters, indicating the strength of connections between neurons. Each neuron receives input signals, processes them using weighted connections and activation functions, and produces an output signal.

Neural networks consist of an input layer (which receives the data), hidden layers (which perform the computations), and an output layer (which creates the prediction).

As the hidden layers perform computations, they learn patterns and relationships between the existing input and output data—the process called training.

Within these hidden layers, each computation represents how a specific neuron relates to the neurons in the previous layer. As the network trains, it improves its understanding of the data, which enhances its ability to make accurate predictions and decisions.

In the context of LLMs, the higher the number of parameters, the higher the language model’s capacity. Parameters are like the model’s building blocks, serving as the knobs and dials that adjust during training to optimize its performance.

Despite the size, large language models don’t store information like hard drives do. Instead, they learn complex patterns and relationships in the data through training, enabling them to generate human-like text and understand language at a sophisticated level.

2. Network Architectures

LLMs can be built in different architectures, depending on the tasks they are intended to complete.

A model’s architecture refers to its overall design and structure. It defines how a model processes input data, captures dependencies, and generates output.

Different architectures have distinct strengths and weaknesses, making them more suitable for certain tasks than others. Here are a few examples:

A. Recurrent Neural Network (RNN)

RNNs allow the output of one computation to serve as input for future evaluation. They are most useful in generative functions such as machine translation, where the model learns from previous inputs and outputs.

There is also another version of RNNs called Long Short Term Memory (LSTM). This variant can hold a lot more memory, being able to process longer sequence outputs compared to traditional RNNs.

B. Convolutional Neural Network(CNN)

A CNN is a network with one or more convolutional layers. It is a deep learning model that’s particularly useful in image-related tasks, such as recognition or classification, but can also be used in NLP functions.

CNNs are specifically designed to exploit the spatial structure present in images by using convolutional layers, which extract features from local regions of the input data.

By stacking multiple convolutional layers, CNNs can capture hierarchical representations of visual features, allowing them to learn complex patterns and relationships within images.

Convolutional Neural Network — *Source: GitHub*

C. Generative Adversarial Network (GAN)

This is a type of neural network architecture employed for generative tasks, such as image synthesis, image translation, and data generation.

GANs consist of two interconnected models: a generator and a discriminator. The generator generates synthetic data, such as images or text, while the discriminator evaluates whether the generated data is real or fake.

Through an adversarial process, the generator aims to produce data that is indistinguishable from real data, while the discriminator strives to become better at distinguishing between real and fake data. This process encourages both models to improve over time, resulting in the generation of increasingly realistic synthetic data.

D. Transformer

This is a type of deep learning architecture that is particularly prominent in NLP tasks, such as text generation, question-answering, sentiment analysis, summarization, and translation.

Unlike traditional models which process data sequentially (e.g., Recurrent Neural Networks), transformers process entire data sequences in parallel. This architectural choice improves efficiency and scalability in tasks involving large sequences of data.

OpenAI’s GPT is built upon the Transformer architecture. It uses self-attention mechanisms that allow the model to weigh the importance of different words in a sequence, capturing context.

The Transformer architecture has shown exceptional performance in many language tasks because it efficiently handles long-distance connections between words and can be trained quickly on large datasets using powerful hardware like GPUs and TPUs.

2. Prompt Engineering

Data training is a factor that makes a large language model generate good results. But it isn’t the only one—a good prompt is equally crucial.

LLMs map between two domains: the user domain, which is the user’s input, and the document domain, which is the data the model was trained with. The practice of creating prompts is called “prompt engineering,” and it’s becoming more and more in demand in the tech sector.

As a product manager, you can dive deep into prompt engineering to create a library of prompts that you can use to boost the efficiency of an AI model.

Prompt engineering has been used to create chatbots, optimize translation services, and create content. By engineering effective prompts, PMs can help train LLMs to generate targeted and relevant responses for their product.

Building Blocks of a Prompt

It’s essential to understand the fundamental components that make up a prompt, allowing you to structure one effectively. The building blocks of a prompt include:

Instruction: A task or command that a user provides for the model to perform.
Context: Additional information provided to the model to help it generate a more accurate response.
Input Data: The question or input for which users are interested in finding a response.
Output Indicator: Instruction provided to the model that indicates the type or format of output that the user wants to get.

Common Types of Prompts

Prompts can come in different forms in order to get responses unique to each need. Some of the common types that users can apply include:

Direct Prompting: providing context + instructions
Prompting with Examples: providing a clear, illustrative example of the result the user wants to get
Chain of Thought (CoT) Prompting: letting the model explain its reasoning in a series of prompts
Role Prompting: assigning a role to the model and then asking questions accordingly.

Prompting Best Practices

As developers and users explore and optimize models, they have developed a list of best practices in LLM prompting. Some of these include:

Be conversational. (E.g., Instead of asking, “Give me a product plan…”, say, “Write a detailed product plan for the launch of a brand new product called…”
Provide clear, concise instructions with no ambiguity. Mention what information is most important.
Structure the prompt as follows: First, define the purpose, then provide input/context, and lastly, state the command.
Provide specific, varied examples to help the model narrow its focus and produce a more accurate response.
Avoid providing complex instructions. Break it down into more straightforward prompts.
Before getting the final results, instruct the model to verify its answers. You can say, “Rate your work from 1-10,” or “Do you think this is correct?”

Check out more prompting best practices in this blog from OpenAI.

Prompt engineering is more than just providing the model with a set of instructions. In fact, you can create a customized model through prompt engineering using platforms such as the Open AI Playground.

If you want a deeper understanding of prompt engineering, check out this video below from one of our community sessions, where a Gen AI Product Advisor discusses prompt engineering for PMs.

At PM Exercises, our community of AI PMs meets every week to discuss the latest developments in the world of artificial intelligence from a product management perspective. We also share personal experiences and look into new advancements in the field.

Join our FREE AI product management community sessions today.

3. Transformer Model

Traditionally, LLMs use recurrent neural networks (RNNs) or convolutional neural networks (CNNs) to process input data sequentially or with fixed-size windows.

Today, modern LLMs use the Transformer architecture for natural language processing. How is the Transformer model different?

Essentially, Transformers can process large amounts of data all at once (parallelization) rather than one at a time (sequentially). This type of architecture has enabled training in much larger datasets than the previous ones can.

Transformer architecture excels at understanding context, making it the go-to choice for most language models at present.

It stands on two key components:

Word Embeddings

In a large amount of text data, words in sentences are represented by a set of numbers. These numbers capture their meanings and context based on how they were used within the dataset. This representation is called word embedding.

Using numerical representations, the system can understand the relationship between words, allowing it to make better language predictions.

For example, each word in the sentence, “Hey, it’s nice to see you!” is converted into a numerical form (like vectors) that the model can comprehend. These vectors carry information about the words and their position in the sentence.

Through word embeddings, Transformer models can look for patterns in how each word relates to the others in various sentences.

Attention

Attention is the mechanism that allows a model to focus on various parts of the input text with different degrees of importance.

To do this, Transformers use self-attention layers within the neural networks. These layers help it focus on the vital parts of a sentence when understanding and generating text.

Let’s take a look at the sentence, “A lady went to the store.” If you’re reading this from a book, your brain will automatically focus on the keywords “lady” and “store.”

Similarly, the self-attention layer allows the model to figure out which words in a sentence are the most important. It does this by assigning scores to words that indicate their relative importance.

It then combines the information from each word in a phrase using these ratings, giving the most significant words more weight. This improves the machine’s ability to comprehend context in sentences, enabling it to improve language processing.

The introduction of the Transformer architecture in Google’s whitepaper entitled “Attention is All You Need” fundamentally changed the landscape of NLP in 2017.

Make sure to check out this research paper—if you haven’t yet—to understand how it has changed AI and opened up new possibilities in large language models.

Hey there! Considering an AI product manager course certificate?

4. LLM Capabilities

Unlike more predictable and linear computer programs, the process in which LLMs are developed gives rise to unexpected qualitative behaviors.

LLMs are capable of:

A. Text Generation, Editing, and Completion

LLMs can generate text that is contextually relevant to the given prompt. Many people take advantage of this to create blogs, social media posts, video scripts, and other forms of content.

These models also allow text editing and completion. For example, Jasper AI is a popular copywriting assistant that was created to end writer’s block. Jasper AI lets a user come up with an initial phrase for it to pick up on and continue writing.

B. Text Summarization

Summarization is a practical use case for research and academia. A user can provide the model with lengthy documents and get an informative summary with a single prompt, saving valuable time and effort.

C. Sentiment Analysis

LLMs are capable of interpreting, understanding, and categorizing the emotional tone behind words. For example, sentiment analysis can be useful in refining a large volume of customer feedback for a certain product. A company can then use the results to improve its marketing strategies and customer services.

D. Language Translation

One of the original use cases of LLMs is language translation, and the most recent models have made this even easier. A lot of publicly available LLMs can produce high-accuracy translations with a simple prompt.

E. Question Answering

LLMs leverage their deep contextual understanding to provide informative and accurate answers to questions across a wide range of domains.

F. Chatbots and Virtual Assistants

Thanks to the improvements in LLMs, we now have smarter chatbots and virtual assistants that can talk to customers more naturally. Using the generative AI and NLP capabilities of models like GPT-4, chatbots can assist users in a diverse range of tasks, such as customer support and appointment scheduling. It can also understand the context of a specific company, such as its internal documents, FAQs, or website information, by using processes like RAGs (Retrieval-Augmented Generative models).

Explore RAGs deeper in this community session video led by one of our expert AI PMs.

G. Role Playing

An LLM can simulate dialogues and interactions, which can be very helpful for many users. It involves instructing the LLM to “adopt” a particular position, occupation, or function, which the AI then uses to carry out the given work more effectively.

H. Code Generation

Various LLMs are trained on datasets that enable them to create code in different languages and across scenarios, serving as a helpful tool for programmers.

NOTE: It’s worth noting that there are many other emerging use cases for LLMs aside from this list above. As companies continue to innovate, new applications for LLMs are constantly being explored and developed.

5. LLM Risks

While the capabilities of LLMs to provide fluent and authoritative text are extensive, there are also various risks involved, including:

Hallucinations

A large amount of false information is uploaded to the Internet. LLMs can be trained on inaccurate, incomplete, or contradictory data and, therefore, generate biased or fabricated facts called “hallucinations.”

Companies have addressed hallucinations in their products in various ways, including providing users with predefined prompt templates to get the right results, supervising processes and outcomes, and fine-tuning. Fine-tuning is the process of optimizing a pre-trained model to better suit a specific task or dataset.

OpenAI also leveraged a method known as Reinforcement Learning with Human Feedback (RLHF). This involves training an AI agent called a “reward model” with direct human feedback and using it to optimize the performance of an AI model through learning from human feedback.

While traditional reinforcement learning (RL) only relies on a trial and error system to improve the results, RLHF leads to faster, more targeted learning through human expertise.

To do this, the model generates output based on given prompts or tasks, which is then evaluated by humans according to predefined criteria.

Human evaluators assign rewards or penalties based on the model’s response. These rewards guide the reinforcement learning process, where the parameters of the LLM are adjusted to maximize expected rewards over time.

Bias

LLM-generated results can show bias when the model is trained with data that isn’t diverse enough or lacks weight in representing certain groups. This can lead to ethical concerns.

Language models develop biases that may be present in the text, maintaining stereotypes and discriminations around age, race, gender, socioeconomic status, and more.

To address this, product teams must perform a proactive evaluation using rigid fairness metrics, mitigating the bias and optimizing the model every time bias is detected.

Inappropriate or Offensive Content

Another risk involved in building language models is the possibility of generating text that promotes hate, discrimination, or harm.

If so, the model can be used for cyberbullying or cause psychological harm to users seeking emotional support from a chatbot.

Issues of offensive content are usually addressed through human feedback and fine-tuning.

Copyright Issues

LLMs can unknowingly produce plagiarized text or anything that violates copyright laws.

When AI-generated articles or documents that resemble copyrighted materials are distributed without the right citations, ethical and legal concerns may arise.

In this case, developers may use novel unlearning techniques such as:

data filtering, which involves selectively removing specific data points or patterns from the training dataset,
gradient methods, which use optimization algorithms that adjust the model parameters based on the gradient of the loss function to minimize prediction errors,
and in-context unlearning, which involves updating the model by removing or adjusting previously learned information based on new context or data inputs.

However, you can only imagine how challenging it is to retrain a model to unlearn potentially copyrighted material. Most of these strategies are resource-intensive and time-consuming.

In a statement following a copyright suit, OpenAI argues that using copyrighted material is a “non-expressive intermediate copying,” saying that it serves a groundbreaking purpose. They believe the intention is to use the data to make LLMs learn “patterns inherent in human-generated media,” which differs from original works’ “entertainment” purpose.

There is a lot of debate on this topic in the industry. Time will tell which viewpoint will dominate the ecosystem in the years to come.

Continuous mitigation is still necessary.

Other LLM Misuse or Risks

Privacy risks (from LLMs unintentionally copying potentially personally identifiable data);
Easier malware programming (using LLM-generated codes)
Social media chatbot risks;
Using generated text for malicious acts such as phishing or spamming;
Psychological harm (e.g., users using chatbots to get emotional support but get damaging responses instead);
and more.

Conclusion

The risks of LLMs and AI, in general, extend beyond incorrect responses or false information.

It is nearly impossible to predict every possible scenario, especially if there is malicious intent to misuse an AI product.

While the dual use of the internet isn’t new, product managers need to keep looking for ways to ensure that large language models aren’t exploited for harmful purposes.

What PMs Can Do To Enhance their AI Product Management Skills

LLMs and their surrounding ecosystem are continuously evolving. Who knows what exciting progress will take place in a few years?

If you are a product manager looking to advance your career as an AI PM, what steps can you take to understand and master product management in LLMs, AI, and machine learning?

1. Study the Math Behind ANNs

Learning the math behind artificial neural networks (ANNs) is an advantage for understanding how LLMs and deep learning work. Thankfully, the math behind ANNs is pretty elementary.

This mathematical concept is built upon three components: differentiation (calculus), linear algebra, and probability theory. Don’t worry; you may have learned these in high school or your first year of university.

Check out this article that explains how neural networks work using a basic example of a data-fitting equation. With this mathematical concept, you will have a deeper understanding of how LLMs generate predictions and how to fine-tune a model.

2. Take an AI/ML Product Management Course

Understanding AI and machine learning algorithms can be challenging for an aspiring AI product manager when done unstructured. While data science and programming skills aren’t required for an AI PM job, a solid technical background in machine learning is essential.

PM Exercises offers an AI product management learning program. It’s a 4-week course that covers the fundamentals of AI and machine learning, specifically in product management.

This cohort is designed for product managers with or without technical background. It was developed by AI PM experts working at Google and Uber, creating a live-workshop-based learning program that teaches the important concepts of how to build an AI-first product.

Choosing to take product management courses in AI/ML lets you acquire essential knowledge in a condensed timeframe, accelerating your career growth in this industry. Additionally, you have the opportunity to gain network and mentorship that will help you even in your application and interviews.

3. Explore AI Tools & Build Your Own Product

In our AI/ML product management cohorts, PMs not only learn the fundamental and technical concepts of AI product management but also have the opportunity to build their own AI products, some of which have turned into actual companies.

Using no-code AI platforms, building a product can help you get hands-on experience that will deepen your understanding of AI product management and prepare you for real-world AI PM responsibilities.

Hey there! Want to become a certified AI product manager? We’ve got you covered.

Check out this comprehensive AI product management course with hands-on live workshops, a private community, and coaching.

Final Thoughts

LLMs are the future of productivity and innovation. Whether you are looking into generative AI to enhance your work as a product manager or are vying for an AI product manager role, learning the inner workings of LLMs can go a long way.

At Product Management Exercises, we host a community of PMs preparing for their interviews to land the perfect PM job. With this, a self-paced interview prep course, a collection of interview questions, and the opportunity to conduct mock interviews with your co-PMs, PM Exercises aim to support your success in product management.

We are also offering an AI/ML Product Management Learning Program to help you advance in your PM career and enter the world of AI-based product development.

Sign up for free and access an all-in-one platform to help you prepare for your PM interviews.

What are Large Language Models (LLMs)?

Difference Between an LLM and Generative AI

Why Do Large Language Models Matter to Product Managers?

WHAT PRODUCT MANAGERS NEED TO KNOW ABOUT LLMS

1. Neural Networks

2. Network Architectures

A. Recurrent Neural Network (RNN)

B. Convolutional Neural Network(CNN)

C. Generative Adversarial Network (GAN)

D. Transformer

2. Prompt Engineering

Building Blocks of a Prompt

Common Types of Prompts

Prompting Best Practices

3. Transformer Model

Word Embeddings

Attention

4. LLM Capabilities

A. Text Generation, Editing, and Completion

B. Text Summarization

C. Sentiment Analysis

D. Language Translation

E. Question Answering

F. Chatbots and Virtual Assistants

G. Role Playing

H. Code Generation

5. LLM Risks

Hallucinations

Bias

Inappropriate or Offensive Content

Copyright Issues

Other LLM Misuse or Risks

Conclusion

What PMs Can Do To Enhance their AI Product Management Skills

1. Study the Math Behind ANNs

2. Take an AI/ML Product Management Course

3. Explore AI Tools & Build Your Own Product

Final Thoughts

Bijan Shahrokhi

How to Become an E-commerce Product Manager

Tools to Make Product Management More Efficient in 2024

Ready to land your dream PM job? Join our community to learn how to ace your interviews and more!

Leave a Reply Cancel reply

You may also like