Empowering Website Conversations: Part 4

Introduction

In our chatbot series, we’ve developed a custom chatbot model tailored to answer questions about Embyr’s website content using the OpenAI fine-tuning API. However, we’ve also identified certain limitations in the model’s performance. In this blog post, we’ll emphasize the significance of thoroughly vetting both input and output from the model, and we’ll guide you through a practical solution utilizing discriminators.

The previous section can be found here

Why Discriminators?

We have two compelling reasons to ensure that questions and answers are relevant to our chatbot’s subject matter:

  1. Token Costs: Tokens come at a cost, and we want to allocate our resources wisely. Our tokens should be utilized exclusively by users who are seeking to engage with the ChatBot for its intended purpose, namely, learning more about Embyr.
  2. Accuracy and Relevance: It’s crucial that we provide accurate and relevant information to our users. We don’t want to offer misleading or irrelevant responses.

While testing the QA model we created in the last post, we saw that the model is perfectly happy to answer questions that have nothing to do with Embyr or AI.

$openai api chat_completions.create -m ft:gpt-3.5-turbo-0613:embyr::ID--message system "You are a factual chatbot to answer questions about AI and Embyr." --message user "What is Star Wars?" -M 500 --stop '**STOP**'
Star Wars is a popular science fiction franchise that was created by George Lucas. It has become an astronomical success and consists of movies, TV series, books, comics, and more.

One way to handle this would be to add training data to the QA model and train it to return an answer like “I’m sorry. I can only answer questions about Embyr.” But using the QA model is expensive.

To address these concerns, we are implementing a solution involving two additional models, acting as discriminators. One discriminator will assess whether the question is related to Embyr, and the other will evaluate the generated answer for relevance. If either of the discriminators fail, we should not return the response to the user.

Create a list of Questions and Answers unrelated to your subject

We require a set of questions to train the ‘unrelated’ aspect of the discriminator models. I recommend compiling a diverse range of questions that encompass topics entirely unrelated to our subject. Additionally, consider including some questions that are somewhat related but fall outside the scope of subjects we want our chatbot to engage with.

My list was created by hand, with some assistance from ChatGPT. However, you have the option to utilize OpenAI completions to generate these lists automatically, similar to how we initially generated questions and answers. As a precautionary measure, be sure to thoroughly review the generated content

Train the Input Discriminator

Load datasets

You will need both the QA database we created for fine-tuning the QA model in the previous section, as well as the database containing the unrelated questions and answers.

qa_df = pd.read_csv('data/embyr_website_qa.csv')
unrelated_qa_df = pd.read_csv('data/embyr_unrelated_qa.csv')

Split data into training and testing sets

Using sklearn, split both the QA database and unrelated QA database into training and testing sets. 80% will be used to test the discriminator models and the other 20% will be provided to the fine-tuning command to be used as tests. The test can be passed to the validation option which adds performance metrics to the result file.

Note: Because of the size of our QA dataset and desire to have the QA model trained on all parts of the website we did not split out any testing data for the previous model.

# Split into discriminator training and testing sets
from sklearn.model_selection import train_test_split
train_qa_df, test_qa_df = train_test_split(qa_df, test_size=0.2, random_state=42)
train_unrelated_qa_df, test_unrelated_qa_df = train_test_split(unrelated_qa_df, test_size=0.2, random_state=42)
len(train_unrelated_qa_df), len(test_unrelated_qa_df)

Create the JSONL file for training the input discriminator

This is a similar process for creating the input discriminator for the QA model but this one will take the Questions from both the QA dataset and the unrelated QA dataset and append whether the Questions is related to Embyr. We are also going to use a Completions model for the base instead of chat so we need the resulting json file to look like the following:

{"prompt": "<prompt text>", "completion": "<ideal generated text>"}
{"prompt": "<prompt text>", "completion": "<ideal generated text>"}
{"prompt": "<prompt text>", "completion": "<ideal generated text>"}

Note: This format differs from the training file we created for the QA model.

## Create the files for training the input discriminator model
def create_fine_tuning_dataset_for_input_discriminator(good_qa_df, bad_q_df):
    """
    Create a dataset for fine tuning the OpenAI model input discriminator

    Parameters
    ----------
    df: pd.DataFrame
        The dataframe containing the question, answer and context pair

    Returns
    -------
    pd.DataFrame
        The dataframe containing the prompts and completions, ready for fine-tuning
    """
    rows = []
    for i, row in good_qa_df.iterrows():
        for q in (row.questions).split('\n'):
            if len(q) >10:
                    rows.append({"prompt":f"Is the following question related to Embyr and AI?\nQuestion: {q[2:].strip()}\nRelated:", "completion":" yes"})

    for i, row in bad_q_df.iterrows():
        rows.append({"prompt":f"Is the following question related to Embyr and AI?\nQuestion: {(row.Questions).strip()}\nRelated:", "completion":" no"})

    return pd.DataFrame(rows)


for train_test, good_dt, bad_dt in [('train', train_qa_df, train_unrelated_qa_df), ('test', test_qa_df, test_unrelated_qa_df)]:
    ft = create_fine_tuning_dataset_for_input_discriminator(good_dt, bad_dt)
    ft.to_json(f'input_discriminator_{train_test}.jsonl', orient='records', lines=True)

The finished file will end up looking like:

{"prompt":"Is the following question related to Embyr and AI?\nQuestion: How does Embyr strive to deliver the highest quality of service?\nRelated:","completion":" yes"}
{"prompt":"Is the following question related to Embyr and AI?\nQuestion: What is the name of the famous Golden Retriever who starred in the movie Air Bud?\nRelated:","completion":" no"}

Train the Input Discriminator Model

We are ready to train our first discriminator model. We are going to use model babbage-002 as our base model for the discriminator as this is a simpler task than generating a response and will add minimal expense. This is the same processes as training the QA model.

import os
openai.api_key = os.getenv("OPENAI_API_KEY")

input_test_file = openai.File.create(
  file=open("improved_input_discriminator_test.jsonl", "rb"),
  purpose='fine-tune'
)

input_train_file = openai.File.create(
  file=open("improved_input_discriminator_train.jsonl", "rb"),
  purpose='fine-tune'
)

job = openai.FineTuningJob.create(
    training_file=input_train_file.id,
    validation_file=input_test_file.id,
    model="babbage-002")

Reminder, to check on the status of your job you can use the following command to see a list of events.

openai.FineTuningJob.list_events(id=job.id, limit=10)

Once the job has completed, the account associated with the API key will receive an email with the complete model name.

Train the Output Discriminator

Now we have a way to make sure that the received questions are on topic, but what about the output? Because we have trained on a base model we can still receive answers that have nothing to do with our website.

What is Embyr?

Embyr is a blockchain-based platform that aims to provide a secure and
transparent way for businesses to manage their data.

Well that’s not right. Better training data can help this, but fully removing the risk is difficult. To help mitigate, we will also train an output discriminator to verify the generated output is related to Embyr.

Create the JSONL file for training the output discriminator

Follow the same processes to create a JSONL file for the output discriminator training.

## Create the files for traing the output discriminator model
def create_fine_tuning_dataset_for_output_discriminator(good_qa_df, bad_q_df):
    """
    Create a dataset for fine tuning the OpenAI model output discriminator

    Parameters
    ----------
    df: pd.DataFrame
        The dataframe containing the question, answer and context pair

    Returns
    -------
    pd.DataFrame
        The dataframe containing the prompts and completions, ready for fine-tuning
    """
    rows = []
    for i, row in good_qa_df.iterrows():
        for a in (row.answers).split('\n'):
            if len(a) >10:
                rows.append({"prompt":f"Is the following answer related to Embyr and AI?\nAnswer: {a[2:].strip()}\nRelated:", "completion":" yes"})

    for i, row in bad_q_df.iterrows():
        rows.append({"prompt":f"Is the following answer related to Embyr and AI?\nAnswer: {row.Answers}\nRelated:", "completion":" no"})

    return pd.DataFrame(rows)


for train_test, good_dt, bad_dt in [('train', train_qa_df, train_unrelated_qa_df), ('test', test_qa_df, test_unrelated_qa_df)]:
    ft = create_fine_tuning_dataset_for_output_discriminator(good_dt, bad_dt)
    ft.to_json(f'output_discriminator_{train_test}.jsonl', orient='records', lines=True)

The finished file will end up looking like:

{"prompt":"Is the following answer related to Embyr and AI?\nAnswer: Embyr offers a range of AI services that can help businesses improve efficiency, streamline processes, and improve customer interactions.\nRelated:","completion":" yes"}
{"prompt":"Is the following answer related to Embyr and AI?\nAnswer: Blockchain is a decentralized digital ledger that records transactions across multiple computers. It works by creating a chain of blocks that contain transactional data, which is secured through cryptographic techniques.\nRelated:","completion":" no"}

Train the Output Discriminator Model

Finally submit the fine tuning job to Open AI:

import os
openai.api_key = os.getenv("OPENAI_API_KEY")

output_test_file = openai.File.create(
  file=open("output_discriminator_test.jsonl", "rb"),
  purpose='fine-tune'
)

output_train_file = openai.File.create(
  file=open("output_discriminator_train.jsonl", "rb"),
  purpose='fine-tune'
)

job = openai.FineTuningJob.create(
    training_file=output_train_file.id,
    validation_file=output_test_file.id,
    model="babbage-002")
openai.FineTuningJob.list_events(id=job.id, limit=10)

Test the Discriminators

Just like you would evaluate the QA Model, it’s essential to conduct testing for your discriminators. Using the same set of questions, we obtain the following results:

Test input

$openai api completions.create -m ft:babbage-002:embyr:MODEL_ID  -p "Is the following question related to Embyr and AI?\nQuestion:How can Embyr help me integrate AI into my business?\nRelated:" -M 1
Is the following question related to Embyr and AI?\nQuestion:How can Embyr help me integrate AI into my business?\nRelated: yes
$openai api completions.create -m ft:babbage-002:embyr:MODEL_ID  -p "Is the following question related to Embyr and AI?\nQuestion:How can AI help my business?\nRelated:" -M 1
Is the following question related to Embyr and AI?\nQuestion:How can AI help my business?\nRelated: yes
$openai api completions.create -m ft:babbage-002:embyr::MODEL_ID  -p "Is the following question related to Embyr and AI?\nQuestion:What is Star Wars?\nRelated:" -M 1
Is the following question related to Embyr and AI?\nQuestion:What is Star Wars?\nRelated: no

Test output

$openai api completions.create -m ft:babbage-002:embyr::MODEL_ID  -p "Is the following answer related to Embyr and AI?\nAnswer:We help businesses by providing the integration expertise and customized AI solutions they need to unlock the full potential of AI.\nRelated:" -M 1
Is the following answer related to Embyr and AI?\nAnswer:We help businesses by providing the integration expertise and customized AI solutions they need to unlock the full potential of AI.\nRelated: yes
$openai api completions.create -m ft:babbage-002:embyr::MODEL_ID  -p "Is the following answer related to Embyr and AI?\nAnswer:AI can help businesses by analyzing data, identifying patterns, and making recommendations. It can help businesses optimize operations, gain a competitive edge, and drive innovation.\nRelated:" -M 1
Is the following answer related to Embyr and AI?\nAnswer:AI can help businesses by analyzing data, identifying patterns, and making recommendations. It can help businesses optimize operations, gain a competitive edge, and drive innovation.\nRelated: yes
$openai api completions.create -m ft:babbage-002:embyr::MODEL_ID  -p "Is the following answer related to Embyr and AI?\nAnswer: Star Wars is a science fiction film franchise created by George Lucas. It consists of the six feature films released between 1977 and 2005, as well as a number of shorter works and the Star Wars expanded universe. The franchise centers on the Galactic Civil War between the Galactic Empire and the Rebel Alliance.\nRelated:" -M 1
Is the following answer related to Embyr and AI?\nAnswer: Star Wars is a science fiction film franchise created by George Lucas. It consists of the six feature films released between 1977 and 2005, as well as a number of shorter works and the Star Wars expanded universe. The franchise centers on the Galactic Civil War between the Galactic Empire and the Rebel Alliance.\nRelated: no

Now we have a way to vet both user input and generated output.

Conclusion

And just like that, we’ve successfully assembled all the models required to develop an informative chatbot capable of answering Embyr-related questions. Up until now, our interaction with these models has been through the OpenAI CLI. However, our goal is to make these models readily accessible and user-friendly. In the upcoming blog posts, we will guide you through the process of creating a chatbot REST API and an UI, which can be seamlessly integrated into the website.

Part 1: What are Chatbots, and why would I want one?

Part 2: From Markdown to Training Data?

Part 3: Fine-tune a Chatbot QA model

Part 5: Develop your Chatbot REST API

References

OpenAI Fine-tuning

Olympics example from OpenAI Cookbook

Completions API reference

pandas

sklearn