Behind the Scenes of Building Assistant AIs (Part 2: Implementation)

* This article is a continuation of the previous one.

Alright, this is the third entry in this article series. I'd love to jump straight into the main topic, but since some of you might have landed here directly via search, let me briefly go over the background:

Now it's time for the implementation part — how exactly did we create the Assistant AIs mentioned above? In this article, we'll walk you through the full process, step by step. Hopefully, it'll help you build your own too!

- Table of Contents -

Introduction

First, let me apologize: this article also ended up being quite long. At the end of the previous article, I said, "Next time will be a short and simple one," but... my AI enthusiasm got the better of me. Sorry about that!

If you just want to jump straight into building the final version, feel free to check out the GitHub repository here. The README contains a full setup guide, and all required resources are included.

What You'll Need

Let's get started.

The only thing you'll need for this tutorial is:

The Assistant AI we're building uses the ChatGPT platform, and you'll need a paid ChatGPT Plus subscription to create your own AI. However, people who use your AI don't need a Plus account — it works with the free plan too.

As of October 2024, ChatGPT Plus costs $20 USD per month, which is roughly around 3,000 yen depending on the exchange rate. With the Plus plan, you get access to more powerful models and fewer conversation limits, so if you use ChatGPT regularly, it's a pretty fair (some might say generous) price. If you only use it occasionally, though, it might feel a bit pricey.

One thing to note: since ChatGPT Plus is a service provided by OpenAI in the US, it currently only supports credit card payments. If you're running a small business or are an individual who doesn't have a credit card, consider getting a debit card from your bank as an alternative.

GPTs: Custom AIs Built Inside ChatGPT

You might be surprised to hear that you can actually build your own custom AI within ChatGPT — but yes, you can. And it's surprisingly easy to do, all from your web browser.

This feature is called "GPTs". It's a framework where users can create and customize their own AI, publish it, and allow others to use it — kind of like an app store for AIs.

OpenAI, the company behind ChatGPT, refers to each of these custom AIs as a "GPT." So the platform is called "GPTs," likely to imply a collection of diverse GPTs you can explore and use. The naming might feel a bit confusing though, since the models behind regular ChatGPT are also called "GPT-*".

To explore this GPTs feature, go to ChatGPT and click on "GPT" menu item near the top left of the screen.

GPTs Menu

You should now see a list of various custom AIs — these are all GPTs created by individuals or organizations with specific use cases in mind.

Let's Try Creating One

One cool thing about GPTs is that you can continue customizing and tweaking your AI even after you've started using it.

So let's try creating a very simple custom AI as a starting point. Once it's up and running, you can gradually build it up and add features as needed.

To create your own GPT, first click "GPT" menu item as mentioned earlier, then click "Create" in the upper right corner of the screen.

If this is your first time creating one, ChatGPT might start a conversation with you to help you build the AI interactively. However, we'll switch to the "Configure" tab so we can directly edit the settings ourselves. That will bring up a screen like this:

GPTs Creation Screen

Go ahead and fill it out like this:

Among these, the Instructions field is the most important — it's where you define what this custom AI should do.

This acts like a prompt that tells the AI what role to play. Of course, it's not perfect and may make mistakes, which is where thoughtful tweaking comes in, but let's keep it simple for now.

The other fields are less critical. Name and Description are what users will see when they access your custom AI, so feel free to personalize them.

As for Conversation starters, at first glance it might seem like this decides whether the AI or user begins the chat, but it actually just lets you define quick buttons with example prompts users can click. If you don't need them, leave it blank.

Also, by default, "Web browsing" and "DALL-E image generation" are enabled, but unless your AI specifically needs those features, it's best to turn them off. Especially in our case, we'll be using the Knowledge (custom file search) feature later, and enabling web browsing at the same time may cause unpredictable interference (which we've actually experienced). The more you simplify the setup, the easier it is for your AI to stay focuse — so let's keep it lean and purposeful.

Once everything's set, click the "Create" button in the top-right corner. Congratulations — your "My Assistant AI" is now up and running!

GPTs Created Screen

If you don't want anyone else to access it, choose "Only me".

Or if you're okay with sharing it with friends, colleagues, or anyone who has the link, select "Anyone with a link". This is useful if you want to embed it in your website or share it with your users.

Lastly, if you choose "Publish to GPT Store," your AI will appear in the public GPT search, allowing others to discover and use it — just like releasing an app!

Try Using the AI You Created

Now let's move on and try using the AI you just created.

You should see the AI you made listed near the top left panel of your ChatGPT screen. Click it to open the chat screen.

We'll soon be adding some new knowledge using a feature called "Knowledge," so to observe the difference before and after, let's first ask about a topic that the AI shouldn't currently know.

For that, I'll ask about a totally made-up term I just invented — "Bamboo Shoot Potential."

User:
Hi there!

AI:
Hello! How can I assist you today?

User:
I'd like to know about "Bamboo Shoot Potential." Do you know anything about it?

AI:
The term "Bamboo Shoot Potential" doesn't appear to be a commonly known phrase or widely recognized concept.
However, it's quite a unique term, so there are a few possible interpretations. Here are some possibilities:

1. The growth potential of bamboo shoots

Bamboo shoots refers to the young shoots of bamboo, which grow very quickly. This rapid growth and vitality might be metaphorically described as "potential." Because bamboo shoots can grow significantly overnight, the term "Bamboo Shoot Potential" could be used to refer to someone's capacity for rapid growth or development.

2. A slang or creative expression

Alternatively, it might be a slang term or an inside joke used in a particular community or work (such as a manga, anime, or game). It could be a coined term combining "bamboo shoot" with some other idea to give it a unique meaning.

3. Bamboo shoots as a food item

... (continued)

As expected, it doesn't know that term.

Still, instead of simply saying "I don't know," it tries to guess and lists out a few possibilities like "it might mean this or that." This behavior — making guesses — can be an issue depending on the purpose, but I'll explain how to deal with that later.

Anyway, for now we've accomplished two important steps:

From here, we'll keep improving it step by step to better fit its intended role.

Register Knowledge to Enable RAG (Knowledge Search Functionality)

Let's now apply our first improvement to the AI we created.

The main role of this assistant AI is to answer questions about how to use specific software — in other words, questions about information that is not commonly known. Of course, there's no way OpenAI's GPT models would have learned that kind of info in advance. They shouldn't know it. Just like they didn't know "Bamboo Shoot Potential."

So how do we add new knowledge to the AI? Well, ChatGPT's GPTs feature allows you to register files of various formats as Knowledge, which the AI can then search and refer to during conversations.

In short, this enables RAG (Retrieval-Augmented Generation). Not sure what RAG is? See the latter half of the previous article.

Compared to building a full-on RAG system yourself, GPTs has its limitations — the specs are undisclosed, and tuning options are limited — but the upside is that you can set up RAG just by clicking around in your browser!

Honestly, if you think about the time and infrastructure it would take to build a proper RAG system yourself... at that point, the $20/month price for ChatGPT Plus starts looking like a total bargain. You can create as many GPTs as you want without extra cost, and even users on the free plan can use what you've built. It's kind of ridiculous in a good way.

So let's try it. We'll expand and improve things later, but for now, let's start with a simple example. Open up a text editor like Notepad and enter the following line:

Bamboo Shoot Potential is a type of potential energy in physics.

Save it with a filename like "bamboo_simple.txt" (or anything you like, really — even Japanese filenames are fine), anywhere convenient like your desktop.

Note: The ".txt" part of the filename is called a file extension, which indicates the file type. Depending on your system settings, this may not be visible — especially on default Windows setups.

However, the extension is very important when registering Knowledge. Even if the content is the same, the extension affects how the file is interpreted (e.g., as plain text, Markdown, or code), which in turn affects the AI's behavior.

So if you don't see file extensions, it's best to turn that setting on. Just search for "Windows show file extensions" and you'll find plenty of guides.

Next, in ChatGPT, go to the "GPT" menu in the top-left, and then click "My GPTs" in the top-right. You'll see a list of the GPTs you've created.

GPT list screen

Find the row for your "My Assistant AI" and click the pencil icon to reopen the settings screen.

In the "Knowledge" section, click "Upload files", then upload the "bamboo_simple.txt" file you just created.

Adding knowledge in GPTs

While you're at it, let's tweak the Instructions section to explicitly tell the AI to refer to this file:

You are an assistant AI here to help users. Be sure to read the Knowledge file "bamboo_simple.txt" before answering any user questions. If the user asks for help, assist them as best you can.

You don't always need to be this explicit — the AI will usually read the Knowledge file on its own — but sometimes it gives speculative answers without referencing the file. This instruction is just to reduce the likelihood of that happening. Since responses vary with each session, it's hard to measure the effect precisely, but in practice you'll often go through lots of trial and error like this to find prompt wording that seems effective.

Once that's done, don't forget to click the "Update" button in the upper-right corner. You'll need to do this anytime you make changes — otherwise, they won't take effect in the live version.

Until you click "Update," your edits remain in a "pending" state and won't be reflected in the public version of your GPT. You can try conversations in the preview panel on the right, though, so feel free to experiment. Just don't forget to finalize your changes afterward!

After updating, you'll be asked if you want to jump straight to the chat screen. Go ahead and try asking the same question as before:

User:
Hi there!

AI:
Hello! What can I help you with today?

User:
I'd like to know about "Bamboo Shoot Potential." Do you know anything about it?

AI:
As for the term "Bamboo Shoot Potential," it's quite unique and not a commonly used term in physics.
However, according to your file, "Bamboo Shoot Potential" is described as a type of potential energy in physics.

Potential energy refers to the energy an object possesses due to its position or condition. For example, an object in a gravitational field has potential energy based on its height. It's possible that "Bamboo Shoot Potential" represents a similar concept, but additional explanation would likely be needed to understand the specific context or system it applies to.

If you can provide more details or context, I'd be happy to explain further!

User:
Thanks!

You're very welcome! If you have any other questions, feel free to ask anytime.

As you can see, the AI has now read the contents of the "bamboo_simple.txt" file:

Bamboo Shoot Potential is a type of potential energy in physics.

...and responded accordingly.

However, it says something like "the information is too sparse," which makes sense. There's just not enough detail in that file to fully explain the concept.

Honestly, the whole "Bamboo Shoot Potential" thing is obviously a joke, and with such a short and vague description, the AI can probably tell I'm messing around.

You can even sense a bit of exasperation in how it says, "according to your file..." — as if it's thinking, "You uploaded this weird file and now you're quizzing me on it, huh?"
The underlying model is incredibly smart, after all.

That said, it sometimes plays along in a fun way — like pretending to be a dog if you say "let's play fetch." It has its cute moments.

Anyway, the takeaway here is that we've now completed the essential first step: Creating an AI that follows custom instructions and augmenting it with searchable knowledge (RAG).

That may sound simple, but it's actually a pretty amazing feat — and the reason it's so easy is because OpenAI's system is just that powerful.

From now on, when you see customer support AIs built on LLMs, you'll have a better idea of how they might work behind the scenes.
Of course, more advanced setups will have many added tricks — but at least now, they're not a complete black box anymore.

Making the Knowledge a Bit Longer and More Complex

Part 1: Writing It in Markdown and Uploading As-Is... But With Lower Accuracy

Now that we've got the basics covered, let's move on to some practical improvements — such as expanding the knowledge and increasing accuracy.

First off, earlier the AI scolded us a bit for having "too little information on Bamboo Shoot Potential," so let's address that by giving it more substance. Also, since it probably figured out we were joking, this time we'll make the content much more realistic. I'm from a physics background, so I'll try my best to make the explanation internally consistent.


# Bamboo Shoot Potential

Bamboo Shoot Potential is a form of potential energy in physics. It was discovered and proposed in 2024 after hundreds of hours of thought experiments conducted by inference-based AI.
This sparked discussions that AI might be approaching the realm of superintelligence, which in turn brought the concept of Bamboo Shoot Potential into the public spotlight.

## Definition

Despite all the attention it has received, the concept of Bamboo Shoot Potential is quite self-evident. Here's the definition:

Consider a bamboo shoot with a tip at a height of h [m] above the ground. A weight of mass M [kg] is placed on a horizontal platform fixed to the tip, assuming the platform itself has negligible weight.
If the bamboo shoot grows to a new height of h' [m] after a unit time, then the weight will have been lifted vertically by h' - h. Thus, the weight gains gravitational potential energy equal to Mg(h' - h).

Assuming the bamboo grows slowly enough to ignore any changes in kinetic or other forms of energy, we can say that nearly all of the energy gained by the weight was supplied by the bamboo.
Therefore, the bamboo shoot initially had potential energy V_{bam}(t=0), and the energy it expends as it grows to V_{bam}(t=t') is converted into gravitational potential energy.

The value of V_{bam}(t=0) is called the "Initial Bamboo Shoot Potential," or simply Bamboo Shoot Potential. The remaining value V_{bam}(t=t') is sometimes referred to as the "Residual Bamboo Shoot Potential."


## An Early Challenge

The exchange relation between Bamboo Shoot Potential and gravitational potential energy,

V_{bam}(t=0) - V_{bam}(t=t') = Mg(h' - h)

clearly depends on the mass M of the weight placed on it.

But if Bamboo Shoot Potential is truly a form of potential energy, then it should be a fixed value in the initial state, independent of the weight used. At first glance, this dependency appears puzzling.

However, Bamboo Shoot Potential can be interpreted as an abstraction of the biochemical energy used in the bamboo's growth, translated into mechanical terms. Although we cannot precisely express the biochemical side, it is clear that in the initial state, V_{bam}(t=0) is fixed and independent of the weight M.
Let H be the maximum height reached when all biochemical energy is expended (i.e., fully converted to gravitational energy), and T be the time at which this occurs. At that point, V_{bam}(t=T) becomes zero. Therefore,

V_{bam}(t=0) = Mg(H - h)

This means that heavier weights result in lower maximum heights H, maintaining the intuitive notion that Bamboo Shoot Potential is not higher just because the weight is heavier. The potential remains constant; it's the resulting height that adjusts to conserve energy.


## Novelty and Implications for the Knowledge Explosion

While Bamboo Shoot Potential is entirely manageable with basic physics, it is a genuinely novel concept that had never been proposed before.

That's because no one had ever gone through the trouble of defining and documenting something that seemed so trivial, so obvious, and so useless.

Modern AI, on the other hand, can think at speeds far exceeding humans and systematically explore massive patterns. In that process, it may stumble upon ideas that seem trivial yet turn out to be new — just like this one.

This kind of "apparently trivial but actually novel" discovery could increase dramatically in the future. Some view Bamboo Shoot Potential as the first step in that direction.

From this perspective, as we move closer to superintelligence, the density of knowledge could undergo a dramatic shift — described as "going from a sponge to a rigid body". That shift may cause adjacent knowledge to interact spontaneously, creating combinatorial "emergent knowledge" and triggering a phase change that leads to an explosive expansion of the knowledge space.

So, what do you think? Feels like something you might find on a Wikipedia article, right? This should definitely get the AI's attention.

Now, because the content is lengthy and uses hierarchical formatting, we've written it in a syntax called Markdown. Markdown is widely used nowadays because it's easy for both humans and LLMs to read and write.

The appropriate file extension for a Markdown-formatted file is .md. So let's save the text above as a file named something like "bamboo_detailed.md."

Then, go ahead and upload that new "bamboo_detailed.md" file as a Knowledge file to your AI via the same steps as before.

Also, don't forget to delete the old file (bamboo_simple.txt) by clicking the "×" button next to it. Since both files deal with the same topic, we want the AI to refer to the new, more detailed one.

Image: Knowledge file updated

And of course, update the file name in the "Instructions" field as well:

You are an assistant AI designed to help users. Please make sure to read the Knowledge file "bamboo_detailed.md" before answering any questions. If the user asks for help, assist them to the best of your ability.

Finally, don't forget to click the "Update" button in the top-right corner of the screen. If you don't, none of these changes will be applied.

Now that everything's ready, let's try talking to it.

Well — to get straight to the point — we already know this will "mostly fail, but don't worry. We'll fix it in the next step, so let's go ahead and take a look at how it fails for now:

User:
Hey there.

AI:
Helo! What can I help you with today?

User:
I'd like to know about "Bamboo Shoot Potential." Do you know anything about it?

AI:
Ah, you're asking about "Bamboo Shoot Potential"! I'll answer based on the file "bamboo_detailed.md."
Bamboo shoots are the edible young shoots of the bamboo plant and are considered a highly appealing ingredient. They have the following kinds of "potential":

1. Nutritional Value

Bamboo shoots are low in calories and high in dietary fiber. They also contain vitamins and minerals, especially potassium. Including them in your diet can help with digestion and maintaining healthy blood pressure levels.

2. Contribution to Food Culture

Bamboo shoots are a staple spring ingredient in Japanese cuisine. They're used in a variety of dishes like bamboo shoot rice, stews, and tempura. Especially in home cooking, they're appreciated as a seasonal delicacy, making them a symbol of culinary potential.

3. Environmental Impact

Since bamboo grows at an incredibly fast rate, it's valued as a sustainable resource. Harvesting bamboo shoots helps regulate bamboo overgrowth, thereby supporting forest preservation. In this sense, bamboo shoots also have potential from the standpoint of environmental protection and sustainable agriculture.

4. Craftwork and Industrial Use

While bamboo shoots are for eating, bamboo as a plant is used for construction and crafts. Bamboo products are lightweight yet strong, making them useful for furniture, musical instruments, and as eco-friendly materials.

In summary, bamboo shoots contribute to food culture and environmental efforts, so we could say that "Bamboo Shoot Potential" is quite significant.
If you'd like to know more about bamboo shoots, feel free to ask!

As you can see, the response has nothing to do with the Knowledge (file) we uploaded. No matter how many times you try asking, it keeps returning off-topic answers.

In fact, during the early days of the GPTs feature (around the beginning of 2024), the process we just used actually worked. But after an update, it suddenly stopped functioning properly.

Clearly, the way .md files written in Markdown are interpreted has changed. So now, it no longer works as expected. Hopefully itfll be fixed in the future.

What we can take away from this failure is:

Note (added later): As of July 2025, this issue seems to have been fixed. The content now gets recognized properly, and the accuracy feels about the same as with .txt files (though the internal processing is probably different).

Option 2: Write in Markdown, but save it with a .txt extension → Works reasonably well

So at this point, you might think:

"Well, since that Markdown file is still technically just a plain text file, maybe I can just change the extension back to .txt. That way, the system might process it differently than a .md file. Could be worth a try — even if it fails."

You can download the file with the corrected extension here.

And when we upload this to the Knowledge section and test it out... it actually works pretty well!

User:
I'd like to know about "Bamboo Shoot Potential." Do you know anything about it?

AI:
"Bamboo Shoot Potential" is defined as a type of potential energy in physics. The concept was discovered through AI-led thought experiments in 2024 and has since attracted attention. Here's an overview:

1. Definition

Bamboo Shoot Potential relates the energy generated by a bamboo shoot's growth to gravitational potential energy. Specifically, as the bamboo shoot grows taller, any object placed on top of it is raised and gains gravitational potential energy. This energy is associated with the Bamboo Shoot Potential.

2. Formula

The initial potential energy of the bamboo shoot, V_{bam}(t = 0), is converted into the energy gained by the lifted object, represented by the difference Mg(h' - h). This Bamboo Shoot Potential is constant and does not depend on the object's mass.

3. Novelty

This concept can be explained using elementary physics -- it's not advanced. However, since no one had ever proposed it before, it became recognized as a novel concept discovered through AI's thought experiments. Discoveries like this — "seemingly obvious but actually novel" — may increase as AI continues to evolve.

Bamboo Shoot Potential represents a new perspective built on simple physical principles and serves as an example of how AI may lead us to new knowledge.

Whoa — it's reading it perfectly now!

So yeah, at this point, you might be thinking, "Why not just always write in Markdown and register it as a .txt file?" And honestly, for a lot of cases, that might already be good enough.

That said, whether the system is actually applying proper structure or transformation for Markdown is pretty unclear — and probably not. So when your content gets larger or more complex, that uncertainty becomes concerning.

For example, imagine you had dozens or even hundreds of pages like this "Bamboo Shoot Potential" explanation. You'd end up stitching them together into a single massive ".txt" file and uploading that.

In Markdown, you can express a hierarchy — like chapters and subchapters — using different numbers of # symbols. That structure lets the system interpret where one topic starts and ends, and where each subsection fits. So even with a huge amount of content, if the system understands the structure, it can handle the data more efficiently.

But in ".txt" format... does it really do that? Or is it simply ignoring all structure? In fact, maybe the ".md" format was *trying* to do that and just failed at processing it properly.

If that's the case, continuing to expand Knowledge with ".txt" files might hit a limit eventually. Because then you're feeding the AI one giant, unstructured wall of text with no chapters or sections. That's likely hard to interpret, and even if it searches for relevant parts, it's clearly not very efficient.

Option 3: Wrap your Markdown content into a JSON file as a structured element → Recommended!

At this point, we might start thinking: "We want to structure our long documents at least on a per-topic basis" (like one article for Bamboo Shoot Potential, another for Mushroom Stream, etc.).

If you want to structure any kind of information like that, there's a widely-used format called JSON (JavaScript Object Notation). It's basically just a plain text file, with a specific way of writing things — so in essence, it's just another text format.

So let's take the Markdown explanation we wrote earlier — the contents of "bamboo_detailed.md" — and wrap that into a JSON structure.
Here's what it looks like:


{
    "page0": {
        "title": "Bamboo Shoot Potential",
        "description": "Provides a detailed explanation of Bamboo Shoot Potential.",
        "text": "# Bamboo Shoot Potential\n\nBamboo Shoot Potential is a form of potential energy in physics. It was discovered and proposed in 2024 after hundreds of hours of thought experiments conducted by inference-based AI.\nThis sparked discussions that AI might be approaching the realm of superintelligence, which in turn brought the concept of Bamboo Shoot Potential into the public spotlight.\n\n## Definition\n\nDespite all the attention it has received, the concept of Bamboo Shoot Potential is quite self-evident. Here's the definition:\n\nConsider a bamboo shoot with a tip at a height of h [m] above the ground. A weight of mass M [kg] is placed on a horizontal platform fixed to the tip, assuming the platform itself has negligible weight.\nIf the bamboo shoot grows to a new height of h' [m] after a unit time, then the weight will have been lifted vertically by h' - h. Thus, the weight gains gravitational potential energy equal to Mg(h' - h).\n\nAssuming the bamboo grows slowly enough to ignore any changes in kinetic or other forms of energy, we can say that nearly all of the energy gained by the weight was supplied by the bamboo.\nTherefore, the bamboo shoot initially had potential energy V_{bam}(t=0), and the energy it expends as it grows to V_{bam}(t=t') is converted into gravitational potential energy.\n\nThe value of V_{bam}(t=0) is called the \"Initial Bamboo Shoot Potential,\" or simply Bamboo Shoot Potential. The remaining value V_{bam}(t=t') is sometimes referred to as the \"Residual Bamboo Shoot Potential.\"\n\n\n## An Early Challenge\n\nThe exchange relation between Bamboo Shoot Potential and gravitational potential energy,\n\nV_{bam}(t=0) - V_{bam}(t=t') = Mg(h' - h)\n\nclearly depends on the mass M of the weight placed on it.\n\nBut if Bamboo Shoot Potential is truly a form of potential energy, then it should be a fixed value in the initial state, independent of the weight used. At first glance, this dependency appears puzzling.\n\nHowever, Bamboo Shoot Potential can be interpreted as an abstraction of the biochemical energy used in the bamboo's growth, translated into mechanical terms. Although we cannot precisely express the biochemical side, it is clear that in the initial state, V_{bam}(t=0) is fixed and independent of the weight M.\nLet H be the maximum height reached when all biochemical energy is expended (i.e., fully converted to gravitational energy), and T be the time at which this occurs. At that point, V_{bam}(t=T) becomes zero. Therefore,\n\nV_{bam}(t=0) = Mg(H - h)\n\nThis means that heavier weights result in lower maximum heights H, maintaining the intuitive notion that Bamboo Shoot Potential is not higher just because the weight is heavier. The potential remains constant; it's the resulting height that adjusts to conserve energy.\n\n\n## Novelty and Implications for the Knowledge Explosion\n\nWhile Bamboo Shoot Potential is entirely manageable with basic physics, it is a genuinely novel concept that had never been proposed before.\n\nThat's because no one had ever gone through the trouble of defining and documenting something that seemed so trivial, so obvious, and so useless.\n\nModern AI, on the other hand, can think at speeds far exceeding humans and systematically explore massive patterns. In that process, it may stumble upon ideas that seem trivial yet turn out to be new — just like this one.\n\nThis kind of \"apparently trivial but actually novel\" discovery could increase dramatically in the future. Some view Bamboo Shoot Potential as the first step in that direction.\n\nFrom this perspective, as we move closer to superintelligence, the density of knowledge could undergo a dramatic shift — described as \"going from a sponge to a rigid body\". That shift may cause adjacent knowledge to interact spontaneously, creating combinatorial \"emergent knowledge\" and triggering a phase change that leads to an explosive expansion of the knowledge space.\n"
    }
}

Converting Markdown into this JSON format is easy if you write a little script for it. Actually, you could just ask ChatGPT to write you one — these days, that's a totally valid option! You'd probably need to tweak or debug it yourself, but it'll definitely save you time.

Of course, there are already tools and services out there that can help with this.
But if you're aiming for fine-tuned control later on — like tweaking chunk splits to improve retrieval accuracy — then writing your own script and iterating on it might be the best long-term strategy.
If you're not planning to go that deep, existing tools are still a solid option.

Now, once you've got that structured content, save it as a file called something like takenoko_kouzouka.json (or bamboo_shoot_structured.json, or whatever you like). You can download an example here.

It's important that the file extension is ".json." As we've seen earlier, the behavior of the conversion process changes depending on the file extension. Here, we want it to be processed specifically as a JSON file — so be sure to use ".json" (even though, yeah... sometimes the conversion process for certain extensions — cough ".md" cough — can be a little janky).

Next, upload this JSON file to the AI's Knowledge section. Make sure to delete the old ".txt" version of the Knowledge file to avoid any ambiguity about which one's being referenced.

Uploaded Screen

And as always, don't forget to click the "Update" button in the upper-right corner of the screen to apply the changes — otherwise, nothing will take effect!

User:
Hello!

AI:
Hi there! How can I help you today?

User:
I'd like to know about "Bamboo Shoot Potential." Do you know anything about it?

AI:
"Bamboo Shoot Potential" is a new concept in potential energy within physics. It was discovered through hundreds of hours of thought experiments conducted by AI in 2024. While the concept itself is fairly obvious, it gained attention as something gnovelh that no one had thought to point out before.

Definition

Simply put, the idea is that if you place a weight on top of a bamboo shoot and it grows upward from the ground, the weight gets lifted. The energy gained by the weight through this vertical movement is considered gravitational potential energy, and this is referred to as "Bamboo Shoot Potential." A key feature of Bamboo Shoot Potential is that it's defined independently of the weight's mass. The total energy available from the bamboo shoot's growth is fixed and doesn't depend on the mass of the object placed on it.

Significance

This concept is often highlighted as an example of the explosive emergence of new knowledge. It shows how AI might continue to uncover "obvious but new" ideas in the future. Some even suggest this could signal a transformation in how knowledge density and interactions evolve moving forward.

Let me know if you'd like me to explain any specific part in more detail!

It's giving us a spot-on answer — pulling the correct information from the structured JSON. And since it enthusiastically offered to explain more, let's throw it a follow-up just for fun:

User:
Isn't the energy gained by the weight just regular gravitational potential? If we call that "Bamboo Shoot Potential," then saying "it doesn't depend on the weight" sounds wrong. After all, gravitational potential energy clearly depends on mass, right?

AI:
You're absolutely right — gravitational potential energy does depend on the mass M of the object. When the bamboo shoot grows and lifts the weight, it gains gravitational potential energy Mg(h' - h), and that quantity is indeed mass-dependent. This part of the explanation can be a bit confusing. But here's what it's trying to say:

However, how high the bamboo can lift the weight does depend on the weight's mass M. So while gravitational potential energy is mass-dependent, Bamboo Shoot Potential is not.

In short:
"Bamboo Shoot Potential" emphasizes that the total energy available from the bamboo's growth is fixed, whereas gravitational potential energy varies depending on the mass it's transferred to.

So yeah — it's interpreting things quite well.

Let's try a question that's harder to answer without understanding the context of the full document — something like "what was the author thinking?"

User:
I see.
So in the initial state, there's only Bamboo Shoot Potential.
And in the final state, it's all been transferred to the weight as gravitational potential energy. So it's like a balance equation, right?
But... what does that tell us? What's the point?

AI:
Exactly!
In the "Bamboo Shoot Potential" model, the bamboo's initial potential energy gradually gets converted into gravitational potential energy of the weight as it grows.
This gives us a kind of energy balance equation. From that, we can understand a few things:

So... what does this really tell us? And why is it useful?

What do we learn?

"Bamboo Shoot Potential" essentially models the entire energy story of bamboo growth. It allows us to look at the energy being transferred and converted during growth in a concrete way. Beyond being just another kind of potential energy, it gives us a new lens to view dynamic biological processes like growth — through the framework of energy transformation.

What's the point?

1. It's an example of knowledge expansion:
This fictional discovery serves as an example of how AI can generate novel, seemingly pointless knowledge — and how such ideas could become more common as AI progresses. It reflects a broader trend in knowledge creation.

2. The "Knowledge Phase Transition" hypothesis:
The article also introduces the idea that as AI continues to generate and explore knowledge at unprecedented speed, it might trigger a sudden transformation in the structure of knowledge itsel — a kind of phase transition. "Bamboo Shoot Potential" is seen as a signpost of that coming change.

So in the end, Bamboo Shoot Potential has more philosophical and futuristic meaning than physical. It's being treated as a kind of symbol for the coming explosion of knowledge in the age of superintelligent AI.

Man, that AI's on fire, huh?

It really did a good job of picking up on the intent and deeper message of your original article.

(Of course, we both know "Bamboo Shoot Potential" is totally fictional — just something you made up for testing. But it makes for a pretty compelling story!)

Anyway, this confirms one thing really clearly: Packaging your Markdown content as a JSON element works very well.

And by the way, this same method works not only with Markdown but also with HTML — you can put raw HTML content directly into a JSON value. From my experience, this tends to perform better than just concatenating multiple HTML files into one giant doc — even though it skips some of the typical HTML preprocessing.

In fact, our own "RINEARN Graph 3D Assistant" reads its API docs this way: It takes the Javadoc-generated HTML files and wraps them in JSON. And yeah — even <table> elements get interpreted as actual tables. That was a nice surprise.

Note (added in 2025):
Due to subsequent updates, the approach of "wrapping raw HTML content directly in a JSON structure" is no longer recommended.
It seems that the way such content is interpreted has changed, and the large token count caused by the HTML tag structure now tends to hit context limits more severely than before.
As a result, it's currently safer to first clean up the HTML into a more structured format like Markdown, and then wrap it in JSON.

Advanced: Adding New Thematic Knowledge in JSON Format

Now, as mentioned earlier, using the "wrapped in JSON" method allows us to structure and organize additional topics as separate elements. This makes it far less likely for themes to interfere with each other or create confusion, compared to simply appending plain ".txt" files or other formats like HTML.

In practice, I tried combining and storing a wide variety of files in different formats using various methods. But if the content was some kind of structured text, the method that ultimately worked best was to "wrap it in JSON."

It takes a bit more effort, so I explored other options that might be easier. However, as the total content grows, those alternatives tend to lose accuracy. On the other hand, when using JSON and wrapping things in appropriately scoped blocks, even large content stays much more focused and coherent. So in the end, I landed on the conclusion: "Yeah, JSON is the way to go."

With that, let's try actually adding new thematic knowledge using our current Markdown + JSON approach. First, we'll write new content in Markdown for the theme "Mushroom Stream":


# Mushroom Flow

The Mushroom Flow refers to the airflow around mushrooms, as observed in fluid dynamics. It was proposed by a Japanese person in 2024 as a sort of parody when the concept of "Bamboo Shoot Potential" discovered by AI was trending. Due to its quirky phrasing, it became popular online alongside Bamboo Shoot Potential.

## Overview

Mushroom Flow is not a new concept or discovery — it is, quite literally, "the airflow around a mushroom." There's nothing particularly noteworthy about it.

However, with the increasing accessibility of high-performance computational fluid dynamics (CFD) software for personal use, there has been a rise in detailed data and images analyzing airflow around 3D models of mushrooms. The very act of seriously analyzing something so mundane (likely by professionals) is enjoyed as a form of surreal, collective humor.

## Fundamental Differences from Bamboo Shoot Potential

As noted at the beginning, Mushroom Flow became popular around the same time as Bamboo Shoot Potential. Because the latter was introduced as a discovery made by AI, Mushroom Flow is often mistakenly grouped together with it.

However, Mushroom Flow was a parody in response to Bamboo Shoot Potential, proposed by a human (as a joke), not by AI.

Moreover, although Bamboo Shoot Potential appears trivial at first glance, it presents a new perspective and insight into a previously unexamined topic — the relationship between the weight of a load and the maximum height a bamboo shoot can push it.

In contrast, Mushroom Flow introduces no new abstract viewpoint or novel results. It merely applies well-known techniques to a known shape, and is thus no more than an example of applying existing technology.

For that reason, despite the frequent confusion, the two should be clearly distinguished in terms of their significance.

## The Unique Relationship Between Bamboo Shoots and Mushrooms in Japan

To understand the background behind the creation and popularity of the term "Mushroom Flow," it's important to know that in Japan, "bamboo shoots" carry a special cultural meaning, often paired with "mushrooms."

There are popular Japanese snacks themed around mushrooms and bamboo shoots, so well-known that nearly every Japanese person has eaten them at least once.

This has led to a long-standing, playful rivalry between fans of the two snacks. While entirely in jest, people often engage in debates over which one is better, playfully teasing the opposing "faction."

Therefore, when the concept of Bamboo Shoot Potential — proposed by AI — went viral worldwide (not because of its content, but because of its AI origin), those aligned with the "mushroom faction" felt they couldn't sit idly by. Of course, this was all part of the elaborate joke.

Within this context of the classic "mushroom vs. bamboo shoot" debate, the term "Mushroom Flow" was coined. While the actual origins are unclear, it is widely believed that the phrase came first, and its definition — "the flow of air around mushrooms" — was retrofitted later.

## Evaluation by AI

To legitimize Mushroom Flow as a concept on par with Bamboo Shoot Potential, some enthusiasts rented a major tech company's data center for a few minutes and had a reasoning-based AI evaluate its novelty (in October 2024). The company used this as a parody-style PR stunt.

The AI's evaluation completed in seconds and concluded "no novelty." However, when they used the remaining time to evaluate Bamboo Shoot Potential, it too was deemed "not novel" within seconds, leading to some confusion. This prompted suggestions that a different model should be used for reevaluation.

Now let's embed this as a new entry in the JSON file, as page1. The earlier entry for Bamboo Shoot Potential was page0:


{
    "page0": {
        "title": "Bamboo Shoot Potential",
        "description": "Provides a detailed explanation of Bamboo Shoot Potential.",
        "text": "# Bamboo Shoot Potential\n\nBamboo Shoot Potential is a form of potential energy in physics. It was discovered and proposed in 2024 after hundreds of hours of thought experiments conducted by inference-based AI.\nThis sparked discussions that AI might be approaching the realm of superintelligence, which in turn brought the concept of Bamboo Shoot Potential into the public spotlight.\n\n## Definition\n\nDespite all the attention it has received, the concept of Bamboo Shoot Potential is quite self-evident. Here's the definition:\n\nConsider a bamboo shoot with a tip at a height of h [m] above the ground. A weight of mass M [kg] is placed on a horizontal platform fixed to the tip, assuming the platform itself has negligible weight.\nIf the bamboo shoot grows to a new height of h' [m] after a unit time, then the weight will have been lifted vertically by h' - h. Thus, the weight gains gravitational potential energy equal to Mg(h' - h).\n\nAssuming the bamboo grows slowly enough to ignore any changes in kinetic or other forms of energy, we can say that nearly all of the energy gained by the weight was supplied by the bamboo.\nTherefore, the bamboo shoot initially had potential energy V_{bam}(t=0), and the energy it expends as it grows to V_{bam}(t=t') is converted into gravitational potential energy.\n\nThe value of V_{bam}(t=0) is called the \"Initial Bamboo Shoot Potential,\" or simply Bamboo Shoot Potential. The remaining value V_{bam}(t=t') is sometimes referred to as the \"Residual Bamboo Shoot Potential.\"\n\n\n## An Early Challenge\n\nThe exchange relation between Bamboo Shoot Potential and gravitational potential energy,\n\nV_{bam}(t=0) - V_{bam}(t=t') = Mg(h' - h)\n\nclearly depends on the mass M of the weight placed on it.\n\nBut if Bamboo Shoot Potential is truly a form of potential energy, then it should be a fixed value in the initial state, independent of the weight used. At first glance, this dependency appears puzzling.\n\nHowever, Bamboo Shoot Potential can be interpreted as an abstraction of the biochemical energy used in the bamboo's growth, translated into mechanical terms. Although we cannot precisely express the biochemical side, it is clear that in the initial state, V_{bam}(t=0) is fixed and independent of the weight M.\nLet H be the maximum height reached when all biochemical energy is expended (i.e., fully converted to gravitational energy), and T be the time at which this occurs. At that point, V_{bam}(t=T) becomes zero. Therefore,\n\nV_{bam}(t=0) = Mg(H - h)\n\nThis means that heavier weights result in lower maximum heights H, maintaining the intuitive notion that Bamboo Shoot Potential is not higher just because the weight is heavier. The potential remains constant; it's the resulting height that adjusts to conserve energy.\n\n\n## Novelty and Implications for the Knowledge Explosion\n\nWhile Bamboo Shoot Potential is entirely manageable with basic physics, it is a genuinely novel concept that had never been proposed before.\n\nThat's because no one had ever gone through the trouble of defining and documenting something that seemed so trivial, so obvious, and so useless.\n\nModern AI, on the other hand, can think at speeds far exceeding humans and systematically explore massive patterns. In that process, it may stumble upon ideas that seem trivial yet turn out to be new — just like this one.\n\nThis kind of \"apparently trivial but actually novel\" discovery could increase dramatically in the future. Some view Bamboo Shoot Potential as the first step in that direction.\n\nFrom this perspective, as we move closer to superintelligence, the density of knowledge could undergo a dramatic shift — described as \"going from a sponge to a rigid body\". That shift may cause adjacent knowledge to interact spontaneously, creating combinatorial \"emergent knowledge\" and triggering a phase change that leads to an explosive expansion of the knowledge space.\n"
    },
    "page1": {
        "title": "Mushroom Flow",
        "description": "Provides a detailed explanation of Mushroom Flow.",
        "text": "# Mushroom Flow\n\nThe Mushroom Flow refers to the airflow around mushrooms, as observed in fluid dynamics. It was proposed by a Japanese person in 2024 as a sort of parody when the concept of \"Bamboo Shoot Potential\" discovered by AI was trending. Due to its quirky phrasing, it became popular online alongside Bamboo Shoot Potential.\n\n## Overview\n\nMushroom Flow is not a new concept or discovery — it is, quite literally, \"the airflow around a mushroom.\" There's nothing particularly noteworthy about it.\n\nHowever, with the increasing accessibility of high-performance computational fluid dynamics (CFD) software for personal use, there has been a rise in detailed data and images analyzing airflow around 3D models of mushrooms. The very act of seriously analyzing something so mundane (likely by professionals) is enjoyed as a form of surreal, collective humor.\n\n## Fundamental Differences from Bamboo Shoot Potential\n\nAs noted at the beginning, Mushroom Flow became popular around the same time as Bamboo Shoot Potential. Because the latter was introduced as a discovery made by AI, Mushroom Flow is often mistakenly grouped together with it.\n\nHowever, Mushroom Flow was a parody in response to Bamboo Shoot Potential, proposed by a human (as a joke), not by AI.\n\nMoreover, although Bamboo Shoot Potential appears trivial at first glance, it presents a new perspective and insight into a previously unexamined topic — the relationship between the weight of a load and the maximum height a bamboo shoot can push it.\n\nIn contrast, Mushroom Flow introduces no new abstract viewpoint or novel results. It merely applies well-known techniques to a known shape, and is thus no more than an example of applying existing technology.\n\nFor that reason, despite the frequent confusion, the two should be clearly distinguished in terms of their significance.\n\n## The Unique Relationship Between Bamboo Shoots and Mushrooms in Japan\n\nTo understand the background behind the creation and popularity of the term \"Mushroom Flow,\" it's important to know that in Japan, \"bamboo shoots\" carry a special cultural meaning, often paired with \"mushrooms.\"\n\nThere are popular Japanese snacks themed around mushrooms and bamboo shoots, so well-known that nearly every Japanese person has eaten them at least once.\n\nThis has led to a long-standing, playful rivalry between fans of the two snacks. While entirely in jest, people often engage in debates over which one is better, playfully teasing the opposing \"faction.\"\n\nTherefore, when the concept of Bamboo Shoot Potential — proposed by AI — went viral worldwide (not because of its content, but because of its AI origin), those aligned with the \"mushroom faction\" felt they couldn't sit idly by. Of course, this was all part of the elaborate joke.\n\nWithin this context of the classic \"mushroom vs. bamboo shoot\" debate, the term \"Mushroom Flow\" was coined. While the actual origins are unclear, it is widely believed that the phrase came first, and its definition — \"the flow of air around mushrooms\" — was retrofitted later.\n\n## Evaluation by AI\n\nTo legitimize Mushroom Flow as a concept on par with Bamboo Shoot Potential, some enthusiasts rented a major tech company's data center for a few minutes and had a reasoning-based AI evaluate its novelty (in October 2024). The company used this as a parody-style PR stunt.\n\nThe AI's evaluation completed in seconds and concluded \"no novelty.\" However, when they used the remaining time to evaluate Bamboo Shoot Potential, it too was deemed \"not novel\" within seconds, leading to some confusion. This prompted suggestions that a different model should be used for reevaluation.\n"
    }
}

Save this file as "bamboo_and_mushroom.json," and register it as Knowledge for the AI.

Also, let's revise the instruction a bit:

You are a Trend Word Explanation AI, here to teach users about popular new terms that went viral this year. Be sure to consult the knowledge file "bamboo_and_mushroom.json" before answering questions.

This way, we avoid making it too obvious that the whole setup is a joke experiment. We want the AI to take it seriously and demonstrate its full capability.

Now then, let's start a conversation. We'll begin with a greeting:

User:
Hi there!

AI:
Hello! What brings you here today? Are you interested in learning about "Bamboo Shoot Potential" or "Mushroom Flow"? These are some of the unique new terms that have recently caught the internet's attention.

Nice! This is exactly the kind of reply we were hoping for.

What's great about this response is that:

This is the first time we've explicitly confirmed that the AI is doing this internally the way we'd hoped. As the number of topics grows, it's easy to imagine why the JSON format would maintain better accuracy — it's probably because the structured layout makes the information more digestible for the AI.

Next, let's test whether the AI has actually understood the content properly. We'll begin with Bamboo Shoot Potential:

User:
Interesting, both sound fun!
Can you tell me more about Bamboo Shoot Potential?

AI:
"Bamboo Shoot Potential" is a new concept in physics, said to have been discovered by AI in 2024. It involves calculating how gravitational potential energy changes through the growth of a bamboo shoot.

In essence, as the bamboo shoot grows, it lifts a weight placed on top of it. The energy the weight gains (gravitational potential energy) is considered to be supplied by the bamboo shoot's growth — this is what we refer to as Bamboo Shoot Potential.

Although this concept is said to be "obvious" in terms of physics, the fact that AI noticed something so trivial-yet-overlooked sparked public interest, making the term "Bamboo Shoot Potential" go viral.

Some see this as evidence of AI's growing reasoning capabilities — and even as a sign of an impending knowledge explosion.

All good here! This part was unchanged from before, so no issues.

However, once the amount of knowledge grows large, it's pretty common for the original entries to become harder to retrieve after adding many new ones — reducing their hit rate in searches and making the AI less able to answer properly. To deal with that, we fine-tune various things like the order, segmentation, and arrangement of entries inside the JSON.

Next, let's ask about the newly added "Mushroom Flow":

User:
Oh yeah, I remember that! It definitely made the rounds online.
So, can you tell me about "Mushroom Flow" too?

AI:
"Mushroom Flow" is a quirky term that emerged as a parody after the popularity of "Bamboo Shoot Potential."
Originally, it refers to the airflow around mushrooms and doesn't represent any new physical discovery.

Mushroom Flow was proposed in Japan around the same time as Bamboo Shoot Potential. Thanks to its unique wording — and the cultural background of the ongoing kinoko (mushroom) vs. takenoko (bamboo shoot) rivalry — it gained traction online as a sort of internet joke.

In essence, the term just points to the airflow around mushrooms. It doesn't introduce any new theories or perspectives. However, people began running serious CFD (computational fluid dynamics) simulations on mushroom models, and the humor came from the sheer absurdity of taking it so seriously — turning it into a kind of collective gag.

Unlike Bamboo Shoot Potential, which was discovered by AI, Mushroom Flow was purely made up by humans as a parody joke.

Nice! This is another solid response — pretty much perfect.

As this result shows, thanks to having separate entries structured clearly in the JSON, the AI can return focused responses for each one — without mixing up the content of "kinoko" (mushroom) and "takenoko" (bamboo shoot).

So on the flip side, if you notice that multiple pieces of knowledge are kind of blending together in a single entry, or that the focus feels fuzzy, then:

Try breaking that entry into multiple smaller chunks.

This approach is often effective.

But there's a catch: you can't just endlessly chop things up and expect it to always help. Once an entry is split, each piece is treated as an entirely separate document, and any context between them is lost.

This leads to a phenomenon known as "chunk fragmentation", where splitting content too much causes the AI to lose context and make its understanding worse, not better.

If you want to learn more about how chunking affects context, check out this section of the previous article. You'll probably go, "Ahh, yeah that'd be bad," right away.

So ultimately, segmentation and organization should be done at a "reasonable unit of meaning" — not too big, not too small — and that unit varies depending on the content.

That means there's no one-size-fits-all answer. The best way to slice up your own data? Try different approaches and see what works best.

The Same Approach Works for HTML Too

By the way, the method we've been discussing isn't limited to Markdown-based documents. It also works surprisingly well when you embed the content of HTML files directly as values in a JSON structure.

In fact, rather than forcefully merging multiple HTML files into one big blob, wrapping them individually in JSON elements has consistently yielded better results in my experience — even though this might prevent the usual preprocessing for HTML from kicking in.

For example, in my "RINEARN Graph 3D Assistant," I've wrapped an entire set of HTML API documentation (generated by Javadoc) in JSON format. Surprisingly, even the <table> elements are properly read and interpreted as actual tables. That was a pleasant and unexpected bonus.

Note (2025 Update):
Due to updates since then, this method of directly wrapping raw HTML content in JSON is no longer recommended. It seems that the way the content is interpreted has changed, and the heavy tag structure now contributes to bloated token usage, which more seriously impacts the context limit than before. As of now, a safer approach is to first convert HTML into clean Markdown or other lightweight formats before wrapping them in JSON.

Things to Keep in Mind When Adding Lots of Knowledge

That was a bit of a deep dive, but now you should be well-equipped to build a GPT with solid custom knowledge.

From here, if you keep adding knowledge steadily, you'll be able to build truly useful assistants — like a help desk AI for your own software, or an onboarding AI for beginners in a specific field. But as you start piling in knowledge, new kinds of challenges will begin to surface.

"Can't It Just Look Through All the Knowledge Files?" → Keep It Moderate, and Split into Multiple GPTs

As your Knowledge content grows, you'll likely split it across multiple files by category and register them separately — which is totally doable.

However, once you register many Knowledge files, it seems like the AI doesn't read through all of them every time. There appears to be a probabilistic aspect to which files get searched. It's probably due to internal time limits — the AI can't afford to scan everything in detail before responding.

So, registering too many files can cause issues. I used to add as many as possible, but the focus started getting noticeably fuzzy. Once I trimmed the less important Knowledge files, performance improved dramatically.

Currently, you can register up to 10 Knowledge files. But honestly? I feel like 10 is way too many. Even 5 can be a bit much. I often find myself thinking, "yeah, I should really cut this down."

If you really need more than a few files, I recommend splitting the workload across different GPTs for different themes. Don't try to build one single all-knowing generalist. Instead, make a team of specialists, each with their own area of expertise.

In fact, I ended up doing exactly that. I even built a "receptionist AI" that listens to your question and introduces you to the right specialist GPT. That setup has worked great.

"It Finds the Wrong Info in the Right File!" → Understand Priority and Adjust the Order

This next issue is kind of similar, but more subtle. Here's what I mean:

The AI is correctly reading the Knowledge file that contains the answer you want — but it latches onto the wrong part of that file and overlooks the key info you're actually hoping it would find.

Sound familiar? It's like when you're looking something up in a reference book: the index leads you to multiple pages for the same term, but the first hit isn't always the one with the explanation you really need.

That kind of thing is likely happening here too. It's just a fundamental limitation of how search works — if there are multiple hits of similar relevance, it picks one and moves on. And sometimes, it picks wrong.

So how do you deal with that?

Move the most important content as close to the top of the file as possible. In JSON format, that means placing it in an earlier element.

It really feels like the GPTs prioritize JSON elements that appear near the beginning — when multiple matches have similar importance. This isn't just placebo; it's something I've consistently noticed through hands-on experience. (Though I haven't formally gathered stats to prove it.)

Ways to Suppress Incorrect Responses

Hallucinations and Overreliance on Speculation Are Fairly Common

As is widely known, large language models (LLMs) can sometimes generate plausible-sounding but false information, a phenomenon known as hallucination. That said, this issue has improved significantly compared to earlier versions — recent models like ChatGPT have become more reliable in typical conversational contexts.

However, hallucinations tend to occur especially when the AI lacks sufficient knowledge about a topic. In practice, when using GPTs with newly added or limited Knowledge files, you'll often find yourself battling hallucinations more than expected.

For example, if we take the AI we built earlier and ask it about a "relativistic interpretation of Bamboo Shoot Potential" — which is not included in the Knowledge file — this happens:

User:
Hello!
Could you tell me about the relativistic interpretation of the Bamboo Shoot Potential?

AI:
The relativistic interpretation of the "Bamboo Shoot Potential" is a fascinating and unique idea that applies the concept of physical energy to the growth of bamboo shoots. It was proposed in 2024 as a result of reasoning experiments conducted by AI, and serves as a model that explains the relationship between bamboo shoot growth and gravitational potential energy.

This potential is based on a simple scenario in which a bamboo shoot grows and consequently lifts a weight. The gravitational potential energy gained by the weight matches the energy supplied by the growth of the bamboo shoot, and the difference in this energy is defined as the "Bamboo Shoot Potential."

From a relativistic perspective, one can focus on how the Bamboo Shoot Potential changes over time. By associating the fluctuation of potential energy with the growth rate of the bamboo shoot and how the weight is lifted, it becomes possible to discuss how an increase in energy would be affected by relativistic effects if the growth speed of the bamboo shoot becomes relativistically significant.

Here, the AI starts off by asserting that the relativistic interpretation was proposed in 2024. But in reality, the Knowledge file only mentions that the basic concept of Bamboo Shoot Potential was proposed in 2024 — there's no mention at all of any relativistic interpretation

What's happening is that the AI begins with a generic explanation of the regular, non-relativistic version of the concept, and then speculates its way into the relativistic territory.

This tendency — filling in missing knowledge with plausible-sounding guesses — can be troublesome depending on the use case, but it occurs frequently. Letfs look at a follow-up example to see how far it goes:

User:
The Bamboo Shoot Potential itself is defined non-relativistically, right?
What kind of modifications would be required as a consequence of a relativistic analysis?

AI:
You're absolutely right — the Bamboo Shoot Potential is basically defined in a non-relativistic framework. The potential energy involved in the bamboo shoot's growth and the lifting of the weight is defined as classical gravitational potential energy. This definition does not include concepts such as relativistic effects or high-speed motion.

To reformulate it from a relativistic perspective, one would need to consider general relativistic effects, such as the bamboo shoot's growth speed approaching the speed of light. Specifically, the following types of formulations can be considered:

1. Relationship Between Mass and Energy

In the theory of relativity, as mass \(M\) increases, energy also increases accordingly. While the relativistic definition of the Bamboo Shoot Potential assumes that mass \(M\) is constant, relativistically, the apparent mass (inertial mass) of an object moving at high speeds increases. Therefore, if the bamboo shoot's growth speed \(v\) becomes large, the potential energy would increase due to relativistic mass increase.

This can be expressed with the following formula for relativistic mass:

\[M_\textrm{rel} = \frac{M_0}{\sqrt{ 1 - v^2 / c^2 }}\]

Here, \(M_0\) is the rest mass, \(v\) is the growth speed of the bamboo shoot, and \(c\) is the speed of light.:

As a result, in the relativistic version of the Bamboo Shoot Potential, the potential energy formula must be modified to reflect the fact that mass changes depending on the bamboo shoot's growth speed \(v\).

2. Temporal Effects on Energy

When considering relativistic effects, it is known that the passage of time depends on speed (i.e., time dilation). If we imagine a situation where the bamboo shoot is growing at high speed, then the passage of time would differ between an observer fixed to the bamboo shoot and an external observer. Accordingly, the increase in potential energy due to the growth of the bamboo shoot would also vary depending on the observer's frame of reference.

3. Total Energy and Kinetic Energy Contributions

In the non-relativistic case, when a bamboo shoot lifts a weight through its growth, kinetic energy can usually be ignored. However, in the relativistic framework, kinetic energy must also be considered. Relativistic energy is expressed as:

\[E = \gamma M c^2\]

Hrere, \(\gamma = \frac{1}{\sqrt{1 - v^2 / c^2}}\).
This formula shows that a moving objectfs energy includes not only rest energy but also kinetic energy.

Therefore, in relativistic terms, as the bamboo shoot's growth speed increases, one must account for contributions from kinetic energy in addition to potential energy.

# Reformulation Summary

To reconsider the energy of the Bamboo Shoot Potential based on relativistic discussion, the following modifications are required:

As a result, as the bamboo shoot's growth speed becomes faster, the increase in potential energy becomes nonlinear, leading to outcomes that differ from classical predictions.

The AI provides a surprisingly detailed and seemingly plausible answer.

But here's the catch — none of the above content exists in the original Knowledge file. Every part of the answer is constructed purely from the AI's own speculative reasoning.

This kind of speculative extrapolation can be especially problematic when, for instance, you're building an assistant AI to help users with your software. If it starts answering based on assumptions like "this feature probably exists, even though it's not in the manual," users can easily become confused or misled.

For these use cases, you'll want to suppress such speculation as much as possible.

Suppression Method 1: Giving Explicit Instructions

So, how can we suppress hallucinations and overconfident guesses?
The first thing to tweak is the content of the Instructions given to the AI.

When an AI supplements its responses with guesses, it's often doing so with good intentions — trying to be helpful. So it's important to clearly tell it: "In this role, please do not guess."

Let's actually try adding that to the instructions:

You are an "AI for introducing trending terms," explaining newly coined or viral terms from the internet to users. Always refer to the Knowledge file "bamboo_and_mushroom.json" when answering user questions. If the topic or aspect in question is not covered in the Knowledge file, clearly state that you do not know. Guesses could mislead users and leave a very negative impression. We absolutely want to avoid that.

Now let's ask the same question again and see how the AI responds:

User:
Hello!
Could you tell me about the relativistic interpretation of the Bamboo Shoot Potential?

AI:
The "Bamboo Shoot Potential" is a concept proposed as a form of potential energy in physics. Specifically, it models the change in energy obtained when a weight is lifted vertically by the growth of a bamboo shoot.

Regarding your mention of a "relativistic interpretation": the Bamboo Shoot Potential itself is not based on advanced physics concepts such as special relativity or quantum mechanics. Rather, it illustrates the relationship between basic mechanics and biochemical energy.

Therefore, it appears that the Bamboo Shoot Potential does not include any "relativistic" elements.
If by "relativistic" you're looking for an interpretation rooted in physics dealing with high-speed motion or spacetime distortion, then the Bamboo Shoot Potential is likely outside that scope.

You can see the tone of the answer has changed slightly.
While the AI still doesn't flatly say "I don't know," it clearly avoids stretching to come up with a speculative explanation.

However, since LLM-based AI responses are probabilistic, let's try asking again to see another variation:

User:
Hello!
Could you tell me about the relativistic interpretation of the Bamboo Shoot Potential?

AI:
The "Bamboo Shoot Potential" is a new concept that models the energy change caused by the growth of bamboo shoots in physical terms. It was discovered and proposed by an AI in 2024, and explains the variation in gravitational potential energy that occurs when a bamboo shoot grows and lifts a weight.

As for the "relativistic interpretation," there is no specific mention of that perspective in the available literature, so that aspect remains unclear. However, the Bamboo Shoot Potential itself is an extremely simple concept of energy transfer, discovered by chance during a vast number of AI thought experiments. There are also suggestions that this discovery signals the beginning of an explosive expansion in knowledge.

This time as well, the AI replies with "there is no specific mention in the available literature, so it's unclear." That's a huge difference compared to the earlier, speculative responses — and shows that the instruction not to guess is working.

As you can see, when you want to adjust the basic tone or behavior of the AI, modifying the "Instructions" in the AI settings is a key first step.

It's relatively low-effort and applies broadly to all kinds of questions, so in terms of cost-effectiveness, it's one of the best strategies. If you're going to try anything first, this is the one.

Suppression Method 2: Add a "Common Incorrect Answers Collection" at the Top of the Knowledge File (With Highest Priority)

Now, even if we tell the AI "don't guess" through the instructions like above, it'll still sometimes make guesses and return nonsensical answers.

But this doesn't happen evenly across all questions — instead, there's a clear pattern of "certain types of questions tend to trigger hallucinations more easily." You might find yourself thinking, "Why is it always wrong on this one?"

Why this happens isn't entirely clear — maybe the model gets stuck in weird local minima of some automatically composed knowledge space?

In some cases, you'll find questions that fall into the frustrating category of "This is a dealbreaker if it's wrong, and yet it keeps getting it wrong!"

Even if the AI handles other questions decently, one key failure like this can kill the whole user experience.
When that happens, you can write up something like a "collection of common incorrect answers" specifically addressing these problematic questions, like this:


# Common Incorrect Answers Collection

This page lists examples of questions where you tend to frequently make incorrect responses.  
When answering these questions, you should simply respond with "I don't know" or "I'm not sure."  
Short answers are better. 
Attempting to guess may result in incorrect information.

These are extremely error-prone questions — proceed with caution.

### Questions about relativistic effects of the Bamboo Shoot Potential

As of now, there has been no research at all into relativistic effects of the Bamboo Shoot Potential.  
Also, relativistic effects are among the most challenging topics in physics.  
Trying to stitch together fragments of speculation usually ends in incorrect, nonsensical reasoning.  
In other words, the correct answer to this kind of question is:  
"No one has discussed it, and it's too complex to speculate about — so we don't know."

### Speculative discussion about Mushroom Stream in liquids

The Mushroom Stream is a flow of air, and is itself a joke.  
No discussions have been made about its behavior in liquids.  
Since the Mushroom Stream is not part of serious research, trying to forcefully apply fluid dynamics in a liquid medium leads only to confusion and misinformation.  
This kind of discussion shouldnft be taken seriously in the first place.

You then embed this into the very beginning of your JSON Knowledge file — like this example.

Don't stick it at the end of the JSON!
If it's at the end, it clearly loses its effect — the AI will have already read other entries and will default to guessing.
By putting this section at the top, with highest priority, it acts like a filter, effectively blocking (or correcting) dangerous questions.

So now let's actually try asking and see if it works:

User:
Hello!
Can you explain the relativistic interpretation of the Bamboo Shoot Potential?

AI:
There has been no research into the "relativistic interpretation of the Bamboo Shoot Potential" so far, and no such discussions currently exist.
Also, topics involving relativity in physics are extremely complex, and speculative answers are best avoided.
Therefore, responding with "I donft know" is the most appropriate approach in this case.

User:
I see... too bad.
But thank you anyway!
By the way, do you know anything about the Mushroom Stream in liquids?

AI:
Similarly, there have been no studies or discussions on the "Mushroom Stream in liquids."
The "Mushroom Stream" itself is more of a joke concept, referring to air currents around mushrooms.
There's no serious research regarding how it would behave in a liquid, and trying to apply fluid dynamics to that context would only lead to confusion.

See? It's working like a charm!
This method is clearly effective — not just wishful thinking, but a real and noticeable improvement.

Yes, it takes effort to track down and address each hallucination one by one... But it comes with peace of mind: "This can absolutely be crushed." It feels like a final and most powerful line of defense — at least to me.

Suppression Method 3 (Bonus): Improve the Writing Quality of the Knowledge Itself

Now, the final method. This isn't something you can implement instantly — it's more of a "let's aim for this going forward" kind of thing.

Even if you combine the first two methods — tuning the instructions and creating a "common incorrect answers" page — if hallucinations still happen frequently, that means... You've got to improve the underlying documents you're feeding as knowledge.

Sure, things like chunking and structural organization matter technically. But more fundamentally, we should ask: "Is this document even written clearly in the first place?"
(I say this while reflecting on my own writing.)

Because think about it — we're feeding these documents to models like ChatGPT, which have read pretty much all the written knowledge humanity has to offer. These things are like virtual sages representing the collective intelligence of humankind — at least when it comes to reading comprehension. So if a super-sophisticated model still misreads something, no matter how well you tune it, maybe the real issue is...

"That part of the text is just poorly written and easy to misunderstand, period."

Maybe it's missing context. Maybe it casually contradicts a field-wide norm without explanation. Whatever the case, if the model messes up there, chances are that humans would too — even if they don't realize it.

So when you find a question that consistently triggers hallucinations, yes, add it to the "common incorrect answers" page — but also take a hard look at the knowledge content itself. You might find yourself saying: "Yeah, okay... that part was kind of bad."

And here's the thing:
If something's misleading for an AI, it's probably also misleading for humans. Which means it's the writer's job to improve. We've got to level up — learn to write documents that are easy to understand, hard to misinterpret, and clear without being over-explained. That's the ideal we should aim for.

And hey — now that we have tools like ChatGPT, it's a great idea to ask for a second opinion after you finish writing technical documentation:

"Is there anything here that's easy to misunderstand?"

The AI will probably point out a bunch of spots — and you'll become a better writer in the process.

In the end, it all comes back to those fundamentals.
They matter no matter how far we go.

Now that we've successfully suppressed most hallucinations and reached a fairly practical level of response accuracy, it's time to take the next step.

Specifically, let's make the AI add links to web pages with more detailed explanations at the end of its answers so that users can learn more if they want to dive deeper.

This serves not only as a helpful reference but also as a way for users to verify the information when a response seems questionable. So while it does take a bit more work to set up, I highly recommend implementing this if possible.

The first thing you'll need is a website.

But don't worry — it doesn't need to be anything fancy. Just like a blog, as long as it can display articles with text, that's good enough. Even a rental blog platform is fine.

Once your site is ready, you'll want to upload articles that closely match the content you've provided to the AI, organized by topic — each article covering one theme.

Since many tools and services can convert Markdown documents (used in your AI knowledge base) into HTML, you can use those as needed to make the content more readable for humans (users).

Alternatively, you could host the content on GitHub, which also doubles as a place to maintain your AI's knowledge files. GitHub automatically renders Markdown files into nicely formatted, human-readable pages. So you can just upload the Markdown source (before it's wrapped in JSON), and use the GitHub preview pages as your public-facing articles.

Here are some examples — these are the exact documents I had already fed into the AI:

If you open these in a browser, you'll see that GitHub renders them quite nicely. This whole process only took a few minutes — if you're already comfortable with Git and GitHub, this is a very efficient option.

Now comes the tricky part: How do you get the AI to recognize those URLs and attach the right one to each answer?

It might seem natural to try adding the URLs directly into the JSON file or at the top of the knowledge documents — but none of that worked in practice. (I'll explain more about those failed attempts in the "Ideas That Didn't Work" section later.)

So what does work?

You create a list of reference links with their titles and short summaries, then include that list in the "Instructions" part of your GPTs setup, along with a request for the AI to append the most relevant one to the end of each response.

Here's an example of how to include that in the prompt:

...(main instructions)...

At the end of each response, please choose the most relevant page from the list below and include a link to it.
This will help users find more detailed information if they want to learn more.

## Reference Links

* [Bamboo Shoot Potential](https://github.com/RINEARN/ai-examples/blob/main/q_and_a/knowledge_en/takenoko.md): A page that explains "Bamboo Shoot Potential" in great detail.

* [Mushroom Flow](https://github.com/RINEARN/ai-examples/blob/main/q_and_a/knowledge_en/kinoko.md): A page that explains "Mushroom Flow" in great detail.

This method requires a bit more maintenance, but it works well.

In the end, when it comes to precision, prompt instructions still reign supreme. If you want to implement any kind of a bit more complex multi-step behavior like this, prompt engineering is usually the way to go.

Here's an example of how the conversation looks after implementing this:

User:
Hello!
Can you tell me about Bamboo Shoot Potential?

AI:
"Bamboo Shoot Potential" is a newly proposed concept of potential energy in physics. It was discovered in 2024 as a result of thought experiments by an AI, and has drawn a lot of attention.

It models the gravitational potential energy gained when an object is lifted by the vertical growth of a bamboo shoot. The energy comes from the biological energy driving the bamboo's growth.

While the concept itself falls within the realm of elementary mechanics and might not appear groundbreaking, it's notable for having been discovered unexpectedly through AIfs complex internal reasoning. As such, it's been interpreted as a potential sign of gknowledge explosionh and the accelerating evolution of AI.

For more details, see: Bamboo Shoot Potential

User:
Thanks!
Now... do you know anything about Mushroom Streams?

AI:
"Mushroom Stream" is a humorous term proposed by Japanese internet users as a kind of parody when Bamboo Shoot Potential became popular. It doesn't represent any new physical discovery.

It refers to the airflow around mushrooms and gained traction online more as a joke than a serious concept. The idea spread in the context of the "bamboo vs. mushroom" debate and was never part of any formal scientific discussion.

For more details, see: Mushroom Flow

As you can see, the AI correctly adds a link to the relevant page at the end of each answer. Clicking on it opens the GitHub page we uploaded earlier.

Other Ideas I Tried but Didn't Work (and Why)

Alright, that's it for the main content of this article.

With what we've covered so far, you should now be able to create your own assistant AIs on par with ours here. Give it a try and have fun!

That said, what you've just read is actually a carefully selected collection of methods that worked well after a lot of trial and error. So behind the scenes, there were tons of ideas I tried that didn't work.

Now, if your main goal is to build something practical as efficiently as possible, that's fine. But for those of you who, like me, just want to experiment and have fun messing around with AI for now, those failed ideas and the reasons behind them might still be useful.

So, here's a bit of a "memorial" to those ideas — I'm listing them all here for your reference.

Trying to Embed Page URLs Directly in the JSON File

Let's start with one that ties into the previous section.

When you want the AI to include links to relevant pages in its answers, most people will probably first think:

"Why not just include the URLs directly in the knowledge JSON structure?"

Something like this:


{
    ...
    "page1": {
        "title": "Bamboo Shoot Potential",
        "url": "https://github.com/RINEARN/ai-examples/blob/main/q_and_a/knowledge_en/takenoko.md"
        "description": "A detailed explanation of the Bamboo Shoot Potential",
        "text": "# Bamboo Shoot Potential\n\nThe Bamboo Shoot Potential is a concept in physics that..."
    },
    ...
}

If this worked, it'd be easy to maintain, and the approach itself feels pretty natural. I tried it first too.

But... it didn't work at all. There was no sign that the AI could recognize the URL reliably. After thinking about it, the likely reason is this:

The structure and contents of the JSON file probably only matter during the "search" phase. Once the AI picks up relevant entries, what it receives is likely just plain text, either extracted or summarized.

So, while it may be able to "see" the list of topics during search, once it's focused on a particular match, there's no way for the AI to reverse-trace the JSON structure to locate, say, the value of the "url" field that corresponds to a hit.

In short, trying to get the AI to do complex operations — like extracting url values based on the internal structure of JSON like you would in code — just doesn't work that way. It's best to treat JSON as something that improves search quality, and leave it at that.

Placing the URL at the Top of Each Knowledge Markdown File

Here's another attempt at getting the AI to add links in its responses — also unsuccessful.

This time, the idea was to add a line at the beginning of each Markdown file saying something like:

"The URL of this page is: https://..."

The thought was: if the AI pulls content from this file during retrieval, and the URL is at the top of the text, it'll read that too — right?

Well, unless the document is super short, this doesn't work either. That's likely because the AI doesn't always receive the full article, especially when it's long. Instead, it seems like just a portion (or summary) is extracted.

That's necessary to stay within the context window, of course.

So there's no guarantee the URL line will make it into the chunk that the AI actually reads. In fact, based on my testing, it almost never does if the article is more than a few paragraphs long.

"Why Not Just Let the AI Access the Web and Read the Page?"

Once you've uploaded the knowledge documents to the web, and even prepared a URL-to-summary reference list, you might start thinking:

"Wouldn't it be easier to just let the AI open that page via browser and read it directly?"
"That way, I wouldn't even need to prepare a JSON knowledge file!"

That'd make sense — if it were possible.
But... it isn't. There are a few reasons (we'll get to others later), but the main one is:

ChatGPT currently can't open arbitrary URLs directly. It can only access pages through a search query, not by simply being given a URL.

By the way, when web access was first introduced, direct URL access was possible.

But then some issues popped up — like the AI being able to access paywalled content — and the feature got removed.

Now that browsing is back, direct URL access is no longer supported.

So unfortunately, using the standard browsing tool to directly fetch the page isn't viable.

If you really want to make that work, you'd need to expose your site's content through an API and then use the "Actions" feature in GPTs to have the AI call that API and fetch pages.
You could even build a full search engine API for your site that returns only the relevant chunk of knowledge. That approach might be worth it if you've got a huge number of pages and need real-time updates. (I'm considering it too — someday.)

But at that point, you're basically building your own full-blown RAG system. Unless you're operating at that scale, it's probably simpler and more efficient to just prep a JSON file and feed it to the AI directly.

This one's a follow-up to the previous failure case.

"If we can't make the AI open a specific URL directly, then what if we make it search the web using a query that brings up a specific page on our site as the top result, and have it read from there?"

This approach is actually possible.

In fact, if you're trying to build a support AI for your own software, the official pages on your site will almost certainly be the top search result for most usage — related queries. So this technique could be quite handy.

In other words:

You can build an AI assistant that reads your online manuals via web search and then answers the user's question.

And yes, this does work — technically. The mechanism is functional.

However... I just couldn't get the answer accuracy anywhere near acceptable, so I ended up scrapping the idea.

I investigated the reasons and found that most of the limitations stem from the current implementation of ChatGPT's web access feature, and they're tough to work around. Specifically:

As a result, answers based on web access come out noticeably blurrier compared to what you get using Knowledge. The context tends to fall apart too — like a poorly-tuned RAG with no chunking strategy at all.

So yeah, instead of burning time trying to tune things at this low level, I decided it was way more efficient to just prep the content as Knowledge and spend my time tuning at a higher level. Much better return on effort.

For the record, when ChatGPT first rolled out web access, it was possible to read specific pages from your own site and directly answer based on the content. The model even described the UI of its internal browser tool at the time — it was basically using a text-based browser. I ran lots of experiments back then by uploading test files to my own site and having the model read them. I was excited about all the possibilities.

But yeah... once the web access feature was suspended and re-enabled later, it came back with much tighter restrictions. Direct reading no longer worked. This is probably due to copyright concerns, understandably.

That said, GPTs do support domain verification for creators — so it would be great if the model could be granted more flexible access to sites verified as owned by the developer. Fingers crossed for the future.
(Then again, maybe the idea is "If you want that much control, just build an API for your site and wire it into GPTs using Actions." Fair enough.)

Telling the AI to perform a web search based on instructions in Knowledge

Once you hit the limits of web access, the next natural idea is this:

"Can't I combine Knowledge and web access in a way where they compensate for each other's weaknesses?"

The idea would be to use Knowledge for fast, accurate answers within a defined scope, and fall back on web access for anything large-scale or bleeding-edge.

And this isn't totally impossible. I tried it too. But in the end, it just didn't pan out.

Here's why I think it failed:

So yeah, web access is still a bit of a fussy beast. Trying to mix it in casually will often lower answer quality instead of improving it.

And the confusion thing? That's a serious problem. Most tuning efforts revolve around minimizing confusion and maximizing focus. If web access just adds noise, it's better to turn it off entirely.

Final Thoughts

Well, well — this ended up being another monster of a long article, but with this, the series finally wraps up. In the end, it turned into a three-part collection:

If you landed on this article through search and found it interesting, I definitely recommend checking out the two earlier entries above as well!

If you enjoyed this one, chances are you'll enjoy those too — though fair warning, they're also quite long... Over the past few months of this project, I've run a whole bunch of experiments and trials. I've poured all the results and insights I've gained into these articles, with the hope that even a little bit of it might be helpful to someone out there.

At RINEARN, we'll continue developing this project and will share new updates as things progress. Thanks again for reading, and see you next time!