Policy on allowing GPTBot web crawler to index the site. (1 Viewer)

Block the GPT web crawler?

  • Yes

    Votes: 4 36.4%
  • No

    Votes: 5 45.5%
  • As an AI, I don't have an opinion on this, but also, No.

    Votes: 2 18.2%

  • Total voters
    11

pete

chronic procrastinator
Staff member
Since 1999
Joined
Nov 14, 1999
Messages
66,433
Solutions
3
Location
iPanopticon
Website
thumped.com
I have a thing installed that detects and new web crawler bots indexing the content here:


Potential New Bots Detected
  • mozilla/5.0 applewebkit/537.36 (khtml, like gecko; compatible; gptbot/1.0; +OpenAI Platform)

which is this lad:



Web pages crawled with the GPTBot user agent may potentially be used to improve future models and are filtered to remove sources that require paywall access, are known to gather personally identifiable information (PII), or have text that violates our policies. Allowing GPTBot to access your site can help AI models become more accurate and improve their general capabilities and safety. Below, we also share how to disallow GPTBot from accessing your site.

Should I block it?
 
Is @GPThumped the external bot you refer to? I would have assumed that was internal.

I'd be against anything external indexing, but no issues with something internal.
@GPThumped is an internal impelentation of chat.OpenAI.com - it hits the OpenAI API for gpt4.

what I’m talking about here is the web crawler that they’ll be using to train their future LLMs.
 
Yes, but also consider the possibilities

To view this content we will need your consent to set third party cookies.
For more detailed information, see our cookies page.
 

Users who are viewing this thread

Warning! This thread is more than 2 years ago old.
It's likely that no further discussion is required, in which case we recommend starting a new thread. If however you feel your response is required you can still do so.
Back
Top