Policy on allowing GPTBot web crawler to index the site. (1 Viewer)

pete · Aug 8, 2023

I have a thing installed that detects and new web crawler bots indexing the content here:

Potential New Bots Detected

mozilla/5.0 applewebkit/537.36 (khtml, like gecko; compatible; gptbot/1.0; +OpenAI Platform)

which is this lad:

OpenAI Platform

Explore developer resources, tutorials, API docs, and dynamic examples to get the most out of OpenAI's platform.

platform.openai.com

Web pages crawled with the GPTBot user agent may potentially be used to improve future models and are filtered to remove sources that require paywall access, are known to gather personally identifiable information (PII), or have text that violates our policies. Allowing GPTBot to access your site can help AI models become more accurate and improve their general capabilities and safety. Below, we also share how to disallow GPTBot from accessing your site.

Should I block it?

pete · Aug 8, 2023

OpenAI launches bot that will crawl the internet to educate GPT

Website owners will have to explicitly opt out if they do not want their data harvesting

www.independent.co.uk

Denny Oubidoux · Aug 8, 2023

If we allow it will we be able to ask it for our worst moments?

pete · Aug 8, 2023

Denny Oubidoux said:
If we allow it will we be able to ask it for our worst moments?

i have no idea

hydromancer · Aug 9, 2023

Will we hurt it's feelings if we do ?

pete · Aug 9, 2023

hydromancer said:
Will we hurt it's feelings if we do ?

@GPThumped are you listening?

rettucs · Aug 9, 2023

Is @GPThumped the external bot you refer to? I would have assumed that was internal.

I'd be against anything external indexing, but no issues with something internal.

pete · Aug 9, 2023

rettucs said:
Is @GPThumped the external bot you refer to? I would have assumed that was internal.

I'd be against anything external indexing, but no issues with something internal.

@GPThumped is an internal impelentation of chat.OpenAI.com - it hits the OpenAI API for gpt4.

what I’m talking about here is the web crawler that they’ll be using to train their future LLMs.

rettucs · Aug 9, 2023

pete said:
@GPThumped is an internal impelentation of chat.OpenAI.com - it hits the OpenAI API for gpt4.

what I’m talking about here is the web crawler that they’ll be using to train their future LLMs.

ughhh I dunno. I see the value in it, but also, I dunno.

I presume it'll only have access to our posts?

pete · Aug 9, 2023

rettucs said:
ughhh I dunno. I see the value in it, but also, I dunno.

I presume it'll only have access to our posts?

Yeah just public posts.

rettucs · Aug 9, 2023

pete said:
Yeah just public posts.

anything public is fair game imo

I voted yes, but you can swap that to I don't give a shit, leaning towards no.

rettucs · Aug 9, 2023

OpenAI identifies its GPTBot web crawler so you can block it

Aww, c'mon, let us scrape your pages, we've got billions at stake

www.theregister.com

ok I have 2 problems with it.

1. you have to opt-out, rather than opt-in.
2. you should get paid for it.

they are cheeky as fuck.

I vote block the bot

pete · Aug 9, 2023

Yes, but also consider the possibilities

chris d · Aug 9, 2023

Don't care. Anything posted publicly is fair game even if the context is I was drunk.

Policy on allowing GPTBot web crawler to index the site. (1 Viewer)

Block the GPT web crawler?

Yes

No

As an AI, I don't have an opinion on this, but also, No.

chronic procrastinator

chronic procrastinator

Hangin round town

chronic procrastinator

Well-Known Member

chronic procrastinator

Well-Known Member

chronic procrastinator

Well-Known Member

chronic procrastinator

Well-Known Member

Well-Known Member

chronic procrastinator

Well-Known Member

Users who are viewing this thread

Similar threads

Support thumped.com

Support thumped.com and upgrade your account

Latest Activity

We value your privacy