Reply to this post | Go Back
View Post [edit]

Poster: Nemo_bis Date: Jul 7, 2014 10:37am

Forum: faqs Subject: Retroactive robots.txt removal of past crawls AKA Oakland Archive Policy

Reply to this post
Reply [edit]

Poster: metaeducation Date: Mar 24, 2016 11:23am

Forum: faqs Subject: Re: Retroactive robots.txt removal of past crawls AKA Oakland Archive Policy

> When you ask more access, have you first asked
> yourself if *you* would pay for additional legal costs
> it may happen to cause?

There are various entities I'd hope would be willing to get in the fight if someone were to sue (the EFF, to name one).

Either way, it would seem there should be a way to irrevocably greenlight the Internet Archive on content. A license on the content can already do this.

For instance a Creative Commons license: if my blog is entirely CC-BY-SA content, then shouldn't the archive be able to keep it up regardless of some hypothetical later state of robots.txt? There could be something more selective, a "Internet Archive License", so even otherwise copyrighted sites could greenlight the archive having a copy.

If it has to be an opt-in process, then that's unfortunate. But I'd certainly prefer to be able to "opt-in to future domain squatters not being able to erase my existence" over having no choice at all...

Reply to this post
Reply [edit]

Poster: Hasan jonas Date: Jun 27, 2022 10:05pm

Forum: faqs Subject: Re: Retroactive robots.txt removal of past crawls AKA Oakland Archive Policy

thanks

Reply to this post
Reply [edit]

Poster: Hasan jonas Date: Jun 27, 2022 10:06pm

Forum: faqs Subject: Re: Retroactive robots.txt removal of past crawls AKA Oakland Archive Policy

http://beedatimes.com/ and http://taotoearthpmpubs.com/
This post was modified by Hasan jonas on 2022-06-28 05:06:46

Reply to this post
Reply [edit]

Poster: Rasel Jonas Date: Jul 11, 2022 8:35am

Forum: faqs Subject: Re: Retroactive robots.txt removal of past crawls AKA Oakland Archive Policy

http://pixelseoservices.info/ecommerce-seo/ PIXELSEO SEO

Reply to this post
Reply [edit]

Poster: Rasel Jonas Date: Jul 11, 2022 8:32am

Forum: faqs Subject: Re: Retroactive robots.txt removal of past crawls AKA Oakland Archive Policy

SEO and link building service company agency PIXELSEO
http://pixelseoservices.info/ SEO and link building service company agency PIXELSEO
http://pixelseoservices.info/seo-service/ SEO and link building service company agency PIXELSEO

Reply to this post
Reply [edit]

Poster: asad johns Date: Sep 7, 2022 12:47am

Forum: faqs Subject: Re: Retroactive robots.txt removal of past crawls AKA Oakland Archive Policy

windows 10 download here http://downloadfreewindowssoftware.com/ windows 10 download here

Reply to this post
Reply [edit]

Poster: asad johns Date: Sep 7, 2022 12:48am

Forum: faqs Subject: Re: Retroactive robots.txt removal of past crawls AKA Oakland Archive Policy

windows 10 download here https://download-windowsupdate.net windows 10 download here

Reply to this post
Reply [edit]

Poster: Hjulle Date: Mar 4, 2015 12:50am

Forum: faqs Subject: Re: Retroactive robots.txt removal of past crawls AKA Oakland Archive Policy

None of the documents you refer to says that a new owner should be allowed to remove the old owners content from the Internet Archive. Allowing that does not make any sense, but it's still the way it works right now.

This will also become a growing problem, as more and more webmasters die (or otherwise become unable to pay for their domain). If the domain switches owner, the new owner should not have any power over the old owners content.

Reply to this post
Reply [edit]

Poster: Hjulle Date: Mar 4, 2015 12:54am

Forum: faqs Subject: Re: Retroactive robots.txt removal of past crawls AKA Oakland Archive Policy

This page http://www2.sims.berkeley.edu/research/conferences/aps/removal-policy.html doesn't even say that the bot obays "User-agent: *" (it requires "User-agent: ia_archiver"). So that is a second way in which it is more restrictive than the Oakland Archive Policy requires.

A reasonable compromise would be to make "User-agent: *" only affect the current version, and make "User-agent: ia_archiver" retroactive. That way, you wouldn't remove history by mistake, but you could still remove it just as easily and you wouldn't have to change any of the policy documents.

Also note that "The Robot Exclusion Standard does not mention anything about the "*" character in the Disallow: statement. " - https://en.wikipedia.org/wiki/Robots_exclusion_standard#Universal_.22.2A.22_match

Reply to this post
Reply [edit]

Poster: Nemo_bis Date: Mar 4, 2015 1:33am

Forum: faqs Subject: Re: Retroactive robots.txt removal of past crawls AKA Oakland Archive Policy

Your interpretation that "*" does not (or should not) imply "ia_archiver" for the sake of the Oakland Archive Policy is an interesting one, but let me say it's a bit adventurous. That might be a way out legally speaking, but it's not self-evident.

Just think of all the emails or support requests which might com from webmasters confused by the (non) interpretation of "*": increasing the workload like that would defeat the purpose. I can understand why IA prefers a conservative (customary?) interpretation for now and I trust them to switch to a less defensive interpretation whenever that's more sustainable than the opposite.

Reply to this post
Reply [edit]

Poster: Somona Khumar Date: Jul 7, 2022 2:33am

Forum: faqs Subject: Re: Retroactive robots.txt removal of past crawls AKA Oakland Archive Policy

great

Reply to this post
Reply [edit]

Poster: Hjulle Date: Mar 4, 2015 1:46am

Forum: faqs Subject: Re: Retroactive robots.txt removal of past crawls AKA Oakland Archive Policy

It is not conservative to retroactively remove all content based on a "User-agent: *". And no document mentions that being a valid operation in robots.txt (specifically, the syntax for removing the archive is very explicit). People would expect robots not to crawl sites with "User-agent: *", the wouldn't expect them to remove the archive for them.

But according to https://archive.org/post/423432/domainsponsorcom-erasing-prior-archived-copies-of-135000-domains
they already do that. Only the "User-agent: ia_archiver" should remove anything, so my point was irrelevant.

I drew my first conclusion from this site https://web.archive.org/web/*/http://www.testblogpleaseignore.com/2012/06/22/the-trouble-with-frp-and-laziness/ not having any archive, while the (new) robots.txt only says "User-agent: *".

Reply to this post
Reply [edit]

Poster: dolalin Date: Jan 8, 2020 1:05am

Forum: faqs Subject: Re: Retroactive robots.txt removal of past crawls AKA Oakland Archive Policy

Shocking and depressing that this hasn't been resolved. There is so much content missing from the web, and it will only increase over time.

robots.txt should be respected but only on a per-crawl basis. If people want things removed from the IA they should be obliged to at least do the bare minimum action of sending an email to request it.

Reply to this post
Reply [edit]

Poster: Menelmacar Date: Apr 2, 2015 4:02pm

Forum: faqs Subject: Re: Retroactive robots.txt removal of past crawls AKA Oakland Archive Policy

"Just think of all the emails or support requests which might com from webmasters confused by the (non) interpretation of "*": increasing the workload like that would defeat the purpose. I can understand why IA prefers a conservative (customary?) interpretation for now and I trust them to switch to a less defensive interpretation whenever that's more sustainable than the opposite."

That's the thing: There's nothing customary about it. The robots.txt standard was invented to affect the *current* behavior of crawlers. Stopping/limiting current crawling was all it ever was ever drafted to do. As far as I've seen, it was never proposed that compliant robots would be expected to perform actions elsewhere, such as modifying existing databases.

See:
http://www.robotstxt.org/orig.html
http://www.robotstxt.org/norobots-rfc.txt
http://en.wikipedia.org/wiki/Robots.txt

The "Oakland Archive Policy" that IA defers to ( http://www2.sims.berkeley.edu/research/conferences/aps/removal-policy.html ) tries to use robots.txt for a purpose it was never designed for. It's a Band-Aid for the fact that there never was (and likely never will be, given the legal tangles involved) a dedicated mechanism for sites to declare whether it's ok for archiving sites to retain permanent copies.

For it's part, robots.txt was never even approved by a major standards body as a standard. It's only a de facto one, which one would think (note: IANAL) might make its use in a legal context even more problematic.

It's unfortunate that there hasn't (to my knowledge) been enshrined into a law protection similar to what exists for temporary caching ( http://en.wikipedia.org/wiki/Online_Copyright_Infringement_Liability_Limitation_Act#Other_safe_harbor_provisions ) , for cases where Internet archiving is provided to the public in an essentially unmodified form for no profit. Given the immense value of a resource like IA to society, ideally something would be worked out to put a site like IA on safer footing.

I think the long and the short of the problem is that IA doesn't have the legal staff, legislated liability protection, or access to standardized authorization protocols that would put them on safer legal ground, nor enough staff to handle enormous volumes of takedown requests, so they feel like they have to go to enormous lengths to be cautious.

I do wish they could at least correlate it against whois records though. My heart sinks any time this happens. It'll definitely become a worse and worse problem as time goes on.

*Sigh* One more reason to loathe %*&^$*ing domain squatting. (Sorry, "domain parking". Ugh.)

Reply to this post
Reply [edit]

Poster: Nemo_bis Date: Apr 2, 2015 11:32pm

Forum: faqs Subject: Re: Customary syntax and liability

As for customary, I *only* meant the usage of "*" as wildcard.

As for legal protection, you're very right. I wonder if https://www.manilaprinciples.org/ would help.

Reply to this post
Reply [edit]

Poster: CogDogBlog Date: Jun 28, 2016 11:55am

Forum: faqs Subject: Re: Retroactive robots.txt removal of past crawls AKA Oakland Archive Policy

Fourteen years worth of my early web work in education (1993-2006) have vanished from the archive, reportedly because of robots.txt. However, it's not an inclusion or exclusion problem, but because of some IT person mangled a DNS forwarding entry, and the domain for the archive does not connect to anything.

So if robots.txt is not found at all, the IA wipes it out? Hardly archival to my simple mind. The full story http://cogdogblog.com/2016/06/dont-archive/

Poster:	Nemo_bis	Date:	Jul 7, 2014 10:37am
Forum:	faqs	Subject:	Retroactive robots.txt removal of past crawls AKA Oakland Archive Policy

Poster:	metaeducation	Date:	Mar 24, 2016 11:23am
Forum:	faqs	Subject:	Re: Retroactive robots.txt removal of past crawls AKA Oakland Archive Policy

Poster:	Hasan jonas	Date:	Jun 27, 2022 10:05pm
Forum:	faqs	Subject:	Re: Retroactive robots.txt removal of past crawls AKA Oakland Archive Policy

Poster:	Rasel Jonas	Date:	Jul 11, 2022 8:35am
Forum:	faqs	Subject:	Re: Retroactive robots.txt removal of past crawls AKA Oakland Archive Policy

Poster:	asad johns	Date:	Sep 7, 2022 12:47am
Forum:	faqs	Subject:	Re: Retroactive robots.txt removal of past crawls AKA Oakland Archive Policy

Poster:	Hjulle	Date:	Mar 4, 2015 12:50am
Forum:	faqs	Subject:	Re: Retroactive robots.txt removal of past crawls AKA Oakland Archive Policy

Poster:	Somona Khumar	Date:	Jul 7, 2022 2:33am
Forum:	faqs	Subject:	Re: Retroactive robots.txt removal of past crawls AKA Oakland Archive Policy

Poster:	dolalin	Date:	Jan 8, 2020 1:05am
Forum:	faqs	Subject:	Re: Retroactive robots.txt removal of past crawls AKA Oakland Archive Policy

Poster:	Menelmacar	Date:	Apr 2, 2015 4:02pm
Forum:	faqs	Subject:	Re: Retroactive robots.txt removal of past crawls AKA Oakland Archive Policy

Poster:	CogDogBlog	Date:	Jun 28, 2016 11:55am
Forum:	faqs	Subject:	Re: Retroactive robots.txt removal of past crawls AKA Oakland Archive Policy

Internet Archive Audio

Featured

Top

Images

Featured

Top

Software

Featured

Top

Books

Featured

Top

Video

Featured

Top

Mobile Apps

Browser Extensions

Archive-It Subscription

Save Page Now

Reply to this post | Go Back View Post [edit]

Poster: Nemo_bis Date: Jul 7, 2014 10:37am Forum: faqs Subject: Retroactive robots.txt removal of past crawls AKA Oakland Archive Policy

Reply to this post Reply [edit]

Poster: metaeducation Date: Mar 24, 2016 11:23am Forum: faqs Subject: Re: Retroactive robots.txt removal of past crawls AKA Oakland Archive Policy

Reply to this post Reply [edit]

Poster: Hasan jonas Date: Jun 27, 2022 10:05pm Forum: faqs Subject: Re: Retroactive robots.txt removal of past crawls AKA Oakland Archive Policy

Reply to this post Reply [edit]

Poster: Hasan jonas Date: Jun 27, 2022 10:06pm Forum: faqs Subject: Re: Retroactive robots.txt removal of past crawls AKA Oakland Archive Policy

Reply to this post Reply [edit]

Poster: Rasel Jonas Date: Jul 11, 2022 8:35am Forum: faqs Subject: Re: Retroactive robots.txt removal of past crawls AKA Oakland Archive Policy

Reply to this post Reply [edit]

Poster: Rasel Jonas Date: Jul 11, 2022 8:32am Forum: faqs Subject: Re: Retroactive robots.txt removal of past crawls AKA Oakland Archive Policy

Reply to this post Reply [edit]

Poster: asad johns Date: Sep 7, 2022 12:47am Forum: faqs Subject: Re: Retroactive robots.txt removal of past crawls AKA Oakland Archive Policy

Reply to this post Reply [edit]

Poster: asad johns Date: Sep 7, 2022 12:48am Forum: faqs Subject: Re: Retroactive robots.txt removal of past crawls AKA Oakland Archive Policy

Reply to this post Reply [edit]

Poster: Hjulle Date: Mar 4, 2015 12:50am Forum: faqs Subject: Re: Retroactive robots.txt removal of past crawls AKA Oakland Archive Policy

Reply to this post Reply [edit]

Poster: Hjulle Date: Mar 4, 2015 12:54am Forum: faqs Subject: Re: Retroactive robots.txt removal of past crawls AKA Oakland Archive Policy

Reply to this post Reply [edit]

Poster: Nemo_bis Date: Mar 4, 2015 1:33am Forum: faqs Subject: Re: Retroactive robots.txt removal of past crawls AKA Oakland Archive Policy

Reply to this post Reply [edit]

Poster: Somona Khumar Date: Jul 7, 2022 2:33am Forum: faqs Subject: Re: Retroactive robots.txt removal of past crawls AKA Oakland Archive Policy

Reply to this post Reply [edit]

Poster: Hjulle Date: Mar 4, 2015 1:46am Forum: faqs Subject: Re: Retroactive robots.txt removal of past crawls AKA Oakland Archive Policy

Reply to this post Reply [edit]

Poster: dolalin Date: Jan 8, 2020 1:05am Forum: faqs Subject: Re: Retroactive robots.txt removal of past crawls AKA Oakland Archive Policy

Reply to this post Reply [edit]

Poster: Menelmacar Date: Apr 2, 2015 4:02pm Forum: faqs Subject: Re: Retroactive robots.txt removal of past crawls AKA Oakland Archive Policy

Reply to this post Reply [edit]

Poster: Nemo_bis Date: Apr 2, 2015 11:32pm Forum: faqs Subject: Re: Customary syntax and liability

Reply to this post Reply [edit]

Poster: CogDogBlog Date: Jun 28, 2016 11:55am Forum: faqs Subject: Re: Retroactive robots.txt removal of past crawls AKA Oakland Archive Policy

Reply to this post | Go Back
View Post [edit]

Poster: Nemo_bis Date: Jul 7, 2014 10:37am

Forum: faqs Subject: Retroactive robots.txt removal of past crawls AKA Oakland Archive Policy

Reply to this post
Reply [edit]

Poster: metaeducation Date: Mar 24, 2016 11:23am

Forum: faqs Subject: Re: Retroactive robots.txt removal of past crawls AKA Oakland Archive Policy

Reply to this post
Reply [edit]

Poster: Hasan jonas Date: Jun 27, 2022 10:05pm

Forum: faqs Subject: Re: Retroactive robots.txt removal of past crawls AKA Oakland Archive Policy

Reply to this post
Reply [edit]

Poster: Hasan jonas Date: Jun 27, 2022 10:06pm

Forum: faqs Subject: Re: Retroactive robots.txt removal of past crawls AKA Oakland Archive Policy

Reply to this post
Reply [edit]

Poster: Rasel Jonas Date: Jul 11, 2022 8:35am

Forum: faqs Subject: Re: Retroactive robots.txt removal of past crawls AKA Oakland Archive Policy

Reply to this post
Reply [edit]

Poster: Rasel Jonas Date: Jul 11, 2022 8:32am

Forum: faqs Subject: Re: Retroactive robots.txt removal of past crawls AKA Oakland Archive Policy

Reply to this post
Reply [edit]

Poster: asad johns Date: Sep 7, 2022 12:47am

Forum: faqs Subject: Re: Retroactive robots.txt removal of past crawls AKA Oakland Archive Policy

Reply to this post
Reply [edit]

Poster: asad johns Date: Sep 7, 2022 12:48am

Forum: faqs Subject: Re: Retroactive robots.txt removal of past crawls AKA Oakland Archive Policy

Reply to this post
Reply [edit]

Poster: Hjulle Date: Mar 4, 2015 12:50am

Forum: faqs Subject: Re: Retroactive robots.txt removal of past crawls AKA Oakland Archive Policy

Reply to this post
Reply [edit]

Poster: Hjulle Date: Mar 4, 2015 12:54am

Forum: faqs Subject: Re: Retroactive robots.txt removal of past crawls AKA Oakland Archive Policy

Reply to this post
Reply [edit]

Poster: Nemo_bis Date: Mar 4, 2015 1:33am

Forum: faqs Subject: Re: Retroactive robots.txt removal of past crawls AKA Oakland Archive Policy

Reply to this post
Reply [edit]

Poster: Somona Khumar Date: Jul 7, 2022 2:33am

Forum: faqs Subject: Re: Retroactive robots.txt removal of past crawls AKA Oakland Archive Policy

Reply to this post
Reply [edit]

Poster: Hjulle Date: Mar 4, 2015 1:46am

Forum: faqs Subject: Re: Retroactive robots.txt removal of past crawls AKA Oakland Archive Policy

Reply to this post
Reply [edit]

Poster: dolalin Date: Jan 8, 2020 1:05am

Forum: faqs Subject: Re: Retroactive robots.txt removal of past crawls AKA Oakland Archive Policy

Reply to this post
Reply [edit]

Poster: Menelmacar Date: Apr 2, 2015 4:02pm

Forum: faqs Subject: Re: Retroactive robots.txt removal of past crawls AKA Oakland Archive Policy

Reply to this post
Reply [edit]

Poster: Nemo_bis Date: Apr 2, 2015 11:32pm

Forum: faqs Subject: Re: Customary syntax and liability

Reply to this post
Reply [edit]

Poster: CogDogBlog Date: Jun 28, 2016 11:55am

Forum: faqs Subject: Re: Retroactive robots.txt removal of past crawls AKA Oakland Archive Policy