# thanks http://www.subsume.com/robots.txt ! # Welcome to www.subsume.com. We like most robots, but some are just greedy bastards. # Contact the webmaster if you're listed but wisely addressed the issue of concern. # FAST Enterprise Crawler/6 comes in from an IP with no reverse DNS. # It does a crapload of crawling, but I'm not seeing any referred traffic. # It does not have a proper bot page that I could find, so User-agent is a guess. # It my not even honor this file at all, which will lead to an IP block. User-agent: FAST Disallow: / # ZyBorg/1.0 Dead Link Checker is another intolerably bad bot. # Everything that applies to the previous agent applies to this piece of crap, too. # The connection (WiseNut/LookSmart) with the shitty grub-client (listed next) doesn't surprise me. User-agent: ZyBorg Disallow: / # You fuckers aren't honoring the * disallows, so you don't get to see anything. # And if you don't honor this, we'll go to blocking specific hosts. # Update: We are now blocking host IPs. Die! User-agent: grub-client Disallow: / # Another bot that ignores * disallows, even though they claim they follow the protocol. # And what the hell is with Yahoo-VerticalCrawler-FormerWebCrawler in the agent? Pick a name! # This may be the same bot that was listed as FAST above, but it gets a special list. # Dirty, dirty bot. I kind of hope this is ignored so I get to block by IP. # Update: It is! I do! User-agent: fast Disallow: / # More * ignorance. User-agent: NaverBot Disallow: / # Intentionally generates 404s by changing the case of a known good URL it just spidered. # We don't know if it's testing case sensitivity or what, but we don't really care. # Use the bloody URL you're given! User-agent: baiduspider Disallow: / # Also 404s URLs by changing case. User-agent: LNSpiderguy Disallow: / # QuepasaCreep is an unknown spider that screws up all links it tries. User-agent: QuepasaCreep Disallow: / # VoilaBot is an another spider that seems to 404 all the time. # And despite a complete / ban, it still bugs us multiple times a day. # Say hello to a 195.101.94.0/24 block you greedy French fucks! User-agent: VoilaBot Disallow: / # This agent charges a fee for its "services" but provides sites with no compensation. # You want to make money by leeching content from our site? Pay us. # Plus, we think stupid people should be allowed to copy in lieu of learning. # More idiots in the market makes us look like absolute geniuses by comparison. User-agent: TurnitinBot Disallow: / # And now for some universal blocks. # # Site changes moved things around so even if old WebObjects links work, they shouldn't be indexed. # Nobody should be searching directly in the pub for binaries. All the good stuff has pages. # User discussions aren't considered site content at this time. User-agent: * Disallow: /secret/