# Manashid Economic Consulting — robots.txt # Search engines: ALLOWED. AI training/scraping bots: BLOCKED. # ---------- Search engines (allowed) ---------- User-agent: Googlebot Allow: / User-agent: Googlebot-Image Allow: / User-agent: Bingbot Allow: / User-agent: DuckDuckBot Allow: / User-agent: YandexBot Allow: / User-agent: Baiduspider Allow: / User-agent: Applebot Allow: / # ---------- AI training & LLM scrapers (blocked) ---------- # OpenAI — model training crawler User-agent: GPTBot Disallow: / # OpenAI — ChatGPT browse / live retrieval User-agent: ChatGPT-User Disallow: / # OpenAI — search index User-agent: OAI-SearchBot Disallow: / # Google — AI training (Bard / Vertex AI / Gemini) User-agent: Google-Extended Disallow: / # Anthropic — Claude crawlers User-agent: ClaudeBot Disallow: / User-agent: Claude-Web Disallow: / User-agent: anthropic-ai Disallow: / # Common Crawl (used by many AI training datasets) User-agent: CCBot Disallow: / # Perplexity AI User-agent: PerplexityBot Disallow: / User-agent: Perplexity-User Disallow: / # ByteDance / TikTok AI User-agent: Bytespider Disallow: / # Amazon AI User-agent: Amazonbot Disallow: / # Meta / Facebook AI User-agent: FacebookBot Disallow: / User-agent: Meta-ExternalAgent Disallow: / User-agent: meta-externalagent Disallow: / # Apple AI training User-agent: Applebot-Extended Disallow: / # Cohere User-agent: cohere-ai Disallow: / User-agent: cohere-training-data-crawler Disallow: / # You.com User-agent: YouBot Disallow: / # Diffbot User-agent: Diffbot Disallow: / # Omgili (data licensing for AI) User-agent: omgili Disallow: / User-agent: omgilibot Disallow: / # ImagesiftBot (image AI training) User-agent: ImagesiftBot Disallow: / # AI2 (Allen Institute) User-agent: AI2Bot Disallow: / User-agent: Ai2Bot-Dolma Disallow: / # Timpi User-agent: TimpiBot Disallow: / # Webz.io (AI data brokers) User-agent: Webzio-Extended Disallow: / # Scrapy / generic AI scrapers User-agent: Scrapy Disallow: / # ---------- Default rule ---------- # Allow everything else (e.g. social previews, well-behaved crawlers) User-agent: * Allow: / Sitemap: https://manashidkw.com/sitemap.xml