Sitemaps Best Practices Including Large Web Sites
The main problem with extra-large sitemaps is that search engines are often not able to discover all links in them as it takes time to download all these sitemaps each day. Search engines cannot download thousands of sitemaps in a few seconds or minutes to avoid over crawling web sites; the total size of sitemap XML files can reach more than 100 Giga-Bytes. Between the time we download the sitemaps index file to discover sitemaps files URLs, and the time we downloaded these sitemap files, these sitemaps may have expired or be over-written. Additionally search engines don’t download sitemaps at specific time of the day; they are so often not in sync with web sites sitemaps generation process. Having fixed names for sitemaps files does not often solve the issue as files, and so URLs listed, can be overwritten during the download process.