You Are Being Redirected To Lumen

Our new self-service and case management experience on the SearchUnify Community.

Redirecting in 10 seconds...

How to force the crawler to crawl the entire content source again?

Comments

3 comments

  • ashishprasher
    Hi @... you can crawl the whole data manually from the manual crawling option in the admin dashboard whenever needed. Frequency crawling will only crawl the updated URLs in case of sitemap with lastmod tag.

    following is a detailed explanation:
    In website content source, we can have different type of inputs
    • file

    • in this type, we crawl URLs from a txt or a XML file

    • Url

    • Url can be a website Url or a URL of a hosted sitemap. And a sitemap can also be a sitemap that has a lastmod (last modified timestamp) tag.


    only for the last case ie a sitemap with lastmod tag we crawl only updated data based on the lastmod tag. Hence if a hosted sitemap is provided with lastmod tag, lastmod tag should be kept up to date. And for all other inputs, we always crawl the whole data in every crawl.
    0
  • Permanently deleted user
    Hi @... I think the message is still valid for a website content source. I uploaded the URL txt file and manually started crawling. During the crawler runtime, this content source becomes unsearchable until it's done.
    0
  • ashishprasher
    Hi @...

    As I have mentioned earlier as well, In website crawling we don't wipe the existing data in case of manual crawl. We crawl the new data in a temporary location and then when the crawling is complete we swap the temporary data with original data. So existing data is available for search during the crawling period, its just that new data will be available for search once the crawling is complete
    0

Please sign in to leave a comment.