How to force the crawler to crawl the entire content source again?

Permanently deleted user

April 12, 2021 06:29

Currently, the crawler is running based on the sitemap. Only the updated URLs will be crawled. Is there any way that we can manually start the crawler to crawl the entire content source all over again whenever needed? Or, is there any way to upload a 'manipulated' sitemap file so that the crawler can crawl all the URLs listed in the sitemap?

Related to

Comments

3 comments

Permanently deleted user

April 12, 2021 13:05
Hi @... you can crawl the whole data manually from the manual crawling option in the admin dashboard whenever needed. Frequency crawling will only crawl the updated URLs in case of sitemap with lastmod tag.

following is a detailed explanation:
In website content source, we can have different type of inputs
- file
- Url
only for the last case ie a sitemap with lastmod tag we crawl only updated data based on the lastmod tag. Hence if a hosted sitemap is provided with lastmod tag, lastmod tag should be kept up to date. And for all other inputs, we always crawl the whole data in every crawl.
0
Permanently deleted user

April 13, 2021 02:28
Hi @... I think the message is still valid for a website content source. I uploaded the URL txt file and manually started crawling. During the crawler runtime, this content source becomes unsearchable until it's done.
0
Permanently deleted user

April 13, 2021 04:36
Hi @...

As I have mentioned earlier as well, In website crawling we don't wipe the existing data in case of manual crawl. We crawl the new data in a temporary location and then when the crawling is complete we swap the temporary data with original data. So existing data is available for search during the crawling period, its just that new data will be available for search once the crawling is complete
0

Please sign in to leave a comment.

Experience Agentic AI-Driven Self-Service with Lumen on SearchUnify Community (Beta)!

You Are Being Redirected To Lumen

Comments

Didn't find what you were looking for?