Problem Statement
What are the possible causes of crawl failure for a new content source?
Environment | Production |
---|---|
Reported product version | C'20 |
Resolved in version | C'20 |
Module | Content Source |
Causes
-
Invalid credentials
-
Incorrect content source name
-
Incorrect client URL
-
Incorrect crawling Start Date
-
Overlap of crawling operations with patch deployment
-
Mismatch between the article fields and fields configured in the admin panel
-
Expiration of password used to authenticate a content source
-
User used to authenticate the content source does not have Admin access
-
Incorrect sitemap in case of website type content sources
Solution
-
Ensure that the login credentials of the content source are the same as the ones used to connect the content source in the SU admin panel.
-
Check if the content source name contains any non-ASCII characters. Only ASCII characters should be used to set the content source name.
-
Only baseURL should be used while configuring content sources in the SU admin panel.
-
The crawling start date should not be a current or future date but should be in the past, from when you would like crawling to start.
-
If the crawl fails in between, check with your CSM if there was any patch deployment happening while crawling was in progress.
-
If you have updated the article structure for one of your content sources by deleting a field, make sure to update the SearchUnify admin panel object fields for a given content source.
-
Content source authentication - Check if the user password used for content source authentication has expired. The content cannot be auto-crawled on the set frequency on an old password.
-
Check if the user used to authenticate the content source has admin access or not. Only users with admin access will be able to authenticate and crawl the content source.
-
In case of website-type content sources, if the sitemap is not in XML format or if the URLs in the sitemap are directed to 404 then crawling will fail. So, it is recommended either to correct the URLs or remove those from the sitemap.
For more information on how to configure a new content source please visit: https://docs.searchunify.com/Content/Content-Sources/Content-Source.htm
Comments
0 comments
Please sign in to leave a comment.