Problem Statement
How to crawl the PDFs in content sources?
Answer
It is possible to extract the contents of a PDF file and index them.
For Salesforce articles, crawl the Attachment field to index textual information from other file types. Different objects have different attachment fields. For example, to index Salesforce knowledge articles, multiple attachment fields have to be crawled, including attachment_knowledge_kav, file_attachment_knowledgekav, link_file_attachment_knowledge_kav, and custom-fieldname_Body_s.
For a few content sources—such as Google Drive, Box, and Dropbox—PDF, PPT, XLS, CSV, and DOCX files are under the Rules tab of the content sources. Select them for crawling.
For all the remaining content sources, PDFs are crawled automatically.
Comments
0 comments
Please sign in to leave a comment.