Strip HTML from search results
Depending on content source, some of our search results have wildly different outputs when including the original HTML. In certain cases, this can significantly increase the height of each search result.
How do we prevent this in search results?
Note: This includes stripping images, which sometimes appear as a broken image.
Issue #1 - Results appear as if you're viewing the full page.

Issue #2 - Results appear with a table that has a lot of extra white space

Issue #3 - Results blow out a table header (similar to extra white space in tables)

Issue #4 - Results appear with a broken image.

Issue #5 - Results appear with large font sizes

Issue #6 - Results display content that's in reverse of the order it appears in the document itself.

Issue #7 - Results appears with special characters described at https://unicodelookup.com/#%EF%BF%BD/1**

How do we prevent this in search results?
Note: This includes stripping images, which sometimes appear as a broken image.
Issue #1 - Results appear as if you're viewing the full page.

Issue #2 - Results appear with a table that has a lot of extra white space

Issue #3 - Results blow out a table header (similar to extra white space in tables)

Issue #4 - Results appear with a broken image.

Issue #5 - Results appear with large font sizes

Issue #6 - Results display content that's in reverse of the order it appears in the document itself.

Issue #7 - Results appears with special characters described at https://unicodelookup.com/#%EF%BF%BD/1**

0
-
Jeremy Henricks SearchUnify's standard crawling does not store HTML in indexes. Since no standard crawling is performed for your content sources and a dump file has been uploaded instead. However, we can definitely clean this clutter with the help of implementation team. parveens - can you please look into it and help Jeremy Henricks with this? 0 -
Esha Tanwar Sure, will check the issue.
Jeremy Henricks I shall connect with you and discuss the issue.0
Please sign in to leave a comment.
Comments
2 comments