Strip HTML from search results



  • Depending on content source, some of our search results have wildly different outputs when including the original HTML. In certain cases, this can significantly increase the height of each search result.

    How do we prevent this in search results?

    Note: This includes stripping images, which sometimes appear as a broken image.

    Issue #1 - Results appear as if you're viewing the full page.
    search-result-formatting-1.jpg

    Issue #2 - Results appear with a table that has a lot of extra white space
    search-result-formatting-2.jpg

    Issue #3 - Results blow out a table header (similar to extra white space in tables)
    search-result-formatting-3.jpg

    Issue #4 - Results appear with a broken image.**
    search-result-formatting-4.jpg

    Issue #5 - Results appear with large font sizes
    search-result-formatting-6.jpg

    Issue #6 - Results display content that's in reverse of the order it appears in the document itself.
    search-result-formatting-8.jpg

    Issue #7 - Results appears with special characters described at https://unicodelookup.com/#�/1
    search-result-formatting-7.jpg



  • @Esha, Sure, will check the issue.

    @jshenricks, I shall connect with you and discuss the issue.



  • @jshenricks SearchUnify's standard crawling does not store HTML in indexes. Since no standard crawling is performed for your content sources and a dump file has been uploaded instead. However, we can definitely clean this clutter with the help of implementation team. @parveens - can you please look into it and help @jshenricks with this?


Log in to reply
 

Suggested Topics

  • 3
  • 2
  • 3
  • 1
  • 10
  • 5
  • 3
  • 8