Strip HTML from search results

  • Depending on content source, some of our search results have wildly different outputs when including the original HTML. In certain cases, this can significantly increase the height of each search result.

    How do we prevent this in search results?

    Note: This includes stripping images, which sometimes appear as a broken image.

    Issue #1 - Results appear as if you're viewing the full page.

    Issue #2 - Results appear with a table that has a lot of extra white space

    Issue #3 - Results blow out a table header (similar to extra white space in tables)

    Issue #4 - Results appear with a broken image.**

    Issue #5 - Results appear with large font sizes

    Issue #6 - Results display content that's in reverse of the order it appears in the document itself.

    Issue #7 - Results appears with special characters described at�/1

  • @Esha, Sure, will check the issue.

    @jshenricks, I shall connect with you and discuss the issue.

  • @jshenricks SearchUnify's standard crawling does not store HTML in indexes. Since no standard crawling is performed for your content sources and a dump file has been uploaded instead. However, we can definitely clean this clutter with the help of implementation team. @parveens - can you please look into it and help @jshenricks with this?

Log in to reply

Suggested Topics

  • 3
  • 2
  • 3
  • 5
  • 1
  • 4
  • 8
  • 3