Insuma GmbH - Smart Search Engines
Insuma Logo
Deutsche Version

Your project:

Your search engine:


Insuma FAQ - Indexing questions

  • Why is my site no longer spidered?
  • I changed my site, how do I re-index?
  • I update my site continuously. Can it be automatically re-indexed?
  • I deleted some pages from my website, how long will they be still found?
  • How does the spider (crawler) find my pages?
  • Can I prevent a page from being indexed?
  • Can I prevent part of a page from being indexed?
  • Can my forum be indexed?
  • Does the spider honor "robots.txt" files?
  • What parts of each page get indexed?
  • What is faster, strong or light update?

    Why is my site no longer indexed?
    This typically happens because your hosting company has moved your site. Amazingly, this happens a lot! Double-check the current location of your site and make the necessary corrections.

    I changed my site, how do I re-index?
    Log in to the Control Center then go to the Crawler > Crawler control page and press the Re-index button.

    I update my site continuously. Can it be automatically re-indexed?
    Yes. The crawling and indexing process happens permanently, so that crawlers revisits every page once per week. This happens absolutely automatically.

    I deleted some pages from my website, how long will they be still found?
    Index entries which can not be renewed (for example because they were deleted) will be removed from the index after particular time period. The standart setting is 15 days (focus.conf: remove_older_than = 15) to allow two updating attempts (when indexed weekly) before giving up. This setting should not be too short, to ensure that pages do not get removed after the crawler could not reach them for some reason (network problem etc.).

    How does the spider (crawler) find my pages?
    The search engine spider starts with the web address (URL) you entered when you signed-up. It reads all the links on that page. Those links that start with the same URL are then read by the spider. It then looks for links on those pages that start with the same URL as the one you signed-up with, and reads them, and so forth. Once all pages have been read an index is built and the search brings the most accurate results.

    Can I prevent a page from being indexed?
    Yes, there are lots of ways of doing this:

    • Forbid the directory in your robots.txt file
    • Forbid the page/directory in the Control Center
    • Add a no-index META tag into the page
    For more details see the Technical Library in the Support section of the website.

    Can I prevent part of a page from being indexed?
    Yes. Put the tag around the part, which you do not want to be indexed. It is good to do so with the menues, news ticker and other text which appears at every page of your website. Please note that this tag is not conform to HTML-standards. You may also use this type of comment which is conform to HTML-standards: <!--insuma_ignore_begin--> <!--insuma_ignore_end-->.

    Can my forum be indexed?
    Yes. Please check that the crawler is not looping in your forum due to same pages shown with different CGI parameters. See details in the Technical Library, in the Support section of the Insuma web site.

    Does the spider honor "robots.txt" files?
    Always. The crawler of Insuma search engine introduces itself as:

        InsumaScout/1.15
    
    where 1.15 is the current version number.

    What parts of each page get indexed?
    By default words in the following parts of each web page are included in the index:

  • the title,
  • the keywords meta tag,
  • the description meta tag, and
  • the body of the page.
    The spider does not index words in javascript nor in the image tag "alt" attribute.

    What is faster, strong or light update?
    Surprisingly, the strong update is faster and creates less load for the system. The reason is that strong update erases the crawler history completely and starts over. Light update tries to check if previously crawled pages have changed and need to be updated, using smart comparison algorithms. If you do not value the previously crawled documents, but need to create index faster, consider using strong update.



  • Copyright © 2001-2012 Insuma GmbH. All rights reserved. Insuma™ and the Insuma logo are registered trademarks of Insuma GmbH. All other logos and trademarks contained in this site are property of their respective owners. Imprint