Indexing

Last Updated: 06 Jul 2022

Squiz Matrix keeps an index or a list of words, where each word appears and how often each word appears. For the Search Page, Search List, Search Folder and Quick Search to work, indexing needs to be turned on in the system.

Once it is turned on, assets will automatically be indexed when they are created or when a change to an asset is committed. By default, Squiz Matrix will index the attributes of the asset including its content as well as metadata values. You can turn indexing off for an asset type or fields within an asset type on the either the Asset Weights, Asset Tree Weights or Global Weights screen of the Search Manager. You can also change which words are indexed and how many characters a word must be before it is indexed on the Details screen of the Search Manager.  For more information on the Search Manager, refer to the Search Manager chapter in this manual.

Bookmarks

Turning Indexing On

To turn indexing on, go to the Details screen of the Search Manager. Change Indexing Status to On and click Commit. Indexing will be turned on for the system and any assets that are added and changed from now on will be indexed. Any assets that have been created previously, however, will not be indexed. To index these assets you will need to re-index the system.

Re-indexing the System

To re-index the system, go to the Details screen of the Search Manager. Select Reindex all assets in the system and click Commit. The system will re-index all assets in the system.

Alternatively, if you only want to re-index a certain part of the system, select the parent asset in the Root Node field under the Re-index Assets section and click Commit.

Whenever you change any of the settings on the Search Manager, you will need to perform a re-index. If you do not perform a re-index, the changed will not affect your search results.

Re-indexing the System from the Server

To re-index the system from the server you can run the reindexSearchIndex.php script, which is in  <system_root>/packages/search/scripts, where <system_root> is the location of the Squiz Matrix system you are using.

The script accepts up to three parameters, the first of which is required:

php packages/search/scripts/reindexSearchIndex.php PATH_TO_SYSTEM_ROOT ROOT_NODE_IDS BATCH_SIZE

  • PATH_TO_SYSTEM_ROOT is a required parameter that sets the path on the server where Matrix is installed.
    In the example below, the user has already moved into the system root directory and  is executing the script within the directory by passing `pwd` in as the first parameter.
  • ROOT_NODE_IDS is  an optional parameter that lets you pass a comma-separated list of root node IDs.
    An example of  valid parameter values for this parameter is  11,55,66 or a single value such as 11. If you do not pass any IDs the script will prompt you to reindex the entire system.
  • BATCH_SIZE is an optional parameter that will only work if ROOT_NODE_IDS is also passed. This parameter lets you define the batch size if you need to run the script in chunks.
    The default value is 100 and there is no upper limit.
    Any values less than 0 will reset the batch size to 100.
    This parameter is an advanced option and should be used with care: the default value is recommended.

An example of the usage of this script is given below:

$ php packages/search/scripts/reindexSearchIndex.php `pwd`
Enter the #IDs of the root nodes to reindex (comma separated) or press ENTER to reindex the whole system: 100,200
Do you want to reindex the root node #100 (yes/no) no
Skipping ..
Do you want to reindex the root node #200 (yes/no) yes
Start Reindexing
Finished

Indexing the Content of PDF Files and MS Word Documents

By default, Squiz Matrix will not index the content of the PDF Files and MS Word Documents. In other words, when a user searches for a term, it will not search the content of these documents.

To index the content of these documents, you need to enable Apache Tika. For more information on how to enable these tools, refer to the External Tools Configuration chapter in the System Configuration manual.

Recommendations for Indexing

To help improve general backend performance and search performance for both front end and back end searching, it is recommended that you turn off indexing for:

  • All general assets such as Designs, Design Areas, Bodycopies, Divisions, Metadata Schemas and Workflow Schemas
  • Fields that are not being used on a Search Page, for example asset ID, created date, updated date and published date.
  • Certain parts of the Asset Map, for example the System Management Folder, the Designs Folder and the Users Folder
  • The Metadata Schemas, sections within a Metadata Schema or the metadata fields that will not be used for searching.

Next Chapter