Monday 11 July 2011

Configuring Search Relevance for SharePoint


Search Relevance refers to the process of tuning the search results, so that it matches closely to the search query which the user is trying to find.
There are a number of factors which affects the presentation and sequence of the search results when a user attempts a search query.
  1. Precision (finding the right answers)
  2. Re Call (finding all the answers)
  3. Visual Design
  4. Usability
  5. Speed
etc…
The relevance ranking engine is based on information retrieval algorithms, adapted from Stephen Robertson’s BM25F algorithm. It is specifically tuned for the unique requirements of searching enterprise content. This approach orders results by decreasing probability of relevance to the query. Query terms describe the document and the query. Statistics about the terms and the result make up the ranking: the document length, the number of occurrences of the term in the document, and the number of documents in which each term occurs at all (this is repeated for each property). This is further enhanced by tracking body text and properties, such as title or author, individually. Yet, each enhancement to the model, adding features and facts about the document or the query, will contribute to better results.

While performing search, SharePoint performs two types of rankings.
1) Dynamic Ranking
2) Static Ranking
 Dynamic Ranking of Search Results
 This ranking is based on the search query term and the information available in various indexed metadata information. The calculation takes place at the Query Servers and depends on the basis of query text and term information matching.
 The following components are used for determining the dynamic search ranks for the search results.
Anchor Text
Anchor text is the text that is included with a hyperlink to describe the target content of that hyperlink. It only influences the rank and is not responsible for including the search result in the overall search result set.
Search indexes the anchor text from the following elements:
  • HTML anchor elements
  • Microsoft Windows SharePoint Services link lists
  • Microsoft Office SharePoint Portal Server 2003 listings
  • Microsoft Office Word 2007, Microsoft Office Excel 2007, and Microsoft Office PowerPoint 2007 hyperlinks (only for files using the new Office Open XML Formats)
Property Weighting

The property weighting is the process of assigning the weights or priorities to the various properties available in the search index.  These properties can be modified to improve the chances of the search results.

Property Length Normalization
A content item can have many different properties of varying length. If the values in these properties are treated equally regardless of their size during relevance calculation, it can have a negative impact on the calculated rank. Length normalization adjusts the rank of a content item, based on the length of the property, and the length normalization setting.
URL Matching
The name of the site is one of the important search terms. The <a href=http://www.logimindz.com>SharePoint</a> search matches the name of the site to the URL of the site.
Title Extraction
Using the title value of a document in index server can help in returning a very high precision of search results. However, in many scenarios, this title of the document does not accurately reflect the content of the document.
  Static Ranking of search Results
This is independent of the search query and is set during the indexing phase on the index server.
The following parameters affect the static ranking of search results.
Click Distance
Click distance refers to the number of links between a content item and an "expert" page linking to the content item.
The more links that the crawler must travel from an authoritative page to the content item, the lower the relevance score. If there are multiple paths to a content item, relevance is calculated based on the shortest path, the one with the least amount of links from the authoritative page to the content item.
URL Depth
Important or relevant content is often located closer to the top of a site’s hierarchy, instead of in a location several levels deep in the site. As a result, the content has a shorter URL, so it is more easily remembered and accessed by the user. Enterprise Search makes use of this fact by reviewing URL depth, which refers to how many levels deep within a site the content item is found. The level is determined by reviewing the number of slash ("/") characters in the URL; the greater the number of slash characters in the URL path, the deeper the URL is for that content item. As a consequence, a large URL depth number can lower the relevance of that content.
Automatic Language Detection
Enterprise Search determines the user’s language based on "Accept-Language" headers from the browser they are using—automatic language detection. When calculating relevance, content that is retrieved in the user’s language is considered more relevant than content in other languages, with the exception of English language content. English language content is considered as relevant as content in the user’s language.
File Type Biasing
In most search scenarios, certain file types are more relevant than others. For example, HTML pages and Word documents are usually more relevant to a user’s search than an Excel spreadsheet or a plain text file.
Enterprise Search’s relevance calculation includes a ranking algorithm that ranks some file types higher than other file types. This applies to the following file types, listed in default ranking order in Enterprise Search, starting with the highest:
  • HTML Web pages
  • PowerPoint presentations
  • Word documents
  • XML files
  • Excel spreadsheets
  • Plain text files
  • List items
Reference website: http://www.logimindz.com

Thursday 28 April 2011

DisableLoopBackCheck : SharePoint 2010


I recently encountered an issue when I was working with one of my clients for providing them support on the deployment of master pages and updated solution files on their Standalone SharePoint 2010 installation.
Every attempt to log in into the local SharePoint 2010 environment by providing username and password failed and I was getting the username and password prompt continuously. As a result I was not able to access the site settings section of my SharePoint Server from  local server itself.
On a detailed investigation, It appeared that this issue has led to multiple issues in my SharePoint 2010 environment.

1)   Deployment Issues : The user will not be able to access the SharePoint Site Admin and hence will not be able to perform the deployment of master pages and layout page. The user will not be able to modify the settings available for the users.

2)   Search Administration :  if the SharePoint Environment hosts the Search Indexer also on the same SharePoint Box , then the indexer will be crawling itself and hence will not be able to access the content and the Crawl log  will be full of 401 error.


3)   Custom” scripts: Any custom script that is trying to access the web application while execution, the script will not be able to log in and hence will not be able to execute.

4)   Custom Code : Any custom code that requires to log in into the Web application will fail as it will not be able to log in when executed on the SharePoint 2010 Server.


Workarounds
This issue occurs because of a security feature added by Microsoft on Windows 2003 and Windows 2008 server.
you can either disable the loop back check entirely or else you can add a list of addresses to exclude them from the security check of the server.
To Disable the  Loop back check in the Registry, you need to perform any of the following:
  1. Go To Start—> Run –> Type Regedit, and then click OK.
  2. Locate the Following Registry Key :  HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Lsa
  3. In Windows 2008, Add a new DWORD 32 bit, at the Lsa.
  4. Type “DisableLoopbackCheck” and then press ENTER.
  5. Right-Click DisableLoopbackCheck, and then click Modify.
  6. In the Value box, type 1 and then click OK.
You can also add the Registry entry,using the PowerShell :
New-ItemProperty –> HKLM:\System\CurrentControlSet\Control\Lsa –Name “DisableLoopbackCheck” –value “1”

Additional Details can be obtained from the following Microsoft Article

To know more about our website please visit: www.LogiMindz.com