In the past, search was viewed as a feature; a nice to have; or an additional capability. An organisation would have some requirement, say, that a certain group needs to be able to more efficiently find documents pertaining to a particular subject, and stored across a set of digital storage locations. It would then be a case of providing the software and service capabilities to said organisation in such a way that mitigated the difficulty in finding the information required. Search is now an application in its own right, and has come a long way as it has evolved from the role of ‘bolt on feature’ to that of ‘core application’. The reasons for this evolution are many, but let’s look at user interaction as a starting point.
Historically, search has generally been a non-interactive process when viewed from the perspective of the user. It’s a simple process; enter your key word, hit search. Whichever search technology the user happened to be working with would find matching text in the index, and return a set of results in an order which is determined by an analysis based on various criteria (to be discussed in a later post). The user would then inspect the set of results, notice that there is some irrelevant content, and then go back and refine the query. The refinement process is usually not a simple task. The user would have needed some knowledge of 1) the syntax used by the search technology in question 2) some knowledge of the workings of Boolean logic 3) a command advanced enough in the language concerned AND in the field of study concerned to be able to generate synonyms efficiently. Given those three (uncommon to John Q Public) knowledge sets, the user would be able to successfully embark on a search refinement process, provided that the information the user was looking for actually existed in the first place. A query submitted to an academic search engine might look like this:
((“The Photo Electric Effect”) AND (“Physics”) AND (“Classical Mechanics” OR “Netwtonian Mechanics”)) NOT (“Quanta” OR “Planck”)
While the balance between precision and recall is made patently obvious to the user in this situation (and the user can be as precise as he or she is capable of being) there are several inefficiencies to this approach:
- It is time consuming
- Syntax is always a concern
- It leaves all the work on the users plate, in an age of information!
The contemporary approach to search lets all the dirty work be handled by a combination of advances in the field of linguistic processing and some clever user interface design. No longer is search a back-button-thrashing exercise. Information can be processed as it’s fed into the index so that its value is increased. Each document can be analysed so that key phrases are extracted and added to each document’s ‘meta information’. At the same time we can identify the names of people and places, telephone numbers, email addresses, physical street addresses, the names of companies, subjects, sub disciplines, field specific jargon, all automatically and store it along with each and every document at the moment it is fed into the index. This means that the contemporary process analogous to the above would be as follows:
- Enter search terms: ‘the photoelectric effect’
- View search results, click ‘Physics’, click ‘Classical Mechanics’, click ‘Newtonian Mechanics’, click the X next to ‘Quanta’ and the one next to ‘Planck’
Through the whole process, the result set is dynamic, shifting according to the will of the user and adjusting to their whim as the user selects refinements and filters. All the while, in the background the user actually fires off a series of queries but the value add to the raw information has taken out the effort that would have been required in the past. This is all enabled by the fact that computers can group sets of information together much faster than we can. It would have taken the user a long while to inspect the result set only to realise that the quantum explanation for the photoelectric effect is in the result set when the historical research really required an account of how it was explained prior to that.
Another side effect of the fact that there is so much additional meta information is that it enables an entire portal to be developed in a write once use many approach. A single template can be used which aggregated disparate information from a variety of sources, say, financial history source, current events news source and an encyclopaedic source. This disparate information can then be presented (after some user interface sorcery) in an informative, easy to read, concise and clear manner. As soon as the user clicks on a name, the whole page shifts to the new context.
The advantages to this context driven, interactive user experience that modern search technologies enable are:
-
It allows all the precision and recall tuning of the past
- This is all driven by the metadata attached to each document.
-
It allows anyone to refine a query without having to think about synonyms, or field of study
- The engine can index the original content in such a way that it converts all words to their synonyms, or the query can be processed to be submitted with synonyms.
- Entity extraction allows drilldown to narrow result sets, and increase precision
- Processing content into a taxonomy can group similar items together, also increasing precision
-
The user does not need prior knowledge of the syntactical conventions of the search engine they may be using.
- In fact, it’s even possible to swap the backend search engine without the users noticing!
So you can see that an interactive contextual search experience really can lead to the relevant information, and deliver it on time too!
