OSlash brings powerful semantic search—to your app, website, and company.

Get in touch with us

OSlash brings powerful semantic search—to your app, website, and company

Get in touch with us

OSlash brings powerful semantic search—to your app, website, and company

Get in touch with us

OSlash brings powerful semantic search—to your app, website, and company

Get in touch with us
Share to Twitter

From keywords to conversations: The complete history & evolution of web search

Discover how web search evolved from keywords to semantics, bridging the gap between silicon and human minds. Get a glimpse into the future which promises even more intelligent and intuitive search experiences.
Table of Content

The year is 2005. I’ve just started fifth grade. 

We now head to the ‘Senior Computer Lab’ twice a week to access the internet. We type in a few keywords and a magical portal of information opens up!

Fast forward to 2023. Keywords and even keyboards are passé. 

I simply say: OK Google! 

The Assistant comes to life on my smartphone, awaiting instructions.

Google Assistant Voice Search

There’s a song I wish to find. But I don’t know the lyrics, just the tune. So, I hum it to the Assistant. Within 3 seconds, I know it is ‘Calm Down’ by Rema and Selena Gomez.

This is what good, no, great search looks like.

Like talking to a friend, forgetting something midway, and having them fill in the gaps seamlessly.

Effective search goes beyond words and uncovers intent. It gets us from question to answer instantly. It saves us time (up to 20% of our workday!) 

And it empowers us to extract meaning from the ever exploding pile of data—2.5 quintillion bytes—we generate every single day.

But searching for information wasn’t always as easy as talking into your phone. 

So, how did search progress from crawlers to conversations?

The journey is as interesting as it gets. Let’s get started!

{{cta-component-1}}

How search typically works

A search engine is an information retrieval system.

It lets you ask a question or enter a combination of keywords to discover answers from a vast database of indexed pages.

How Google Search works

But, given the heterogeneity of the indexed data—web pages contain information ranging from cooking recipes to code repositories—search engines not only retrieve information but also rank them for their relevance to the searcher.

There are three parts to the whole process.

  1. Crawling: Search engines employ automated bots, commonly known as "spiders," to systematically traverse web pages and look for content behind URLs (including text, images, PDFs etc). They create an interconnected repository of this content. Crawlers also periodically revisit web pages to identify updated content.

  2. Indexing: The gathered information needs to be organized and stored. Parsers extract relevant data from the links and send it for indexing into the search engine’s index which is like a massive digital library with meticulously cataloged information.
  3. Retrieval and ranking: Whenever a user searches for information, the search engine matches the query to the data contained in its index and returns pages that are a match. But, not all pages will be equally relevant to the user. So, the pages are ranked on the SERPs (search engine results pages) in the order of their potential utility from a user's perspective. Search engines employ intricate algorithms quickly evaluating content based on factors like location, backlinks, freshness, search history, and readability, among others. These algorithms ensure that users find the most relevant results to their queries within milliseconds.
Search processes of crawling, indexing, and ranking
Source: Search Engline Land

The earliest search engines used one or all of these processes.

The first crawler bots: Archie, Veronica and Jughead (1990-91)

The foundations of internet search began with the first crawler bot, Archie. 

Created at McGill University in 1990 by Alan Emtage, Archie downloaded file listings from public anonymous FTP (File Transfer Protocol) sites and created a searchable database for them. However, Archie did not index the full contents of these sites. Users simply searched for file names and it would return URLs where those files could be found.

Inspired by Archie, Bill Heelan created Veronica in 1991 to help users locate files on Gopher servers. Jughead, also released in 1991 by Steven Foster, indexed file directories and metadata.

The first crawler bots: Archie, Veronica and Jughead
Source: Wikipedia

These first crawlers, based on comic book characters, established critical foundations for search as we know it today—automating the discovery and indexing of content to make the rapidly growing online world navigable.

Yet all of them relied purely on keywords to locate files. They lacked relevance ranking and an understanding of relation between terms. Results were listed alphabetically or chronologically, making them tedious and inefficient.

The first web search engines (early 1990s)

In 1992, a markedly different search engine called the W3Catalog appeared. Created by Oscar Nierstrasz at the University of Geneva, it tapped into existing lists of top-notch websites, saving itself the trouble of crawling the vast web.

But it had a little hiccup. Its bot would visit each of these websites multiple times a day, causing performance headaches. 

Enter Martijn Koster and his brainchild, ALIWEB. Instead of relying on a web robot, AliWeb asked website administrators to notify the search engine about the presence of an index file in a specific format. It was a way to cut down on excessive crawling and conserve bandwidth.

Here’s a screenshot of ALIWEB from 1997 (Courtesy: The Wayback Machine)

ALIWEB: One of the first web search engines

While ALIWEB succeeded in reducing the strain on servers, many website administrators had no clue about the importance of submitting their data. It was as if a secret club existed, and not everyone was privy to the membership rules.

Then, in 1993, something extraordinary happened. Jonathon Fletcher swooped in and created JumpStation, a groundbreaking search engine that combined the powers of crawling, indexing, and retrieval. 

From here on, there was no looking back!

Learn more about the earliest search engines here.

The keyword-matching era (mid-1990s)

As the web boomed in the mid-1990s, many competing search engines started to emerge. 

Excite, launched in 1993, was one of the earliest search engines to index the full text of pages rather than just keywords or metadata. Brian Pinkerton's WebCrawler also appeared in 1994, traversing links to index pages.

Yahoo, founded in 1994 by Jerry Yang and David Filo, took a different approach with a human-curated directory of websites. However, they soon recognized the superior scalability of automated crawling bots. In 1995, YahooSearch indexed the web using crawler technology, cementing algorithms as the future of search.

YahooSearch

Other major entrants included Lycos in 1994, LookSmart in 1995, and HotBot in 1996.

And how can I forget AltaVista, a first among web search engines in many ways, and our constant companion at the Senior Computer Lab? 

It remained, till Google took over, as one of the few search engines to offer the complete package: unlimited bandwidth, full-text searches of web pages using Boolean operators, natural language queries as well as search tips, among others. In its heyday in 1995 it was attracting up to 80 million hits per day!

AltaVista search engine
Source: archive.org

Despite differentiating features, all of these relied on keyword matching—they cataloged terms on web pages and matched user queries to these indexed keywords within the content.

Why’s that a problem? 

Imagine you search for “best pasta recipe”. The keyword-matched search results are likely to fetch you all pages containing the terms “best”, “pasta” and “recipe” without ensuring that the results are

  1. Relevant to your intent (finding the best pasta recipe and not the “best pasta” or the “best recipe”)
  2. Ranked according to your preference (say vegetarian or vegan recipes, easy-to-cook recipes, less time-consuming recipes etc.)

You could, of course, use boolean search operators such as “AND”, “OR”, and “NOT” to create more precise and targeted search queries. Yet, their range of applications would be limited.

What search needed next was a revolution that allowed for greater control and customization in the way results were presented. It didn’t take long for that to happen.

Statistical ranking, PageRank, and the Google Revolution (late 1990s)

In 1996, two Ph.D students at Stanford, Sergey Brin and Larry Page, launched  a prototype web search engine, Google, on the university’s servers. Google crawled and indexed pages from the web at a speed and scale far beyond any previous system.

Unlike earlier search engines, Google's innovative PageRank algorithm analyzed the hyperlink structure of websites to determine which results were most authoritative for a given query. Pages that were linked to many sites were deemed credible and high quality, thus moving to the top positions in SERPs.

Google's PageRank algorithm
Source: Search Engline Land

Combined with keyword indexing, PageRank powered Google's breakthrough search engine and helped catapult it into a dominant position upon launch in 1998.

Incidentally, the PageRank algorithm itself was inspired by Robin Li’s RankDex site-scoring algorithm which later laid the foundation for the Baidu search engine.

Gradually, more statistical approaches to ranking were developed. Search engines employed various statistical signals, such as keyword frequency, document length, and click-through rates, to assess the relevance and importance of web pages. These signals were incorporated into complex statistical models that calculated the likelihood of a page being relevant to a particular query.

Some of these approaches include: 

  1. Term Frequency (TF) which ranks pages according to the frequency of the keyword appearing on that page. The higher the TF, the more relevant the search engine considers the page to be for that particular query. For example, if you're trying to understand semantic search, a web page that mentions the word "semantic" multiple times will have a higher TF for that keyword.

  2. Term Frequency-Inverse Document Frequency (TF-IDF) which calculates the ranking score for a page not just using the frequency of a word within a page (TF) but also how unique it is across all the pages in the search engine's index (IDF). This approach recognizes that some words, like "the" or "and," appear frequently in almost all pages and may not necessarily indicate relevance. On the other hand, words that appear more often in a specific page but less frequently across other pages might be more significant in understanding the page's content.

  3. The BM25F algorithm which finds the best match between a search query and a web page by striking a balance between factors like term frequency, document length, and term rarity BM25 recognizes that longer documents might naturally contain more instances of a search term, so it adjusts the ranking based on the document length.

Novel machine learning algorithms continue to optimize relevance ranking by learning from usage data. But, before these algorithms were perfected, the journey of search had one more milestone to cross.

The ‘searchphone’ shift (2000s)

This milestone propelled search from something that happened on a static, rooted screen to something we could make use of anywhere, at any time.

This milestone was the launch of the smartphone. 

It was a seismic shift that made search location-aware, context-aware, and extremely personalized.

Danny Sullivan, a prominent journalist who wrote about the search space from 1996 through 2017, went on as far as to say

“It’s not a smartphone in your hands. It’s a searchphone.”

Whether we’re accessing a contact to place a call, checking the latest news, navigating to display settings, or even trying to follow a favorite artist on Spotify, we’re essentially using our phones to search for information that takes us to our end goal.

The only difference? We’re no longer searching for keywords. We’re searching on the go, searching for precise answers, searching for personalized information.

Search on smartphones is multimodal

So we refuse to scan long listicles for "Best restaurants in New York City". 

We’d rather have an instant answer to: "What are some highly recommended restaurants with a great ambiance in New York City?"

The only way to do that? To have our phones understand natural language search queries like the one above.

Mobility has not only made search ubiquitous, it has also sparked a shift in its paradigm—from vague and dumb keyword combinations to free-flowing natural language. 

The rise of semantics and natural language search (late 2000s)

With semantic search, search evolved from plain keyword matching to understanding the intent and meaning behind search queries and web content. 

Using natural language processing (NLP), machine learning (ML), and knowledge graphs among others,  search began to provide more accurate and contextually relevant results, mimicking human cognition and closing the gap between silicon and carbon minds.

With semantic search, search evolved from plain keyword matching to understanding the intent and meaning behind search queries and web content. 

Using natural language processing (NLP), machine learning (ML), and knowledge graphs among others,  search began to provide more accurate and contextually relevant results, mimicking human cognition and closing the gap between silicon and carbon minds.

Various forms of semantic search include:

  1. Entity-based semantic search which focuses on recognizing and understanding entities (such as people, places, organizations) within search queries and web content. It aims to provide more precise results by considering the relationships and attributes associated with those entities.
  1. Contextual semantic search which takes into account the context of a search query, including the user's location, time, and preferences to deliver more personalized results.
  1. Intent-based semantic search which goes beyond the literal interpretation of keywords and attempts to provide results that satisfy the user's intent instead. For example, if a user searches for "best restaurants in New York," the search engine would understand the intent as seeking recommendations for restaurants in New York City.
  1. Relationship-based semantic search which provides more comprehensive and interconnected search results by considering the semantic connections or relationships between different pieces of information.
  1. Semantic question answering which involves understanding and interpreting natural language questions to provide direct answers. It goes beyond providing a list of relevant web pages and aims to deliver concise and accurate answers extracted from trusted sources.

Semantic search has sparked an irreversible change in the way humans interact with computers and search engines.

Will Oremus, former Senior Technology Writer at Slate, summarizes it best in his famous quote:

“In the beginning, computers spoke only computer language, and a human seeking to interact with one was compelled to do the same. First came punch cards, then typed commands such as run, print, and dir.
The 1980s brought the mouse click and the graphical user interface … the 2000s, touch screens; the 2010s, gesture control and voice. It has all been leading, gradually and imperceptibly, to a world in which we no longer have to speak computer language, because computers will speak human language—not perfectly, but well enough to get by.”

{{cta-component-2}}

Natural language search

NLP plays a crucial role in enhancing the search experience by bridging the gap between the way humans communicate and how search engines understand queries. 

Here's how NLP is radically improving the world of search today:

  1. Query understanding: NLP techniques analyze search queries to understand the intent behind them. This involves parsing the query to identify important words, phrases, and the relationships between them.
  1. Contextual understanding: NLP helps search engines interpret the context of a search query. This includes understanding the context of time, location, and user preferences. For example, if a user searches for "weather," NLP can determine their current location and provide weather information specific to that area.

  2. Natural language queries: NLP enables users to use natural language when entering search queries, as opposed to rigidly structured keywords. Users can ask questions or provide more context for their queries, similar to how they would ask a question to another person. NLP algorithms analyze the query, extract the key elements, and provide relevant results based on the understood intent.
  1. Language understanding: NLP techniques like stemming and lemmatization help search engines handle variations of words. They reduce words to their base or common forms, allowing for better matching of queries with web pages. This ensures that users find relevant results even if there are slight differences in the wording.
  1. Entity recognition: NLP algorithms can recognize named entities like people, places, organizations, and dates within search queries. This helps search engines understand the real-world context and provide more accurate results. For example, if a user searches for "restaurants near Central Park," NLP can identify "Central Park" as a location and retrieve relevant information about nearby restaurants.

Overall, NLP empowers search engines to understand and interpret human language more effectively.

The first application of NLP technology for search emerged as early as 1993 at the MIT Artificial Intelligence Lab. The START Natural language Question Answering Machine, may not have been a web search engine. But it did (and still does) let users query an online encyclopedia in the fields of Geography, Science, History and Culture using conversational language. 

I tried it out for fun and here are some snapshots of the results.

The START Natural language Question Answering Machine
The START Natural language Question Answering Machine

Any AI-powered search is only as good as the source data it contains. These limitations also apply to SMART. 

The first web search engine to allow users to search the internet using natural language was Ask Jeeves (launched in 1996). The only catch is that it was perhaps too good for its time and couldn’t sustain itself against Google in the race to web dominance.

AskJeeves natural language search engine

It was rebranded to Ask.com in 2005 and still continues to be around, though I was not particularly impressed with the quality of results.

ask.com search engine
ask.com search engine

The Google Revolution soon took over the internet. The company’s experiments with NLP and deep learning led to the birth of a number of search advancements that led to the current state of search as we know it.

Autocomplete

In 2004, Google launched autocomplete (or Google Suggest, as it was formerly called), a search feature that suggested full queries in a drop down menu even before we had finished typing them. 

Autocomplete, part of a Google Labs project, was incorporated into Google.com in 2008. 

Today’s autocomplete is powered by NLP and uses four factors to make predictions of the user’s intended search terms:

  • Trending queries
  • Language of the searcher
  • Searcher’s location
  • Freshness
Google autocomplete

This benefits both Google as well as its users. Users get a quick, adaptive, hassle-free, and customized search experience capable of dynamically interpreting language. And the search engine gains insights into user intent, learning and improving further. 

Vertical (or universal) search and blending of results

Not only the way users search, the way search results are displayed also underwent a massive shift in the 2000s with the death of the ‘ten blue links’ (image below)

Google—the time of ten blue links
Source: Bluecompass

Google’s vertical search made its debut in 2007. No longer were search results limited to crawled web page URLs like in the image above. 

The new ‘universal search’ system blended listings from books, videos, images, news, e-commerce, local listings/maps and others to offer users the most relevant results without needing to dig deep into the contents of each page. 

Blended search results

Vertical search remains one of my favorite search updates ever because it makes research (and life!) so much easier and faster for me. A step-by-step tutorial to fix a leaking pipe? I’d rather click on a video result than a 2000-word blog. 

Blended search results videos

What about quoting experts in a blog post? Books and news articles would be my go-to. You get the drift!

Blended search results books

While this was game changing, I’m so thankful that the evolution of search didn’t stop here. 

If anything, search started to get smarter, leaving clicks and keystrokes behind to rely on other inputs.

Voice search and conversational search assistants

And what could be easier than speaking a search query into your handheld device?

Enter voice search.

In 2008, Google launched voice search for the iPhone, three years before Siri brought the first end-to-end conversational interface/assistant to iOS. Desktop voice search by Google arrived in 2011. 

The voice revolution pushed search toward a natural dialogue with users. Queries began to be posed conversationally using full sentences and context. Responses were delivered as complete, precise answers rather than just a list of links containing keywords. 

Microsoft’s Cortana and Amazon’s Alexa further drove adoption of natural language and voice-powered search which now rule all our devices—from laptops to smartphones, from tablets to TVs. 

In 2022, close to half of the US population used voice search features daily, and more than one-third (34%) used them at least once a week (UpCity, 2022). 

Predictive search

With the introduction of Google Now in 2012, voice-enabled search and information systems also got a predictive makeover. 

Predictive search aims to provide information before users seek it. Instead of reacting to queries, predictive systems proactively notify users of relevant content based on individual context.

Google Now worked like a precursor of the Google Assistant and pushed reminders, recommendations and alerts based on location, behavior, and schedule of the users. It also offered persistent "cards" that automatically populated with pre-computed information based on a user’s searches.

Ontologies and knowledge graphs

In the same year, Google also introduced the Knowledge Graph. 

A knowledge graph is a large-scale, structured data repository that represents real-world entities and their relationships. It provides a way to model and represent knowledge using nodes (entities) and edges (relationships) that connect them. It is a way of organizing and connecting information so that it can be easily understood and used by machines. Ontologies define each of these entities and their properties.

Together, knowledge graphs and ontologies bring to search the capacity to understand entities, concepts and their associations.

For example, if you were to search for Barack Obama (or another famous personality), the knowledge graph would fetch you their personal background, titles and designations, history, notable achievements, images, latest news, books authored by them etc. directly into the search results.

Knowledge graph search results in Google

Google leverages its Knowledge Graph—a database of billions of facts about people, places, and things—to give you a quick, direct factual answer in the form of a Knowledge Panel on the right side of the SERP. I tried asking about the length of the Nile and the answer appeared directly without me needing to click on one of the 269 billion results!

Knowledge graph knowledge panel

As far as ontologies go, there’s no better example of ontologies in search than schema.org, a collaborative initiative by major search engines like Google, Bing, Yahoo, and Yandex. Schema.org provides a shared vocabulary and a collection of ontologies that webmasters can use to markup their website content, making it more structured and easy to understand for search engines.

By using schema.org's ontologies, webmasters can annotate their web pages with specific semantic tags, providing additional context and meaning to search engines. For example, a web page about a recipe can use the Recipe schema to mark up details such as the ingredients, cooking time, nutritional information, and reviews. This structured markup helps search engines understand the content of the page more accurately and present it in a richer format in search results.

When a search engine encounters a web page with schema.org markup, it can extract and interpret the structured data, enabling enhanced search features like rich snippets, knowledge panels, and more detailed search results.

While knowledge graphs coupled with ontologies enabled basic semantic capabilities in search, they had limitations in representing meanings. While knowledge graphs excel at representing factual knowledge and contextual interconnections between entities, they may struggle to capture the full depth of meaning, subjective interpretations, and abstract concepts.

Vector Search

So, how do we solve this problem?

By turning to vector search. 

Vector search encodes unstructured data (text, images, audio etc.) into dense numeric vectors capturing semantic relationships. Each vector is a list of numbers that represents content. The combination of these numbers defines how similar or how different two pieces of content are. 

For example, "king" and "queen" have similar vector representations, despite being completely different words, because of their semantic relationship.

In vector search, queries and documents map to vectors. Relevance is scored by vector proximity—results closer in the common vector space are more contextually related. Even synonymous terms can be matched via their similar vector embeddings.

Vector math also enables recommendations—products, media or contacts can be suggested based on vector similarity. This is why popular use-cases for vector search include applications for ecommerce websites (like Amazon), travel websites (like Booking.com), and media websites (like Spotify). 

{{cta-component-3}}

AI Ranking: Neural networks and deep learning (2010s)

Building on vector search, machine learning techniques including deep learning and neural networks, began powering AI ranking models starting in the early 2010s. They analyzed massive behavioral datasets to optimize search results based on query semantics, user history, location, and hundreds of other signals.

In 2018, Google introduced the revolutionary BERT machine learning system to search. BERT (Bidirectional Encoder Representations from Transformers) leveraged neural networks for deep language understanding, analyzing words in relation to surrounding context, grasping meaning and nuance.

Where earlier AI matched keywords, in BERT-powered ranking, the AI "reads" and comprehends the full search query. It then compares its vector representation against page vectors to identify matches. The AI goes beyond keywords to discern semantics, intent and contextual relevance. This unlocks unprecedented accuracy in natural language search.

Ranking algorithms continue to improve further as ever more data trains the underlying AI models.

Visual search

Perhaps one of the most astonishing milestones in the journey of search has been the evolution of visual search. 

Visual search allows searching by images instead of keywords. In over a decade of rapid evolution, it has come a long way from the early 2000s where early photo identification apps like CamFind first began using computer vision algorithms to categorize image contents into broad categories like "dog", "car", etc. 

These limited capabilities soon paved the way for one of the first truly visual search engines—like.com. It empowered users to input not just text but also image queries. Akin to the principles of vector search, it compared a “visual signature” for the query image to possible results. The visual signature was a mathematical representation of the image consisting of 10,000 variables such as color, texture, shape etc. 

Around 2011-12, Pinterest and Google Goggles pioneered more advanced visual search for online images. Google Lens launched in 2017, incorporating real-time visual search through phone cameras. Lens uses deep neural networks to understand objects and context in the physical world.

Today, visual search is powered by state-of-the-art convolutional neural networks trained on billions of images. Deep learning can recognize a multitude of objects, textures, text and contexts within images. This enables use cases like searching for furniture by snapping a picture or finding similar fashion items based on a photo. Amazon, eBay, Snapchat and numerous startups now provide visual search, spanning e-commerce, social media and travel.

The potential is vast, as visual search aims to mimic the sophisticated object, pattern and meaning recognition abilities of the human visual cortex.

Hybrid search

While vector search is pretty much unmatched in accuracy and relevance because of its broader scope, it can often be slower than traditional keyword search. 

Modern search engines overcome the tradeoff by using hybrid search—an approach that combines various techniques for optimal search performance.

  • Keyword search provides matching results
  • AI reranks results based on user intent determined through NLP
  • Knowledge graphs boost contextual understanding
  • Autocomplete enables interactive experiences
  • Vector math expands matchability
  • Visual search makes room for image input in addition to textual keywords

By synthesizing statistical search, linguistic analysis, machine learning and other innovations, hybrid search delivers unprecedented accuracy, depth, and speed.

Generative search experiences: Moving from search engines to reasoning engines (2020s)

Taking hybrid search one step further, in the early 2020s, conversational search systems began melding NLP, knowledge graphs, contextual understanding, and generative AI.  

They analyzed language, gathered intent, and synthesized responses that would reduce the search timeframe as well as challenge the monopoly of search engines. 

In 2014, Microsoft made updates to Bing smart search which improved its parsing of natural language queries. A few months later, it developed on this even further by introducing the ability to “continue the conversation” after asking a question in search. In other words, you could ask a follow-up question which depends on the previous one for context, and Bing will understand what you mean.

In 2022, OpenAI introduced ChatGPT, an AI system able to have conversations and generate human-like text. This conversational ability was quickly incorporated into search.

Generative, conversational AI search

Microsoft integrated ChatGPT into Bing with chat-based queries and summarized answers. 

BingChat conversational AI search

Google followed with Bard, its own AI chatbot, and launch of generative search experiences (SGE) to augment text results. 

Google Bard AI Chatbot

Google says: Let’s take a question like “what's better for a family with kids under 3 and a dog, bryce canyon or arches.” Normally, you might break this one question down into smaller ones, sort through the vast information available, and start to piece things together yourself. With generative AI, Search can do some of that heavy lifting for you.

You’ll see an AI-powered snapshot of key information to consider, with links to dig deeper.

Google's Search Generative Experiences (SGE)
Source: blog.google

Context will be carried over from question to question, to help you more naturally continue your exploration. You’ll also find helpful jumping-off points to web content and a range of perspectives that you can dig into.”

Modern AI-assisted search platforms incorporate multi-turn dialogue understanding to power personal assistants that can comprehend long, complex questions. Open-domain question answering systems now leverage massive language models trained on enormous text corpora to understand semantics behind any query.

The future promises conversational engines that discuss, reason and advise, not just look up facts like search engines. Interacting with search may become like talking to a helpful human expert.

What drove such rapid evolution of web search?

Behind search's exponential evolution are two key driving forces—technological progress and shifts in human behavior.

On the technology front, cheap computing unleashed big data and artificial intelligence. NLP unlocked nuance in language analysis. Faster processors allowed real-time processing of billions of signals.

On the human front, widespread smartphone adoption made search mobile, location-aware and personalized. Users expected instant, accurate responses tailored to context.

As humans interacted with technology in new ways, search had to adapt both understanding of intent and speed of response. Computers stepped up as our Information Age cognitive partners.

What's next for web search?

From the first crawler bots to today's sophisticated AI engines, search has been transformed by technology innovation. 

Three overarching themes emerge from tracing the evolution of search technology over the past 30+ years:

  1. Relevance—Early keyword matching gave way to link analysis, user behavior data and knowledge graphs to better understand what information is most useful to the person searching.

  2. Interface—Search transitioned from simple text boxes to voice assistants and conversational interfaces as computers gained human-like understanding of language.

  3. Scale—Each new engine indexed and learned from vastly larger datasets due to Moore's law, enabling exponential leaps in comprehension from statistical learning techniques.

Cutting-edge research aims to close remaining gaps between human and machine understanding. 

Projects like Anthropic's Constitutional AI, OpenAI's GPT-4, and DeepMind's deep reinforcement learning could enable conversational search assistants far more human-like than any system today. 

Some advances on the horizon include:

  1. Multimodal search that combines text, images and speech, similar to human information processing. This is possible thanks to technology that consistently represents meaning across text, images, video, audio and other mediums. For example, "car" would have similar vector representations regardless of modality. This allows seamlessly matching user queries with relevant content across formats.

    Pioneering research includes Stanford's GloVe word embeddings in 2014 and Google's multimodal BERT, which can interpret text, images and conversations together. Other models including OpenAI’s CLIP (Contrastive Language–Image Pre-training) “build on a large body of work on zero-shot transfer, natural language supervision, and multimodal learning” to enable joint learning of vision and language. 

    Further, the integration of augmented reality (AR) and virtual reality (VR) technologies may enable immersive multi-modal search experiences, where users can interact with the environment and retrieve information using various modalities.
  1. Massive neural networks surpassing the 200 billion parameters of GPT-3, potentially reaching human-level language proficiency.

  2. Neural-symbolic models that integrate symbolic logic with neural learning to improve reasoning. This enables AI systems to make more informed decisions and provide meaningful search results even when faced with incomplete or ambiguous queries.

  3. Predictive search proactively providing relevant information without needing specific queries

  4. Search evolving to be virtually indistinguishable from creation of content of various mediums. There is a growing trend of search systems generating brand new content like images and text instead of just retrieving existing information. 

Google Brain's Imagen can create original images from text prompts. Anthropic’s Claude2 generates powerful, helpful responses without any web search, showcasing the potential for search systems powered by generative AI rather than indexes. DALL-E 2 and GPT-4 demonstrate increasing fluency of language models in creating coherent, customized content. 

As generative AI advances, future search systems may synthesize personalized explanations, illustrations and multimedia that perfectly match user queries, taking the current generative search experience to a whole new level.

Advances in semantic parsing, knowledge base construction, and neural symbolic AI may even power virtual scholars that surpass human experts in narrow domains. The future of search promises conversational access to all information as naturally as speaking with another person.

The quest for the perfect search continues—where systems seamlessly understand people, not just keywords. The journey so far shows that search will keep improving as long as we keep searching.

Ready to integrate AI

into  your website, app or software?
Book a demo