Engineering at OSlash

From code to


Unveiling the technological expertise that powers Generative AI at OSlash
LLM Search


OSlash Copilot automatically & recursively extracts as well as syncs data from your data source


Our rigorously trained & optimized few-shot models help improve embeddings & minimize latency


Generate robust vector indexing on any database up to 5000 pages with near-zero latency


Boost search accuracy & relevance with optimized response streaming & multi-dimensional reranking


Generate AI-summaries for search queries or execute multi-step workflows using natural language

intent detection

The IntelliDetect is a powerful, proprietary few-shot intent detection model that brings with it the benefits of data efficiency, quick adaptation, and reduced annotation costs.
It is based on Google’s state-of-the-art intent detection model and is designed and trained to achieve an accuracy of up to 99% for classifying user intent as either a search intent or ask intent.
IntelliDetect’s lightning-fast response time ensures near real-time intent classification resulting in reduced latencies, faster results, and a better customer experience overall.

Robust vector indexing
on any database

Index 5000 pages in <1 min

OSlash Copilot follows URLs & traverses documentation iteratively ensuring a comprehensive & up-to-date index of your content. 
It intelligently connects to the metadata associated with content, enhancing the its ability to understand the meaning & relevance of the data.

Higher performance, lower price 

OSlash Copilot uses the latest embedding model based on the GPT-3 tokenizer. This makes it compatible with a wide range of NLP tasks.

The token window of 8191 tokens also makes it suitable for processing large documents. It's highly performant, affordable, and scalable to boot.

Overcome token limits

To overcome token limitations, which can hinder complex queries, our model utilizes a technique called chunking. It breaks complex documentation into smaller chunks.
By efficiently processing and dividing lengthy queries into manageable chunks, the model effortlessly handles even the most extensive and intricate search requests on your data.

Fastest LLM-powered search experience

Query optimization

OSlash Copilot uses query caching to avoid repetitive computations for similar queries. It leverages search-specific algorithms, such as ranking algorithms, for relevance sorting & feedback for query expansion.

Preprocessing & filtering

OSlash Copilot eliminates unnecessary information & filter irrelevant data before the actual search process.This reduces the search space and speeds up search operations.

Partial search & incremental updates

OSlash Copilot implements partial search mechanisms to provide intermediate results while the search is ongoing.
For frequently updated data, we use incremental updates to keep search indexes up to date without re-indexing the entire dataset.

Parallel processing

We utilize multi-threading or distributed processing to perform search tasks in parallel.
This can significantly boost search speeds, especially for large-scale applications and datasets.

Compression & serialization

We compress data and use efficient serialization techniques
This reduces data transfer times between components, especially in distributed search systems.

Benchmarking & profiling

We regularly benchmark and profile our search system to identify performance bottlenecks.
The insights help us fine-tune and optimize the architecture.

Natural-language workflows 

Our proprietary TaskSynth classifier model has 3 components working in tandem to generate workflows from natural language commands.

Task Builder

The Task Builder is what classifies an incoming user query as a task and prepares the list of actions that need to be executed. This list is fed to the Task Validator.

Task Validator

The Task Validator checks the list of tasks generated for missing details or inconsistencies, if any. If the task has all necessary details for execution (executable task), it passes the instructions on to the Task Executor. Else, it asks the user for further details. 

Task Executer

The Task Executer implements the executable task and displays the corresponding response/launches the corresponding workflow in the frontend of the application.

LLM usage

Caching & compression

We cache the input text and prepopulate queries so that when a user inputs similar sentences, we can retrieve the cached response instead of making a new API call.
Compressing cached data further optimizes resource utilization and reduces associated costs for us.

Time-based expiration

We implement a time-based expiration policy for cached data .
This helps us serve reasonably fresh data and avoid outdated
or incorrect information.

Content-based hashing

We employ content-based hashing techniques to store unique LLM responses in the cache.
This way, we avoid redundant caching of responses for similar queries, saving storage space and memory.

Warming-up the cache

We proactively load frequently used data into the cache during system startup or when the cache is empty.
This helps minimize cache miss rates and ensures that popular data is readily available.

approach to AI

Whether you opt for on-site installation or cloud-based installation for OSlash, we follow the same set of guidelines and best practices to ensure that your data is secure at all times.
We understand that our users entrust us with their information, and we are committed to ensuring it remains safe. We do not store any personally identifiable information (PII) of our end-users. 

We only require organizations to provide a uniquely identifiable key for each user when making requests, to ensure analytics can be mapped to the correct users. We only collect this unique ID provided by the organization, with no access to any other user data.Read the OSlash trust guidelines in detail here.

Case Study: How internal shortlinks save Stripe over 20,000 hours every week
The origin story of internal shortlinks for higher productivity at Stripe