Twitter has rolled out a massive upgrade to its search engine which now allows users to search any tweet ever tweeted. From the first tweet posted by Twitter founder Jack Dorsey on March 21, 2006 to the last one tweeted just a few micro-seconds ago.
Twitter has always let users search tweets but has limited its search algorithm to include only tweets that were posted up to seven days ago. Once a tweet turns eight days old, however, Twitter moves the tweet from its real-time index stored in RAM, which contains around 2 billion tweets, to its network of machines because it was too expensive to store all the tweets in history.
In a blog post, Twitter search infrastructure engineer Yi Zhuang provides a detailed description of how he and a small team of engineers were able to get around the problem by building an index of older tweets stored in machines with solid-state drives, which are significantly less expensive than RAM but have acceptable levels of latency after a lot of tweaking, and partitioning the data into smaller chunks.
"This setup had two main benefits," explains Zhuang. "It allowed us to incrementally update the index with new data without having to fully rebuild too frequently. And because processing for each day is set up to be fully independent, the pipeline could be massively parallelizable on (data-crunching tool) Hadoop. This allowed us to efficiently rebuild the full index periodically."
What this means is that Twitter's engineers used custom-made software in one group of machines to build an index for older tweets and the same software in another group of machines to take care of real-time tweets. This allows Twitter to merge both the old tweets with the millions of new tweets that trickle in every second. Right now, Twitter says its system indexes roughly around half a trillion tweets and returns search results in less than 100 micro-seconds on average.
On the user end, searching for that old link sent by a friend two years ago remains the same. One can simply search for a certain keyword and click the All tab to bring up all the tweets that used that keyword since 2006. For now, the new search engine can handle only basic searches, but Twitter plans to expand its algorithm to provide accurate results for long-tail keywords in the future. Gilad Mishne, part of the engineering team that works on Twitter search, also says that the new search infrastructure lays the foundation for future Twitter tools.
"It lets us power a lot more things down the road - not just search," Mishne says.
Although Twitter says that around 284 million users tweet more than its previous memory systems could handle, the social network faces problems with slow user growth and decreasing engagement even as its rival Facebook continues to build its advertising network. The new Twitter search could help the micro-blogging service improve its user engagement levels and attract more advertisers to use its platform.