
Lucene powered by Akamai EdgeComputing is based on Jakarta Lucene, an open-source search application created by the Apache Software Foundation. It leverages EdgeComputing's on demand distributed computing model to run the software entirely on Akamai-managed infrastructure, obviating further hardware expenditures for the enterprise.
The Lucene Search Application
Jakarta Lucene is a high-performance, full-featured text search engine library written entirely in Java. The engine uses powerful, accurate, and efficient search algorithms. (http://jakarta.apache.org/lucene). Lucene includes the following functionality:
- Provides the ability to search in many languages
- Indexes any text-based file, such as HTML, or any file that can be converted to text
- Supports ranked searching so the best results are returned first
- Performs boolean and phrase queries
- Enables fielded searching (e.g. searches can be submitted that focus on title, author, contents, etc)
- Allows for date-range searching so users can access time-sensitive information
Creating an index file is a necessary step in implementing a search application with Lucene. An index is a special database that contains a compiled version of the Web site content. While the Lucene indexing API automates the creation of the index, the content that will be included in the index must be in text format. For every document type to be included in an index, the customer will need to utilize a parser or extractor. Lucene includes a sample HTML parser that receives a URL or the location of a file on a hard drive, parses the file, extracts the text from the HTML tags, and creates a Java string object that is passed to the Lucene indexing API.
To better understand how indexing works, consider the example of Akamai and its use of Lucene for site search on www.akamai.com. The first step in the indexing process is to identify the content that should be indexed. As shown in Figure 1 (below) for each piece of content that is being indexed, Lucene creates a "Document object", which is a collection of name-value pairs that are called "fields". For example, one field might be Title, so the name-value pair would be "Title" - "Akamai Home Page". Each field is then assigned a "field object", which determines if the text associated with the field should be indexed, stored, and/or tokenized.

Figure 1: Creating an Index with Lucene
The "field object" determines how Lucene should use the information in the field:
- Indexed: If a field is indexed, it means it is searchable. For the Lucene implementation on Akamai.com, the "Title" field was indexed so that a query can check for the user's search entry in the "Title" field. Lucene enables you to define which indexed fields are searched by default and which indexed fields are reserved for more constrained searches.
- Stored: If a field is stored in the index, it means the content can be displayed in the search results. The summary associated with a page is often stored in the index, as it is for the Akamai.com implementation. This enables a short description to be included with the title on the search results page.
- Tokenized: If a field is tokenized, it means it is run through an Analyzer that converts the content into a sequence of tokens. A token is the basic unit of indexing and represents a single word to be indexed. During the tokenization process, the Analyzer extracts the text that should be indexed while applying any transformation logic (such as removing stop words such as "a" or "the", performing stemming, converting all text to lowercase letters for case-insensitive searching, etc). This reduces the size of the index as the text associated with a field is reduced to its core elements. It only makes sense to tokenize a field if it is also going to be indexed. For Akamai.com, the "Title" is tokenized so that Lucene is not searching for words such as "a" or "the".
The final step in the indexing process is to use the Lucene IndexWriter object to create the index. The IndexWriter object uses an Analyzer to preprocess the input text. Please note that the Analyzer is used to create the index and to search the index. Because the search text has to be processed the same way that the indexed text was processed, it is critical to use the same Analyzer for both indexing and searching. Akamai.com uses the Lucene Standard Analyzer.
How the Managed Service Works
Since Lucene is written entirely in Java, it can actually be fully deployed and run on Akamai's EdgePlatform through its use of Akamai's EdgeComputing service. In this model, the customer deploys the search index and the rest of the search application (typically including a search page and a search results page) in a WAR file to be deployed and run from Akamai's distributed set of J2EE application servers around the world.
Handling User Requests
Deployed on Akamai's EdgePlatform, Lucene can be run wholly as an on demand service without any run-time requests to the origin. Hence, users can continue to perform searches using Lucene even if the origin goes down. Figure 2 (below) shows a diagram of end-user requests to the search running on Akamai.com. As with any search application deployed using Lucene powered by Akamai EdgeComputing, the application runs wholly on the EdgePlatform, without forwarding requests to the origin.

Figure 2: Handling User Requests
Note that in the case of Akamai.com, two different search results pages are used, depending on whether a user wishes to search the whole site (Search_Results.jsp) or limit the query to press releases only (PR_search_results.jsp).
Simple Deployment And Management
As with any other EdgeComputing Application, customers can enjoy in-depth insight and control over Lucene powered by Akamai EdgeComputing. Click-to-deploy provisioning ensures that customers can modify and re-launch the application when necessary. See Figure 3 (below) for a diagram of the deployment model through which a customer can launch a search application onto the EdgePlatform.

Figure 3: Deploying the Search Application Akamai
Modifications to Lucene 1.3
In order for Lucene to run EdgeComputing, Akamai needed to make the following minor code modifications to Lucene 1.3. These modifications may be unnecessary in future versions of EdgeComputing if certain EdgeComputing sandbox permissions are allowed. In the meantime, Akamai will provide customers with our modified version of Lucene.
- Manually turned on the disable locks boolean flag in the file org.apache.lucene.store.FSDirectory.java. Currently, the EdgeComputing sandbox does not allow setting system properties.
private static final boolean DISABLE_LOCKS = true; - Commented out lock file creation in the file org.apache.lucene.store.FSDirectory.java in the method public final Lock makeLock(String name). Currently, the EdgeComputing sandbox does not allow writing to the file system.
//final File lockFile = new File(System.getProperty
("java.io.tmpdir"),
// buf.toString());
final File lockFile = null;
The index lock feature is not required on the Edge because only one process (the edge application server) will need to access the index as opposed to in an origin environment when one process may be reading from the index, and another process may be writing to the index to update it. - When opening the index in the WAR file, the ServletContext object must be used to get the path to the index directory in the WAR file. Here is a code example:
IndexSearcher searcher = null;
//the searcher used to open/search the index
try {
ServletContext sc = getServletContext();
searcher = new indexSearcher( IndexReader.open(sc.getRealPath
("index")));
} catch (Exception e)
{ ... }