<ASPseek PHP Manualaspseek_cached_page_data>
 Last updated: Fri, 09 May 2003

I. ASPseek Functions

Introduction

These functions allow you to access the ASPseek free search engine. ASPseek is developed by SWsoft and licensed as free software under the GNU license.

ASPseek consists of an indexing robot, a search daemon and a CGI search frontend. It can index as many as a few million URLs and search for words and phrases, use wildcards and do a boolean search. Search results can be limited to time period given, site or Web space (set of sites) and sorted by relevance (PageRank is used) or date.

ASPseek is optimized for multiple sites (threaded index, async DNS lookups, grouping results by site, Web spaces), but can be used for searching one site as well. ASPseek can work with multiple languages/encodings at once (including multibyte encodings such as Chinese) due to Unicode storage mode. Other features include stopwords and ispell support, a charset and language guesser, HTML templates for search results, excerpts and query words highlighting.

More information about ASPseek can be found at http://www.aspseek.org/.

Documentation for ASPseek can be found at http://www.aspseek.org/man/.

Note

This extension is not available on Windows platforms.

Requirements

Download ASPseek from http://www.aspseek.org/ and install it on your system. You need at least version 1.3.0 of ASPseek installed to use these functions. In order to have these functions available, you must also compile PHP with ASPseek support.

Installation

By using the --with-aspseek[=DIR] configuration option you enable PHP to access ASPseek search daemons.

Note

If you are running a version of ASPseek prior to version 1.3.0 then it is assumed that you will have patched your installation of ASPseek according to the instuctions at http://aspseek.unixatwork.com/.

Runtime Configuration

The behaviour of these functions is affected by settings in php.ini.

Table 1. ASPseek Configuration Options

NameDefaultChangeable
aspseek.allow_persistent"Off"PHP_INI_SYSTEM
aspseek.max_persistent"-1"PHP_INI_SYSTEM
aspseek.max_links"-1"PHP_INI_SYSTEM
aspseek.default_portNULLPHP_INI_ALL
aspseek.default_hostNULLPHP_INI_ALL

Here is a short explanation of the configuration directives.

aspseek.allow_persistent boolean

Whether to allow persistent connections to searchd.

aspseek.max_persistent integer

The maximum number of persistent searchd connections per process.

aspseek.max_links integer

The maximum number of searchd connections per process, including persistent connections.

aspseek.default_port string

The default TCP port number to use when connecting to the searchd server if no other port is specified. If no default is specified, the port will be obtained from the compile-time SEARCHD_PORT constant.

aspseek.default_host string

The default server host to use when connecting to the searchd server if no other host is specified.

Resource Types

There are five resource types used in the ASPseek module. The first one is the link identifier for a searchd connection, the second a resource which holds the result of a search query, the third a resource which holds the urlset parameters of a result, the fourth a resource which holds the url data of an urlset and the fifth and final resource holds parameters of a cached page.

Predefined Constants

The constants below are defined by this extension, and will only be available when the extension has either been compiled into PHP or dynamically loaded at runtime.

ASPseek client error codes

Table 2. ASPseek client error codes

constantdescription
ASEEK_UNKNOWN_ERRORUnknown error
ASEEK_UNKNOWN_HOSTUnknown server host
ASEEK_CONNECT_FAILEDCould not connect to search daemon
ASEEK_NO_OPTIONOption not available
ASEEK_OPTION_FAULTOption argument is invalid
ASEEK_COMMANDS_OUT_OF_SYNCCommands out of sync; You can't run this command now
ASEEK_OUT_OF_MEMORYClient ran out of memory
ASEEK_CACHE_URL_NOT_FOUNDCached URL does not exist
ASEEK_CACHE_BAD_CONTENT_TYPECached URL has an unsupported content type
ASEEK_PROTOCOL_ERRORProtocol error
ASEEK_STOPWORDS_ERROROnly stopword(s) are used in query. You must specify at least one non-stop word
ASEEK_EXTRA_SYMBOLS_ERRORExtra symbols at the end
ASEEK_EMPTY_QUERYEmpty query
ASEEK_PATTERN_TOO_SHORTToo few letters or digits are used at the beginning of pattern
ASEEK_UNMATCHED_QUOTEUnmatched string quote
ASEEK_UNMATCHED_PARENTHESISUnmatched parenthesis
ASEEK_SERVER_LOSTLost connection to server during query
ASEEK_INVALID_HOSTSPECInvalid server host specification

ASPseek options

Table 3. ASPseek options

constantdescription
ASEEKOPT_PAGESPERSCREENDefines the maximum number of links to other search result pages to be shown if there are many results found (s.htm PagesPerScreen num)
ASEEKOPT_USECLONESIf this line is present, clones detecting and showing is disabled (s.htm Cones no)
ASEEKOPT_MAXEXCERPTSDefines the maximum number of excerpts that are shown in results (s.htm MaxExcerpts num)
ASEEKOPT_MAXEXCERPTLENDefines the maximum length (in characters) of each excerpt string (s.htm MaxExcerptLen num)
ASEEKOPT_SCRIPTNAMEValue is set by web server and is used to determine the self name of the script and the name of template file to load (see description of tmpl parameter) (s.cgi env SCRIPT_NAME; not relevant to PHP module, should probably be removed)
ASEEKOPT_EXOPENContents of this sections are displayed just before each excerpt found (s.htm exopen string)
ASEEKOPT_EXCLOSEContents of this sections are displayed just after each excerpt found (s.htm exclose string)
ASEEKOPT_HIEXOPENUsed in displaying excerpts, works the same was as hiopen and hiclose (s.htm hiexopen string)
ASEEKOPT_HIEXCLOSEUsed in displaying excerpts, works the same was as hiopen and hiclose (s.htm hiexclose string)
ASEEKOPT_HICOLORSEach line of this section should contain value of color for each search term. Value of color is taken from line with number equal to N mod C, where N is the search term sequential number and C is the total number of lines in this section (s.htm hicolors string)
ASEEKOPT_WANTRES2Contents of moreurls template section, if grouping by sites is enabled and more than n results are found from the site, where n is number following $M, usually 2. If n is not specified, it is set to 1 (s.htm $Mn)
ASEEKOPT_SEARCHMODEWord forms. Can be comma-separated list of languages or just on or off. In case it is not set to off, s.cgi will search for all forms of specified words, and results with exact word forms will be displayed first. Example: if word 'create' is specified, then documents containing either 'create' or 'creates' or 'created' will be found (s.cgi m=on | off | lang[,lang,..])
ASEEKOPT_RESULTSPERPAGENumber of results per page. Overrides value set by "preferences" cookie (s.cgi ps=number)
ASEEKOPT_PAGENUMBERResult page number. Default value is 0 (s.cgi np=number)
ASEEKOPT_OUTPUTFORMATYou can have several section with the same name in template. Normally, the first encountered section is used. This behavior can be overridden by supplying o=n parameter to s.cgi(1). If value of n is more than zero, then "n+1"th sections are used. If number of occurrences of particular section is less then value of "n+1", then last section with the needed name is used (s.htm $o number)
ASEEKOPT_SITEIDSite ID. Value of this parameter is used to restrict search by specified site. Generated in search results, as result of $SH meta symbol, which is used in moreurls template (s.cgi st=number)
ASEEKOPT_GROUPBYSITEGrouping results by site. If value of this parameter is off, then results are not grouped by site. Any other value is ignored (s.cgi gr=off)
ASEEKOPT_CHARSETSource charset. Tells s.cgi which charset is used in input query. This is required parameter if non-ascii characters are used in query. Results of query will also be presented in that charset (s.cgi cs=charset)

ASPseek search modes

Table 4. ASPseek search modes

constantdescription
ASEEK_SEARCHMODE_NONEUse default mode
ASEEK_SEARCHMODE_EXACTSearch for exact query words
ASEEK_SEARCHMODE_FORMSSearch for all forms of specified query words

Examples

This simple example shows how to connect, execute a search query, print results and disconnect from an ASPseek search daemon.

Example 1. ASPseek extension overview example

<?php
    /* Connecting */
    $link = aspseek_connect("searchd_host:searchd_port")
        or die("Could not connect");
    print "Connected successfully";

    /* Performing search query */
    $query = "ASPseek PHP Home";
    $result = aspseek_query($query) or die("Query failed");

    /* Printing results in HTML */
    print "<table>\n";
    while (($urlset = aspseek_fetch_urlset($result)) &&
            ($url = aspseek_fetch_url($urlset))) {
        $url_data = aspseek_url_data($url);
        print "\t<tr>\n";
        foreach ($url_data as $col_value) {
          print "\t\t<td>$col_value</td>\n";
        }
        print "\t</tr>\n";
    }
    print "</table>\n";

    /* Closing connection */
    aspseek_close($link);
?>
Table of Contents
aspseek_cached_page_data -- Get cached page data as an associative array
aspseek_close -- Close searchd connection
aspseek_connect -- Open a connection to searchd
aspseek_errno --  Returns the numerical value of the error message from previous ASPseek operation
aspseek_error --  Returns the text of the error message from previous ASPseek operation
aspseek_fetch_cached_page -- Fetch cached page data
aspseek_fetch_cloneurls -- Fetch clone urls as an enumerated array
aspseek_fetch_url -- Fetch an url
aspseek_fetch_urlset -- Fetch an urlset
aspseek_getoption -- Get current option value for an ASPseek link
aspseek_passthru_cached_page -- Get cached page content and display raw output
aspseek_pconnect --  Open a persistent connection to searchd
aspseek_query -- Send an ASPseek query
aspseek_result_data -- Get search query result data as an associative array
aspseek_setoption -- Set an option for an ASPseek link
aspseek_unbuffered_query -- Send an ASPseek query without fetching and buffering the result
aspseek_url_data -- Fetch url data as an associative array
aspseek_version -- Gets the current ASPseek client library version

<ASPseek PHP Manualaspseek_cached_page_data>
 Last updated: Fri, 09 May 2003