|
|
I. ASPseek FunctionsIntroduction
These functions allow you to access the ASPseek free search engine.
ASPseek is developed by SWsoft and
licensed as free software under the GNU license.
ASPseek consists of an indexing robot, a search daemon and a CGI
search frontend. It can index as many as a few million URLs and
search for words and phrases, use wildcards and do a boolean
search. Search results can be limited to time period given, site or
Web space (set of sites) and sorted by relevance (PageRank is used)
or date.
ASPseek is optimized for multiple sites (threaded index, async DNS
lookups, grouping results by site, Web spaces), but can be used for
searching one site as well. ASPseek can work with multiple
languages/encodings at once (including multibyte encodings such as
Chinese) due to Unicode storage mode. Other features include stopwords
and ispell support, a charset and language guesser, HTML templates for
search results, excerpts and query words highlighting.
More information about ASPseek can be found at http://www.aspseek.org/.
Documentation for ASPseek can be found at http://www.aspseek.org/man/.
| This extension is not
available on Windows platforms. |
Requirements
Download ASPseek from http://www.aspseek.org/
and install it on your system. You need at least version 1.3.0 of ASPseek
installed to use these functions. In order to have these functions
available, you must also compile PHP with ASPseek support.
Installation
By using the --with-aspseek[=DIR]
configuration option you enable PHP to access
ASPseek search daemons.
|
If you are running a version of ASPseek prior to version 1.3.0 then it is
assumed that you will have patched your installation of ASPseek according
to the instuctions at http://aspseek.unixatwork.com/.
|
Runtime Configuration
The behaviour of these functions is affected by settings in php.ini.
Table 1. ASPseek Configuration Options Name | Default | Changeable |
---|
aspseek.allow_persistent | "Off" | PHP_INI_SYSTEM | aspseek.max_persistent | "-1" | PHP_INI_SYSTEM | aspseek.max_links | "-1" | PHP_INI_SYSTEM | aspseek.default_port | NULL | PHP_INI_ALL | aspseek.default_host | NULL | PHP_INI_ALL |
Here is a short explanation of the configuration directives.
- aspseek.allow_persistent
boolean
Whether to allow persistent connections to searchd.
- aspseek.max_persistent
integer
The maximum number of persistent searchd connections per
process.
- aspseek.max_links
integer
The maximum number of searchd connections per process, including
persistent connections.
- aspseek.default_port
string
The default TCP port number to use when connecting to
the searchd server if no other port is specified. If
no default is specified, the port will be obtained
from the compile-time SEARCHD_PORT
constant.
- aspseek.default_host
string
The default server host to use when connecting to the searchd
server if no other host is specified.
Resource Types
There are five resource types used in the ASPseek module. The first one
is the link identifier for a searchd connection, the second a resource
which holds the result of a search query, the third a resource which
holds the urlset parameters of a result, the fourth a resource which
holds the url data of an urlset and the fifth and final resource holds
parameters of a cached page.
Predefined Constants
The constants below are defined by this extension, and
will only be available when the extension has either
been compiled into PHP or dynamically loaded at runtime.
ASPseek client error codes
Table 2. ASPseek client error codes constant | description |
---|
ASEEK_UNKNOWN_ERROR | Unknown error | ASEEK_UNKNOWN_HOST | Unknown server host | ASEEK_CONNECT_FAILED | Could not connect to search daemon | ASEEK_NO_OPTION | Option not available | ASEEK_OPTION_FAULT | Option argument is invalid | ASEEK_COMMANDS_OUT_OF_SYNC | Commands out of sync; You can't run this command now | ASEEK_OUT_OF_MEMORY | Client ran out of memory | ASEEK_CACHE_URL_NOT_FOUND | Cached URL does not exist | ASEEK_CACHE_BAD_CONTENT_TYPE | Cached URL has an unsupported content type | ASEEK_PROTOCOL_ERROR | Protocol error | ASEEK_STOPWORDS_ERROR | Only stopword(s) are used in query. You must specify at least one non-stop word | ASEEK_EXTRA_SYMBOLS_ERROR | Extra symbols at the end | ASEEK_EMPTY_QUERY | Empty query | ASEEK_PATTERN_TOO_SHORT | Too few letters or digits are used at the beginning of pattern | ASEEK_UNMATCHED_QUOTE | Unmatched string quote | ASEEK_UNMATCHED_PARENTHESIS | Unmatched parenthesis | ASEEK_SERVER_LOST | Lost connection to server during query | ASEEK_INVALID_HOSTSPEC | Invalid server host specification |
ASPseek options
Table 3. ASPseek options constant | description |
---|
ASEEKOPT_PAGESPERSCREEN | Defines the maximum number of links to other search result
pages to be shown if there are many results found (s.htm
PagesPerScreen num) | ASEEKOPT_USECLONES | If this line is present, clones detecting and showing is
disabled (s.htm Cones no) | ASEEKOPT_MAXEXCERPTS | Defines the maximum number of excerpts that are shown in
results (s.htm MaxExcerpts num) | ASEEKOPT_MAXEXCERPTLEN | Defines the maximum length (in characters) of each excerpt
string (s.htm MaxExcerptLen num) | ASEEKOPT_SCRIPTNAME | Value is set by web server and is used to determine the
self name of the script and the name of template file to load
(see description of tmpl parameter) (s.cgi env SCRIPT_NAME;
not relevant to PHP module, should probably be removed) | ASEEKOPT_EXOPEN | Contents of this sections are displayed just before each
excerpt found (s.htm exopen string) | ASEEKOPT_EXCLOSE | Contents of this sections are displayed just after each
excerpt found (s.htm exclose string) | ASEEKOPT_HIEXOPEN | Used in displaying excerpts, works the same was as hiopen
and hiclose (s.htm hiexopen string) | ASEEKOPT_HIEXCLOSE | Used in displaying excerpts, works the same was as hiopen
and hiclose (s.htm hiexclose string) | ASEEKOPT_HICOLORS | Each line of this section should contain value of color for
each search term. Value of color is taken from line with number
equal to N mod C, where N is the search term sequential number
and C is the total number of lines in this section
(s.htm hicolors string) | ASEEKOPT_WANTRES2 | Contents of moreurls template section, if grouping by sites
is enabled and more than n results are found from the site, where
n is number following $M, usually 2. If n is not specified, it is
set to 1 (s.htm $Mn) | ASEEKOPT_SEARCHMODE | Word forms. Can be comma-separated list of languages or just
on or off. In case it is not set to off, s.cgi will search for all
forms of specified words, and results with exact word forms will
be displayed first. Example: if word 'create' is specified, then
documents containing either 'create' or 'creates' or 'created'
will be found (s.cgi m=on | off | lang[,lang,..]) | ASEEKOPT_RESULTSPERPAGE | Number of results per page. Overrides value set by "preferences"
cookie (s.cgi ps=number) | ASEEKOPT_PAGENUMBER | Result page number. Default value is 0 (s.cgi np=number) | ASEEKOPT_OUTPUTFORMAT | You can have several section with the same name in template.
Normally, the first encountered section is used. This behavior can
be overridden by supplying o=n parameter to s.cgi(1). If value of
n is more than zero, then "n+1"th sections are used. If number of
occurrences of particular section is less then value of "n+1",
then last section with the needed name is used (s.htm $o number) | ASEEKOPT_SITEID | Site ID. Value of this parameter is used to restrict search
by specified site. Generated in search results, as result of $SH
meta symbol, which is used in moreurls template (s.cgi st=number) | ASEEKOPT_GROUPBYSITE | Grouping results by site. If value of this parameter is off,
then results are not grouped by site. Any other value is ignored
(s.cgi gr=off) | ASEEKOPT_CHARSET | Source charset. Tells s.cgi which charset is used in input
query. This is required parameter if non-ascii characters are
used in query. Results of query will also be presented in that
charset (s.cgi cs=charset) |
ASPseek search modes
Table 4. ASPseek search modes constant | description |
---|
ASEEK_SEARCHMODE_NONE | Use default mode | ASEEK_SEARCHMODE_EXACT | Search for exact query words | ASEEK_SEARCHMODE_FORMS | Search for all forms of specified query words |
Examples
This simple example shows how to connect, execute a search query, print
results and disconnect from an ASPseek search daemon.
Example 1. ASPseek extension overview example <?php
/* Connecting */
$link = aspseek_connect("searchd_host:searchd_port")
or die("Could not connect");
print "Connected successfully";
/* Performing search query */
$query = "ASPseek PHP Home";
$result = aspseek_query($query) or die("Query failed");
/* Printing results in HTML */
print "<table>\n";
while (($urlset = aspseek_fetch_urlset($result)) &&
($url = aspseek_fetch_url($urlset))) {
$url_data = aspseek_url_data($url);
print "\t<tr>\n";
foreach ($url_data as $col_value) {
print "\t\t<td>$col_value</td>\n";
}
print "\t</tr>\n";
}
print "</table>\n";
/* Closing connection */
aspseek_close($link);
?> |
|
| |