大多数网站托管(Web hosting)公司都支持客户对Web站点统计数据的访问,但是你往往会觉得服务器所产生的状态信息不够全面。例如,配置不正确的Web服务器不能识别某些文件类型,这些类型的文件就不会出现在状态信息之中。幸好,你可以用PHP来定制状态信息收集程序,这样你就可以获取你所需要的信息了。
公共日志文件格式(Common Logfile Format,CLF)的结构
CLF最初是NCSA为HTTPd(全球网服务器软件)而设计的。CERN HTTPd是一个由万维网联盟(World Wide Web Consortium,W3C)维护的公共域Web服务器。W3C网站列出了该日志文件规范。基于微软和UNIX的Web服务器都可以生成CLF格式的日志文件。CLF格式如下:
Host IdentAuthuserTime_Stamp "request" Status_codeFile_size
例如:
21.53.48.83 - - [22/Apr/2002:22:19:12 -0500] "GET /cnet.gif HTTP/1.0" 200 8237
下面是日志条目的细目分类:
Host是网站访问者的IP地址或者DNS名;在上面的例子中,它是21.53.48.83。
Ident是该访客的远端身份(RFC 931)。破折号表明“未指定”。
Authuser是用户ID(如果Web服务器已经验证了验证网站访问者的身份的话)。
Time_Stam是服务器以“日/月/年”这种格式返回的时间。
Request是网站访问者的HTTP请求,例如GET或者POST。
Status_Code是服务器所返回的状态代码,例如:200代表“正确——浏览器请求成功”。
File_Size是用户所请求文件的大小。在本例中,它为 8237字节。
服务器状态代码
你可以在HTTP标准中找到W3C所开发的服务器状态代码规范。这些由服务器所产生的状态代码表示了浏览器和服务器之间的数据传输成功与否。这些代码一般传递给浏览器(例如非常有名的404错误“页面没有找到“)或者添加到服务器日志中去。
收集数据
创建我们的自定义应用程序的第一步就是获取用户数据。每当用户选择网站的某个资源时,我们就希望创建一个对应的日志条目。幸好,服务器变量的存在使得我们能够查询用户浏览器并获取数据。
报头中的服务器变量携带了从浏览器传递到服务器的信息。REMOTE_ADDR就是一个服务器变量的例子。这个变量返回了用户的IP地址:
例子输出:27.234.125.222
下面的PHP代码将显示出当前用户的IP地址:
让我们看看我们的PHP应用程序的代码。首先,我们需要定义我们想跟踪的网站资源并指定文件大小:
//获取我们想记录的文件名称
$fileName="cnet-banner.gif";
$fileSize="92292";
你无需把这些值保存到静态变量中去。如果你要跟踪许多条目,那么你可以把它们保存到数组或者数据库中去。在这种情况下,你可能会希望通过一个外部链接来找到每个条目,如下所示:
其中“123”表示“cnet-banner.gif”所对应的记录。然后,我们通过服务器变量来查询用户浏览器。这样我们就得到在我们的日志文件中添加新条目所需的数据:
//得到网站浏览者的CLF信息
$host=$_SERVER['REMOTE_ADDR'];
$ident=$_SERVER['REMOTE_IDENT'];
$auth=$_SERVER['REMOTE_USER'];
$timeStamp=date("d/M/Y:H:i:s O");
$reqType=$_SERVER['REQUEST_METHOD'];
$servProtocol=$_SERVER['SERVER_PROTOCOL'];
$statusCode="200";
然后,我们检查服务器是否返回了空值(null)。根据CLF规范,空值应该用破折号来代替。这样,下一个代码块的任务就是寻找空值并用破折号来取代它:
//给空值添加破折号(根据规范)
if ($host==""){ $host="-"; }
if ($ident==""){ $ident="-"; }
if ($auth==""){ $auth="-"; }
if ($reqType==""){ $reqType="-"; }
if ($servProtocol==""){ $servProtocol="-"; }
一旦我们获取了必要的信息,这些值将被组织成一种符合CLF规范的格式:
//创建CLF格式的字符串
$clfString=$host." ".$ident." ".$auth." [".$timeStamp."] \"".$reqType." /".$fileName." ".$servProtocol."\" ".$statusCode." ".$fileSize."\r\n";
Create a custom log file
Now, the formatted data can be stored in our custom log file. First, we will create a file naming convention and write a method (function) that generates a new log file every day. In the example given in this article, each file starts with "weblog-", followed by the date in month/day/year, and the file extension is .log. The .log extension generally indicates server log files. (In fact, most log analyzers search .log files.)
// Name the log file with the current date
$logPath="./log/";
$logFile=$logPath ."weblog-".date("mdy").".log";
Now, we need to determine whether the current log file exists. If it exists, we add an entry to it; otherwise, the application creates a new log file. (The creation of new log files generally occurs when the date changes, because the file name changes at this time.)
//Check whether the log file already exists
if (file_exists($logFile)){
/ /If it exists, open the existing log file
$fileWrite = fopen($logFile,"a");}
else {
//Otherwise, create a new log file
$fileWrite = fopen($logFile,"w"); }
If you receive a "Permission Denied" error message when writing or appending files, please change the permissions of the target log folder to allow write operation. The default permissions of most web servers are "readable and executable". You can change the folder permissions using the CHMOD command or using an FTP client.
Then, we create a file locking mechanism so that when two or more users access the log file at the same time, only one of them can write to the file:
//Create file write operation The locking mechanism
flock($fileWrite, LOCK_SH);
Finally, we write the content of the entry:
//Write the CLF entry
fwrite($fileWrite,$clfString);
//Unlock the file lock status
flock($fileWrite, LOCK_UN);
//Close the log file
fclose($fileWrite);
Process log data
After the system is commercialized, the customer hopes to get detailed statistical analysis of the collected visitor data. Since all custom log files are organized in a standard format, any log analyzer can process them. Log Analyzer is a tool that analyzes large log files and produces pie charts, histograms, and other statistical graphics. Log analyzers are also used to collect data and synthesize information about which users visit your website, the number of clicks, etc.
Listed below are several popular log analyzers:
WebTrends is a very good log analyzer, which is suitable for large-scale websites and enterprise-level networks.
Analog is a popular free log analyzer.
Webalizer is a free analysis program. It can generate HTML reports so that its reports can be viewed by most web browsers.
Standard compliant
We can easily extend the application to support other types of logging. This way you can capture more data, such as browser type and referrer (referrer refers to the previous web page that linked to the current web page). The lesson here is: following standards or conventions when you program will ultimately make your job easier.