ThinkPHP framework is one of the more popular PHP frameworks in China. Although it is not comparable to those foreign frameworks, the advantage is , Well, the Chinese manual is very comprehensive. I have been studying SQL injection recently. When I used the TP framework before, because the underlying layer provided security functions, I didn’t consider security issues much during the development process.
1. What I have to say about the I function
TP system provides I function for filtering input variables. The meaning of the entire function body is to obtain data in various formats, such as I('get.'), I('post.id'), and then use the htmlspecialchars function (by default) to process it.
If you need to use other methods for security filtering, you can set it from /ThinkPHP/Conf/convention.php:
'DEFAULT_FILTER' => 'strip_tags', //也可以设置多种过滤方法 'DEFAULT_FILTER' => 'strip_tags,stripslashes',
You can find the I function from /ThinkPHP/Common/functions.php, the source code is as follows:
/** * 获取输入参数 支持过滤和默认值 * 使用方法: * <code> * I('id',0); 获取id参数 自动判断get或者post * I('post.name','','htmlspecialchars'); 获取$_POST['name'] * I('get.'); 获取$_GET * </code> * @param string $name 变量的名称 支持指定类型 * @param mixed $default 不存在的时候默认值 * @param mixed $filter 参数过滤方法 * @param mixed $datas 要获取的额外数据源 * @return mixed */ function I($name,$default='',$filter=null,$datas=null) { static $_PUT = null; if(strpos($name,'/')){ // 指定修饰符 list($name,$type) = explode('/',$name,2); }elseif(C('VAR_AUTO_STRING')){ // 默认强制转换为字符串 $type = 's'; } /*根据$name的格式获取数据:先判断参数的来源,然后再根据各种格式获取数据*/ if(strpos($name,'.')) {list($method,$name) = explode('.',$name,2);} // 指定参数来源 else{$method = 'param';}//设定为自动获取 switch(strtolower($method)) { case 'get' : $input =& $_GET;break; case 'post' : $input =& $_POST;break; case 'put' : /*此处省略*/ case 'param' : /*此处省略*/ case 'path' : /*此处省略*/ } /*对获取的数据进行过滤*/ if('' // 获取全部变量 $data = $input; $filters = isset($filter)?$filter:C('DEFAULT_FILTER'); if($filters) { if(is_string($filters)){$filters = explode(',',$filters);} //为多种过滤方法提供支持 foreach($filters as $filter){ $data = array_map_recursive($filter,$data); //循环过滤 } } }elseif(isset($input[$name])) { // 取值操作 $data = $input[$name]; $filters = isset($filter)?$filter:C('DEFAULT_FILTER'); if($filters) { /*对参数进行过滤,支持正则表达式验证*/ /*此处省略*/ } if(!empty($type)){ //如果设定了强制转换类型 switch(strtolower($type)){ case 'a': $data = (array)$data;break; // 数组 case 'd': $data = (int)$data;break; // 数字 case 'f': $data = (float)$data;break; // 浮点 case 'b': $data = (boolean)$data;break; // 布尔 case 's': // 字符串 default:$data = (string)$data; } } }else{ // 变量默认值 $data = isset($default)?$default:null; } is_array($data) && array_walk_recursive($data,'think_filter'); //如果$data是数组,那么用think_filter对数组过滤 return $data; }
Well, the function is basically divided into three parts:
The first block is to obtain data in various formats.
The second block performs loop encoding on the acquired data, whether it is a two-dimensional array or a three-dimensional array.
The third block, which is the penultimate line, calls think_filter to perform the last step of mysterious processing on the data.
Let’s trace the think_filter function first:
//1536行 版本3.2.3最新添加 function think_filter(&$value){// 过滤查询特殊字符 if(preg_match('/^(EXP|NEQ|GT|EGT|LT|ELT|OR|XOR|LIKE|NOTLIKE|NOT BETWEEN|NOTBETWEEN|BETWEEN|NOTIN|NOT IN|IN)$/i',$value)){ $value .= ' '; } }
This function is very simple and can be seen at a glance. It adds a space after some specific keywords.
But this function called think_filter only adds a space. What filtering effect does it play?
We all know that important logical verification, such as verifying whether the user is logged in, whether the user can purchase a certain product, etc., must be verified from the server side. If verified from the front end, it can be easily bypassed. For the same reason, in a program, logical structures such as in/exp are best controlled by the server.
When the data passed to the server is like this: id[0]=in&id[1]=1,2,3, if there is no think_filter function, it will be parsed into 1 in the following table, and it will be regarded as Server-side logic parsing. But if it becomes like Table 2 below, because there is an extra space, it cannot be matched and parsed, thus avoiding the loophole.
$data['id']=array('in'=>'1,2,3') //经过think_filter过滤之后,会变成介个样子: $data['id']=array('in '=>'1,2,3')
2. SQL injection
The relevant files are:/ThinkPHP/Library/Think/Db.class.php (changed to /ThinkPHP/Library/Think/Db/Driver.class.php in 3.2.3) and /ThinkPHP/Library/Think /Model.class.php. The Model.class.php file provides functions directly called by curd and directly provides external interfaces. The functions in Driver.class.php are indirectly called by curd operations.
//此次主要分析如下语句: M('user')->where($map)->find(); //在user表根据$map的条件检索出一条数据
Briefly talk about TP’s processing ideas:
First instantiate the Model class into a user object, and then call the where function in the user object to process $map, that is, perform some formatting on $map and assign it to the member variable $options of the user object (if there are other For coherent operations, values are first assigned to the corresponding member variables of the user object instead of directly splicing SQL statements. Therefore, when writing coherent operations, there is no need to consider the order of keywords like splicing SQL statements), and then calling the find function.
The find function will call the bottom layer, that is, the function in the driver class - select to obtain data. When it comes to the select function, it's another story.
In addition to curd operations, select also handles pdo binding. We only care about curd operations here, so we call buildSelectSql in select to process paging information, and call parseSQL to assemble SQL statements in the established order.
Although all parameters required for splicing SQL statements have been placed in member variables, the formats are not uniform. They may be in string format, may be in array format, or may be the special query format provided by TP , for example: $data['id']=array('gt','100');, so before splicing, the respective processing functions must be called for unified formatting. I chose parseWhere, a complex example, for analysis.
Regarding security, if you use the I function to obtain data, htmlspecialchars processing will be performed by default, which can effectively resist XSS attacks, but has little impact on SQL injection.
When filtering symbols related to SQL injection, TP’s approach is very clever: first process the user’s input according to normal logic, and then perform safe processing in functions such as parseWhere and parseHaving that are closest to the final SQL statement. This order avoids injection during processing.
Of course, the most common processing method is addslashes. According to the former waves who died on the beach, it is recommended to use mysql_real_escape_string for filtering, but this function can only be used if the database has been connected.
I feel like TP can do some optimization in this area. After all, everyone who has reached this point is connected to the database.
Well, next, the analysis begins:
Let’s talk about a few member variables in the Model object:
// 主键名称 protected $pk = 'id'; // 字段信息 protected $fields = array(); // 数据信息 protected $data = array(); // 查询表达式参数 protected $options = array(); // 链操作方法列表 protected $methods = array('strict','order','alias','having','group','lock','distinct','auto','filter','validate','result','token','index','force') 接下来分析where函数: public function where($where,$parse=null){ //如果非数组格式,即where('id=%d&name=%s',array($id,$name)),对传递到字符串中的数组调用mysql里的escapeString进行处理 if(!is_null($parse) && is_string($where)) { if(!is_array($parse)){ $parse = func_get_args();array_shift($parse);} $parse = array_map(array($this->db,'escapeString'),$parse); $where = vsprintf($where,$parse); //vsprintf() 函数把格式化字符串写入变量中 }elseif(is_object($where)){ $where = get_object_vars($where); } if(is_string($where) && '' != $where){ $map = array(); $map['_string'] = $where; $where = $map; } //将$where赋值给$this->where if(isset($this->options['where'])){ $this->options['where'] = array_merge($this->options['where'],$where); }else{ $this->options['where'] = $where; } return $this; }
The logic of the where function is very simple. If it is in the format of where('id=%d&name=%s',array($id,$name)), then call mysql on the $id, $name variables. escapeString is processed. The essence of escapeString is to call mysql_real_escape_string, addslashes and other functions for processing.
最后将分析之后的数组赋值到Model对象的成员函数——$where中供下一步处理。
再分析find函数:
//model.class.php 行721 版本3.2.3 public function find($options=array()) { if(is_numeric($options) || is_string($options)){ /*如果传递过来的数据是字符串,不是数组*/ $where[$this->getPk()] = $options; $options = array(); $options['where'] = $where; /*提取出查询条件,并赋值*/ } // 根据主键查找记录 $pk = $this->getPk(); if (is_array($options) && (count($options) > 0) && is_array($pk)) { /*构造复合主键查询条件,此处省略*/ } $options['limit'] = 1; // 总是查找一条记录 $options = $this->_parseOptions($options); // 分析表达式 if(isset($options['cache'])){ /*缓存查询,此处省略*/ } $resultSet = $this->db->select($options); if(false === $resultSet){ return false;} if(empty($resultSet)) { return null; } // 查询结果为空 if(is_string($resultSet)){ return $resultSet;} //查询结果为字符串 // 读取数据后的处理,此处省略简写 $this->data = $this->_read_data($resultSet[0]); return $this->data; }
$Pk为主键,$options为表达式参数,本函数的作用就是完善成员变量——options数组,然后调用db层的select函数查询数据,处理后返回数据。
跟进_parseOptions函数:
protected function _parseOptions($options=array()) { //分析表达式 if(is_array($options)){ $options = array_merge($this->options,$options); } /*获取表名,此处省略*/ /*添加数据表别名,此处省略*/ $options['model'] = $this->name;// 记录操作的模型名称 /*对数组查询条件进行字段类型检查,如果在合理范围内,就进行过滤处理;否则抛出异常或者删除掉对应字段*/ if(isset($options['where']) && is_array($options['where']) && !empty($fields) && !isset($options['join'])){ foreach ($options['where'] as $key=>$val){ $key = trim($key); if(in_array($key,$fields,true)){ //如果$key在数据库字段内,过滤以及强制类型转换之 if(is_scalar($val)) { /*is_scalar 检测是否为标量。标量是指integer、float、string、boolean的变量,array则不是标量。*/ $this->_parseType($options['where'],$key); } }elseif(!is_numeric($key) && '_' != substr($key,0,1) && false === strpos($key,'.') && false === strpos($key,'(') && false === strpos($key,'|') && false === strpos($key,'&')){ // 如果$key不是数字且第一个字符不是_,不存在.(|&等特殊字符 if(!empty($this->options['strict'])){ //如果是strict模式,抛出异常 E(L('_ERROR_QUERY_EXPRESS_').':['.$key.'=>'.$val.']'); } unset($options['where'][$key]); //unset掉对应的值 } } } $this->options = array(); // 查询过后清空sql表达式组装 避免影响下次查询 $this->_options_filter($options); // 表达式过滤 return $options; }
本函数的结构大概是,先获取了表名,模型名,再对数据进行处理:如果该条数据不在数据库字段内,则做出异常处理或者删除掉该条数据。否则,进行_parseType处理。parseType此处不再跟进,功能为:数据类型检测,强制类型转换包括int,float,bool型的三种数据。
函数运行到此处,就该把处理好的数据传到db层的select函数里了。此时的查询条件$options中的int,float,bool类型的数据都已经进行了强制类型转换,where()函数中的字符串(非数组格式的查询)也进行了addslashes等处理。
继续追踪到select函数,就到了driver对象中了,还是先列举几个有用的成员变量:
// 数据库表达式 protected $exp = array('eq'=>'=','neq'=>'<>','gt'=>'>','egt'=>'>=','lt'=>'<','elt'=>'<=','notlike'=>'NOT LIKE','like'=>'LIKE','in'=>'IN','notin'=>'NOT IN','not in'=>'NOT IN','between'=>'BETWEEN','not between'=>'NOT BETWEEN','notbetween'=>'NOT BETWEEN'); // 查询表达式 protected $selectSql = 'SELECT%DISTINCT% %FIELD% FROM %TABLE%%FORCE%%JOIN%%WHERE%%GROUP%%HAVING%%ORDER%%LIMIT% %UNION%%LOCK%%COMMENT%'; // 当前SQL指令 protected $queryStr = ''; // 参数绑定 protected $bind = array(); select函数: public function select($options=array()) { $this->model = $options['model']; $this->parseBind(!empty($options['bind'])?$options['bind']:array()); $sql = $this->buildSelectSql($options); $result = $this->query($sql,!empty($options['fetch_sql']) ? true : false); return $result; }
版本3.2.3经过改进之后,select精简了不少。parseBind函数是绑定参数,用于pdo查询,此处不表。
buildSelectSql()函数及其后续调用如下:
public function buildSelectSql($options=array()) { if(isset($options['page'])) { /*页码计算及处理,此处省略*/ } $sql = $this->parseSql($this->selectSql,$options); return $sql; } /* 替换SQL语句中表达式*/ public function parseSql($sql,$options=array()){ $sql = str_replace( array('%TABLE%','%DISTINCT%','%FIELD%','%JOIN%','%WHERE%','%GROUP%','%HAVING%','%ORDER%','%LIMIT%','%UNION%','%LOCK%','%COMMENT%','%FORCE%'), array( $this->parseTable($options['table']), $this->parseDistinct(isset($options['distinct'])?$options['distinct']:false), $this->parseField(!empty($options['field'])?$options['field']:'*'), $this->parseJoin(!empty($options['join'])?$options['join']:''), $this->parseWhere(!empty($options['where'])?$options['where']:''), $this->parseGroup(!empty($options['group'])?$options['group']:''), $this->parseHaving(!empty($options['having'])?$options['having']:''), $this->parseOrder(!empty($options['order'])?$options['order']:''), $this->parseLimit(!empty($options['limit'])?$options['limit']:''), $this->parseUnion(!empty($options['union'])?$options['union']:''), $this->parseLock(isset($options['lock'])?$options['lock']:false), $this->parseComment(!empty($options['comment'])?$options['comment']:''), $this->parseForce(!empty($options['force'])?$options['force']:'') ),$sql); return $sql; }
可以看到,在parseSql中用正则表达式拼接了sql语句,但并没有直接的去处理各种插叙你的数据格式,而是在解析变量的过程中调用了多个函数,此处拿parseWhere举例子。
protected function parseWhere($where) { $whereStr = ''; if(is_string($where)) { // 直接使用字符串条件 $whereStr = $where; } else{ // 使用数组表达式 /*设定逻辑规则,如or and xor等,默认为and,此处省略*/ $operate=' AND '; /*解析特殊格式的表达式并且格式化输出*/ foreach ($where as $key=>$val){ if(0===strpos($key,'_')) { // 解析特殊条件表达式 $whereStr .= $this->parseThinkWhere($key,$val); } else{ // 查询字段的安全过滤 $multi = is_array($val) && isset($val['_multi']); //判断是否有复合查询 $key = trim($key); /*处理字段中包含的| &逻辑*/ if(strpos($key,'|')) { // 支持 name|title|nickname 方式定义查询字段 /*将|换成or,并格式化输出,此处省略*/ } elseif(strpos($key,'&')){ /*将&换成and,并格式化输出,此处省略*/ } else{ $whereStr .= $this->parseWhereItem($this->parseKey($key),$val); } } $whereStr .= $operate; } $whereStr = substr($whereStr,0,-strlen($operate)); } return empty($whereStr)?'':' WHERE '.$whereStr; } // where子单元分析 protected function parseWhereItem($key,$val) { $whereStr = ''; if(is_array($val)){ if(is_string($val[0])){ $exp = strtolower($val[0]); //如果是$map['id']=array('eq',100)一类的结构,那么解析成数据库可执行格式 if(preg_match('/^(eq|neq|gt|egt|lt|elt)$/',$exp)){ $whereStr .= $key.' '.$this->exp[$exp].' '.$this->parseValue($val[1]); } //如果是模糊查找格式 elseif(preg_match('/^(notlike|like)$/',$exp)){// 模糊查找,$map['name']=array('like','thinkphp%'); if(is_array($val[1])) { //解析格式如下:$map['b'] =array('notlike',array('%thinkphp%','%tp'),'AND'); $likeLogic = isset($val[2])?strtoupper($val[2]):'OR'; //如果没有设定逻辑结构,则默认为OR if(in_array($likeLogic,array('AND','OR','XOR'))){ /* 根据逻辑结构,组合语句,此处省略*/ $whereStr .= '('.implode(' '.$likeLogic.' ',$like).')'; } } else{ $whereStr .= $key.' '.$this->exp[$exp].' '.$this->parseValue($val[1]); } }elseif('bind' == $exp ){ // 使用表达式,pdo数据绑定 $whereStr .= $key.' = :'.$val[1]; }elseif('exp' == $exp ){ // 使用表达式 $map['id'] = array('exp',' IN (1,3,8) '); $whereStr .= $key.' '.$val[1]; }elseif(preg_match('/^(notin|not in|in)$/',$exp)){ //IN运算 $map['id'] = array('not in','1,5,8'); if(isset($val[2]) && 'exp'==$val[2]){ $whereStr .= $key.' '.$this->exp[$exp].' '.$val[1]; }else{ if(is_string($val[1])) { $val[1] = explode(',',$val[1]); } $zone = implode(',',$this->parseValue($val[1])); $whereStr .= $key.' '.$this->exp[$exp].' ('.$zone.')'; } }elseif(preg_match('/^(notbetween|not between|between)$/',$exp)){ //BETWEEN运算 $data = is_string($val[1])? explode(',',$val[1]):$val[1]; $whereStr .= $key.' '.$this->exp[$exp].' '.$this->parseValue($data[0]).' AND '.$this->parseValue($data[1]); }else{ //否则抛出异常 E(L('_EXPRESS_ERROR_').':'.$val[0]); } } else{ //解析如:$map['status&score&title'] =array('1',array('gt','0'),'thinkphp','_multi'=>true); $count = count($val); $rule = isset($val[$count-1]) ? (is_array($val[$count-1]) ? strtoupper($val[$count-1][0]) : strtoupper($val[$count-1]) ) : '' ; if(in_array($rule,array('AND','OR','XOR'))){ $count = $count -1; }else{ $rule = 'AND'; } for($i=0;$i<$count;$i++){ $data = is_array($val[$i])?$val[$i][1]:$val[$i]; if('exp'==strtolower($val[$i][0])) { $whereStr .= $key.' '.$data.' '.$rule.' '; }else{ $whereStr .= $this->parseWhereItem($key,$val[$i]).' '.$rule.' '; } } $whereStr = '( '.substr($whereStr,0,-4).' )'; } } else { //对字符串类型字段采用模糊匹配 $likeFields = $this->config['db_like_fields']; if($likeFields && preg_match('/^('.$likeFields.')$/i',$key)) { $whereStr .= $key.' LIKE '.$this->parseValue('%'.$val.'%'); }else { $whereStr .= $key.' = '.$this->parseValue($val); } } return $whereStr; } protected function parseThinkWhere($key,$val) { //解析特殊格式的条件 $whereStr = ''; switch($key) { case '_string':$whereStr = $val;break; // 字符串模式查询条件 case '_complex':$whereStr = substr($this->parseWhere($val),6);break; // 复合查询条件 case '_query':// 字符串模式查询条件 /*处理逻辑结构,并且格式化输出字符串,此处省略*/ } return '( '.$whereStr.' )'; }
上面的两个函数很长,我们再精简一些来看:parseWhere首先判断查询数据是不是字符串,如果是字符串,直接返回字符串,否则,遍历查询条件的数组,挨个解析。
由于TP支持_string,_complex之类的特殊查询,调用了parseThinkWhere来处理,对于普通查询,就调用了parseWhereItem。
在各自的处理过程中,都调用了parseValue,追踪一下,其实是用了addslashes来过滤,虽然addslashes在非utf-8编码的页面中会造成宽字节注入,但是如果页面和数据库均正确编码的话,还是没什么问题的。