Home>Article>Backend Development> PHP implements functions similar to the Construct library in Python (1) Basic design ideas

PHP implements functions similar to the Construct library in Python (1) Basic design ideas

王林
王林 forward
2019-08-19 17:12:42 2022browse

Introduction

There is a library Construct in python, which can be used to parse binary data. It is very convenient to use this tool to analyze network packets, formatted data files, etc.

If I had used this tool when analyzing the sqlite database file format a while ago, it would have saved a lot of trouble.

However, Construct2.9 has changed a lot from previous versions. The original python code written in Construct basically needs to be rewritten. In the process of viewing the source code of Construct, I found that the basic implementation idea of Construct is the recursive descent analysis method. Use the object's construction method to dynamically define the data structure, and implement the parsing of binary data in the parse method.

I plan to use php to parse binary data, but the implementation idea is completely different from python's Construct.

Recommended PHP video tutorial://m.sbmmt.com/course/list/29/type/2.html

Basic ideas

Since python's Construct uses python's object syntax to implement the definition and parsing of dynamic hierarchical structures, some structure definitions look obscure due to the limitation of python's object syntax.

My idea is to define a small language specifically used to describe dynamic hierarchical structures, so that it can take care of people's common expression habits as much as possible.

Take this small project as an example, based on the structure definition syntax in C language, and on this basis, conditionalization and loop structure definition are added.

There is a common structure in binary data. The first few bytes store the length of the subsequent data block, followed by the data block. It is not convenient to express this kind of structure with the structure definition of C language. The length of the entire data block varies and cannot be determined at compile time, but can only be determined during parsing. Therefore, it is necessary to extend the structure definition syntax of C language.

The first step is to implement the analysis of the C language structure

In this step, we do not consider the dynamic hierarchical structure definition, but implement this small The core part of the language, at least it must be able to understand the structure definition of C language, and be able to parse binary data based on this structure definition. After this step is completed, the definition and analysis of the dynamic structure will be implemented.

This project is based on the ADOS scripting language engine briefly introduced in the previous file.

Let’s take a look at our task first

Here is a structure definition file that is completely C language specification

blockStruct.h

struct student { char name[2]; int num; int age; char addr[3]; }; struct teacher { char name[2]; int num; char addr[3]; };

Binary data block to be parsed

"\x41\x42\x01\x00\x00\x00\x02\x00\x00\x00\x41\x42\x43\x41\x42\x01\x00\x00\x00\x41\x42\x43"

Hope to get the parsing result

[value] => Array ( [name] => AB [num] => 1 [age] => 2 [addr] => ABC ) [value] => Array ( [name] => AB [num] => 1 [addr] => ABC )

Interested friends can pay attention to the relationship between the three

The following It is a lexical rule file that only implements the compilation of C language structures

=/','_greq' ,'>'], ['/^\<=/','_leeq' ,'<'], ['/^=/','_equa' ,'='], ['/^\>/','_grea' ,'>'], ['/^\
      

The following is a grammatical rule file that only implements the compilation of C language structures

0){ return $extraArray[0]; }else{ return ''; } } //求出放在附加信息中的数组长度 function elementSize($extraArray){ if(count($extraArray)>0){ return intval($extraArray[0]); }else{ return 0; } } //处理源代码中的多条声明语句 function handleStatementList($stack,$coder,$listIndex,$statementIndex){ $t1= $this->topItem($stack,$statementIndex); if($listIndex>0){ $t2= $this->topItem($stack,$listIndex); //将元素名所在项的附加信息数组找出来 $extraArray=$t2[TokenExtraIndex]; }else{ //$listIndex=0表示,statementList是上级节点,附加信息数组应该是空数组 $extraArray=[]; } //将statement中的附加信息添加到statementList的附加信息中去 $extraArray[]=$t1[TokenExtraIndex]; return ['#',$extraArray]; } //处理源代码中的一条声明语句 function handleStatement($stack,$coder,$typeItemIndex,$idenItemIndex){ $t1= $this->topItem($stack,$typeItemIndex); $elementType = $t1[TokenValueIndex]; $t2= $this->topItem($stack,$idenItemIndex); $elementName = $t2[TokenValueIndex]; //将元素名所在项的附加信息数组找出来 $extraArray=$t2[TokenExtraIndex]; $valLen =$extraArray[0]; //附加信息中包含 元素名称,元素类型,数据长度 return [$elementName,[$elementName,$elementType,$valLen]]; } //语法规则处理函数名由规则右边部分与规则左边部分拼接而成 //语法规则定义的先后决定了归约时匹配的顺序,要根据实际的语法安排 //如果不熟悉语法,随意调整语法规则的先后次序将有可能导致语法错误 // struct list {{{ function _structList_0_structList_struct($stack,$coder){ $coder->pushBlockTail(); return ['#',[]]; } function _structList_0_struct($stack,$coder){ $coder->pushBlockTail(); return ['#',[]]; } // struct list }}} // struct {{{ function _struct_0_structName_blockStatement_semi($stack,$coder){ $t1= $this->topItem($stack,3); $structName = $t1[TokenValueIndex]; $t2= $this->topItem($stack,2); $extraArray=$t2[TokenExtraIndex]; return [$structName,$extraArray]; } // struct }}} // struct name {{{ function _structName_0_strukey_iden($stack,$coder){ $t1= $this->topItem($stack,1); $structName = $t1[TokenValueIndex]; $coder->pushBlockHeader($structName); return $this->pass($stack,1); } // struct name }}} // blockStatement {{{ function _blockStatement_0_lcb_statementList_rcb($stack,$coder){ return $this->pass($stack,2); } // blockStatement }}} // statement list {{{ function _statementList_0_statementList_statement($stack,$coder){ return $this->handleStatementList($stack,$coder,2,1); } function _statementList_0_statement($stack,$coder){ //此处0表示statementList是上一级节点,要做特殊处理 return $this->handleStatementList($stack,$coder,0,1); } // statement list }}} // statement {{{ function _statement_0_double_term_semi($stack,$coder){ $t1= $this->topItem($stack,2); $elementName = $t1[TokenValueIndex]; $parseFuncName = 'parseDouble'; $coder->pushBlockBody($parseFuncName,$elementName); return $this->handleStatement($stack,$coder,3,2); } function _statement_0_float_term_semi($stack,$coder){ $t1= $this->topItem($stack,2); $elementName = $t1[TokenValueIndex]; $parseFuncName = 'parseFloat'; $coder->pushBlockBody($parseFuncName,$elementName); return $this->handleStatement($stack,$coder,3,2); } function _statement_0_char_term_semi($stack,$coder){ $t1= $this->topItem($stack,2); $elementName = $t1[TokenValueIndex]; $size = $this->elementSize($t1[TokenExtraIndex]); $parseFuncName = 'parseFixStr'; $coder->pushBlockBody($parseFuncName,$elementName,$size); return $this->handleStatement($stack,$coder,3,2); } function _statement_0_int_term_semi($stack,$coder){ $t1= $this->topItem($stack,2); $elementName = $t1[TokenValueIndex]; $parseFuncName = 'parseInt'; $coder->pushBlockBody($parseFuncName,$elementName,4); return $this->handleStatement($stack,$coder,3,2); } // statement }}} // term {{{ function _term_0_term_array($stack,$coder){ $t1= $this->topItem($stack,1); $valLen = $t1[TokenValueIndex]; //将数据长度放入附加信息 $t2= $this->topItem($stack,2); return [$t2[TokenValueIndex],[$valLen]]; } function _term_0_iden($stack,$coder){ $t1= $this->topItem($stack,1); //未指定数据长度时将长度值设为0 $valLen = '0'; $t2= $this->topItem($stack,2); return [$t1[TokenValueIndex],[$valLen]]; } // term }}} // array {{{ function _array_0_lb_num_rb($stack,$coder){ return $this->pass($stack,2); } // array }}} } // end of class

The following is the basic data Parsing functions for types, integers, and strings (note, used for experiments, not yet covering all basic data types)

0){ $raw = substr($data, 0,1); $value = unpack("C1",$raw)[1]; return ['value'=>$value,'size'=>1,'error'=>0,'msg'=>'ok']; }else{ return ['value'=>False,'size'=>0,'error'=>1,'msg'=>'not enough for parseByte']; } } //解析一个有符号整数 function parseInt($context,$size = 4){ if(!($size == 2 or $size == 4 or $size == 8 )){ return ['value'=>False,'size'=>0,'error'=>2,'msg'=>'not a valid size']; } if($size == 2) $format = "s1"; if($size == 4) $format = "l1"; if($size == 8) $format = "q1"; $pos=$context['pos']; $data = substr($context['data'], $pos); if(strlen($data)>=$size){ $raw = substr($data, 0,$size); $value = unpack($format,$raw)[1]; return ['value'=>$value,'size'=>$size,'error'=>0,'msg'=>'ok']; }else{ return ['value'=>False,'size'=>0,'error'=>1,'msg'=>'no data for parseInt']; } } //解析一个大端无符号整数 function parseBUInt($context,$size = 4){ if(!($size == 2 or $size == 4 or $size == 8 )){ return ['value'=>False,'size'=>0,'error'=>2,'msg'=>'not a valid size']; } if($size == 2) $format = "n1"; if($size == 4) $format = "N1"; if($size == 8) $format = "J1"; $pos=$context['pos']; $data = substr($context['data'], $pos); if(strlen($data)>=$size){ $raw = substr($data, 0,$size); $value = unpack($format,$raw)[1]; return ['value'=>$value,'size'=>$size,'error'=>0,'msg'=>'ok']; }else{ return ['value'=>False,'size'=>0,'error'=>1,'msg'=>'no data for parseBUInt']; } } //解析一个小端无符号整数 function parseLUInt($context,$size = 4){ if(!($size == 2 or $size == 4 or $size == 8 )){ return ['value'=>False,'size'=>0,'error'=>2,'msg'=>'not a valid size']; } if($size == 2) $format = "v1"; if($size == 4) $format = "NL"; if($size == 8) $format = "P1"; $pos=$context['pos']; $data = substr($context['data'], $pos); if(strlen($data)>=$size){ $raw = substr($data, 0,$size); $value = unpack($format,$raw)[1]; return ['value'=>$value,'size'=>$size,'error'=>0,'msg'=>'ok']; }else{ return ['value'=>False,'size'=>0,'error'=>1,'msg'=>'no data for parseLUInt']; } } //解析一个null结束的字符串 function parseString($context,$size=0){ $pos=$context['pos']; $data = substr($context['data'], $pos); $p=0; $raw = substr($data, $p,1); $byte = unpack("C1",$raw)[1]; $result =''; while($byte){ $result.=$raw; $p++; if($p>=strlen($data)){ return ['value'=>False,'size'=>0,'error'=>1,'msg'=>'not find null end for parseString']; } $raw = substr($data, $p,1); $byte = unpack("C1",$raw)[1]; } return ['value'=>$result,'size'=>$p,'error'=>0,'msg'=>'ok']; } //解析一个定长字符串 function parseFixStr($context,$size=0){ $pos=$context['pos']; $data = substr($context['data'], $pos); //var_dump($data); if(strlen($data)>=$size){ $result =''; for($i=0;$i<$size;$i++){ $raw = substr($data, $i,1); $value = unpack("a1",$raw)[1]; $result.=$value; } return ['value'=>$result,'size'=>$size,'error'=>0,'msg'=>'ok']; } return ['value'=>False,'size'=>0,'error'=>1,'msg'=>'not enough for parseFixedString']; }

The following is a working function for template replacement


      

With that After these preparations, a grammar-guided encoder can be implemented to generate the final parsing function.
The following is a simple encoder that generates script code that can be used to parse binary data when syntactically analyzing the structure definition.

engine = $engine; }else{ exit('the engine is not valid in StructwkrCoder construct.'); } } //编译得到的最终结果 public function codeLines(){ if(count($this->codeLines)<1){ return ''; } $script=''; for ($i=0;$i< count($this->codeLines);$i+=1) { $script.=$this->codeLines[$i]; } return $script; } //输出编译后的结果 public function printCodeLines(){ echo $this->codeLines(); } //添加一个块解析函数头 public function pushBlockHeader($structName){ $structName=ucfirst($structName); $content = makeBlockHeader($structName); array_push($this->codeLines, $content); $lineIndex=$this->lineIndex; $this->lineIndex+=1; return $lineIndex; } //添加一个块解析函数体 public function pushBlockBody($parseFuncName,$filedName='',$filedSize=0){ $content = makeblockBody($parseFuncName,$filedName,$filedSize); array_push($this->codeLines, $content); $lineIndex=$this->lineIndex; $this->lineIndex+=1; return $lineIndex; } //添加一个块解析函数尾 public function pushBlockTail(){ $content = makeblockTail(); array_push($this->codeLines, $content); $lineIndex=$this->lineIndex; $this->lineIndex+=1; return $lineIndex; } }

Automatically generated script file for parsing

Compile the C language structure, and finally get a series of functions that can be used to parse binary data. For example, two structures are defined in the above structure definition file

struct student { char name[2]; int num; int age; char addr[3]; }; struct teacher { char name[2]; int num; char addr[3]; };

Then the parsing functionsparseStudentandparseTeachercorresponding to these two structures will be generated.

The following is the automatically generated test script

       False,'size'=>0,'error'=>$expRes['error'],'msg'=>$expRes['msg']]; } $expRes = parseInt($context,4); if($expRes['error']==0){ $filed = 'num'; if($filed){ $valueArray[$filed]=$expRes['value']; }else{ $valueArray[]=$expRes['value']; } $context['pos']+=$expRes['size']; $totalSize+= $expRes['size']; }else{ return ['value'=>False,'size'=>0,'error'=>$expRes['error'],'msg'=>$expRes['msg']]; } $expRes = parseInt($context,4); if($expRes['error']==0){ $filed = 'age'; if($filed){ $valueArray[$filed]=$expRes['value']; }else{ $valueArray[]=$expRes['value']; } $context['pos']+=$expRes['size']; $totalSize+= $expRes['size']; }else{ return ['value'=>False,'size'=>0,'error'=>$expRes['error'],'msg'=>$expRes['msg']]; } $expRes = parseFixStr($context,3); if($expRes['error']==0){ $filed = 'addr'; if($filed){ $valueArray[$filed]=$expRes['value']; }else{ $valueArray[]=$expRes['value']; } $context['pos']+=$expRes['size']; $totalSize+= $expRes['size']; }else{ return ['value'=>False,'size'=>0,'error'=>$expRes['error'],'msg'=>$expRes['msg']]; } return ['value'=>$valueArray,'size'=>$totalSize,'error'=>0,'msg'=>'ok']; } function parseTeacher($context,$size=0){ $valueArray=[]; $totalSize = 0; $expRes = parseFixStr($context,2); if($expRes['error']==0){ $filed = 'name'; if($filed){ $valueArray[$filed]=$expRes['value']; }else{ $valueArray[]=$expRes['value']; } $context['pos']+=$expRes['size']; $totalSize+= $expRes['size']; }else{ return ['value'=>False,'size'=>0,'error'=>$expRes['error'],'msg'=>$expRes['msg']]; } $expRes = parseInt($context,4); if($expRes['error']==0){ $filed = 'num'; if($filed){ $valueArray[$filed]=$expRes['value']; }else{ $valueArray[]=$expRes['value']; } $context['pos']+=$expRes['size']; $totalSize+= $expRes['size']; }else{ return ['value'=>False,'size'=>0,'error'=>$expRes['error'],'msg'=>$expRes['msg']]; } $expRes = parseFixStr($context,3); if($expRes['error']==0){ $filed = 'addr'; if($filed){ $valueArray[$filed]=$expRes['value']; }else{ $valueArray[]=$expRes['value']; } $context['pos']+=$expRes['size']; $totalSize+= $expRes['size']; }else{ return ['value'=>False,'size'=>0,'error'=>$expRes['error'],'msg'=>$expRes['msg']]; } return ['value'=>$valueArray,'size'=>$totalSize,'error'=>0,'msg'=>'ok']; }

Run this test script and the results are as follows:

Array ( [value] => Array ( [name] => AB [num] => 1 [age] => 2 [addr] => ABC ) [size] => 13 [error] => 0 [msg] => ok ) Array ( [value] => Array ( [name] => AB [num] => 1 [addr] => ABC ) [size] => 9 [error] => 0 [msg] => ok )

OK! The first step of the task has been completed, and then consider implementing it Parsing of a structure with conditional judgment.

For more related questions, please visit the PHP Chinese website://m.sbmmt.com/

The above is the detailed content of PHP implements functions similar to the Construct library in Python (1) Basic design ideas. For more information, please follow other related articles on the PHP Chinese website!

Statement:
This article is reproduced at:csdn.net. If there is any infringement, please contact admin@php.cn delete