Home>Article>Web Front-end> Use javascript to write lexical analysis of four arithmetic compilers

Use javascript to write lexical analysis of four arithmetic compilers

php是最好的语言
php是最好的语言 Original
2018-08-06 13:40:42 1773browse

There is a course called Compilation Principles in junior year, which asks us to write a simple compiler ourselves. Well, any language can be used. Of course I use js. It is so elegant, although I don’t use it very much. grace. This has nothing to do with language, it’s just that I like to use js, and there aren’t many js features used in it.

Also, the code is a bit bad, so don’t complain.

Let me talk about my whole process first

  1. The first step is lexical analysis: you need to write a regular expression and then put the words and numbers in it Cut them all out.

  2. Build grammar rules. Here I chose LL(1) grammar. Design your own grammar here.

  3. Build intermediate code. Here I used a syntax tree.

  4. Write a conversion program, and what kind of grammatical sentence corresponds to what kind of program.

## Lexical analysis:

  1. Enter a regular expression and convert the regular expression into nfa->dfa-> ;dfa minimized. For this part, go to NetEase Cloud Classroom's compiler that uses Java. Just follow the above. I followed his video for the previous nfa part.

  2. Get the table minimized by dfa. All that is needed for every regular expression is this table. Therefore, after determining the regular expression, you can directly store the table without generating it every time.

  3. Then you need a bunch of regular expressions


    1. number one

    2. Symbol one

    3. Keyword one

    4. Variable name one

    5. "anything"One (this can be used to handle errors. If it is not the above 1-4, then you can use 5 to receive and exclude)

    var id_ = new build_rule(); id_.build_from_str(id__str, 3); //这个变量id__str就是那个已经生成字符串保存起来的dfa最小化的表 //数字3就是id对应的名字,到时候用来判断来生成类型码的 var key_word = new build_rule(); key_word.build_from_str(key_word_str, 1); //和上面一样 var ops = new build_rule("{op}{op}*", 1); //这个使用正则生成的规则的,需要经过nfa---dfa---最小化这几步的转化 //1符号和关键字统称的类型 var num = new build_rule("{float}", 4); //同上 var anything = new build_rule(); anything.build_from_str(anything_str); anything.rule_name = 5; //这个就是用来处理错误的,识别5这个类型时候就会出错,也可以记录这个出错让程序一直扫描到后面再输出错误 //按照自己定义的规定的顺序进行添加规则,到时候就会按照这个顺序进行查找 var qing = qingai(code); qing.add_rules(key_word); qing.add_rules(id_); qing.add_rules(ops); qing.add_rules(num); qing.add_rules(anything); qing.action();
  1. All regular expressions are in order. It depends on your own arrangement. For example, the sequence is:

    Variable name——–>Keyword————>Others
    In this case, if
    "var"is recognized, var will be regarded as a variable name, because when var is not defined as a keyword, this can be used as a legal variable name.So the order needs to be arranged by yourself

  2. After constructing multiple regular expressions and their order, the work begins.

  3. The source code is scanned from the beginning, and the main pointer points to the beginning


    1. The pointer points to the beginning

    2. Start from the first rule

    3. According to the dfa automaton rules, that is, the table, if you encounter any character, jump to that row

    4. If you can still jump after reading the next one, jump to another line. If it cannot jump, check whether this line has an end mark. If so, exit smoothly without looking for the next rule. Then add the length of the found string to the main pointer and record the information of this string. : Length, that row, that column, what type. If not, find the next rule.

    5. Because there is a rule that

      is, ensuring that everyone can find the attribute.

    6. There will still be some that are not found because there are spaces and line breaks. You need to check them at the end and filter them out.

  4. Then you can get such a table based on the source code you wrote

  5. a=7464;b=7465;a=b+7464*2;

8. The following type code is the type code that needs to be used to express this type of string in the grammar. Later, the grammar side will use the type code to determine whether the entered sentence symbol does not conform to the given grammar. Because different keywords and symbols have different meanings, the type codes of keywords and symbols are different. My variable name is represented by dUse javascript to write lexical analysis of four arithmetic compilers

Summary steps

  1. Enter the regular expression

  2. Replace the macro definition of the regular expression with normal characters

  3. Cut regular expression

  4. Pass the entire cut expression into the nfa automaton to construct the nfa graph

  5. Put the head node of the nfa graph Pass in the dfa automatic machine to construct the dfa table

  6. Perform dfa minimization on the nfa table to obtain the dfa minimized table

  7. This completes a string The construction of the corresponding search automaton

  8. Repeat 1-7 until all regular expressions are converted. After that, you only need to save all the minimized tables, and just load that table every time. There is no need to generate nfa, dfa or the like every time.

  9. Put the rules into the scanner in order (actually it is just a while loop that writes all the characters of the source program at once) and start scanning.

  10. Get token table

  11. End

Related articles:


Algorithms and examples of parsing four arithmetic expressions in javascript

Analysis of JavaScript pre-compilation principle

The above is the detailed content of Use javascript to write lexical analysis of four arithmetic compilers. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn