靜態分析是一個強大的工具,可以幫助開發人員控製程式碼品質。讓我們試著使用 Java 為 Lua 開發一個簡單的分析器,看看靜態分析器的底層是什麼。
我們將用 Java 寫 Lua 的靜態分析器。
為什麼是Lua?它的語法很簡單,沒有必要陷入細節。它也是一種很好的語言,尤其是與 JavaScript 相比。為什麼選擇Java?它為我們提供了全方位的技術棧,而且Java也是一種用戶友好的開發語言。
不:)但是,本文介紹了我們如何在內部黑客馬拉松中開發真正的分析儀試點。所以,這個標題不僅僅是一個標題誘餌——我們確實有 48 小時。我們的團隊有三名開發人員,所以如果您想單獨嘗試這個,請準備好花費更多時間。
由於這只是一個試點,我們在文章中將我們的分析器稱為「Mun」。
在開始之前,最好先了解分析器是什麼並規劃出工作範圍。總而言之,很明顯:抓住代碼,如果這裡有問題就抱怨。我們到底需要什麼?我們對靜態分析的以下方面感興趣:
聽起來很乏味,但這只是分析器的核心。順便說一句,編譯器在前端也有同樣的東西。然而,最棘手的部分(程式碼產生)就潛伏在後端。
核心的計畫是什麼?範圍如下:
確實,48小時太多了:)有些東西必須放棄,有些東西要精簡,有些東西要重用。當我們閱讀本文時,我們將考慮此類情況。
為了更好地理解,讓我們將其圖解一下:
這足以開始了。您可以在我們的網站上了解有關靜態分析的更多資訊。
所以,詞法分析器和解析器是分析器的基礎,沒有它們我們就無法走得更遠。如果我們想要正規表示式以外的任何東西。
使用詞法分析器非常簡單:將原始程式碼作為文字處理很麻煩。因此,我們需要將其轉換為中間表示,即將其拆分為標記。也就是說,詞法分析器必須不聰明。這是必須具備的,因為所有最困難、最混亂的東西都會進入解析器。
然後讓我們轉向解析器。它接收傳入的 token,計算出它們,然後建構 AST。這是一個簡單的背景知識。
語言有語法,並且有不同類型的解析器可以使用它們。如果我們自己編寫,最好為上下文無關語法編寫簡單的 LL(1) 解析器。正如我們之前所說,Lua 有直觀的語法,所以這就足夠了。
LL 表示從左到右讀取輸入字串,並為其建構從左到右的輸出。一般來說,僅僅查看當前標記 (LL(0)) 是不夠的,因此我們可能需要查看前面的 k 個標記。這樣的解析器稱為 LL(k)。
但是,我們不需要它,因為我們不會寫任何東西。為什麼?我們只有 48 小時,沒有時間創建解析器 - 特別是如果您從未開發過它。
我們必須用什麼替代方案來寫我們的詞法分析器和解析器?發電機!這是一整類實用程序,只需要我們提供一個專門描述的語法檔案作為輸入來產生解析器。
我們選了 ANTLR v4。該工具也是用 Java 編寫的,這使得它非常易於使用。經過多年的發展,它已經開始表現得很好。
問題也潛伏在這裡——我們需要知道如何為解析器編寫語法檔案。幸運的是,在 GitHub 上找到現成的選項並不難,所以我們就從這裡獲取。
使用 ANTLR 設定專案後,讓我們繼續建立抽象語法樹。
說到樹:ANTLR 經過精心設計,可以視覺化程式碼分析。例如,以下是階乘計算:
function fact (n) if n == 0 then return 1 else return n * fact(n-1) end end print("enter a number:") a = io.read("*number") print(fact(a))
我們可以得到以下樹:
我們認為它可能更接近分類的解析樹。我們可以停在這裡並開始使用它。
但我們不會這樣做並將其轉換為 AST。為什麼?作為 Java 團隊,使用與分析器中的樹類似的樹會更容易。您可以在此處閱讀有關 Java 開發的更多資訊。 Spoon 的範例類別層次結構如下圖所示:
好了,延遲手動工作已經足夠了,是時候編寫程式碼了。我們沒有完全展示整個程式碼——它沒有任何意義,因為它太大而且難看。為了讓您稍微了解我們的思維方式,我將在劇透下留下一些程式碼片段,供有興趣的人使用。
我們從樹頂開始處理:
public void enterStart_(LuaParser.Start_Context ctx) { _file = new CtFileImpl(); _file.setBlocks(new ArrayList<>()); for (var chunk : ctx.children) { if (chunk instanceof LuaParser.ChunkContext) { CtBlock block = getChunk((LuaParser.ChunkContext)chunk); if (block == null) continue; block.setParent(_file); _file.getBlocks().add(block); } } } private CtBlock getChunk(LuaParser.ChunkContext ctx) { for (var block : ctx.children) { if (block instanceof LuaParser.BlockContext) { return getBlock((LuaParser.BlockContext)block); } } return null; } private CtBlock getBlock(LuaParser.BlockContext ctx) { CtBlock block = new CtBlockImpl(); block.setLine(ctx.start.getLine()); block.setColumn(ctx.start.getCharPositionInLine()); block.setStatements(new ArrayList<>()); for (var statement : ctx.children) { if (statement instanceof LuaParser.StatContext) { var statements = getStatement((LuaParser.StatContext)statement); for (var ctStatement : statements) { ctStatement.setParent(block); block.getStatements().add(ctStatement); } } } return block; }
我們只需從上到下遍歷這棵樹,沿途建造我們的樹。遲早,我們會到達終端節點。以下是函數參數:
private ListparseParameters( LuaParser.NamelistContext ctx, CtElement parent ) { var parameters = new ArrayList (); for (var child : ctx.children) { if (Objects.equals(child.toString(), ",")) continue; var parameter = new CtParameterImpl(); parameter.setParameterName(child.toString()); parameter.setParent(parent); parameters.add(parameter); } return parameters; }
這似乎也不是很有挑戰性——我們只是將一個物體變成另一個物體。我們也將代碼列表包含在這裡。我們希望它的工作方式是清楚的。
坦白說,從長遠來看,這種方法不會帶來太大的利益。我們不是將文字轉換為樹,而是將一棵樹轉換為另一棵樹。這兩項任務都相當乏味。我們有什麼選擇?
如果我們沒有為 AST 提供程式碼分析範例,那麼本節顯然是不完整的。您可以在劇透下檢查樹的漂亮列印,以進行相同的階乘計算。
CtGlobal: CtFile: CtFunction: func Parameters: CtParameter: n CtBlock: CtIf: Condition: CtBinaryOperator: Equals Left: CtVariableRead: n Right: CtLiteral: 0 Then block CtBlock: CtReturn: CtLiteral: 1 Else block CtBlock: CtReturn: CtBinaryOperator: Multiplication Left: CtVariableRead: n Right: CtInvocation: fact Arguments: CtBinaryOperator: Minus Left: CtVariableRead: n Right: CtLiteral: 1 CtInvocation: print Arguments: CtLiteral: "enter a number:" CtAssignment: Left: CtVariableWrite: a Right: CtInvocation: CtFieldRead: Target: io Field: read Arguments: CtParameter: "*number" CtInvocation: print Arguments: CtInvocation: fact Arguments: CtVariableRead: a
讓我們對術語進行快速評論以結束本文。雖然最好將我們的實體稱為樹翻譯器,但現在我們將其稱為解析器。
接下來,我們將開發執行分析並用於建立診斷規則的功能。這是一棵樹的遍歷。總的來說,很容易掌握如何實現樹迭代。然而,我們需要對樹節點做一些有用的事情。訪客模式已登場。
你還記得經典 GoF 書中那個引人入勝的模式,它有一個很酷的實現,但使用場景卻相當模糊?因此,我們有一個關於如何在實際情況中使用它的基準案例。為了保持文章簡潔,我不會在書中展示它是如何實現的,但我將展示我們如何在分析器中實現它。
讓我們從簡單的樹遍歷開始。我們定義CtScanner類,並為單一項目及其集合添加兩個scan方法。
publicvoid scan(T element) { if (element != null) { element.accept(this); } } public void scan(List elements) { if (elements != null) { for (var element : elements) { scan(element); } } }
你看到這個來自CtElement的接受了嗎?在我們的例子中,任何繼承CtVisitable介面的類別都會實作void accept(CtAbstractVisitor Visitor)方法。稍後我們將討論CtAbstractVisitor。現在,只要知道CtScanner是從它繼承的就夠了。
這就是accept在CtAssignment:
中的樣子
@Override public void accept(CtAbstractVisitor visitor){ visitor.visitCtAssignment(this); }
是的,這是小菜一碟。節點在訪客中呼叫它們的方法。在我們的CtScanner中應該有樹節點每個類別的方法:
@Override public void visitCtIf(CtIf ctIf) { scan(ctIf.getCondition()); scan((CtStatement) ctIf.getThenStatement()); scan((CtStatement) ctIf.getElseStatement()); } @Override publicvoid visitCtLiteral(CtLiteral literal) { } @Override public void visitCtStatement(CtStatement statement) { } // ....
現在讓我們回到CtAbstractVisitor,一個從CtScanner中提取的介面。它包含存取樹節點的方法,但只有這些方法,沒有scan方法。在訪客方法的實作中,我們要麼為將來的重載留下一個佔位符(如果它是終端節點),要麼我們透過沿著樹節點執行遞歸下降來繼續展開樹節點。這就是我們繼續需要知道的全部。
It'd seem that's all with the core. For example, now our analyzer can catch simple errors like the variable self-assignment. We call it the signature analysis. To provide more advanced analysis, we'd like to obtain more data about what's going on in the code. The parser has done its job, it's time to create new entities.
So far, we've completely followed the way of compilers regarding of the analyzer structure, but now our path diverges. If Lua had a compiler, it'd start analysis to ensure that the code is syntactically correct. Although our tools will continue to overlap in features.
For our needs, let's createSemanticResolverfor two purposes:
However, we'll call it from the parser along with its traversal. No decorator this time.
To keep the app structure seamless, we'll contain all the semantic data in the same AST nodes. First, let's define the necessary properties in theCtVariableAccessinterface:
CtBlock getScope(); void setScope(CtBlock block); TypeKind getTypeKind(); void setTypeKind(TypeKind type);
Let's start with scope; this will be our key tool for defining the variable along with its name. First, we define the variable entity insideSemanticResolver. To keep it short, we'll show you only the interface, but the gist should be clear:
public static class Variable { public Variable(String identifier); public String getIdentifier(); public CtBlock getScope(); public void setScope(CtBlock block); public void setType(TypeKind type); public TypeKind getType(); // Methods use only identifier @Override public boolean equals(Object o); @Override public int hashCode(); }
Let's also further define the stack for variable scopes:
private final Stack>> stack = new Stack<>(); private final CtGlobal global; public SemanticResolver(CtGlobal global) { pushStack(global); this.global = stack.peek(); } public void pushStack(CtBlock block) { stack.push(Pair.of(block, new HashSet<>())); } public void popStack() { stack.pop(); }
The stack entry consists of a tuple of scope and variables set registered there. The stack operation is mundane. Here's how it works in the parser:
private CtBlock getBlock(LuaParser.BlockContext ctx) { CtBlock block = new CtBlockImpl(); resolver.pushStack(block); // .... resolver.popStack(); return block; }
We have to register the variables somehow. If a variable is local, it's simple—let's take the current scope and pass the variable there:
public CtBlock registerLocal(Variable variable) { var scope = stack.pop(); variable.setScope(scope.getLeft()); scope.getRight().add(variable); stack.push(scope); return scope.getLeft(); }
If thelocalkeyword isn't used, the variable will be either global or be declared somewhere above. So, we first go through the stack and double-check if it exists:
public CtBlock registerUndefined(Variable variable) { var pair = lookupPair(variable); pair.getRight().add(variable); return pair.getLeft(); } public Pair> lookupPair(Variable variable) { var buf = new Stack >>(); Pair > res = null; while (!stack.isEmpty()) { var scope = stack.pop(); buf.push(scope); if (scope.getRight().contains(variable)) { res = scope; break; } } while (!buf.isEmpty()) { stack.push(buf.pop()); } if (res == null) { return global; } return res; }
We can set the scope to variables when we have the entry:
private CtVariableWrite getVariableWriteInternal( ParseTree ctx, boolean isLocal ) { var node = new CtVariableWriteImpl(); node.setVariableName(ctx.getChild(0).toString()); CtBlock scope; if (isLocal) { scope = resolver.registerLocal( new SemanticResolver.Variable(node.getVariableName())); } else { scope = resolver.registerUndefined( new SemanticResolver.Variable(node.getVariableName())); } node.setScope(scope); return node; }
And defining it for readings:
private CtExpression getExpression(LuaParser.ExpContext ctx) { // .... if (child instanceof LuaParser.PrefixexpContext) { // .... var scope = resolver.lookupScope( new SemanticResolver.Variable(variableRead.getVariableName()) ); variableRead.setScope(scope); return variableRead; } // .... }
We won't show thelookupScopecode. It's a single-line wrapper overlookupPair, which we can see above. Here, we can end with the variable scope. We'll check the mechanism in the diagnostic rule on a separate section. Now we continue working on the semantic parsing. Now, let's move to the variable type determination.
How to determine the variable type? Indeed, we obtain them from the literals. Let's further define the type and the enumeration for them:
public interface CtLiteralextends CtExpression, CtVisitable { // .... void setTypeKind(TypeKind kind); TypeKind getTypeKind(); }
public enum TypeKind { Undefined, Number, String, Boolean, Nil }
Thus, the data type can be numeric, string, logical, andnil. However, it'll be undefined by default. The split ofundefinedandnilmay seem far-fetched, but it's okay for the pilot.
We store the literal type only in the tree node, doing it from the parser:
privateCtLiteralImpl createLiteral( // .... TypeKind type, ) { // .... literal.setTypeKind(type); return literal; }
However, the variable type will be both in the tree and inSemanticResolver. So, we can request it during further traversal and the AST building:
private ArrayListparseAssignments(LuaParser.StatContext ctx) { // .... for (int i = 0; i < variables.size(); i++) { var assignment = new CtAssignmentImpl(); var variable = variables.get(i); // .... variable.setTypeKind(resolver.lookupType(variable.getVariableName())); resolver.setType( variable.getVariableName(), variable.getScope(), SemanticResolver.evaluateExpressionType(expression) ); } return assignments; }
There is no error in the operator precedence. Let the stored variable have the type from its past assignment. It'll facilitate our work in the future. As for the methods used here, there's nothing incredible about thelookupTypeimplementation—it's basically the same aslookupPair. There's nothing complex aboutsetType:
public void setType(String variable, CtBlock scope, TypeKind type) { var opt = stack.stream() .filter(x -> Objects.equals(x.getLeft(), scope)) .findFirst(); if (opt.isPresent()) { var pair = opt.get(); var newVar = new Variable(variable); var meta = pair.getRight() .stream() .filter(x -> x.equals(newVar)) .findFirst(); meta.ifPresent(value -> value.setType(type)); } }
However,evaluateExpressionTypeis trickier. Computing variable types in dynamic languages can be a bit of a hassle. Just look at the jokes about JavaScript and string concatenation. However, firstly, Lua has a separate operator '..', and secondly, we're trying to keep the process simple. So, we'll only determine whether all the operands are the same type. We'll use the familiarCtScanner.
public static TypeKind evaluateExpressionType(CtExpression expression) { Mutabletype = new MutableObject<>(null); var typeEvaluator = new CtScanner() { private boolean stop = false; @Override public void scan(CtElement el) { if (stop) { return; } if (el instanceof CtVariableRead || el instanceof CtLiteral>) { var newType = el instanceof CtVariableRead ? ((CtVariableRead) el).getTypeKind() : ((CtLiteral>) el).getTypeKind(); if (newType.equals(TypeKind.Undefined)) { type.setValue(TypeKind.Undefined); stop = true; return; } else if (type.getValue() == null) { type.setValue(newType); } else if (!type.getValue().equals(newType)) { type.setValue(TypeKind.Undefined); stop = true; return; } } super.scan(el); } }; typeEvaluator.scan(expression); return type.getValue(); }
InparseAssignments, we've set the assignment type to the (CtVariableWrite) variable but forgot about reading (CtVariableRead). Let's fix it:
private CtExpression getExpression(LuaParser.ExpContext ctx) { // .... if (child instanceof LuaParser.PrefixexpContext) { // .... variableRead.setTypeKind( resolver.lookupType(variableRead.getVariableName()) ); var scope = resolver.lookupScope( new SemanticResolver.Variable(variableRead.getVariableName())); variableRead.setScope(scope); return variableRead; } // .... }
We've completed the semantic analysis and almost ready to start searching for bugs.
First, we make two quick stops. While the topic of the data-flow analysis deserves a series of articles, it'd be wrong to skip over it. Here, we won't dive deep into theory but just try to memorize the values set by literals.
First, let's fall into the sin of self-copying and define the variable entity forDataFlowagain but in a simpler way. Again, we'll only show you the interface:
private static class Variable { private Variable(String identifier, CtBlock scope); // Methods use identifier and scope @Override public boolean equals(Object o); @Override public int hashCode(); }
Here's the rest of the class content:
public class DataFlow { private static class Variable { // .... } MapvariableCache = new HashMap<>(); public void scanDataFlow(CtElement element) { if (element instanceof CtAssignment) { CtAssignment variableWrite = (CtAssignment) element; if (variableWrite.getAssignment() instanceof CtLiteral>) { var assigned = variableWrite.getAssigned(); var variable = new Variable( assigned.getVariableName(), assigned.getScope() ); variableCache.put( variable, getValue(variableWrite.getAssignment()) ); } } } public Object getValue(CtExpression expression) { if (expression instanceof CtVariableRead) { CtVariableRead variableRead = (CtVariableRead) expression; var variable = new Variable( variableRead.getVariableName(), variableRead.getScope() ); return variableCache.getOrDefault(variable, null); } else if (expression instanceof CtLiteral>) { return ((CtLiteral>) expression).getValue(); } return null; } }
It's quite simple: inscanDataFlow, we associate the value with the variable, and ingetValue, we extract that value for the set node. Everything is simple because we don't factor in branching, loops, or even expressions. Why don't we factor them in? Branching is the very topic that deserves its own series of articles. What about expressions? Well, we didn't make it in two days. Given what we've achieved in the last two days, we think this task is feasible. We just postpone it as a homework.
That's all. It's clear that such a solution is far from a real product, but we have laid some foundation. Then we can either try to enhance the code—and we'll introduce data-flow analysis via AST. Or we can redo everything properly and build the control-flow graph.
We've done the class implementation, but we haven't discussed how to use it. We've written the class, but what shall we do with it? Let's talk about it in the following section. Now we just say thatDataFlowworks right before calling the diagnostic rules. Those are called when traversing the finished AST. Thus, the rules will have access to the current variable values. This is calledEnvironment; you can see it in your debugger.
Welcome to the last section regarding the core. We've already had the AST that is full of semantic data, as well as the data-flow analysis that is just waiting to be run. It's a good time to put it all together and set the stage for our diagnostic rules.
How is the analysis performed? It's simple: we get on the topmost tree node, and then we start recursive traversal. That is, we need something that will traverse the tree. We haveCtScannerfor that. Based on it, we defineMunInvoker:
public class MunInvoker extends CtScanner { private final Listrules = new ArrayList<>(); private final Analyzer analyzer; public MunInvoker(Analyzer analyzer) { this.analyzer = analyzer; rules.add(new M6005(analyzer)); rules.add(new M6020(analyzer)); rules.add(new M7000(analyzer)); rules.add(new M7001(analyzer)); } @Override public void scan(T element) { if (element != null) { analyzer.getDataFlow().scanDataFlow(element); rules.forEach(element::accept); super.scan(element); } } }
You can notice a few unknown things in the code:
Otherwise, the class operation shouldn't raise any questions. Each time we enter any tree node, the analyzer rule is called, and the variable values are evaluated right before it. Next, the traversal continues in line with to theCtScanneralgorithm.
So, we have the analyzer core prototype—it's a good time to start analyzing something.
The base for our rules is ready—it's theCtAbstractVisitorclass. The analysis goes as follows: the rule overloads a few visitors and analyzes the data contained in the AST nodes. Let's extendCtAbstractVisitorwith the abstractMunRuleclass, which we use to create rules. In this class, we also define theaddRulemethod that generates warnings.
Speaking of warnings: what data do they need? First is a warning message to demonstrate users what they may have been mistaken about. Second, the user needs to know where and what the analyzer warns. So, let's add data about the file where the analyzer has detected the troublesome code block and the location of this code fragment.
Here's what theMunRuleclass looks like:
public abstract class MunRule extends CtAbstractVisitor { private Analyzer analyzer; public void MunRule(Analyzer analyzer) { this.analyzer = analyzer; } protected Analyzer getAnalyzer() { return analyzer; } protected void addRule(String message, CtElement element) { var warning = new Warning(); warning.message = message; WarningPosition pos = new WarningPosition( Analyzer.getFile(), element.getLine(), element.getColumn() + 1 ); warning.positions.add(pos); analyzer.addWarning(warning); } public DataFlow getDataFlow() { return analyzer.getDataFlow(); } }
TheWarningPositionandWarningclasses are just data stores, so we won't list them. We'll talk aboutaddWarningnow.
The last thing to prepare is how we're going to view the diagnostic rules. To do this, we combine all our features together, using the already mentionedAnalyzerclass. Actually, here it is:
public class Analyzer { private DataFlow dataFlow = new DataFlow(); public DataFlow getDataFlow() { return dataFlow; } public CtElement getAst(String pathToFile) throws IOException { InputStream inputStream = new FileInputStream(pathToFile); Lexer lexer = new LuaLexer(CharStreams.fromStream(inputStream)); ParseTreeWalker walker = new ParseTreeWalker(); var listener = new LuaAstParser(); walker.walk(listener, new LuaParser( new CommonTokenStream(lexer) ).start_()); return listener.getFile(); } protected void addWarning(Warning warning) { Main.logger.info( "WARNING: " + warning.code + " " + warning.message + " (" + warning.positions.get(0).line + ", " + warning.positions.get(0).column + ")"); } public MunInvoker getMunInvoker() { return new MunInvoker(this); } public void analyze(String pathToFile) { try { var top = getAst(pathToFile); var invoker = getMunInvoker(); invoker.scan(top); } catch (IOException ex) { Main.logger.error("IO error: " + ex.getMessage()); } } }
To explain how it works, we'll give you an example of the whole analysis process:
We've got the prep work—it's time to start writing diagnostic rules.
We've decided to start writing the rules with a simple one: PVS-Studio has the Java diagnostic rule, V6005, where the analyzer check if a variable is assigned to itself. It can simply be copied and slightly adapted to our tree. Since our analyzer is called Mun, we start the numbers of our diagnostic rules with M. Let's create theM6005class, extendingMunRule, and override thevisitCtAssignmentmethod in it. The following check will be located in the method:
public class M6005 extends MunRule { private void addRule(CtVariableAccess variable) { addRule("The variable is assigned to itself.", variable); } @Override public void visitCtAssignment(CtAssignment assignment) { if (RulesUtils.equals(assignment.getAssigned(), assignment.getAssignment())) { addRule(assignment.getAssigned()); } } }
The usedRulesUtils.equalsmethod is a wrapper and overload for anotherequalsmethod that checks the name and scope:
public static boolean equals(CtVariableAccess left, CtVariableAccess right) { return left.getVariableName().equals(right.getVariableName()) && left.getScope().equals(right.getScope()); }
We need to check the scope, because the next code fragment isn't an assignment of the variable to itself:
local a = 5; begin local a = a; end
Now, we can test the diagnostic rule on some simple code and see if it works. The following example will issue the warning on the marked line: "M6005 The variable is assigned to itself":
local a = 5; local b = 3; if (b > a) then a = a; <= end
Well, we've warmed up, so let's move on. The analyzer has already had the primitive data-flow analysis (DataFlow) that we can and should use. Again, let's look at one of our existing diagnostic rules, V6020, where the analyzer checks for the zero division. We try to adapt it for our analyzer. The divisor can be either the variable as zero or a literal as zero. We need to access the cache of variables to check their value.
Here's the simple implementation of such a diagnostic rule:
public class M6020 extends MunRule { private void addWarning(CtElement expression, String opText) { addRule(String.format( "%s by zero. Denominator '%s' == 0.", opText, expression instanceof CtLiteral ? ((CtLiteral) expression).getValue() : ((CtVariableRead) expression).getVariableName() ), expression ); } @Override public void visitCtBinaryOperator(CtBinaryOperator operator) { BinaryOperatorKind opKind = operator.getKind(); if (opKind != BinaryOperatorKind.DIV && opKind != BinaryOperatorKind.MOD) { return; } apply(operator.getRightHandOperand(), opKind == BinaryOperatorKind.MOD); } private void apply(CtExpression expr, boolean isMod) { Object variable = getDataFlow().getValue(expr); if (variable instanceof Integer) { if ((Integer) variable == 0) { String opText = isMod ? "Mod" : "Divide"; addWarning(expr, opText); } } } }
We can see that the diagnostic rule works on simple examples and issues the following warning: "M6020 Divide by zero. Denominator 'b' == 0" on these lines:
local a = 5; local b = 0; local c = a / b; <= local d = a / 0; <=
If you have made the expression evaluator as your homework, you may try the diagnostic rule on this code:
local b = 7; local b = b - 7; local c = a / b;
Since we're writing the Lua analyzer, we need to write diagnostic rules for this language. Let's start simple.
Lua is a dynamic scripting language. Let's use this feature and write the diagnostic rule that allows us to catch overwritten types.
We'll also need to pick a new number for the diagnostic rule. Earlier we just copied them from the Java analyzer. Now, it seems like it's time to start a new thousand—the seventh. Who knows what diagnostic rule it'll be in PVS-Studio, but at the time of writing this article, it will be Lua.
Knowing about types, we facilitate our case: we need to check that the left and right assignments are of different types. Note that we ignoreUndefinedfor the left part andNilfor the right part. The code looks like this:
public class M7000 extends MunRule { @Override public void visitCtAssignment(CtAssignment assignment) { var assigned = assignment.getAssigned(); var exprType = SemanticResolver.evaluateExpressionType( assignment.getAssignment()); if (assigned.getTypeKind().equals(TypeKind.Undefined) || exprType.equals(TypeKind.Nil) ) { return; } if (!assigned.getTypeKind().equals(exprType)) { addRule( String.format( "Type of the variable %s is overridden from %s to %s.", assigned.getTypeKind().toString(), exprType.toString() assigned, ); } } }
It's time to check the diagnostic rule in real cases. In the following example, our analyzer issues the warning only on the last line:
local a = "string"; if (true) then local a = 5; end a = 5; <=
The analyzer warning: "M7000 Type of the variable a is overridden from Integer to String".
Let's finish smoothly. The Lua plugin for VS Code has a diagnostic rule that detects global lowercase variables. This check can help detect the forgottenlocalidentifiers. Let's implement the same diagnostic rule in our analyzer.
Here, just as before, we'll need to use the variable scope data that is obtained via the semantic analysis. We just find where the variable was declared. That's also where the variable is assigned a value. Then check its scope and name. If the variable is global and starts with a lowercase, the analyzer will warn. Easy-peasy.
Let's create a new class and override thevisitCtAssignmentmethod in it again. That way, we can look for the problematic global variables:
public class M7001 extends MunRule { @Override public void visitCtAssignment(CtAssignment assignment) { var variable = assignment.getAssigned(); var firstLetter = variable.getVariableName().substring(0, 1); if (variable.getScope() instanceof CtGlobal && !firstLetter.equals(firstLetter.toUpperCase())) { addRule("Global variable in lowercase initial.", variable); } } }
Let's check the diagnostic rule work. It issues the warning: "M7001 Global variable in lowercase initial." It's issued on the second line in the code snippet below:
function sum_numbers(b, c) a = b + c; <= return a; end local a = sum_numbers(10, 5);
Alright, we've written the diagnostic rules, we've done with the analyzer (and it even works). Breathe out. Now we can enjoy the fruits of our labors.
Above we've already shown the code that allows us to view warnings in a console or file. Here is how the process of working with the analyzer looks like in the console. We run the analyzer using the command:
java -jar mun-analyzer.jar -input "C:\munproject\test.lua"
And we get:
However, first, the analyzer is a tool, and it should be user-friendly. Instead of working with the console, it'd be much more convenient to work with a plugin.
又到了借錢的時間了。 PVS-Studio 已經為許多不同的 IDE 提供了插件,例如 Visual Studio Code 和 IntelliJ IDEA。您可以查看警告並瀏覽它們。由於分析器報告具有標準化格式,因此我們可以簡單地借用 Java 分析器的 JSON 報告生成演算法。該演算法廣泛且枯燥,因此我們不會展示它。我們仍然必須從命令列運行它,但使用參數* -output "D:git/MunProject/report.json"*.
然後,我們可以在 IntelliJ IDEA 或 VS Code 中開啟報表並查看 Lua 分析器警告:
甜甜的!現在我們可以將分析器用於其全部預期目的,而無需犧牲可用性。
那麼,我們已經寫了成熟的分析器嗎?呃,不完全是。在這段漫長旅程的終點,我們擁有了最真實的飛行員,經歷了所有階段。然而,成長空間是巨大的:
還有很多很多。換句話說,從試點到成熟工具的道路相當荊棘。因此,PVS-Studio 專注於現有的方向:C#、C、C++ 和 Java,而不是新的方向。如果您使用這些語言中的任何一種編寫,您可以嘗試我們的分析器。
這篇文章比我們想像的要長很多,所以如果您讀到了最後,請發表評論:)我們很樂意收到您的回饋。
如果您對此主題感興趣,您可以閱讀有關為我們的語言開發分析器的資訊:
以上是如何在數小時內開發程式碼分析器的詳細內容。更多資訊請關注PHP中文網其他相關文章!