解析'pip install'命令以获取已安装软件包文本中的范围-PHP中文网问答

文章专题学习下载问答编程词典手游最近更新

简体中文(ZH-CN) English(EN) 繁体中文(ZH-TW) 日本語(JA) 한국어(KO) Melayu(MS) Français(FR) Deutsch(DE)

解析'pip install'命令以获取已安装软件包文本中的范围

P粉431220279 2023-09-07 18:54:45

440

我正在进行一个项目，需要我提取使用pip install命令安装的Python包的名称和位置。

一个网页包含一个code元素，其中包含多行文本和bash命令。我想编写一个JS代码，可以解析这个文本并找到包和它们在文本中的位置。

例如，如果文本是：

$ pip install numpy pip install --global-option build_ext -t ../ pandas>=1.0.0,<2 sudo apt update pip uninstall numpy pip install "requests==12.2.2"

我想得到类似这样的结果：

[ { "name": "numpy", "position": 14 }, { "name": "pandas", "position": 65 }, { "name": "requests", "position": 131 } ]

我该如何在JavaScript中实现这个功能？

P粉431220279

全部回复 (2)

P粉7736596872023-09-08 15:53:55 2 楼

您可以在此答案中查看我解释的代码。

这里还有另一种类似的解决方案，更基于正则表达式：

const pipOptionsWithArg = [ '-c', '--constraint', '-e', '--editable', '-t', '--target', '--platform', '--python-version', '--implementation', '--abi', '--root', '--prefix', '-b', '--build', '--src', '--upgrade-strategy', '--install-option', '--global-option', '--no-binary', '--only-binary', '--progress-bar', '-i', '--index-url', '--extra-index-url', '-f', '--find-links', '--log', '--proxy', '--retires', '--timeout', '--exists-action', '--trusted-host', '--cert', '--client-cert', '--cache-dir', ]; const optionWithArgRegex = `( (${pipOptionsWithArg.join('|')})(=| )\S+)*`; const options = /( -[-\w=]+)*/; const packageArea = /["']?(?(?\w[\w.-]*)([=<>~!]=?[\w.,<>]+)?)["']?(?=\s|$)/g; const repeatedPackages = `(?( ${packageArea.source})+)`; const whiteSpace = / +/; const PIP_COMMAND_REGEX = new RegExp( `(?pip install${optionWithArgRegex}${options.source})${repeatedPackages}`.replaceAll(' ', whiteSpace.source), 'g' ); export const parseCommand = (command) => { const matches = Array.from(command.matchAll(PIP_COMMAND_REGEX)); const results = matches.flatMap((match) => { const packagesStr = match?.groups.packages; if (!packagesStr) return []; const packagesIndex = command.indexOf(packagesStr, match.index + match.groups.command.length); return Array.from(packagesStr.matchAll(packageArea)) .map((packageMatch) => { const packagePart = packageMatch.groups.package_part; const name = packageMatch.groups.package_name; const startIndex = packagesIndex + packagesStr.indexOf(packagePart, packageMatch.index); const endIndex = startIndex + packagePart.length; return { type: 'pypi', name, version: undefined, startIndex, endIndex, }; }) .filter((result) => result.name !== 'requirements.txt'); }); return results; };

点赞+0

添加回复

P粉1945410722023-09-08 10:50:01 1 楼

这里是一个可选的解决方案，尝试使用循环而不是正则表达式：

思路是找到包含pip install文本的行，这些行是我们感兴趣的行。然后，将命令分解成单词，并在它们上进行循环，直到达到命令的包部分。

首先，我们将定义一个用于包的正则表达式。请记住，一个包可以是像pip install 'stevedore>=1.3.0,<1.4.0' "MySQL_python==1.2.2"这样的东西：

const packageArea = /(?<=\s|^)["']?(?(?\w[\w.-]*)([=<>~!]=?[\w.,<>]+)?)["']?(?=\s|$)/;

注意到命名分组，package_part用于识别“带版本的包”字符串，而package_name用于提取包名。

关于参数

我们有两种类型的命令行参数：选项和标志。

选项的问题在于我们需要理解下一个单词不是包名，而是选项值。

所以，我首先列出了pip install命令中的所有选项：

const pipOptionsWithArg = [ '-c', '--constraint', '-e', '--editable', '-t', '--target', '--platform', '--python-version', '--implementation', '--abi', '--root', '--prefix', '-b', '--build', '--src', '--upgrade-strategy', '--install-option', '--global-option', '--no-binary', '--only-binary', '--progress-bar', '-i', '--index-url', '--extra-index-url', '-f', '--find-links', '--log', '--proxy', '--retires', '--timeout', '--exists-action', '--trusted-host', '--cert', '--client-cert', '--cache-dir', ];

然后我编写了一个稍后将使用的函数，用于在看到一个参数时决定要做什么：

const handleArgument = (argument, restCommandWords) => { let index = 0; index += argument.length + 1; // +1 是为了去掉 split 时的空格 if (argument === '-r' || argument === '--requirement') { while (restCommandWords.length > 0) { index += restCommandWords.shift().length + 1; } return index; } if (!pipOptionsWithArg.includes(argument)) { return index; } if (argument.includes('=')) return index; index += restCommandWords.shift().length + 1; return index; };

这个函数接收到了识别出的参数和命令的其余部分，分割成单词。

(在这里你开始看到“索引计数器”。由于我们还需要找到每个发现的位置，我们需要跟踪原始文本中的当前位置)。

在函数的最后几行中，你可以看到我处理了--option=something和--option something两种情况。

解析器

现在主解析器将原始文本分割成行，然后再分割成单词。

每个操作都必须更新全局索引，以跟踪我们在文本中的位置，并且这个索引帮助我们在文本中搜索和查找，而不会陷入错误的子字符串中，使用indexOf(str, counterIndex)：

export const parseCommand = (multilineCommand) => { const packages = []; let counterIndex = 0; const lines = multilineCommand.split('\n'); while (lines.length > 0) { const line = lines.shift(); const pipInstallMatch = line.match(/pip +install/); if (!pipInstallMatch) { counterIndex += line.length + 1; // +1 是为了换行符 continue; } const pipInstallLength = pipInstallMatch.index + pipInstallMatch[0].length; const argsAndPackagesWords = line.slice(pipInstallLength).split(' '); counterIndex += pipInstallLength; while (argsAndPackagesWords.length > 0) { const word = argsAndPackagesWords.shift(); if (!word) { counterIndex++; continue; } if (word.startsWith('-')) { counterIndex += handleArgument(word, argsAndPackagesWords); continue; } const packageMatch = word.match(packageArea); if (!packageMatch) { counterIndex += word.length + 1; continue; } const startIndex = multilineCommand.indexOf(packageMatch.groups.package_part, counterIndex); packages.push({ type: 'pypi', name: packageMatch.groups.package_name, version: undefined, startIndex, endIndex: startIndex + packageMatch.groups.package_part.length, }); counterIndex += word.length + 1; } } return packages; };

点赞+0

添加回复