Home > Web Front-end > JS Tutorial > How to detect code language in browser

How to detect code language in browser

Patricia Arquette
Release: 2024-11-27 00:13:10
Original
1035 people have browsed it

How to detect code language in browser

Repository: https://github.com/ray-d-song/guesslang-js

Demo: https://ray-d-song.github.io/guesslang-js/

Recently, I'm working on a project called EchoRSS, and I have a very wanted feature, which is to intercept external links in subscriptions (read full text, quote, etc.) and display them directly on the current page.

There is a problem that the returned HTML code block loses the language annotation (or the language was not annotated on the pre and code tags in the original code block), so it cannot be highlighted using tools like shiki or prism.js.

I found three solutions to detect code language:

1. linguist

This is a Ruby project deployed on the server, and Github uses it to detect the language composition of the repository. If you need extremely high accuracy and can be calculated on the server, this is the best solution.

2. hljs

highlight.js is a very famous web code highlighting library, and it is also the only library that provides automatic code detection.

The principle is very simple, which is to enumerate the keywords of the language, and then match them one by one with the text, and finally see which one has the highest matching degree.

hljs has four problems.

  • It requires a very long code length, and most languages require at least 300 characters to achieve a relatively good accuracy.
  • The part that detects the language is not a separate module, but tightly coupled with the parser and render, and the code is also very imperative, making it difficult to extract useful parts.
  • If you don't extract the detection module, the original format (line breaks and indentation) of the code will be lost when using hljs to highlight.
  • It requires a lot of regular matching, the performance is poor, and because of reason 2, it cannot be run in a web worker.

3. guesslang

guesslang is a machine learning project based on tensorflow.js.

Microsoft ported this project to node.js in 2021 and added the automatic language detection function to vscode.

A Vietnamese guy hieplpvip three years ago also ported this project to the browser, but there are also three problems:

Popular Tutorials
More>
Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template