Repository: https://github.com/ray-d-song/guesslang-js
Demo: https://ray-d-song.github.io/guesslang-js/
Recently, I'm working on a project called EchoRSS, and I have a very wanted feature, which is to intercept external links in subscriptions (read full text, quote, etc.) and display them directly on the current page.
There is a problem that the returned HTML code block loses the language annotation (or the language was not annotated on the pre and code tags in the original code block), so it cannot be highlighted using tools like shiki or prism.js.
I found three solutions to detect code language:
This is a Ruby project deployed on the server, and Github uses it to detect the language composition of the repository. If you need extremely high accuracy and can be calculated on the server, this is the best solution.
highlight.js is a very famous web code highlighting library, and it is also the only library that provides automatic code detection.
The principle is very simple, which is to enumerate the keywords of the language, and then match them one by one with the text, and finally see which one has the highest matching degree.
hljs has four problems.
guesslang is a machine learning project based on tensorflow.js.
Microsoft ported this project to node.js in 2021 and added the automatic language detection function to vscode.
A Vietnamese guy hieplpvip three years ago also ported this project to the browser, but there are also three problems: