是时候离开了吗？重建的时间到了！制作推特-Golang-PHP中文网

对于厌倦了马斯克和 Twitter 的用户来说，新社交网络最关键的功能如下；

导入 Twitter 的 archive.zip 文件
注册尽可能简单
相似（如果不相同）的用户功能

平台不太重要但绝对有用的功能；

道德货币化和审核
利用人工智能帮助识别有问题的内容
使用 Onfido 或 SMART 身份服务的蓝色勾号

在这篇文章中，我们将重点关注第一个功能。导入 Twitter 的 archive.zip 文件。

该文件

Twitter 并没有让你的数据变得那么容易获取。很高兴他们允许您访问它（从法律上讲，他们必须这样做）。格式太垃圾了

它实际上是一个迷你网络存档，您的所有数据都保存在 JavaScript 文件中。它更像是一个网络应用程序，而不是方便的数据存储。

当您打开您的 archive.html 文件时，您会看到类似这样的内容；

Time to Leave? Time to Rebuild! Making Twitter

注意：我很早就决定使用 Next.js 构建网站，使用 Go 和 GraphQL 构建后端。

那么，当您的数据不是结构化数据时该怎么办？

好吧，你解析一下。

创建基本的 Go 脚本

前往官方文档了解如何开始使用 Go，并设置您的项目目录。

我们将一起破解这个过程。这似乎是吸引那些过于依赖 TwitterX 的人的最重要功能之一。

第一步是创建一个 main.go 文件。在这个文件中，我们将继续（哈哈）并做一些事情；

os.Args：这是一个保存命令行参数的切片。
os.Args[0] 是程序的名称，os.Args[1] 是传递给程序的第一个参数。
参数检查：该函数检查是否至少提供了一个参数。如果没有，它会打印一条消息，询问路径。
run 函数：该函数目前只是打印传递给它的路径。

package main import ( "fmt" "os" ) func run(path string) { fmt.Println("Path:", path) } func main() { if len(os.Args) < 2 { fmt.Println("Please provide a path as an argument.") return } path := os.Args[1] run(path) }

登录后复制

在每一步，我们都会像这样运行文件；

go run main.go twitter.zip

登录后复制

如果您没有 Twitter 存档导出，请创建一个简单的 manifest.js 文件并为其提供以下 JavaScript。

window.__THAR_CONFIG = { "userInfo" : { "accountId" : "1234567890", "userName" : "lukeocodes", "displayName" : "Luke ✨" }, };

登录后复制

将其压缩到我们将在整个过程中使用的 twitter.zip 文件中。

读取 Zip 文件

下一步是读取 zip 文件的内容。我们希望尽可能高效地做到这一点，并减少在磁盘上提取数据的时间。

zip中有很多文件也不需要解压。

我们将编辑 main.go 文件；

打开ZIP文件：zip.OpenReader()函数用于打开路径指定的ZIP文件。
遍历文件：该函数使用 r.File（它是 zip.File 的一个切片）循环遍历 ZIP 存档中的每个文件。打印每个文件的名称属性。

package main import ( "archive/zip" "fmt" "log" "os" ) func run(path string) { // Open the zip file r, err := zip.OpenReader(path) if err != nil { log.Fatal(err) } defer r.Close() // Iterate through the files in the zip archive fmt.Println("Files in the zip archive:") for _, f := range r.File { fmt.Println(f.Name) } } func main() { // Example usage if len(os.Args) < 2 { log.Fatal("Please provide the path to the zip file as an argument.") } path:= os.Args[1] run(path) }

登录后复制

仅限JS！我们正在寻找结构化数据

这个存档文件非常没有帮助。我们只想检查 /data 目录中的 .js 文件。

打开 ZIP 文件：使用 zip.OpenReader() 打开 ZIP 文件。
检查 /data 目录：程序循环访问 ZIP 存档中的文件。它使用 strings.HasPrefix(f.Name, "data/") 来检查文件是否位于 /data 目录中。
查找 .js 文件：程序还会使用 filepath.Ext(f.Name) 检查文件是否具有 .js 扩展名。
读取和打印内容：如果在 /data 目录中找到 .js 文件，程序将读取并打印其内容。

package main import ( "archive/zip" "fmt" "io/ioutil" "log" "os" "path/filepath" "strings" ) func readFile(file *zip.File) { // Open the file inside the zip rc, err := file.Open() if err != nil { log.Fatal(err) } defer rc.Close() // Read the contents of the file contents, err := ioutil.ReadAll(rc) // deprecated? :/ if err != nil { log.Fatal(err) } // Print the contents fmt.Printf("Contents of %s:\n", file.Name) fmt.Println(string(contents)) } func run(path string) { // Open the zip file r, err := zip.OpenReader(path) if err != nil { log.Fatal(err) } defer r.Close() // Iterate through the files in the zip archive fmt.Println("JavaScript files in the zip archive:") for _, f := range r.File { // Use filepath.Ext to check the file extension if strings.HasPrefix(f.Name, "data/") && strings.ToLower(filepath.Ext(f.Name)) == ".js" { readFile(f) return // Exit after processing the first .js file so we don't end up printing a gazillion lines when testing } } } func main() { // Example usage if len(os.Args) < 2 { log.Fatal("Please provide the path to the zip file as an argument.") } path:= os.Args[1] run(path) }

登录后复制

解析JS！我们想要这些数据

我们找到了结构化数据。现在我们需要解析它。好消息是，已经有在 Go 中使用 JavaScript 的现有包。我们将使用 goja。

如果您正在阅读本节，熟悉 Goja，并且已经看过该文件的输出，您可能会发现我们将来会出现错误。

安装goja：

go get github.com/dop251/goja

登录后复制

现在我们将编辑 main.go 文件以执行以下操作；

用goja解析：goja.New()函数创建一个新的JavaScript运行时，vm.RunString(processedContents)在该运行时运行处理后的JavaScript代码。
处理解析错误

package main import ( "archive/zip" "fmt" "io/ioutil" "log" "os" "path/filepath" "strings" ) func readFile(file *zip.File) { // Open the file inside the zip rc, err := file.Open() if err != nil { log.Fatal(err) } defer rc.Close() // Read the contents of the file contents, err := ioutil.ReadAll(rc) // deprecated? :/ if err != nil { log.Fatal(err) } // Parse the JavaScript file using goja vm := goja.New() _, err = vm.RunString(contents) if err != nil { log.Fatalf("Error parsing JS file: %v", err) } fmt.Printf("Parsed JavaScript file: %s\n", file.Name) } func run(path string) { // Open the zip file r, err := zip.OpenReader(path) if err != nil { log.Fatal(err) } defer r.Close() // Iterate through the files in the zip archive fmt.Println("JavaScript files in the zip archive:") for _, f := range r.File { // Use filepath.Ext to check the file extension if strings.HasPrefix(f.Name, "data/") && strings.ToLower(filepath.Ext(f.Name)) == ".js" { readFile(f) return // Exit after processing the first .js file so we don't end up printing a gazillion lines when testing } } } func main() { // Example usage if len(os.Args) < 2 { log.Fatal("Please provide the path to the zip file as an argument.") } path:= os.Args[1] run(path) }

登录后复制

惊喜。窗口未定义可能是一个熟悉的错误。基本上 goja 运行 EMCA 运行时。窗口是浏览器上下文，遗憾的是不可用。

实际解析 JS

此时我遇到了一些问题。包括无法返回数据，因为它是顶级 JS 文件。

长话短说，我们需要在将文件加载到运行时之前修改它们的内容。

我们来修改main.go文件；

reConfig: A regex that matches any assignment of the form window.someVariable = { and replaces it with var data = {.
reArray: A regex that matches any assignment of the form window.someObject.someArray = [ and replaces it with var data = [
Extracting data: Running the script, we use vm.Get("data") to retrieve the value of the data variable from the JavaScript context.

package main import ( "archive/zip" "fmt" "io/ioutil" "log" "os" "path/filepath" "regexp" "strings" "github.com/dop251/goja" ) func readFile(file *zip.File) { // Open the file inside the zip rc, err := file.Open() if err != nil { log.Fatal(err) } defer rc.Close() // Read the contents of the file contents, err := ioutil.ReadAll(rc) if err != nil { log.Fatal(err) } // Regular expressions to replace specific patterns reConfig := regexp.MustCompile(`window\.\w+\s*=\s*{`) reArray := regexp.MustCompile(`window\.\w+\.\w+\.\w+\s*=\s*\[`) // Replace patterns in the content processedContents := reConfig.ReplaceAllStringFunc(string(contents), func(s string) string { return "var data = {" }) processedContents = reArray.ReplaceAllStringFunc(processedContents, func(s string) string { return "var data = [" }) // Parse the JavaScript file using goja vm := goja.New() _, err = vm.RunString(processedContents) if err != nil { log.Fatalf("Error parsing JS file: %v", err) } // Retrieve the value of the 'data' variable from the JavaScript context value := vm.Get("data") if value == nil { log.Fatalf("No data variable found in the JS file") } // Output the parsed data fmt.Printf("Processed JavaScript file: %s\n", file.Name) fmt.Printf("Data extracted: %v\n", value.Export()) } func run(path string) { // Open the zip file r, err := zip.OpenReader(path) if err != nil { log.Fatal(err) } defer r.Close() // Iterate through the files in the zip archive for _, f := range r.File { // Check if the file is in the /data directory and has a .js extension if strings.HasPrefix(f.Name, "data/") && strings.ToLower(filepath.Ext(f.Name)) == ".js" { readFile(f) return // Exit after processing the first .js file so we don't end up printing a gazillion lines when testing } } } func main() { // Example usage if len(os.Args) < 2 { log.Fatal("Please provide the path to the zip file as an argument.") } path:= os.Args[1] run(path) }

登录后复制

Hurrah. Assuming I didn't muck up the copypaste into this post, you should now see a rather ugly print of the struct data from Go.

JSON would be nice

Edit the main.go file to marshall the JSON output.

Use value.Export() to get the data from the struct
Use json.MarshallIndent() for pretty printed JSON (use json.Marshall if you want to minify the output).

package main import ( "archive/zip" "encoding/json" "fmt" "io/ioutil" "log" "os" "path/filepath" "regexp" "strings" "github.com/dop251/goja" ) func readFile(file *zip.File) { // Open the file inside the zip rc, err := file.Open() if err != nil { log.Fatal(err) } defer rc.Close() // Read the contents of the file contents, err := ioutil.ReadAll(rc) // deprecated :/ if err != nil { log.Fatal(err) } // Regular expressions to replace specific patterns reConfig := regexp.MustCompile(`window\.\w+\s*=\s*{`) reArray := regexp.MustCompile(`window\.\w+\.\w+\.\w+\s*=\s*\[`) // Replace patterns in the content processedContents := reConfig.ReplaceAllStringFunc(string(contents), func(s string) string { return "var data = {" }) processedContents = reArray.ReplaceAllStringFunc(processedContents, func(s string) string { return "var data = [" }) // Parse the JavaScript file using goja vm := goja.New() _, err = vm.RunString(processedContents) if err != nil { log.Fatalf("Error parsing JS file: %v", err) } // Retrieve the value of the 'data' variable from the JavaScript context value := vm.Get("data") if value == nil { log.Fatalf("No data variable found in the JS file") } // Convert the data to a Go-native type data := value.Export() // Marshal the Go-native type to JSON jsonData, err := json.MarshalIndent(data, "", " ") if err != nil { log.Fatalf("Error marshalling data to JSON: %v", err) } // Output the JSON data fmt.Println(string(jsonData)) } func run(zipFilePath string) { // Open the zip file r, err := zip.OpenReader(zipFilePath) if err != nil { log.Fatal(err) } defer r.Close() // Iterate through the files in the zip archive for _, f := range r.File { // Check if the file is in the /data directory and has a .js extension if strings.HasPrefix(f.Name, "data/") && strings.ToLower(filepath.Ext(f.Name)) == ".js" { readFile(f) return // Exit after processing the first .js file } } } func main() { // Example usage if len(os.Args) < 2 { log.Fatal("Please provide the path to the zip file as an argument.") } zipFilePath := os.Args[1] run(zipFilePath) }

登录后复制

That's it!

go run main.go twitter.zip

登录后复制

} "userInfo": { "accountId": "1234567890", "displayName": "Luke ✨", "userName": "lukeocodes" } }

登录后复制

Open source

I'll be open sourcing a lot of this work so that others who want to parse the data from the archive, can store it how they like.

以上是是时候离开了吗？重建的时间到了！制作推特的详细内容。更多信息请关注PHP中文网其他相关文章！

php8，我来也

30分钟学会网站布局

尚观Oracle入门到精通视频教程

你的第一行 UNI-APP 代码

Flutter 从头到应用启动

兄弟连Linux新版视频教程

AXURE 9视频教程（适合产品经理交互产品设计UI）

零基础PS视频教程

16天带你入门UI视频教程

PS技巧和切片技巧视频教程

阿里云环境搭建以及项目上线视频教程

计算机网络概述——程序员必须掌握的基础知识

程序员必备教程——HTTP协议讲解

Websocket视频教程

是时候离开了吗？重建的时间到了！制作推特

该文件

创建基本的 Go 脚本

读取 Zip 文件

仅限JS！我们正在寻找结构化数据

解析JS！我们想要这些数据

实际解析 JS

JSON would be nice

Open source