トリッキーな Golang 面接の質問 - パートデータレース-Golang-php.cn

Tricky Golang interview questions - Part Data Race

ここで、コードレビューの面接の質問をもう 1 つ紹介します。この質問は前の質問よりも高度で、より上級の聴衆を対象としています。この問題には、スライスと並列プロセス間でのデータ共有に関する知識が必要です。

スライスとその構築方法に詳しくない場合は、スライスヘッダーに関する私の以前の記事を参照してください。

データレースとは何ですか?

データ競合は、2 つ以上のスレッド (Go の場合はゴルーチン) が共有メモリに同時にアクセスし、それらのアクセスのうち少なくとも 1 つが書き込み操作である場合に発生します。アクセスを管理するための適切な同期メカニズム (ロックやチャネルなど) が設置されていない場合、データの破損、一貫性のない状態、クラッシュなどの予測できない動作が発生する可能性があります。

本質的に、データ競合は次の場合に発生します。

2 つ以上のスレッド (またはゴルーチン) が同じメモリ位置に同時にアクセスします。
少なくとも 1 つのスレッド (またはゴルーチン) がそのメモリに書き込み中です。
そのメモリへのアクセスを制御するための 同期はありません。

このため、スレッドまたはゴルーチンが共有メモリにアクセスまたは変更する順序は予測不可能であり、実行ごとに異なる可能性のある非決定的な動作につながります。

     +----------------------+      +---------------------+
     | Thread A: Write      |      | Thread B: Read      |
     +----------------------+      +---------------------+
     | 1. Reads x           |      | 1. Reads x          |
     | 2. Adds 1 to x       |      |                     |
     | 3. Writes new value  |      |                     |
     +----------------------+      +---------------------+

                    Shared variable x
                    (Concurrent access without synchronization)

ログイン後にコピー

ここでは、スレッド A が x を変更 (書き込み) しており、同時にスレッド B が x を読み取っています。両方のスレッドが同時に実行されており、同期がない場合、スレッド A が更新を完了する前にスレッド B が x を読み取る可能性があります。その結果、データが不正確または矛盾する可能性があります。

質問: チームメイトの 1 人がコードレビューのために次のコードを提出しました。コードを注意深く確認し、潜在的な問題を特定してください。
確認する必要があるコードは次のとおりです:

package main  

import (  
    "bufio"  
    "bytes"
    "io"
    "math/rand"
    "time"
)  

func genData() []byte {  
    r := rand.New(rand.NewSource(time.Now().Unix()))  
    buffer := make([]byte, 512)  
    if _, err := r.Read(buffer); err != nil {  
       return nil  
    }  
    return buffer  
}  

func publish(input []byte, output chan<- []byte) {  
    reader := bytes.NewReader(input)  

    bufferSize := 8  
    buffer := make([]byte, bufferSize)  

    for {  
       n, err := reader.Read(buffer)  
       if err != nil || err == io.EOF {  
          return  
       }  
       if n == 0 {  
          break  
       }  
       output <- buffer[:n]  
    }  
}  

func consume(input []byte) {  
    scanner := bufio.NewScanner(bytes.NewReader(input))  
    for scanner.Scan() {  
       b := scanner.Bytes()  
       _ = b  
       // does the magic  
    }  
}  

func main() {  
    data := genData()  
    workersCount := 4  
    chunkChannel := make(chan []byte, workersCount)  

    for i := 0; i < workersCount; i++ {  
       go func() {  
          for chunk := range chunkChannel {  
             consume(chunk)  
          }  
       }()  
    }  

    publish(data, chunkChannel)  
    close(chunkChannel)  
}

ログイン後にコピー

ここには何があるでしょうか？

publish() 関数は、入力データをチャンクごとに読み取り、各チャンクを出力チャネルに送信する役割を果たします。まず bytes.NewReader(input) を使用して入力データからリーダーを作成し、データを順番に読み取ることができます。サイズ 8 のバッファーは、入力から読み取られるデータの各チャンクを保持するために作成されます。各反復中に、reader.Read(buffer) は入力から最大 8 バイトを読み取り、関数は最大 8 バイトを含むこのバッファーのスライス (buffer[:n]) を出力チャネルに送信します。ループは、reader.Read(buffer) でエラーが発生するか、入力データの最後に到達するまで続きます。

consumer() 関数は、チャネルから受信したデータチャンクを処理します。これらのチャンクは bufio.Scanner を使用して処理されます。bufio.Scanner はデータの各チャンクをスキャンし、構成に応じてデータを行またはトークンに分割する可能性があります。変数 b := scanner.Bytes() は、スキャンされている現在のトークンを取得します。この関数は基本的な入力処理を表します。

main() は、workersCount に等しい容量を持つバッファリングされたチャネル chunkChannel を作成します。この場合は 4 に設定されています。次に、関数は 4 つのワーカーゴルーチンを起動し、それぞれが chunkChannel からデータを同時に読み取ります。ワーカーはデータのチャンクを受信するたびに、consum() 関数を呼び出してそのチャンクを処理します。 publish() 関数は、生成されたデータを読み取り、最大 8 バイトのチャンクに分割して、チャネルに送信します。

プログラムはゴルーチンを使用して複数のコンシューマーを作成し、同時データ処理を可能にします。各コンシューマは個別の goroutine で実行され、データのチャンクを個別に処理します。

このコードを実行すると、不審な通知が発生します:

[Running] go run "main.go"

[Done] exited with code=0 in 0.94 seconds

ログイン後にコピー

しかし、問題があります。 データ競合リスクが存在します。このコードでは、publish() 関数が各チャンクに同じバッファースライスを再利用するため、データ競合が発生する可能性があります。コンシューマはこのバッファから同時に読み取りを行っており、スライスは基礎となるメモリを共有しているため、複数のコンシューマが同じメモリを読み取り、データ競合を引き起こす可能性があります。競合検出を使用してみましょう。 Go には、データ競合を検出するための組み込みツール、レースディテクタ が用意されています。 -race フラグを指定してプログラムを実行することで有効にできます:

go run -race main.go

ログイン後にコピー

実行コマンドに -race フラグを追加すると、次の出力が表示されます:

[Running] go run -race "main.go"

==================
WARNING: DATA RACE
Read at 0x00c00011e018 by goroutine 6:
  runtime.slicecopy()
      /GOROOT/go1.22.0/src/runtime/slice.go:325 +0x0
  bytes.(*Reader).Read()
      /GOROOT/go1.22.0/src/bytes/reader.go:44 +0xcc
  bufio.(*Scanner).Scan()
      /GOROOT/go1.22.0/src/bufio/scan.go:219 +0xef4
  main.consume()
      /GOPATH/example/main.go:40 +0x140
  main.main.func1()
      /GOPATH/example/main.go:55 +0x48

Previous write at 0x00c00011e018 by main goroutine:
  runtime.slicecopy()
      /GOROOT/go1.22.0/src/runtime/slice.go:325 +0x0
  bytes.(*Reader).Read()
      /GOROOT/go1.22.0/src/bytes/reader.go:44 +0x168
  main.publish()
      /GOPATH/example/main.go:27 +0xe4
  main.main()
      /GOPATH/example/main.go:60 +0xdc

Goroutine 6 (running) created at:
  main.main()
      /GOPATH/example/main.go:53 +0x50
==================
Found 1 data race(s)
exit status 66

[Done] exited with code=0 in 0.94 seconds

ログイン後にコピー

The warning you’re seeing is a classic data race detected by Go’s race detector. The warning message indicates that two goroutines are accessing the same memory location (0x00c00011e018) concurrently. One goroutine is reading from this memory, while another goroutine is writing to it at the same time, without proper synchronization.

The first part of the warning tells us that Goroutine 6 (which is one of the worker goroutines in your program) is reading from the memory address 0x00c00011e018 during a call to bufio.Scanner.Scan() inside the consume() function.

Read at 0x00c00011e018 by goroutine 6:
  runtime.slicecopy()
  /GOROOT/go1.22.0/src/runtime/slice.go:325 +0x0
  bytes.(*Reader).Read()
  /GOROOT/go1.22.0/src/bytes/reader.go:44 +0xcc
  bufio.(*Scanner).Scan()
  /GOROOT/go1.22.0/src/bufio/scan.go:219 +0xef4
  main.consume()
  /GOPATH/example/main.go:40 +0x140
  main.main.func1()
  /GOPATH/example/main.go:55 +0x48

ログイン後にコピー

The second part of the warning shows that the main goroutine previously wrote to the same memory location (0x00c00011e018) during a call to bytes.Reader.Read() inside the publish() function.

Previous write at 0x00c00011e018 by main goroutine:
  runtime.slicecopy()
  /GOROOT/go1.22.0/src/runtime/slice.go:325 +0x0
  bytes.(*Reader).Read()
  /GOROOT/go1.22.0/src/bytes/reader.go:44 +0x168
  main.publish()
  /GOPATH/example/main.go:27 +0xe4
  main.main()
  /GOPATH/example/main.go:60 +0xdc

ログイン後にコピー

The final part of the warning explains that Goroutine 6 was created in the main function.

Goroutine 6 (running) created at:
  main.main()
  /GOPATH/example/main.go:53 +0x50

ログイン後にコピー

In this case, while one goroutine (Goroutine 6) is reading from the buffer in consume(), the publish() function in the main goroutine is simultaneously writing to the same buffer, leading to the data race.

+-------------------+               +--------------------+
|     Publisher     |               |      Consumer      |
+-------------------+               +--------------------+
        |                                   |
        v                                   |
1. Read data into buffer                    |
        |                                   |
        v                                   |
2. Send slice of buffer to chunkChannel     |
        |                                   |
        v                                   |
 +----------------+                         |
 |  chunkChannel  |                         |
 +----------------+                         |
        |                                   |
        v                                   |
3. Consume reads from slice                 |
                                            v
                                    4. Concurrent access
                                    (Data Race occurs)

ログイン後にコピー

Why the Data Race Occurs

The data race in this code arises because of how Go slices work and how memory is shared between goroutines when a slice is reused. To fully understand this, let’s break it down into two parts: the behavior of the buffer slice and the mechanics of how the race occurs. When you pass a slice like buffer[:n] to a function or channel, what you are really passing is the slice header which contains a reference to the slice’s underlying array. Any modifications to the slice or the underlying array will affect all other references to that slice.

buffer = [ a, b, c, d, e, f, g, h ]  <- Underlying array
           ↑
          Slice: buffer[:n]

ログイン後にコピー

func publish(input []byte, output chan<- []byte) {  
    reader := bytes.NewReader(input)  

    bufferSize := 8  
    buffer := make([]byte, bufferSize)  

    for {  
       // ....
       output <- buffer[:n] // <-- passing is a reference to the underlying array
    }  
}

ログイン後にコピー

If you send buffer[:n] to a channel, both the publish() function and any consumer goroutines will be accessing the same memory. During each iteration, the reader.Read(buffer) function reads up to 8 bytes from the input data into this buffer slice. After reading, the publisher sends buffer[:n] to the output channel, where n is the number of bytes read in the current iteration.

The problem here is that buffer is reused across iterations. Every time reader.Read() is called, it overwrites the data stored in buffer.

Iteration 1: publish() function reads the first 8 bytes into buffer and sends buffer[:n] (say, [a, b, c, d, e, f, g, h]) to the channel.
Iteration 2: The publish() function overwrites the buffer with the next 8 bytes, let’s say [i, j, k, l, m, n, o, p], and sends buffer[:n] again.

At this point, if one of the worker goroutines is still processing the first chunk, it is now reading stale or corrupted data because the buffer has been overwritten by the second chunk. Reusing a slice neans sharing the same memory.

How to fix the Data Race?

To avoid the race condition, we must ensure that each chunk of data sent to the channel has its own independent memory. This can be achieved by creating a new slice for each chunk and copying the data from the buffer to this new slice. The key fix is to copy the contents of the buffer into a new slice before sending it to the chunkChannel:

chunk := make([]byte, n)    // Step 1: Create a new slice with its own memory
copy(chunk, buffer[:n])     // Step 2: Copy data from buffer to the new slice
output <- chunk             // Step 3: Send the new chunk to the channel

ログイン後にコピー

Why this fix works? By creating a new slice (chunk) for each iteration, you ensure that each chunk has its own memory. This prevents the consumers from reading from the buffer that the publisher is still modifying. copy() function copies the contents of the buffer into the newly allocated slice (chunk). This decouples the memory used by each chunk from the buffer. Now, when the publisher reads new data into the buffer, it doesn’t affect the chunks that have already been sent to the channel.

+-------------------------+           +------------------------+
|  Publisher (New Memory) |           | Consumers (Read Copy)  |
|  [ a, b, c ] --> chunk1 |           |  Reading: chunk1       |
|  [ d, e, f ] --> chunk2 |           |  Reading: chunk2       |
+-------------------------+           +------------------------+
         ↑                                    ↑
        (1)                                  (2)
   Publisher Creates New Chunk          Consumers Read Safely

ログイン後にコピー

This solution works is that it breaks the connection between the publisher and the consumers by eliminating shared memory. Each consumer now works on its own copy of the data, which the publisher does not modify. Here’s how the modified publish() function looks:

func publish(input []byte, output chan<- []byte) {
    reader := bytes.NewReader(input)
    bufferSize := 8
    buffer := make([]byte, bufferSize)

    for {
        n, err := reader.Read(buffer)
        if err != nil || err == io.EOF {
            return
        }
        if n == 0 {
            break
        }

        // Create a new slice for each chunk and copy the data from the buffer
        chunk := make([]byte, n)
        copy(chunk, buffer[:n])

        // Send the newly created chunk to the channel
        output <- chunk
    }
}

ログイン後にコピー

Summary

Slices Are Reference Types:
As mentioned earlier, Go slices are reference types, meaning they point to an underlying array. When you pass a slice to a channel or a function, you’re passing a reference to that array, not the data itself. This is why reusing a slice leads to a data race: multiple goroutines end up referencing and modifying the same memory.

メモリ割り当て:
make([]byte, n) で新しいスライスを作成すると、Go はそのスライスに別のメモリブロックを割り当てます。これは、新しいスライス (チャンク) がバッファーから独立した独自のバッキング配列を持つことを意味します。データをbuffer[:n]からチャンクにコピーすることで、各チャンクが独自のプライベートメモリ空間を持つようになります。

メモリのデカップリング:
各チャンクのメモリをバッファから切り離すことにより、パブリッシャは、すでにチャネルに送信されたチャンクに影響を与えることなく、新しいデータをバッファに読み続けることができます。各チャンクには独自のデータの独立したコピーが存在するため、コンシューマーはパブリッシャーからの干渉を受けることなくチャンクを処理できます。

データ競合の防止:
データ競合の主な原因は、共有バッファへの同時アクセスでした。新しいスライスを作成してデータをコピーすることにより、共有メモリが排除され、各ゴルーチンは独自のデータを操作します。これにより、同じメモリに対する競合がなくなるため、競合状態が発生する可能性がなくなります。

結論

修正の中核はシンプルですが強力です。データの各チャンクが独自のメモリを持つようにすることで、データ競合の原因となっていた共有リソース (バッファ) を排除します。これは、データをチャネルに送信する前に、バッファから新しいスライスにデータをコピーすることによって実現されます。このアプローチでは、各コンシューマはパブリッシャーのアクションとは独立してデータの独自のコピーを操作し、競合状態のない安全な同時処理を保証します。共有メモリを切り離すこの方法は、同時プログラミングにおける基本的な戦略です。これにより、競合状態によって引き起こされる予測不可能な動作が防止され、複数の goroutine が同時にデータにアクセスしている場合でも、Go プログラムが安全で、予測可能で、正確な状態が保たれます。

とても簡単です!

以上がトリッキーな Golang 面接の質問 - パートデータレースの詳細内容です。詳細については、PHP 中国語 Web サイトの他の関連記事を参照してください。