큰따옴표가 너무 많은지 여부, 그것이 문제입니다!-PHP 튜토리얼-php.cn

최근에 나는 PHP 사람들이 여전히 작은 따옴표와 큰 따옴표에 대해 이야기하고 작은 따옴표를 사용하는 것은 미시적인 최적화에 불과하지만 항상 작은 따옴표를 사용하는 데 익숙해지면 많은 양의 따옴표를 절약할 수 있다는 말을 다시 들었습니다. CPU 사이클!

"이미 모든 것이 언급되었지만 아직 모든 사람이 말한 것은 아닙니다." – Karl Valentin

나는 이러한 정신으로 Nikita Popov가 이미 12년 전에 썼던 것과 동일한 주제에 대한 기사를 쓰고 있습니다(그의 기사를 읽고 있다면 여기에서 읽기를 중단할 수 있습니다).

퍼지는 무엇에 관한 것입니까?

PHP는 문자열 보간을 수행합니다. 여기서는 문자열에서 변수의 사용을 검색하고 이를 사용된 변수의 값으로 바꿉니다:

으아악

이 기능은 큰따옴표 안의 문자열과 heredoc으로 제한됩니다. 작은따옴표(또는 nowdoc)를 사용하면 다른 결과가 나옵니다.

으아악

보세요: PHP는 작은따옴표 문자열에 있는 변수를 검색하지 않습니다. 따라서 우리는 어디에서나 작은따옴표를 사용하기 시작할 수 있습니다. 그래서 사람들은 이런 변화를 제안하기 시작했습니다..

으아악

.. PHP는 작은 따옴표로 묶인 문자열(어쨌든 예제에서는 존재하지 않음)에서 변수를 찾지 않기 때문에 해당 코드를 실행할 때마다 속도가 더 빨라지고 CPU 주기가 많이 절약되기 때문입니다. 모두가 행복해졌습니다. 사건은 종료되었습니다.

사건이 종결되었나요?

물론 작은 따옴표와 큰 따옴표를 사용하는 것에는 차이가 있지만 무슨 일이 일어나고 있는지 이해하려면 좀 더 깊이 파고들어 볼 필요가 있습니다.

PHP는 해석된 언어임에도 불구하고 가상 머신이 실제로 실행할 수 있는 작업, 즉 opcode를 얻기 위해 특정 부분이 함께 작동하는 컴파일 단계를 사용합니다. 그렇다면 PHP 소스 코드에서 opcode로 어떻게 이동합니까?

어휘 분석기

렉서는 소스 코드 파일을 스캔하고 이를 토큰으로 분해합니다. 이것이 의미하는 바에 대한 간단한 예는 token_get_all() 함수 문서에서 찾을 수 있습니다. 단지

으아악

이 3v4l.org 스니펫에서 실제로 이를 확인하고 가지고 놀 수 있습니다.

파서

파서는 이러한 토큰을 가져와서 추상 구문 트리를 생성합니다. 위 예의 AST 표현은 JSON:
으로 표현될 때 다음과 같습니다.

으아악

이것도 가지고 놀고 싶고 다른 코드의 AST가 어떻게 생겼는지 확인하고 싶다면 Ryan Chandler의 https://phpast.com/과 https://php-ast-viewer.com/을 참조하세요. 둘 다 주어진 PHP 코드의 AST를 보여줍니다.

컴파일러

컴파일러는 AST를 가져와 opcode를 생성합니다. Opcode는 가상 머신이 실행하는 항목이며, 해당 설정이 있고 활성화된 경우 OPcache에 저장됩니다(강력히 권장합니다).

opcode를 보려면 여러 가지 옵션이 있습니다(더 많을 수도 있지만 이 세 가지를 알고 있습니다):

vulcan 로직 덤퍼 확장을 사용하세요. 3v4l.org에도 구워졌습니다.
opcode를 덤프하려면 phpdbg -p script.php를 사용하세요
또는 OPcache에 대한 opcache.opt_debug_level INI 설정을 사용하여 opcode를 인쇄하도록 합니다.
- 0x10000 값은 최적화 전 opcode를 출력합니다
- 0x20000 값은 최적화 후 opcode를 출력합니다.

으아악

가설

작은따옴표와 큰따옴표를 사용할 때 CPU 주기를 절약한다는 초기 아이디어로 돌아가서, PHP가 모든 단일 요청에 대해 런타임에 이러한 문자열을 평가하는 경우에만 이것이 사실이라는 데 우리 모두 동의한다고 생각합니다.# #

런타임에는 어떤 일이 발생하나요?

그러면 PHP가 두 가지 다른 버전에 대해 어떤 opcode를 생성하는지 살펴보겠습니다.

큰따옴표:

으아악 으아악

대. 작은따옴표:

으아악 으아악

잠깐만요, 이상한 일이 일어났어요. 이거 똑같아 보이는데! 내 마이크로 최적화는 어디로 갔나요?

아마도 ECHO opcode 처리기의 구현이 주어진 문자열을 구문 분석할 수도 있지만 그렇게 하도록 지시하는 마커나 다른 것이 없지만 ... 흠 ?

다른 접근 방식을 시도하여 이 두 가지 경우에 대해 어휘 분석기가 무엇을 하는지 살펴보겠습니다.

큰따옴표:

으아악

대. 작은따옴표:

으아악

토큰은 여전히 큰따옴표와 작은따옴표를 구별하지만 AST를 확인하면 두 경우 모두 동일한 결과를 얻을 수 있습니다. 유일한 차이점은 여전히 작은따옴표/큰따옴표가 있는 Scalar_String 노드 속성의 rawValue입니다. 그러나 값은 두 경우 모두 큰따옴표를 사용합니다.

새로운 가설

혹시 문자열 보간은 실제로 컴파일 타임에 수행되나요?

좀 더 "정교한" 예를 들어 확인해 보겠습니다.

으아악

이 파일의 토큰은 다음과 같습니다.

T_OPEN_TAG (
        
         로그인 후 복사

Look at the last two tokens! String interpolation is handled in the lexer and as such is a compile time thing and has nothing to do with runtime.

Too double quote or not, that

For completeness, let's have a look at the opcodes generated by this (after optimisation, using 0x20000):

0000 ASSIGN CV0($juice) string("apple") 0001 T2 = FAST_CONCAT string("juice: ") CV0($juice) 0002 ECHO T2 0003 RETURN int(1)

로그인 후 복사

This is different opcode than we had in our simple

Get to the point: should I concat or interpolate?

Let's have a look at these three different versions:


        
         로그인 후 복사

the first version is using string interpolation
the second is using a comma separation (which AFAIK only works with echo and not with assigning variables or anything else)
and the third option uses string concatenation

The first opcode assigns the string "apple" to the variable $juice:

0000 ASSIGN CV0($juice) string("apple")

로그인 후 복사

The first version (string interpolation) is using a rope as the underlying data structure, which is optimised to do as little string copies as possible.

0001 T2 = ROPE_INIT 4 string("juice: ") 0002 T2 = ROPE_ADD 1 T2 CV0($juice) 0003 T2 = ROPE_ADD 2 T2 string(" ") 0004 T1 = ROPE_END 3 T2 CV0($juice) 0005 ECHO T1

로그인 후 복사

The second version is the most memory effective as it does not create an intermediate string representation. Instead it does multiple calls to ECHO which is a blocking call from an I/O perspective so depending on your use case this might be a downside.

0006 ECHO string("juice: ") 0007 ECHO CV0($juice) 0008 ECHO string(" ") 0009 ECHO CV0($juice)

로그인 후 복사

The third version uses CONCAT/FAST_CONCAT to create an intermediate string representation and as such might use more memory than the rope version.

0010 T1 = CONCAT string("juice: ") CV0($juice) 0011 T2 = FAST_CONCAT T1 string(" ") 0012 T1 = CONCAT T2 CV0($juice) 0013 ECHO T1

로그인 후 복사

So ... what is the right thing to do here and why is it string interpolation?

String interpolation uses either a FAST_CONCAT in the case of echo "juice: $juice"; or highly optimised ROPE_* opcodes in the case of echo "juice: $juice $juice";, but most important it communicates the intent clearly and none of this has been bottle neck in any of the PHP applications I have worked with so far, so none of this actually matters.

TLDR

String interpolation is a compile time thing. Granted, without OPcache the lexer will have to check for variables used in double quoted strings on every request, even if there aren't any, waisting CPU cycles, but honestly: The problem is not the double quoted strings, but not using OPcache!

However, there is one caveat: PHP up to 4 (and I believe even including 5.0 and maybe even 5.1, I don't know) did string interpolation at runtime, so using these versions ... hmm, I guess if anyone really still uses PHP 5, the same as above applies: The problem is not the double quoted strings, but the use of an outdated PHP version.