Just recently I heard again that PHP folks still talk about single quotes vs. double quotes and that using single quotes is just a micro optimisation but if you get used to using single quotes all the time you'd save a bunch of CPU cycles!
"Everything has already been said, but not yet by everyone" – Karl Valentin
It is in this spirit that I am writing an article about the same topic Nikita Popov did already 12 years ago (if you are reading his article, you can stop reading here).
PHP performs string interpolation, in which it searches for the use of variables in a string and replaces them with the value of the variable used:
$juice = "apple"; echo "They drank some $juice juice."; // will output: They drank some apple juice.
This feature is limited to strings in double quotes and heredoc. Using single quotes (or nowdoc) will yield a different result:
$juice = "apple"; echo 'They drank some $juice juice.'; // will output: They drank some $juice juice.
Look at that: PHP will not search for variables in that single quoted string. So we could just start using single quotes everywhere. So people started suggesting changes like this ..
- $juice = "apple"; + $juice = 'apple';
.. because it'll be faster and it'd save a bunch of CPU cycles with every execution of that code because PHP does not look for variables in single quoted strings (which are non-existent in the example anyway) and everyone is happy, case closed.
Obviously there is a difference in using single quotes vs. double quotes, but in order to understand what is going on we need to dig a bit deeper.
Even though PHP is an interpreted language it is using a compile step in which certain parts play together to get something the virtual machine can actually execute, which is opcodes. So how do we get from PHP source code to opcodes?
The lexer scans the source code file and breaks it down into tokens. A simple example of what this means can be found in the token_get_all() function documentation. A PHP source code of just
T_OPEN_TAG (<?php ) T_ECHO (echo) T_WHITESPACE ( ) T_CONSTANT_ENCAPSED_STRING ("")
We can see this in action and play with it in this 3v4l.org snippet.
The parser takes these tokens and generates an abstract syntax tree from them. An AST representation of the above example looks like this when represented as a JSON:
{ "data": [ { "nodeType": "Stmt_Echo", "attributes": { "startLine": 1, "startTokenPos": 1, "startFilePos": 6, "endLine": 1, "endTokenPos": 4, "endFilePos": 13 }, "exprs": [ { "nodeType": "Scalar_String", "attributes": { "startLine": 1, "startTokenPos": 3, "startFilePos": 11, "endLine": 1, "endTokenPos": 3, "endFilePos": 12, "kind": 2, "rawValue": "\"\"" }, "value": "" } ] } ] }
In case you wanna play with this as well and see how the AST for other code looks like, I found https://phpast.com/ by Ryan Chandler and https://php-ast-viewer.com/ which both show you the AST of a given piece of PHP code.
The compiler takes the AST and creates opcodes. The opcodes are the things the virtual machine executes, it is also what will be stored in the OPcache if you have that setup and enabled (which I highly recommend).
To view the opcodes we have multiple options (maybe more, but I do know these three):
$ echo '<?php echo "";' > foo.php $ php -dopcache.opt_debug_level=0x10000 foo.php $_main: ... 0000 ECHO string("") 0001 RETURN int(1) </p> <h2> Hypothesis </h2> <p>Coming back to the initial idea of saving CPU cycles when using single quotes vs. double quotes, I think we all agree that this would only be true if PHP would evaluate these strings at runtime for every single request.</p> <h2> What happens at runtime? </h2> <p>So let's see which opcodes PHP creates for the two different versions.</p> <p>Double quotes:<br> </p> <pre class="brush:php;toolbar:false"><?php echo "apple";
0000 ECHO string("apple") 0001 RETURN int(1)
vs. single quotes:
<?php echo 'apple';
0000 ECHO string("apple") 0001 RETURN int(1)
Hey wait, something weird happened. This looks identical! Where did my micro optimisation go?
Well maybe, just maybe the ECHO opcode handler's implementation parses the given string, although there is no marker or something else which tells it to do so ... hmm ?
Let's try a different approach and see what the lexer does for those two cases:
Double quotes:
T_OPEN_TAG (<?php ) T_ECHO (echo) T_WHITESPACE ( ) T_CONSTANT_ENCAPSED_STRING ("")
vs. single quotes:
Line 1: T_OPEN_TAG (<?php ) Line 1: T_ECHO (echo) Line 1: T_WHITESPACE ( ) Line 1: T_CONSTANT_ENCAPSED_STRING ('')
The tokens are still distinguishing between double and single quotes, but checking the AST will give us an identical result for both cases - the only difference is the rawValue in the Scalar_String node attributes, that still has the single/double quotes, but the value uses double quotes in both cases.
Could it be, that string interpolation is actually done at compile time?
Let's check with a slightly more "sophisticated" example:
<?php $juice="apple"; echo "juice: $juice";
Tokens for this file are:
T_OPEN_TAG (<?php) T_VARIABLE ($juice) T_CONSTANT_ENCAPSED_STRING ("apple") T_WHITESPACE () T_ECHO (echo) T_WHITESPACE ( ) T_ENCAPSED_AND_WHITESPACE (juice: ) T_VARIABLE ($juice)
Look at the last two tokens! String interpolation is handled in the lexer and as such is a compile time thing and has nothing to do with runtime.
For completeness, let's have a look at the opcodes generated by this (after optimisation, using 0x20000):
0000 ASSIGN CV0($juice) string("apple") 0001 T2 = FAST_CONCAT string("juice: ") CV0($juice) 0002 ECHO T2 0003 RETURN int(1)
This is different opcode than we had in our simple
Let's have a look at these three different versions:
<?php $juice = "apple"; echo "juice: $juice $juice"; echo "juice: ", $juice, " ", $juice; echo "juice: ".$juice." ".$juice;
The first opcode assigns the string "apple" to the variable $juice:
0000 ASSIGN CV0($juice) string("apple")
The first version (string interpolation) is using a rope as the underlying data structure, which is optimised to do as little string copies as possible.
0001 T2 = ROPE_INIT 4 string("juice: ") 0002 T2 = ROPE_ADD 1 T2 CV0($juice) 0003 T2 = ROPE_ADD 2 T2 string(" ") 0004 T1 = ROPE_END 3 T2 CV0($juice) 0005 ECHO T1
The second version is the most memory effective as it does not create an intermediate string representation. Instead it does multiple calls to ECHO which is a blocking call from an I/O perspective so depending on your use case this might be a downside.
0006 ECHO string("juice: ") 0007 ECHO CV0($juice) 0008 ECHO string(" ") 0009 ECHO CV0($juice)
The third version uses CONCAT/FAST_CONCAT to create an intermediate string representation and as such might use more memory than the rope version.
0010 T1 = CONCAT string("juice: ") CV0($juice) 0011 T2 = FAST_CONCAT T1 string(" ") 0012 T1 = CONCAT T2 CV0($juice) 0013 ECHO T1
So ... what is the right thing to do here and why is it string interpolation?
String interpolation uses either a FAST_CONCAT in the case of echo "juice: $juice"; or highly optimised ROPE_* opcodes in the case of echo "juice: $juice $juice";, but most important it communicates the intent clearly and none of this has been bottle neck in any of the PHP applications I have worked with so far, so none of this actually matters.
String interpolation is a compile time thing. Granted, without OPcache the lexer will have to check for variables used in double quoted strings on every request, even if there aren't any, waisting CPU cycles, but honestly: The problem is not the double quoted strings, but not using OPcache!
However, there is one caveat: PHP up to 4 (and I believe even including 5.0 and maybe even 5.1, I don't know) did string interpolation at runtime, so using these versions ... hmm, I guess if anyone really still uses PHP 5, the same as above applies: The problem is not the double quoted strings, but the use of an outdated PHP version.
Update to the latest PHP version, enable OPcache and live happily ever after!
The above is the detailed content of Too double quote or not, thats the question!. For more information, please follow other related articles on the PHP Chinese website!