Secure your PHP application, secure your php application
Before you begin In this tutorial, you will learn how to add security to your own PHP web application sex. This tutorial assumes you have at least one year of experience writing PHP web applications, so basic knowledge of the PHP language (conventions or syntax) is not covered here. The goal is to give you an understanding of how to secure the web applications you build.
Objectives
This tutorial explains how to defend against the most common security threats: SQL injection, manipulation of GET and POST variables, buffer overflow attacks, cross-site scripting attacks, in-browser data manipulation and Remote form submission.
Prerequisites
This tutorial is written for PHP developers with at least one year of programming experience. You should understand PHP's syntax and conventions; these are not explained here. Developers with experience using other languages such as Ruby, Python, and Perl will also benefit from this tutorial, as many of the rules discussed here apply to other languages and environments as well.
A quick introduction to security
What is the most important part of a web application? Depending on who answers the question, the answer to this question may vary. Business people need reliability and scalability. IT support teams need robust, maintainable code. End users require beautiful user interfaces and high performance in performing tasks. However, if the answer is "security" then everyone will agree that it is important for web applications. However, most discussions stop there. Although security is on the project checklist, often it is not addressed until the project is delivered. The number of web application projects that take this approach is staggering. Developers worked for months, adding security features only at the end to make the web application available to the public. The result is often a mess or even a need for rework because the code has been inspected, unit tested and integrated into a larger framework before security features are added to it. After adding security, major components may stop working. The integration of security adds an extra burden or step to an otherwise smooth (but unsafe) process. This tutorial provides a great way to integrate security into your PHP web application. It discusses several general security topics and then dives into major security vulnerabilities and how to plug them. After completing this tutorial, you will have a better understanding of security. Topics include: SQL injection attacks Manipulating GET strings Buffer overflow attacks Cross-site scripting attacks (XSS) In-browser data manipulation Remote Forms Submit
Web Security 101
Before discussing the details of implementing security, it is best to discuss web application security from a high-level perspective. This section introduces some basic tenets of security philosophy that should be kept in mind no matter what kind of web application you are creating. Some of these ideas come from Chris Shiflett (whose book on PHP security is an invaluable treasure trove), some from Simson Garfinkel (see Related topics), and some from years of accumulated knowledge. Rule 1: Never trust external data or input The first thing you must realize about web application security is that external data should not be trusted. External data includes any data that is not entered directly by the programmer into the PHP code. Any data from any other source (such as GET variables, form POST, databases, configuration files, session variables, or cookies) cannot be trusted until steps are taken to ensure security. For example, the following data elements can be considered safe because they are set in PHP. Listing 1. Safe and flawless code define("GREETING", 'hello there' . $myUsername); [/php] However, the following data elements are flawed. Listing 2. Unsafe, flawed code ', 'tommy'); //tainted! define("GREETING", 'hello there' . $myUsername); //tainted! [/php] Why is the first variable $myUsername defective? of? Because it comes directly from the form POST. Users can enter any string into this input field, including malicious commands to clean files or run previously uploaded files. You might ask, "Can't you avoid this danger by using a client-side (JavaScript) form validation script that only accepts the letters A-Z?" Yes, this is always a beneficial step, but as we'll see later , anyone can download any form to their machine, modify it, and resubmit whatever they need. The solution is simple: the sanitization code must be run on $_POST['username']. If you don't do this, you risk polluting these objects any other time you use $myUsername (such as in an array or constant). A simple way to sanitize user input is to use regular expressions to process it. In this example, only letters are expected to be accepted. It might also be a good idea to limit the string to a specific number of characters, or require all letters to be lowercase. Listing 3. Making user input safe 'tom', 'tommy'); //clean! define("GREETING", 'hello there' . $myUsername); //clean! function cleanInput($input){ $clean = strtolower($input); $clean = preg_replace(”/[^a-z]/”, “”, $clean); $clean = substr($clean,0,12); return $clean; }[/php] Rule 2: Disable PHP settings that make security difficult to implement Now that you can’t trust user input, you should also know that you shouldn’t trust the PHP configuration on the machine. Way. For example, make sure register_globals is disabled. If register_globals is enabled, it's possible to do careless things like use a $variable to replace a GET or POST string with the same name. By disabling this setting, PHP forces you to reference the correct variables in the correct namespace. To use variables from a form POST, $_POST['variable'] should be quoted. This way you won't mistake this particular variable for a cookie, session, or GET variable. The second setting to check is the error reporting level. During development, you want to get as many error reports as possible, but when you deliver the project, you want the errors to be logged to a log file rather than displayed on the screen. Why? Because malicious hackers can use error reporting information (such as SQL errors) to guess what the application is doing. This kind of reconnaissance can help hackers breach the application. To close this vulnerability, edit the php.ini file to provide a suitable destination for the error_log entries and set display_errors to Off.
Rule 3: If you can’t understand it, you can’t protect it Some developers use strange syntax, or organize statements very tightly, resulting in short but ambiguous code. This approach may be efficient, but if you don't understand what the code is doing, you can't decide how to protect it. For example, which of the following two pieces of code do you like? Listing 4. Make the code easy to protect ”); //unobfuscated code $input = ”; if (isset($_POST['username'])){ $input = $_POST['username']; }else{ $input = ”; }[/php]
In the second clearer code snippet, it is easy to see that $input is flawed and needs Clean it up before processing it safely Rule 4: “Defense in depth” is the new magic This tutorial will use examples to show how to protect online forms while taking the necessary measures in the PHP code that handles the form. . Likewise, even if you use PHP regex to ensure that GET variables are entirely numeric, you can still take steps to ensure that SQL queries use escaped user input. Defense in depth is not just a good idea, it will ensure that you don't get into serious trouble. Now that the basic rules have been discussed, let’s look at the first threat: SQL injection attacks
Preventing SQL injection attacks
In a SQL injection attack, the user passes through. Manipulate the form or GET query string to add information to the database query. For example, suppose you have a simple login database. Each record in this database has a username field and a password field. Let's build a login form. The user can log in. Listing 5. Simple login form ;/head>
[/php] This form accepts user input for a username and password and submits the user input to a file called verify.php. In this file, PHP handles the data from the login form as follows: Listing 6. Unsafe PHP form handling code ; $username = $_POST['user']; $pw = $_POST['pw']; $sql = “select count(*) as ctr from users where username ='".$username."' and password='".$pw."' limit 1″;
$result = mysql_query($sql); while ($data = mysql_fetch_object($ result)){ if ($data->ctr == 1){ //they're okay to enter the application! $okay = 1; } } if ($okay){ $_SESSION['loginokay'] = true; header("index.php"); }else{ header("login.php" ); } ?> [/php] This code looks fine, right? Code like this is used by hundreds (if not thousands) of PHP/MySQL sites around the world. What's wrong with it? Well, remember "user input cannot be trusted". No information from the user is escaped here, thus leaving the application vulnerable. Specifically, any type of SQL injection attack is possible. For example, if the user enters foo as the username and ' or '1′='1 as the password, the following string is actually passed to PHP, which then passes the query to MySQL: $sql = "select count(*) as ctr from users where username='foo' and password=" or '1′='1′ limit 1″; This query always returns the count value 1, so PHP will Allow access. By injecting some malicious SQL at the end of the password string, the hacker can impersonate the legitimate user. The solution to this problem is to use PHP's built-in mysql_real_escape_string() function as any user input. Wrapper. This function escapes characters in a string, making it impossible to pass special characters such as apostrophes and allowing MySQL to operate on them. Listing 7 shows the code with escaping. 7. Secure PHP form processing code _POST['pw']; $sql = "select count(*) as ctr from users where username='".mysql_real_escape_string($username)."' and password='". mysql_real_escape_string ($pw).”' limit 1″;
$result = mysql_query($sql); while ($data = mysql_fetch_object($result)){ if ($data-> ;ctr == 1){ //they're okay to enter the application! $okay = 1; } } if ($okay){ $ _SESSION['loginokay'] = true; header("index.php"); }else{ header("login.php"); } ?>[ /php]
Use mysql_real_escape_string() as a wrapper around user input to avoid any malicious SQL injection in user input. If a user attempts to pass a malformed password via SQL injection, the following query will be passed to the database: select count(*) as ctr from users where username='foo' and password='' or '1' ='1′ limit 1″ Nothing in the database matches such a password. Just taking one simple step closed a big hole in the web application. The lesson here is, always User input for SQL queries should be escaped. However, there are a few security holes that need to be closed. The next item is manipulating GET variables
In the previous section, you prevented the user from logging in with a malformed password. If you were smart, you would have applied the techniques you learned to ensure that all user input to the SQL statement was escaped. However, the user is now safe. Just because the user has a valid password doesn't mean he will act according to the rules - there are many opportunities for him to do damage. For example, the application may allow the user to view all links pointing to template.php?pid=. 33 or template.php?pid=321. The part after the question mark in the URL is called the query string. Because the query string is placed directly in the URL, it is also called the GET query string. , if register_globals is disabled, this string can be accessed with $_GET['pid']. In the template.php page, a similar operation to Listing 8 may be performed Listing 8. Example template.php
[php]$pid = $_GET['pid']; //we create an object of a fictional class Page $obj = new Page; $content = $obj->fetchPage($pid); //and now we have a bunch of PHP that displays the page //…… //... ?> [/php] Is there anything wrong here? First, the GET variable pid from the browser is implicitly trusted to be safe. What will happen? Most users are not smart enough to construct semantic attacks. However, if they notice pid=33 in the browser's URL location field, they might start causing trouble. If they put in another number, then that's probably fine; but if they put in something else, like a SQL command or the name of a file (like /etc/passwd), or something else shenanigans like 3,000 characters long value, what happens? In this case, remember the basic rule, don't trust user input. Application developers know that personal identifiers (PIDs) accepted by template.php should be numeric, so they can use PHP's is_numeric() function to ensure that non-numeric PIDs are not accepted, as shown below: Listing 9. Using is_numeric( ) to limit the GET variable
[php]$pid = $_GET['pid']; if (is_numeric($pid)){ //we create an object of a fictional class Page $obj = new Page; $content = $obj->fetchPage($pid); //and now we have a bunch of PHP that displays the page //… //… }else{ //didn't pass the is_numeric() test, do something else! }?> [/php ] This method seems to be valid, but the following inputs can easily pass the is_numeric() check: 100 (valid) 100.1 (should not have decimal places) +0123.45e6 (Scientific notation - bad) 0xff33669f (Hex - Danger! Danger!) So, what should security-conscious PHP developers do? Years of experience have shown that best practice is to use regular expressions to ensure that the entire GET variable consists of numbers, as shown below: Listing 10. Using regular expressions to restrict GET variables ;?php $pid = $_GET['pid']; if (strlen($pid)){ if (!ereg(”^[0-9 ]+$”,$pid)){ //do something appropriate, like maybe logging them out or sending them back to home page } }else{ //empty $pid, so send them back to the home page }
//we create an object of a fictional class Page, which is now //moderately protected from evil user input $obj = new Page; $content = $obj->fetchPage($pid); //and now we have a bunch of PHP that displays the page // … //… ?>[/php] All you need to do is use strlen() to check whether the length of the variable is non-zero; if so, use an all-digit regular expression to Make sure the data elements are valid. If the PID contains letters, slashes, periods, or anything resembling hexadecimal, then this routine captures it and blocks the page from user activity. If you look behind the scenes of the Page class, you'll see that security-conscious PHP developers have escaped the user input $pid, thereby protecting the fetchPage() method, as shown below: Listing 11. The fetchPage() method is escaped kw,content, status from page where pid=' ”.mysql_real_escape_string($pid)."'"; //etc, etc….
} } ?> [/php] You may ask, "Since we have ensured that the PID is a number, why do we need to escape it?" Because there is no telling how many different contexts and situations fetchPage( will be used ) method. Protection must be provided everywhere where this method is called, and escaping in the method embodies the meaning of defense in depth. What happens if the user attempts to enter a very long value, such as up to 1000 characters, in an attempt to launch a buffer overflow attack? The next section discusses this in more detail, but for now you can add another check to ensure that the input PID is of the correct length. You know that the maximum length of the database's pid field is 5 digits, so you can add the following check. Listing 12. Limiting GET variables using regular expressions and length checks
[php]$pid = $_GET['pid']; if (strlen ($pid)){ if (!ereg("^[0-9]+$",$pid) && strlen($pid) > 5){ //do something appropriate, like maybe logging them out or sending them back to home page } }else{ //empty $pid, so send them back to the home page } //we create an object of a fictional class Page, which is now //even more protected from evil user input $obj = new Page; $content = $obj->fetchPage($pid); / /and now we have a bunch of PHP that displays the page //… //… ?> [/php] Now, no one can Stuffing a 5,000-digit value into it -- at least not where GET strings are involved. Just imagine the hackers gnashing their teeth when they are frustrated in their attempts to break into your application! And because error reporting is turned off, it's harder for hackers to conduct reconnaissance.
Buffer Overflow Attack
Buffer Overflow Attack An attempt to overflow a memory allocation buffer in a PHP application (or, more precisely, in Apache or the underlying operating system). Keep in mind that you may be writing your web application in a high-level language like PHP, but ultimately you're calling C (in the case of Apache). Like most low-level languages, C has strict rules for memory allocation. Buffer overflow attacks send a large amount of data to the buffer, causing part of the data to overflow into adjacent memory buffers, thereby destroying the buffer or rewriting the logic. This can cause a denial of service, corrupt data, or execute malicious code on the remote server. The only way to prevent buffer overflow attacks is to check the length of all user input. For example, if you have a form element that asks for the user's name, add a maxlength attribute with a value of 40 on this field and check it using substr() on the backend. Listing 13 gives a brief example of the form and PHP code. Listing 13. Checking the length of user input ($_POST['name'],0,40); //continue processing…. } ?> $_SERVER['PHP_SELF'];?>” method=”post”>
“name” id=”name” size=”20″ maxlength=”40″/>
[/php]
Why not only provide the maxlength attribute, but also perform substr() check on the backend? Because defense in depth is always good. The browser prevents users from entering very long strings that PHP or MySQL cannot safely handle (imagine someone trying to enter a name that is up to 1,000 characters long), while backend PHP checks ensure that no one is manipulating form data remotely or in the browser . As you can see, this approach is similar to using strlen() in the previous section to check the length of the GET variable pid. In this example, any input value longer than 5 digits is ignored, but the value can easily be truncated to an appropriate length, as shown below: Listing 14. Changing the length of the input GET variable
[php]$pid = $_GET['pid']; if (strlen($pid)){ if (!ereg("^[0-9 ]+$”,$pid)){ //if non numeric $pid, send them back to home page } }else{ //empty $pid, so send them back to the home page } //we have a numeric pid, but it may be too long, so let's check if (strlen($pid)>5){ $pid = substr($pid,0,5); } //we create an object of a fictional class Page, which is now //even more protected from evil user input $obj = new Page; $content = $obj->fetchPage($pid); //and now we have a bunch of PHP that displays the page //… //… … ?>[/php]
Note that buffer overflow attacks are not limited to long strings of numbers or letters. You may also see long hexadecimal strings (often looking like xA3 or xFF). Remember, the goal of any buffer overflow attack is to flood a specific buffer and place malicious code or instructions into the next buffer, thereby corrupting data or executing malicious code. The simplest way to deal with hex buffer overflow is to not allow input to exceed a certain length. If you are dealing with a form text area that allows longer entries in the database, there is no way to easily limit the length of the data on the client side. After the data reaches PHP, you can use regular expressions to clear out any hex-like strings. Listing 15. Preventing hex strings if ($_POST['submit'] == "go"){ $name = substr($_POST['name'],0,40); //clean out any potential hexadecimal characters $name = cleanHex($name); //continue processing…. } function cleanHex($input){ $clean = preg_replace(”![][xX]([ A-Fa-f0-9]{1,3})!", "",$input); return $clean; } ?> [/php]
You may find that this series of operations is a bit too strict. After all, hexadecimal strings have legitimate uses, such as printing characters in a foreign language. How you deploy the hex regex is up to you. A better strategy is to only remove hex strings if there are too many of them on a line, or if the string exceeds a certain number of characters (such as 128 or 255).
Cross-site scripting attack
In a cross-site scripting (XSS) attack, a malicious user often enters information in a form (or through other user input methods), and these inputs will be malicious Client-side tags are inserted into the process or database. For example, suppose you have a simple guest book program on your site that allows visitors to leave their name, email address, and a brief message. A malicious user could take advantage of this opportunity to insert something other than a brief message, such as an image that would be inappropriate for other users or JavaScript that would redirect the user to another site, or steal cookie information. Fortunately, PHP provides the strip_tags() function, which can remove any content surrounded by HTML tags. The strip_tags() function also allows you to provide a list of allowed tags, such as or . Listing 16 shows an example that builds on the previous example. Listing 16. Clearing HTML tags from user input strip_tags $name = strip_tags($_POST['name']); $name = substr($name,0,40); //clean out any potential hexadecimal characters $name = cleanHex($name); //continue processing…. } function cleanHex($input){ $clean = preg_replace (”![][xX]([ A-Fa-f0-9]{1,3})!", "",$input); return $clean; } ?> “” method=”post”>
[/php] From a security perspective, for public user input Using strip_tags() is necessary. If the form is in a protected area (such as a content management system) and you trust users to perform their tasks correctly (such as creating HTML content for a Web site), then using strip_tags() may be unnecessary and affect productivity . One more question: If you want to accept user input, such as comments on a post or guest entry, and need to display this input to other users, then be sure to put the response in PHP's htmlspecialchars() function. This function converts the ampersand, < and > symbols into HTML entities. For example, the ampersand (&) becomes &. In this case, even if the malicious content escapes the processing of strip_tags() on the front end, it will be processed by htmlspecialchars() on the back end.
In-browser data manipulation
There is a type of browser plug-in that allows users to tamper with header elements and form elements on the page. Using Tamper Data, a Mozilla plug-in, it's easy to manipulate simple forms with many hidden text fields to send instructions to PHP and MySQL. Before the user clicks Submit on the form, he can start Tamper Data. When submitting the form, he will see a list of form data fields. Tamper Data allows the user to tamper with this data before the browser completes the form submission. Let’s go back to the example we built earlier. String length has been checked, HTML tags cleaned, and hexadecimal characters removed. However, some hidden text fields are added, as shown below: Listing 17. Hidden variables
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn