tag:blogger.com,1999:blog-86298094760299958222023-11-15T20:51:39.138+07:00Law of phpLaw, rules, tips and trick php in web development.Minh Sonhttp://www.blogger.com/profile/13709010922780861505noreply@blogger.comBlogger8125tag:blogger.com,1999:blog-8629809476029995822.post-34880810712042101112008-10-06T11:33:00.000+07:002008-10-06T11:35:26.025+07:00PHP: Parsing HTML to find Links<p>From blogging to log analysis and search engine optimisation (SEO) people are looking for scripts that can <em>parse</em> web pages and RSS feeds from other websites - to see where their traffic is coming from among other things.</p> <p><a href="http://www.google.com/search?q=define:parsing" target="_blank"></a>Parsing your own <em>HTML</em> is no problem - assuming you use consistent formatting - but once you set your sights at parsing other people's <em>HTML</em> the frustration really sets in. This page presents some regular expressions and a commentary that will hopefully point you in the right direction.</p> <div class="divider"><!-- --></div> <h2 id="section_0">1. Simplest Case</h2> <p>Let's start with the simplest case - a well formatted link with no extra attributes:</p> <code></code>/(.*)<\/a>/iU This, believe it or not, is a very simple regular expression (or "regexp" for short). It can be broken down as follows: starts with:<ul><li><a href="http://www.blogger.com/%5C"> <b></b></a><b><a href="http://www.blogger.com/post-create.do"></a></b></li><li><b><a href="http://www.blogger.com/post-create.do">a series of characters up to, but not including, the next double-quote (") - 1st capture</a></b></li><li><b><a href="http://www.blogger.com/post-create.do">the string: <b>"></b></a></b></li><li><b><a href="http://www.blogger.com/post-create.do">a series of any characters - 2nd capture</a></b></li><li><b><a href="http://www.blogger.com/post-create.do">ends with: <b></b></a><b></b></b></li></ul><b> </b><p><b>We're also using two 'pattern modifiers':</b></p><b> </b><ul><li><b>i - matches are 'caseless' (upper or lower case doesn't matter)</b></li><li><b>U - matches are 'ungreedy'</b></li></ul><b> </b><p><b>The first modifier means that we're matching <a> as well as </a><a>. The 'ungreedy' modifier is necessary because otherwise the second captured string could (by being 'greedy') extend from the contents of one link all the way to the end of another link.</a></b></p><b><a> </a></b><p><b><a>One shortcoming of this regexp is that it won't match link tags that include a line break - fortunately there's a modifer for this as well:</a></b></p><b><a> <code>/</code></a><a><span>\s</span>href=\"([^\"]*)\">(.*)<\/a>/<span>s</span>iU </a></b><p><b><a>Now the '.' character will match any character <strong>including</strong> line breaks. We've also changed the first space to a 'whitespace' character type so that it can match a space, tab or line break. It's necessary to have some kind of whitespace in that position so we don't match other tags such as <tt></tt>.</a></b></p><b><a> </a></b><p><b><a>For more information on pattern modifiers see the link at the bottom of this page.</a></b></p><b><a> </a></b><div class="divider"><!-- --></div><b><a> </a></b><h2 id="section_1"><b><a>2. Room for Extra Attributes</a></b></h2><b><a> </a></b><p><b><a>Most link tags contain a lot more than just an <tt>href</tt> attribute. Other common attributes include: rel, target and title. They can appear before or after the href attribute:</a></b></p><b><a> <code>/<a\s><span>[^>]*</span>href=\"([^\"]*)\"<span>[^>]*</span>>(.*)<\/a>/siU</a\s></code> </a></b><p><b><a>We've added extra patterns before and after the href attribute. They will match any series of characters NOT containing the <tt>></tt> symbol. It's always better when writing regular expressions to specify <strong>exactly</strong> which characters are allowed and not allowed - rather that using the '.' character.</a></b></p><b><a> </a></b><div class="divider"><!-- --></div><b><a> </a></b><h2 id="section_2"><b><a>3. Allow for Missing Quotes</a></b></h2><b><a> </a></b><p><b><a>Up to now we've assumed that the link address is going to be enclosed in double-quotes. Unfortunately there's nothing enforcing this so a lot of people simply leave them out. The problem is that we were relying on the quotes to be there to indicate where the address starts <b>and</b> ends. Without the quotes we have a problem.</a></b></p><b><a> </a></b><p><b><a>It would be simple enough (even trivial) to write a second regexp, but where's the fun in that when we can do it all with one:</a></b></p><b><a> <code class="final">/<a\s[^>]*href=<span>(\"??)</span>([^\"<span> ></span>]*<span>?</span>)<span>\\1</span>[^>]*>(.*)<\/a>/siU</a\s[^></code> </a></b><p><b><a><small><b>Note:</b> There are many different ways of implementing this regular expression. Some may be better than the example presented here, but "If it ain't broke..."</small></a></b></p><b><a> </a></b><p><b><a>What can I say? Regular expressions are a lot of fun to work with but when it takes a half-hour to work out where to put an extra <tt>?</tt> your really know you're in deep.</a></b></p><b><a> </a></b><p><b><a>Firstly, what's with those extra <tt>?</tt>'s?</a></b></p><b><a> </a></b><p><b><a>Because we used the <tt>U</tt> modifier, all patterns in the regexp default to 'ungreedy'. Adding an extra <tt>?</tt> after a <tt>?</tt> or <tt>*</tt> reverses that behaviour back to 'greedy' but just for the preceding pattern. Without this, for reasons that are difficult to explain, the expression fails. Basically anything following <tt>href=</tt> is lumped into the <tt>[^>]*</tt> expression.</a></b></p><b><a> </a></b><p><b><a>We've added an extra capture to the regexp that matches a double-quote if it's there: <tt>(\"??)</tt>. There is then a backreference <tt>\\1</tt> that matches the closing double-quote - if there was an opening one.</a></b></p><b><a> </a></b><p><b><a>To cater for links without quotes, the pattern to match the link address itself has been changed from <tt>[^\"]*</tt> to <tt style="white-space: nowrap;">[^\" >]*?</tt>. That means that the link can be terminated by not just a double-quote (the previous behaviour) but also a space or <tt>></tt> symbol.</a></b></p><b><a> </a></b><div class="divider"><!-- --></div><b><a> </a></b><h2 id="section_3"><b><a>4. Refining the Regexp</a></b></h2><b><a> </a></b><p><b><a>Given the nature of the WWW there are always going to be cases where the regular expression breaks down. Small changes to the patterns can fix these.</a></b></p><b><a> </a></b><h4><b><a>spaces around the <tt>=</tt> after href:</a></b></h4><b><a> <code>/<a\s[^>]*href<span>\s*</span>=<span>\s*</span>(\"??)([^\" >]*?)\\1[^>]*>(.*)<\/a>/siU</a\s[^></code> </a></b><h4><b><a>matching only links starting with http:</a></b></h4><b><a> <code>/<a\s[^>]*href=(\"??)(<span>http</span>[^\" >]*?)\\1[^>]*>(.*)<\/a>/siU</a\s[^></code> </a></b><h4><b><a>single quotes around the link address:</a></b></h4><b><a> <code>/<a\s[^>]*href=(<span>[</span>\"<span>\']</span>??)([^\" >]*?)\\1[^>]*>(.*)<\/a>/siU</a\s[^></code> </a></b><p><b><a>And yes, all of these modifications can be added to the version above to make one super-regexp, but the result is just too painful to look at so I'll leave that as an exercise.</a></b></p><b><a> </a></b><p><b><a><small><b>Note:</b> All of the expressions on this page have been tested to some extent, but mistakes can occur in transcribing so please report any errors you may have found when implementing these examples.</small></a></b></p><b><a> </a></b><div class="divider"><!-- --></div><b><a> </a></b><h2 id="section_4"><b><a>5. Using the Regular Expression to <em>parse</em> <em>HTML</em></a></b></h2><b><a> </a></b><p><b><a>Using the default for </a><a href="http://www.php.net/preg_match_all" target="_blank"></a></b>preg_match_all <b>the array returned contains an array of the first 'capture' then an array of the second capture and so forth. By capture we mean patterns contained in <tt>()</tt>:</b></p><b> <code class="final"> # Original <em>PHP</em> code by Chirp Internet: www.chirp.com.au # Please acknowledge use of this code by including this header. $url = "http://www.example.net/somepage.<em>html</em>"; $input = @</code></b>file_get_contents<b><code class="final">($url) or die('Could not access file: $url'); $regexp = "<tt><a\s[^>]*href=(\"??)([^\" >]*?)\\1[^>]*>(.*)<\/a></a\s[^></tt>"; if(preg_match_all("/$regexp/siU", $input, $matches)) { # $matches[2] = array of link addresses # $matches[3] = array of link text - including <em>HTML</em> code }</code> </b><p><b>Using <tt>PREG_SET_ORDER</tt> each link matched has it's own array in the return value:</b></p><b> <code class="final"> # Original <em>PHP</em> code by Chirp Internet: www.chirp.com.au # Please acknowledge use of this code by including this header. $url = "http://www.example.net/somepage.<em>html</em>"; $input = @file_get_contents($url) or die('Could not access file: $url'); $regexp = "<tt><a\s[^>]*href=(\"??)([^\" >]*?)\\1[^>]*>(.*)<\/a></a\s[^></tt>"; if(preg_match_all("/$regexp/siU", $input, $matches<span>, PREG_SET_ORDER</span>)) { foreach($matches as $match) { # $match[2] = link address # $match[3] = link text } }</code> </b><p><b>If you find any cases where this code falls down, let us know using the Feedback link below.</b></p><b> </b><p><b>Before using this or similar scripts to fetch pages from other websites, we suggest you read through the related article on </b>setting a user agent and parsing robots.txt<b>.</b></p><b> </b><div class="divider"><!-- --></div><b> </b><h2 id="section_5"><b>6. First checking robots.txt</b></h2><b> </b><p><b>As mentioned above, before using a script to download files you should <b>always</b> check the relevant <tt>robots.txt</tt> file. Here we're making use of the <tt>robots_allowed</tt> function from the article linked above to determine whether we're allowed to access the file:</b></p><b> <code class="final"> # Original <em>PHP</em> code by Chirp Internet: www.chirp.com.au # Please acknowledge use of this code by including this header. <span> ini_set('user_agent', '<i>NameOfAgent (http://www.example.net)</i>');</span> $url = "http://www.example.net/somepage.<em>html</em>"; <span> if(robots_allowed($url, "<i>NameOfAgent</i>")) {</span> $input = @file_get_contents($url) or die('Could not access file: $url'); $regexp = "<tt><a\s[^>]*href=(\"??)([^\" >]*?)\\1[^>]*>(.*)<\/a></a\s[^></tt>"; if(preg_match_all("/$regexp/siU", $input, $matches, PREG_SET_ORDER)) { foreach($matches as $match) { # $match[2] = link address # $match[3] = link text } } <span> } else { die('Access denied by robots.txt'); }</span></code> </b><p><b>Now you're well on the way to building a professional web spider. If you're going to use this in practice you might want to look at: caching the robots.txt file so that it's not downloaded every time (a la Slurp); checking the server headers and </b>server response codes<b>; and adding a pause between multiple requests - for starters.</b></p>Minh Sonhttp://www.blogger.com/profile/13709010922780861505noreply@blogger.com0tag:blogger.com,1999:blog-8629809476029995822.post-85115382780841491042008-10-02T11:41:00.003+07:002008-10-02T11:45:02.713+07:00Using cURL with PHP<span style="font-weight: bold;">Basic cURL:<br /></span><code style="white-space: nowrap;"><code><span style="color: rgb(0, 0, 0);"> <span style="color: rgb(0, 0, 187);"><span style="color: rgb(0, 119, 0);">= </span><span style="color: rgb(0, 0, 187);">curl_init</span><span style="color: rgb(0, 119, 0);">(</span><span style="color: rgb(221, 0, 0);">'http://www.target.com'</span><span style="color: rgb(0, 119, 0);">); </span><span style="color: rgb(255, 128, 0);">// the target<br /></span><span style="color: rgb(0, 0, 187);">curl_setopt</span><span style="color: rgb(0, 119, 0);">(</span><span style="color: rgb(0, 0, 187);">$ch</span><span style="color: rgb(0, 119, 0);">, </span><span style="color: rgb(0, 0, 187);">CURLOPT_RETURNTRANSFER</span><span style="color: rgb(0, 119, 0);">, </span><span style="color: rgb(0, 0, 187);">1</span><span style="color: rgb(0, 119, 0);">); </span><span style="color: rgb(255, 128, 0);">// return the page<br /></span><span style="color: rgb(0, 0, 187);">$result </span><span style="color: rgb(0, 119, 0);">= </span><span style="color: rgb(0, 0, 187);">curl_exec </span><span style="color: rgb(0, 119, 0);">(</span><span style="color: rgb(0, 0, 187);">$ch</span><span style="color: rgb(0, 119, 0);">); </span><span style="color: rgb(255, 128, 0);">// executing the cURL<br /></span><span style="color: rgb(0, 0, 187);">curl_close </span><span style="color: rgb(0, 119, 0);">(</span><span style="color: rgb(0, 0, 187);">$ch</span><span style="color: rgb(0, 119, 0);">); </span><span style="color: rgb(255, 128, 0);">// Closing connection<br /><br /></span><span style="color: rgb(0, 119, 0);">echo </span><span style="color: rgb(0, 0, 187);">$result</span><span style="color: rgb(0, 119, 0);">;<br /></span><span style="color: rgb(0, 0, 187);">?></span><br /></span></span></code></code>that code would get the source code of target.com, and echo it.<br /><br /><span style="font-weight: bold;">Post via cURL: </span><br /><code style="white-space: nowrap;"><code><span style="color: rgb(0, 0, 0);"> <span style="color: rgb(0, 0, 187);"><span style="color: rgb(0, 119, 0);">= </span><span style="color: rgb(221, 0, 0);">"field_name=field_value&submit_value=submit\";<br /><br />$ch = curl_init('http://www.target.com'); // the target<br />curl_setopt ($ch, CURLOPT_POST, 1); // telling cURl to POST<br />curl_setopt ($ch, CURLOPT_POSTFIELDS, $data);<br />curl_exec ($ch); // executing the cURL<br />curl_close ($ch); // Closing connection<br /><br />?></span><br /><br /></span></span></code></code> Simple posting via cURL.<br /><br />- Using cookies with cURL:<br /><br />( You will find this usefull when you are trying to do something that needs a login and a cookie stored )<br /><br /><code style="white-space: nowrap;"><code><span style="color: rgb(0, 0, 0);"> <span style="color: rgb(0, 0, 187);"><span style="color: rgb(0, 119, 0);">= </span><span style="color: rgb(0, 0, 187);">curl_init</span><span style="color: rgb(0, 119, 0);">();<br /></span><span style="color: rgb(0, 0, 187);">curl_setopt</span><span style="color: rgb(0, 119, 0);">(</span><span style="color: rgb(0, 0, 187);">$ch</span><span style="color: rgb(0, 119, 0);">, </span><span style="color: rgb(0, 0, 187);">CURLOPT_URL</span><span style="color: rgb(0, 119, 0);">, </span><span style="color: rgb(221, 0, 0);">'http://www.target.com'</span><span style="color: rgb(0, 119, 0);">);<br /></span><span style="color: rgb(0, 0, 187);">curl_setopt</span><span style="color: rgb(0, 119, 0);">(</span><span style="color: rgb(0, 0, 187);">$ch</span><span style="color: rgb(0, 119, 0);">, </span><span style="color: rgb(0, 0, 187);">CURLOPT_FOLLOWLOCATION</span><span style="color: rgb(0, 119, 0);">, </span><span style="color: rgb(0, 0, 187);">1</span><span style="color: rgb(0, 119, 0);">);<br /></span><span style="color: rgb(0, 0, 187);">curl_setopt</span><span style="color: rgb(0, 119, 0);">(</span><span style="color: rgb(0, 0, 187);">$ch</span><span style="color: rgb(0, 119, 0);">, </span><span style="color: rgb(0, 0, 187);">CURLOPT_RETURNTRANSFER</span><span style="color: rgb(0, 119, 0);">, </span><span style="color: rgb(0, 0, 187);">1</span><span style="color: rgb(0, 119, 0);">);<br /></span><span style="color: rgb(0, 0, 187);">curl_setopt</span><span style="color: rgb(0, 119, 0);">(</span><span style="color: rgb(0, 0, 187);">$ch</span><span style="color: rgb(0, 119, 0);">, </span><span style="color: rgb(0, 0, 187);">CURLOPT_COOKIEJAR</span><span style="color: rgb(0, 119, 0);">, </span><span style="color: rgb(221, 0, 0);">'/path/to/cookie.txt'</span><span style="color: rgb(0, 119, 0);">);<br /></span><span style="color: rgb(0, 0, 187);">curl_setopt</span><span style="color: rgb(0, 119, 0);">(</span><span style="color: rgb(0, 0, 187);">$ch</span><span style="color: rgb(0, 119, 0);">, </span><span style="color: rgb(0, 0, 187);">CURLOPT_COOKIEFILE</span><span style="color: rgb(0, 119, 0);">, </span><span style="color: rgb(221, 0, 0);">'/path/to/cookie.txt'</span><span style="color: rgb(0, 119, 0);">);<br /></span><span style="color: rgb(0, 0, 187);">$result </span><span style="color: rgb(0, 119, 0);">= </span><span style="color: rgb(0, 0, 187);">curl_exec</span><span style="color: rgb(0, 119, 0);">(</span><span style="color: rgb(0, 0, 187);">$ch</span><span style="color: rgb(0, 119, 0);">);<br /></span><span style="color: rgb(0, 0, 187);">curl_close</span><span style="color: rgb(0, 119, 0);">(</span><span style="color: rgb(0, 0, 187);">$ch</span><span style="color: rgb(0, 119, 0);">);<br /><br />echo </span><span style="color: rgb(0, 0, 187);">$result</span><span style="color: rgb(0, 119, 0);">;<br /><br /></span><span style="color: rgb(0, 0, 187);">?></span><br /><br /></span></span></code></code>ofcourse you might need to post first into the login form, to get the cookies stored, then you can do other things with you being logged in.<br /><br />- Extra info:<br /><br />you can set `user-agent`, `referrer`, `headers`.. using cURL:<br /><br /><code style="white-space: nowrap;"><code><span style="color: rgb(0, 0, 0);"> <span style="color: rgb(0, 0, 187);"><span style="color: rgb(255, 128, 0);">// set user-agent to DarkMindZ<br /></span><span style="color: rgb(0, 0, 187);">curl_setopt</span><span style="color: rgb(0, 119, 0);">(</span><span style="color: rgb(0, 0, 187);">$ch</span><span style="color: rgb(0, 119, 0);">, </span><span style="color: rgb(0, 0, 187);">CURLOPT_USERAGENT</span><span style="color: rgb(0, 119, 0);">, </span><span style="color: rgb(221, 0, 0);">'DarkMindZ'</span><span style="color: rgb(0, 119, 0);">);<br /></span><span style="color: rgb(0, 0, 187);">?></span><br /></span></span></code></code><code style="white-space: nowrap;"><code><span style="color: rgb(0, 0, 0);"> <span style="color: rgb(0, 0, 187);"><span style="color: rgb(255, 128, 0);">// set referrer darkmindz.com<br /></span><span style="color: rgb(0, 0, 187);">curl_setopt</span><span style="color: rgb(0, 119, 0);">(</span><span style="color: rgb(0, 0, 187);">$ch</span><span style="color: rgb(0, 119, 0);">, </span><span style="color: rgb(0, 0, 187);">CURLOPT_REFERER</span><span style="color: rgb(0, 119, 0);">, </span><span style="color: rgb(221, 0, 0);">"http://www.darkmindz.com\");<br />?></span><br /><br /></span></span></code></code>- Making life easier:<br /><br />This function will help you alot in making things go easy:<br /><br /><code style="white-space: nowrap;"><code><span style="color: rgb(0, 0, 0);"> <span style="color: rgb(0, 0, 187);"><span style="color: rgb(0, 119, 0);">function </span><span style="color: rgb(0, 0, 187);">curl_it</span><span style="color: rgb(0, 119, 0);">(</span><span style="color: rgb(0, 0, 187);">$method</span><span style="color: rgb(0, 119, 0);">, </span><span style="color: rgb(0, 0, 187);">$target</span><span style="color: rgb(0, 119, 0);">, </span><span style="color: rgb(0, 0, 187);">$post_var</span><span style="color: rgb(0, 119, 0);">)<br />{<br /></span><span style="color: rgb(0, 0, 187);">$ch </span><span style="color: rgb(0, 119, 0);">= </span><span style="color: rgb(0, 0, 187);">curl_init</span><span style="color: rgb(0, 119, 0);">();<br /></span><span style="color: rgb(0, 0, 187);">curl_setopt</span><span style="color: rgb(0, 119, 0);">(</span><span style="color: rgb(0, 0, 187);">$ch</span><span style="color: rgb(0, 119, 0);">, </span><span style="color: rgb(0, 0, 187);">CURLOPT_URL</span><span style="color: rgb(0, 119, 0);">, </span><span style="color: rgb(0, 0, 187);">$target</span><span style="color: rgb(0, 119, 0);">);<br /></span><span style="color: rgb(0, 0, 187);">curl_setopt</span><span style="color: rgb(0, 119, 0);">(</span><span style="color: rgb(0, 0, 187);">$ch</span><span style="color: rgb(0, 119, 0);">, </span><span style="color: rgb(0, 0, 187);">CURLOPT_USERAGENT</span><span style="color: rgb(0, 119, 0);">, </span><span style="color: rgb(0, 0, 187);">$_SERVER</span><span style="color: rgb(0, 119, 0);">[</span><span style="color: rgb(221, 0, 0);">'HTTP_USER_AGENT'</span><span style="color: rgb(0, 119, 0);">]);<br /></span><span style="color: rgb(0, 0, 187);">curl_setopt</span><span style="color: rgb(0, 119, 0);">(</span><span style="color: rgb(0, 0, 187);">$ch</span><span style="color: rgb(0, 119, 0);">, </span><span style="color: rgb(0, 0, 187);">CURLOPT_FOLLOWLOCATION</span><span style="color: rgb(0, 119, 0);">, </span><span style="color: rgb(0, 0, 187);">1</span><span style="color: rgb(0, 119, 0);">);<br /></span><span style="color: rgb(0, 0, 187);">curl_setopt</span><span style="color: rgb(0, 119, 0);">(</span><span style="color: rgb(0, 0, 187);">$ch</span><span style="color: rgb(0, 119, 0);">, </span><span style="color: rgb(0, 0, 187);">CURLOPT_RETURNTRANSFER</span><span style="color: rgb(0, 119, 0);">, </span><span style="color: rgb(0, 0, 187);">1</span><span style="color: rgb(0, 119, 0);">);<br /></span><span style="color: rgb(0, 0, 187);">curl_setopt</span><span style="color: rgb(0, 119, 0);">(</span><span style="color: rgb(0, 0, 187);">$ch</span><span style="color: rgb(0, 119, 0);">, </span><span style="color: rgb(0, 0, 187);">CURLOPT_COOKIEJAR</span><span style="color: rgb(0, 119, 0);">, </span><span style="color: rgb(221, 0, 0);">'/path/to/cookie.txt'</span><span style="color: rgb(0, 119, 0);">);<br /></span><span style="color: rgb(0, 0, 187);">curl_setopt</span><span style="color: rgb(0, 119, 0);">(</span><span style="color: rgb(0, 0, 187);">$ch</span><span style="color: rgb(0, 119, 0);">, </span><span style="color: rgb(0, 0, 187);">CURLOPT_COOKIEFILE</span><span style="color: rgb(0, 119, 0);">, </span><span style="color: rgb(221, 0, 0);">'/path/to/cookie.txt'</span><span style="color: rgb(0, 119, 0);">);<br /><br />if (</span><span style="color: rgb(0, 0, 187);">$method </span><span style="color: rgb(0, 119, 0);">== </span><span style="color: rgb(221, 0, 0);">'POST'</span><span style="color: rgb(0, 119, 0);">) {<br /></span><span style="color: rgb(0, 0, 187);">curl_setopt</span><span style="color: rgb(0, 119, 0);">(</span><span style="color: rgb(0, 0, 187);">$ch</span><span style="color: rgb(0, 119, 0);">, </span><span style="color: rgb(0, 0, 187);">CURLOPT_POST</span><span style="color: rgb(0, 119, 0);">, </span><span style="color: rgb(0, 0, 187);">1</span><span style="color: rgb(0, 119, 0);">);<br /></span><span style="color: rgb(0, 0, 187);">curl_setopt</span><span style="color: rgb(0, 119, 0);">(</span><span style="color: rgb(0, 0, 187);">$ch</span><span style="color: rgb(0, 119, 0);">, </span><span style="color: rgb(0, 0, 187);">CURLOPT_POSTFIELDS</span><span style="color: rgb(0, 119, 0);">, </span><span style="color: rgb(0, 0, 187);">$post_var</span><span style="color: rgb(0, 119, 0);">);<br />}<br /><br /></span><span style="color: rgb(0, 0, 187);">$result </span><span style="color: rgb(0, 119, 0);">= </span><span style="color: rgb(0, 0, 187);">curl_exec</span><span style="color: rgb(0, 119, 0);">(</span><span style="color: rgb(0, 0, 187);">$ch</span><span style="color: rgb(0, 119, 0);">);<br /></span><span style="color: rgb(0, 0, 187);">curl_close</span><span style="color: rgb(0, 119, 0);">(</span><span style="color: rgb(0, 0, 187);">$ch</span><span style="color: rgb(0, 119, 0);">);<br />}<br /><br /></span><span style="color: rgb(255, 128, 0);">// usage:<br /><br /></span><span style="color: rgb(0, 0, 187);">curl_it</span><span style="color: rgb(0, 119, 0);">(</span><span style="color: rgb(221, 0, 0);">''</span><span style="color: rgb(0, 119, 0);">, </span><span style="color: rgb(221, 0, 0);">'http://www.darkmindz.com'</span><span style="color: rgb(0, 119, 0);">); </span><span style="color: rgb(255, 128, 0);">// get darkmindz.com homepage<br /><br /></span><span style="color: rgb(0, 0, 187);">curl_it</span><span style="color: rgb(0, 119, 0);">(</span><span style="color: rgb(221, 0, 0);">'POST'</span><span style="color: rgb(0, 119, 0);">, </span><span style="color: rgb(221, 0, 0);">'http://www.darkmindz.com'</span><span style="color: rgb(0, 119, 0);">, </span><span style="color: rgb(221, 0, 0);">'user=dude&pass=dude2'</span><span style="color: rgb(0, 119, 0);">); </span><span style="color: rgb(255, 128, 0);">// login using dude:dude2<br /><br /></span><span style="color: rgb(0, 0, 187);">?></span> </span> </span></code></code>Minh Sonhttp://www.blogger.com/profile/13709010922780861505noreply@blogger.com1tag:blogger.com,1999:blog-8629809476029995822.post-1207596142220216872008-10-01T01:42:00.001+07:002008-10-01T01:55:32.376+07:006 tips to write less code phpPHP is a good language, but there are always surprises. And today I've seen an interesting approach in Arnold Daniels's blog. He talks about temporary variables in PHP. This tip is useful to "lazy" developers who do not even think about variable names. They may prefer magic names like ${0} and 0 is good enough variable name, why not...<br /><br />But I'm even more lazy then Arnold and sure that when there is no variable, then there is no problem. So here are a few tips that can make your code shorter and harder to read :-)<br /><br />1. Use || (or) and && (and) operations instead of if.<br /><blockquote></blockquote><code style="white-space: nowrap;"><code><span style="color: rgb(0, 0, 0);"><span style="color: rgb(255, 128, 0);">// A lot of code<br /></span><span style="color: rgb(0, 0, 187);">$status </span><span style="color: rgb(0, 119, 0);">= </span><span style="color: rgb(0, 0, 187);">fwrite</span><span style="color: rgb(0, 119, 0);">(</span><span style="color: rgb(0, 0, 187);">$h</span><span style="color: rgb(0, 119, 0);">, </span><span style="color: rgb(221, 0, 0);">'some text'</span><span style="color: rgb(0, 119, 0);">);<br />if (!</span><span style="color: rgb(0, 0, 187);">$status</span><span style="color: rgb(0, 119, 0);">) {<br /> </span><span style="color: rgb(0, 0, 187);">log</span><span style="color: rgb(0, 119, 0);">(</span><span style="color: rgb(221, 0, 0);">'Writing failed'</span><span style="color: rgb(0, 119, 0);">);<br />}<br /><br /></span><span style="color: rgb(255, 128, 0);">// Less code<br /></span><span style="color: rgb(0, 119, 0);">${</span><span style="color: rgb(0, 0, 187);">0</span><span style="color: rgb(0, 119, 0);">} = </span><span style="color: rgb(0, 0, 187);">fwrite</span><span style="color: rgb(0, 119, 0);">(</span><span style="color: rgb(0, 0, 187);">$h</span><span style="color: rgb(0, 119, 0);">, </span><span style="color: rgb(221, 0, 0);">'some text'</span><span style="color: rgb(0, 119, 0);">);<br />if (!${</span><span style="color: rgb(0, 0, 187);">0</span><span style="color: rgb(0, 119, 0);">}) </span><span style="color: rgb(0, 0, 187);">log</span><span style="color: rgb(0, 119, 0);">(</span><span style="color: rgb(221, 0, 0);">'Writing failed'</span><span style="color: rgb(0, 119, 0);">);<br /><br /></span><span style="color: rgb(255, 128, 0);">// Even less code<br /></span><span style="color: rgb(0, 0, 187);">fwrite</span><span style="color: rgb(0, 119, 0);">(</span><span style="color: rgb(0, 0, 187);">$h</span><span style="color: rgb(0, 119, 0);">, </span><span style="color: rgb(221, 0, 0);">'some text'</span><span style="color: rgb(0, 119, 0);">) or </span><span style="color: rgb(0, 0, 187);">log</span><span style="color: rgb(0, 119, 0);">(</span><span style="color: rgb(221, 0, 0);">'Writing failed'</span><span style="color: rgb(0, 119, 0);">);<br /></span></span></code></code> 2. Use ternary operator.<br /><code style="white-space: nowrap;"><code><span style="color: rgb(0, 0, 0);"><span style="color: rgb(255, 128, 0);">// A lot of code<br /></span><span style="color: rgb(0, 119, 0);">if (</span><span style="color: rgb(0, 0, 187);">$age </span><span style="color: rgb(0, 119, 0);">< </span><span style="color: rgb(0, 0, 187);">16</span><span style="color: rgb(0, 119, 0);">) {<br /> </span><span style="color: rgb(0, 0, 187);">$message </span><span style="color: rgb(0, 119, 0);">= </span><span style="color: rgb(221, 0, 0);">'Welcome!'</span><span style="color: rgb(0, 119, 0);">;<br />} else {<br /> </span><span style="color: rgb(0, 0, 187);">$message </span><span style="color: rgb(0, 119, 0);">= </span><span style="color: rgb(221, 0, 0);">'You are too old!'</span><span style="color: rgb(0, 119, 0);">;<br />}<br /><br /></span><span style="color: rgb(255, 128, 0);">// Less code<br /></span><span style="color: rgb(0, 0, 187);">$message </span><span style="color: rgb(0, 119, 0);">= </span><span style="color: rgb(221, 0, 0);">'You are too old!'</span><span style="color: rgb(0, 119, 0);">;<br />if (</span><span style="color: rgb(0, 0, 187);">$age </span><span style="color: rgb(0, 119, 0);">< </span><span style="color: rgb(0, 0, 187);">16</span><span style="color: rgb(0, 119, 0);">) {<br /> </span><span style="color: rgb(0, 0, 187);">$message </span><span style="color: rgb(0, 119, 0);">= </span><span style="color: rgb(221, 0, 0);">'Welcome!'</span><span style="color: rgb(0, 119, 0);">;<br />}<br /><br /></span><span style="color: rgb(255, 128, 0);">// Even less code<br /></span><span style="color: rgb(0, 0, 187);">$message </span><span style="color: rgb(0, 119, 0);">= (</span><span style="color: rgb(0, 0, 187);">$age </span><span style="color: rgb(0, 119, 0);">< </span><span style="color: rgb(0, 0, 187);">16</span><span style="color: rgb(0, 119, 0);">) ? </span><span style="color: rgb(221, 0, 0);">'Welcome!' </span><span style="color: rgb(0, 119, 0);">: </span><span style="color: rgb(221, 0, 0);">'You are too old!'</span><span style="color: rgb(0, 119, 0);">;<br /><br /></span></span></code></code>3. Use for instead of while.<br /><code style="white-space: nowrap;"><code><span style="color: rgb(0, 0, 0);"><span style="color: rgb(255, 128, 0);">// A lot of code<br /></span><span style="color: rgb(0, 0, 187);">$i </span><span style="color: rgb(0, 119, 0);">= </span><span style="color: rgb(0, 0, 187);">0</span><span style="color: rgb(0, 119, 0);">;<br />while (</span><span style="color: rgb(0, 0, 187);">$i </span><span style="color: rgb(0, 119, 0);">< </span><span style="color: rgb(0, 0, 187);">100</span><span style="color: rgb(0, 119, 0);">) {<br /> </span><span style="color: rgb(0, 0, 187);">$source</span><span style="color: rgb(0, 119, 0);">[] = </span><span style="color: rgb(0, 0, 187);">$target</span><span style="color: rgb(0, 119, 0);">[</span><span style="color: rgb(0, 0, 187);">$i</span><span style="color: rgb(0, 119, 0);">];<br /> </span><span style="color: rgb(0, 0, 187);">$i </span><span style="color: rgb(0, 119, 0);">+= </span><span style="color: rgb(0, 0, 187);">2</span><span style="color: rgb(0, 119, 0);">;<br />}<br /><br /></span><span style="color: rgb(255, 128, 0);">// less code<br /></span><span style="color: rgb(0, 119, 0);">for (</span><span style="color: rgb(0, 0, 187);">$i </span><span style="color: rgb(0, 119, 0);">= </span><span style="color: rgb(0, 0, 187);">0</span><span style="color: rgb(0, 119, 0);">; </span><span style="color: rgb(0, 0, 187);">$i </span><span style="color: rgb(0, 119, 0);">< </span><span style="color: rgb(0, 0, 187);">100</span><span style="color: rgb(0, 119, 0);">; </span><span style="color: rgb(0, 0, 187);">$source</span><span style="color: rgb(0, 119, 0);">[] = </span><span style="color: rgb(0, 0, 187);">$target</span><span style="color: rgb(0, 119, 0);">[</span><span style="color: rgb(0, 0, 187);">$i</span><span style="color: rgb(0, 119, 0);">+=</span><span style="color: rgb(0, 0, 187);">2</span><span style="color: rgb(0, 119, 0);">]);<br /><br /></span></span></code></code>4. In some cases PHP requires you to create a variable. Some examples you can find in my PHP fluent API tips article. Another example is getting array element when array is returned by the function.<br /><code style="white-space: nowrap;"><code><span style="color: rgb(0, 0, 0);"><span style="color: rgb(0, 0, 187);">$ext </span><span style="color: rgb(0, 119, 0);">= </span><span style="color: rgb(0, 0, 187);">pathinfo</span><span style="color: rgb(0, 119, 0);">(</span><span style="color: rgb(221, 0, 0);">'file.png'</span><span style="color: rgb(0, 119, 0);">)[</span><span style="color: rgb(221, 0, 0);">'extension'</span><span style="color: rgb(0, 119, 0);">];<br /><br /></span></span></code></code>// result: Parse error: syntax error, unexpected '[' in ... on line ...<br />To handle all these situation you can create a set of small functions which shortcuts frequently used operations<br /><br /><code style="white-space: nowrap;"><code><span style="color: rgb(0, 0, 0);"><span style="color: rgb(255, 128, 0);">// returns reference to the created object<br /></span><span style="color: rgb(0, 119, 0);">function &</span><span style="color: rgb(0, 0, 187);">r</span><span style="color: rgb(0, 119, 0);">(</span><span style="color: rgb(0, 0, 187);">$v</span><span style="color: rgb(0, 119, 0);">) { return </span><span style="color: rgb(0, 0, 187);">$v</span><span style="color: rgb(0, 119, 0);">; }<br /><br /></span><span style="color: rgb(255, 128, 0);">// returns array offset<br /></span><span style="color: rgb(0, 119, 0);">function &</span><span style="color: rgb(0, 0, 187);">a</span><span style="color: rgb(0, 119, 0);">(&</span><span style="color: rgb(0, 0, 187);">$a</span><span style="color: rgb(0, 119, 0);">, </span><span style="color: rgb(0, 0, 187);">$i</span><span style="color: rgb(0, 119, 0);">) { return </span><span style="color: rgb(0, 0, 187);">$a</span><span style="color: rgb(0, 119, 0);">[</span><span style="color: rgb(0, 0, 187);">$i</span><span style="color: rgb(0, 119, 0);">]; }<br /><br /></span></span></code></code>5. Explore the language you use. PHP is very powerful and has a lot of functions and interesting aspects of the language which can make your code more efficient and short.<br /><br />6. When it is better to write more and then read the code easily, do not be lazy.<br />Spend a few seconds and write a comment and more readable construction. This is the only tip in this list that really can save hours, not minutes.<br /><code style="white-space: nowrap;"><code><span style="color: rgb(0, 0, 0);"><span style="color: rgb(0, 119, 0);"><br /></span></span></code></code>Minh Sonhttp://www.blogger.com/profile/13709010922780861505noreply@blogger.com0tag:blogger.com,1999:blog-8629809476029995822.post-45285697447398759432008-10-01T01:08:00.001+07:002008-10-01T01:54:48.592+07:0050+ PHP optimisation tips revisited<p>After reading an article some time ago entitled “40 Tips for optimizing your php Code” (and some others that are suspiciously similar), I decided to redo it, but properly this time with more accurate tips, providing references and citations for each and every one.</p> <p>The result is this list of over 50 PHP optimisation tips…</p> <p>Enjoy!</p> <p><span id="more-178"></span></p> <ol><li><em>echo</em> is faster than <em>print</em>. [Citation]</li><li>Wrap your string in single quotes (’) instead of double quotes (”) is faster because PHP searches for variables inside “…” and not in ‘…’, use this when you’re not using variables you need evaluating in your string. [Citation]</li><li>Use sprintf instead of variables contained in double quotes, it’s about 10x faster. [Citation]</li><li>Use echo’s multiple parameters (or stacked) instead of string concatenation. [Citation]</li><li>Use pre-calculations, set the maximum value for your for-loops before and not in the loop. ie: for ($x=0; $x < max="count($array)"></li><li>Unset or null your variables to free memory, especially large arrays. [Citation]</li><li>Avoid magic like __get, __set, __autoload. [Citation]</li><li>Use require() instead of require_once() where possible. [Citation]</li><li>Use full paths in includes and requires, less time spent on resolving the OS paths. [Citation]</li><li>require() and include() are identical in every way except require halts if the file is missing. Performance wise there is very little difference. [Citation]</li><li>Since PHP5, the time of when the script started executing can be found in $_SERVER[’REQUEST_TIME’], use this instead of time() or microtime(). [Citation]</li><li>PCRE regex is quicker than EREG, but always see if you can use quicker native functions such as strncasecmp, strpbrk and stripos instead. [Citation]</li><li>When parsing with XML in PHP try xml2array, which makes use of the PHP XML functions, for HTML you can try PHP’s DOM document or DOM XML in PHP4. [Citation]</li><li>str_replace is faster than preg_replace, str_replace is best overall, however strtr is sometimes quicker with larger strings. Using array() inside str_replace is usually quicker than multiple str_replace. [Citation]</li><li>“else if” statements are faster than select statements aka case/switch. [Citation]</li><li>Error suppression with @ is very slow. [Citation]</li><li>To reduce bandwidth usage turn on mod_deflate in Apache v2 [Citation] or for Apache v1 try mod_gzip. [Citation]</li><li>Close your database connections when you’re done with them. [Citation]</li><li>$row[’id’] is 7 times faster than $row[id], because if you don’t supply quotes it has to guess which index you meant, assuming you didn’t mean a constant. [Citation]</li><li>Use tags when declaring PHP as all other styles are depreciated, including short tags. [Citation]</li><li>Use strict code, avoid suppressing errors, notices and warnings thus resulting in cleaner code and less overheads. Consider having error_reporting(E_ALL) always on. [Citation]</li><li>PHP scripts are be served at 2-10 times slower by Apache httpd than a static page. Try to use static pages instead of server side scripts. [Citation]</li><li>PHP scripts (unless cached) are compiled on the fly every time you call them. Install a PHP caching product (such as memcached or eAccelerator or Turck MMCache) to typically increase performance by 25-100% by removing compile times. You can even setup eAccelerator on cPanel using EasyApache3. [Citation]</li><li>An alternative caching technique when you have pages that don’t change too frequently is to cache the HTML output of your PHP pages. Try Smarty or Cache Lite. [Citation]</li><li>Use isset where possible in replace of strlen. (ie: if (strlen($foo) <></li><li>++$i is faster than $ i++, so use pre-increment where possible. [Citation]</li><li>Make use of the countless predefined functions of PHP, don’t attempt to build your own as the native ones will be far quicker; if you have very time and resource consuming functions, consider writing them as C extensions or modules. [Citation]</li><li>Profile your code. A profiler shows you, which parts of your code consumes how many time. The Xdebug debugger already contains a profiler. Profiling shows you the bottlenecks in overview. [Citation]</li><li>Document your code. [Citation]</li><li>Learn the difference between good and bad code. [Citation]</li><li>Stick to coding standards, it will make it easier for you to understand other people’s code and other people will be able to understand yours. [Citation]</li><li>Separate code, content and presentation: keep your PHP code separate from your HTML. [Citation]</li><li>Don’t bother using complex template systems such as Smarty, use the one that’s included in PHP already, see ob_get_contents and extract, and simply pull the data from your database. [Citation]</li><li>Never trust variables coming from user land (such as from $_POST) use mysql_real_escape_string when using mysql, and htmlspecialchars when outputting as HTML. [Citation]</li><li>For security reasons never have anything that could expose information about paths, extensions and configuration, such as display_errors or phpinfo() in your webroot. [Citation]</li><li>Turn off register_globals (it’s disabled by default for a reason!). No script at production level should need this enabled as it is a security risk. Fix any scripts that require it on, and fix any scripts that require it off using unregister_globals(). Do this now, as it’s set to be removed in PHP6. [Citation]</li><li>Avoid using plain text when storing and evaluating passwords to avoid exposure, instead use a hash, such as an md5 hash. [Citation]</li><li>Use ip2long() and long2ip() to store IP addresses as integers instead of strings. [Citation]</li><li>You can avoid reinventing the wheel by using the PEAR project, giving you existing code of a high standard. [Citation]</li><li>When using header(’Location: ‘.$url); remember to follow it with a die(); as the script continues to run even though the location has changed or avoid using it all together where possible. [Citation]</li><li>In OOP, if a method can be a static method, declare it static. Speed improvement is by a factor of 4. [Citation].</li><li>Incrementing a local variable in an OOP method is the fastest. Nearly the same as calling a local variable in a function and incrementing a global variable is 2 times slow than a local variable. [Citation]</li><li>Incrementing an object property (eg. $this->prop++) is 3 times slower than a local variable. [Citation]</li><li>Incrementing an undefined local variable is 9-10 times slower than a pre-initialized one. [Citation]</li><li>Just declaring a global variable without using it in a function slows things down (by about the same amount as incrementing a local var). PHP probably does a check to see if the global exists. [Citation]</li><li>Method invocation appears to be independent of the number of methods defined in the class because I added 10 more methods to the test class (before and after the test method) with no change in performance. [Citation]</li><li>Methods in derived classes run faster than ones defined in the base class. [Citation]</li><li>A function call with one parameter and an empty function body takes about the same time as doing 7-8 $localvar++ operations. A similar method call is of course about 15 $localvar++ operations. [Citation]</li><li>Not everything has to be OOP, often it is just overhead, each method and object call consumes a lot of memory. [Citation]</li><li>Never trust user data, escape your strings that you use in SQL queries using mysql_real_escape_string, instead of mysql_escape_string or addslashes. Also note that if magic_quotes_gpc is enabled you should use stripslashes first. [Citation]</li><li>Unset your database variables (the password at a minimum), you shouldn’t need it after you make the database connection.</li><li>RTFM! PHP offers a fantastic manual, possibly one of the best out there, which makes it a very hands on language, providing working examples and talking in plain English. Please USE IT! [Citation]</li></ol> <p>If you still need help, try #PHP on the EFnet IRC Network. (Read the !rules first).</p> <p>Also see:</p> <ul><li>an Excellent Article about optimizing PHP by John Lim</li><li>PEAR coding standards</li><li>PHP best practices by ez.no (Use left and right keys to scroll through the pages)</li><li>Tuning Apache and PHP for Speed on Unix</li><li>Premature Optimisation</li><li>PHP and Performance</li><li>Performance Tuning PHP</li><li>Develop rock-solid code in PHP</li><li>12 PHP optimization tips</li><li>10 things you (probably) didn’t know about PHP</li></ul>Minh Sonhttp://www.blogger.com/profile/13709010922780861505noreply@blogger.com1tag:blogger.com,1999:blog-8629809476029995822.post-52626431409298623832008-10-01T00:42:00.002+07:002008-10-01T01:03:36.508+07:00Plan to build a travel web 2.0 communityIn the real work, to exists you must work hardly day and day so you will have sometime to relax to balance your hearth and thought.<br />But after work, where do you want to go? Anywhere you can find on Internet. But You can't discuss about these paces you want to go like as hotel, bus or food and important is price you must pay for Travel Agent to make this tours.<br />With Communication Travel you will chat and have more useful information about place you want to go for rest and relax. Anyone have arrived will talk to you about what you must prepare for this trip.<br />In other case You don't have money to make a trip, you can view and enjoy the nice places through looking these pictures form other members.<br /><br />Local Travel Agent can post their service and nice places to help tourists have more info about trip which they want like as price, time, food, transport etc ...<br />I think you will like to Vietnam to make a good trip with your girl friend, your family and meeting friend when I make this project.<br /><br />I hope you will travel a lot and support to our community your pictures and anything wonderful about your trip.Minh Sonhttp://www.blogger.com/profile/13709010922780861505noreply@blogger.com0tag:blogger.com,1999:blog-8629809476029995822.post-27614619554463548802008-09-28T01:23:00.001+07:002008-09-28T02:07:06.882+07:00My php cms base on Zend Framework.I have thought about a CMS with Zend Framework that's a reason I have learned this php Framework.<br />I have spent much time to research and seek, learn about Zend Framework. It is great php Framework to make quickly your web application.<br />But It make you spent more time to much to learn and use and It is don't make for who is strange with MVC Structure.<br />To make a CMS base on It, you must improve your using ZF skill by reading code and combine components together.<br />It will be done soon and release as open source cms.<br /><span style="font-weight: bold;">Live demo:</span><br /><a href="http://zfvn.co.cc">http://zfvn.co.cc</a> [ use cache ]<br /><a href="http://thanhblog.net">http://thanhblog.net</a> [ does not use cache ]<br /><br />The components was used in this cms:<br />Zend_View<br />Zend_Layout<br />Zend_Db<br />Zend_Cache<br />Zend_Auth<br />Zend_Session [ Required by Zend_Auth ]<br />Zend_ControllerMinh Sonhttp://www.blogger.com/profile/13709010922780861505noreply@blogger.com0tag:blogger.com,1999:blog-8629809476029995822.post-52857592028624673972008-09-28T01:06:00.002+07:002008-10-01T01:53:59.648+07:00Zend Framework is not the best php framework.It make me spend more time to learning and seacrh so hard. But Its speed is slowly. You must don't use or disable It when a project base on ZF was built.<br />I thing the best component Zend Framework is trainning to gether.Minh Sonhttp://www.blogger.com/profile/13709010922780861505noreply@blogger.com0tag:blogger.com,1999:blog-8629809476029995822.post-48542692263234936992008-09-28T00:41:00.003+07:002008-10-01T01:52:25.907+07:00Law of phpWelcome to law of php when you'r here you can read articles on my blog.<br />Because my english is not good to explain anything in my head , It is only the sharing my knowledge about Internet and web development so don't laugh If I have mistake about english grammar It only anything I have made, and spend.<br />Where you found any info, tutorials about rule of development web.<br />I hope this page is good for me and you, a bridge to connect me to you nearlyer.<br />Regard.Minh Sonhttp://www.blogger.com/profile/13709010922780861505noreply@blogger.com1