Monday, May 18, 2009

UPDATE 20090516 : mingpao.php

Mingpao News has changed their layout, so there is a need to rewrite the parser.

Here is the code :

Code:

<?php

function InitCurl()
{
global $ch;

$ch = curl_init();
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 0);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.3) Gecko/20070309 Firefox/2.0.0.3");
}

function CurlGetContent($url)
{
global $ch;

curl_setopt($ch, CURLOPT_URL, $url);

return curl_exec($ch);
}

function DisplayNewsPage($news)
{
global $base_url;
global $today;

$b = $base_url.$today.'/';
print '<table><tr><td width=200 VALIGN="top">';

if (is_array($news['PicSet']))
{
foreach ($news['PicSet'] as $k => $v)
{
print "<img src=\"".$b.$v['Pic']."\" onmouseover=\"popImg(true, this);\" onmouseout=\"popImg(false);\" width=180><BR>\r\n";
print $v['PicText']."<BR><BR>\r\n";
}
}
print '</td><td VALIGN="top">';
print '<H1>'.$news['Title'].'</H1><BR><BR>';
print $news['Body'];
print '</td></tr>';
print '</table>'."\r\n";
ob_flush();
flush();
}

function GetPic($url)
{
global $base_url;
global $today;


$page = CurlGetContent($base_url.$today.'/'.$url.'?Mode=1');

$pic = $pictext = "";

if (!preg_match_all('|mainphotolink"><img src="([^"]+)"|U', $page, $layer1, PREG_SET_ORDER)) return "";

$pic = $layer1[0][1];

if (preg_match_all("|<td align=\"center\" class=\"caption\">(.*?)</td>|U", $page, $layer2, PREG_SET_ORDER))
{
$pictext = $layer2[0][1];
}

return Array(
'Pic' => $pic,
'PicText' => $pictext,
);

}

function GetNewsContent($url)
{
global $base_url;
global $today;

$page = CurlGetContent($base_url.$today.'/'.$url);

$p = $body = $title = $pictext = $pic = "";

if (preg_match_all("|<h1>(.*?)</h1>|U", $page, $layer1, PREG_SET_ORDER))
{
$title = str_replace('<br>', ' ', $layer1[0][1]);
}

if (preg_match_all('|<div class="[^"]*" id="newscontent[\d]*">(.*?)</div>|is',$page, $layer2, PREG_SET_ORDER))
{
foreach ($layer2 as $v)
{
$body .= $v[1];
}
}

if (preg_match_all("|<a href=\"([^\"]+)\?Mode=1\">|U",$page, $layer3, PREG_SET_ORDER))
{
foreach ($layer3 as $v)
{
$ar = GetPic($v[1]);
if (is_array($ar)) $p[$v[1]] = $ar;
}

}

return Array(
'Title' => $title,
'Body' => $body,
'PicSet' => $p,
);

}

function ProcessAllTitles()
{
global $title_pages_set;
if (!is_array($title_pages_set))
{
echo 'ERR';
return;
}
foreach ($title_pages_set as $key => $val)
{
if (!strstr($val,"main"))
{
ProcessPage($key, $val);
}
}
}

function GetTitle($ming)
{
if (!preg_match_all("|sublink\[0\] = \'(.*?)\'|U",$ming, $layer1, PREG_SET_ORDER)) return "1";

if (!preg_match_all("|<a href=\"([^\"]+)\"[^>]*?>([^<]+)</a>|U",$layer1[0][1], $layer2, PREG_SET_ORDER)) return "2";

$titles = "";
foreach ($layer2 as $v)
{
if (!preg_match_all("|index.htm|U", $v[1], $layer3, PREG_SET_ORDER)) continue;
$title = $v[2];
$link = $v[1];
$titles[] = Array(
'Title' => $title,
'Link' => $link,
);
}
return $titles;

}

function GetHeadlines($url)
{
global $base_url;
global $today;

$page = CurlGetContent($base_url.$today.'/'.$url);
$heads = "";

if (!preg_match_all('#<(h1|li)>(.*?)</\1>#U', $page, $layer1, PREG_SET_ORDER)) return;
foreach ($layer1 as $v)
{
if (!preg_match_all("|<a href=\"([^\"]+)\">(.*?)</a>|U", $v[2], $layer2, PREG_SET_ORDER)) continue;
$title = str_replace('<br>','&nbsp;&nbsp;&nbsp;&nbsp;',trim($layer2[0][2]));
$link = $layer2[0][1];
$heads[$link] = Array(
'Title' => $title,
'Link' => $link,
);
}
return $heads;
}

function DisplayHeader($heads, $name)
{
print '<FORM action='.$_SERVER['PHP_SELF'].' method=POST target=_blank>';
print '<table><tr>';

print '<td width=100 valign=top>'.$name ."<BR>\n";

print '<INPUT type=SUBMIT value=Submit><INPUT type=RESET>';
print '<input type=BUTTON onclick="selectAll(this.form);" value=SelectAll>';

print '<INPUT type=HIDDEN name=ShowNews value=1>';
print "<BR>\r\n</td><td valign=top>";
foreach ($heads as $v)
{
print "<INPUT type=CHECKBOX name=".$v['Link']." value=".$v['Link'].">";
print "<a href=".$_SERVER['PHP_SELF']."?ShowNews=1&".$v['Link']."=".$v['Link']." target=_blank>".$v['Title']."</a><BR>\n";
}
print '</td></tr></table>';
print '</FORM>'."\r\n";
}

function PrintJavaScript()
{
?>

<script type='text/javascript'>
function get(eid)
{
var d = document;
var r = d.getElementById(eid);
return r;
}
function popImg(open, iref)
{
if (open)
{
var top = (iref.offsetParent.offsetParent.offsetTop + iref.offsetTop) + 'px';
var curleft = 0;
var obj = iref;
do {
curleft += obj.offsetLeft;
} while (obj = obj.offsetParent);
var left = (curleft + iref.offsetWidth )+ 'px';
var img = '<img src="' + iref.src + '" />';
var d = document;
if (null == get('popImg'))
{
var pop = d.createElement('DIV');
pop.id = 'popImg';
pop.style.position = 'absolute';
d.body.appendChild(pop);
}
var pop = get('popImg');
pop.innerHTML = img;
pop.style.top = top;
pop.style.left = left;
pop.style.display = 'block';
}
else
{
var pop = get('popImg');
pop.style.display = 'none';
}
}
function selectAll(formObj)
{
for (var i=0;i < formObj.length;i++)
{
fldObj = formObj.elements[i];
if (fldObj.type == 'checkbox')
{
fldObj.checked = true;
}
}
}

</script>

<?PHP

}

function GetToday($ming)
{
global $today;
global $base_url;
global $main_url;
if (!preg_match_all('|<base href="http://news.mingpao.com/(\d+)/">|U',$ming, $layer1, PREG_SET_ORDER))
{
return "";
}
return $layer1[0][1];

}

function PrepareGlobals()
{
global $HTTP_HEADER;
global $HTTP_FOOTER;

global $today;
global $base_url;
global $main_url;

InitCurl();

$HTTP_HEADER = '<html><head><meta http-equiv="Content-Type" content="text/html; charset=Big5-HKSCS"><title>My MingPaoNews</title>';
$HTTP_HEADER.= '<style type="text/css"> ';
$HTTP_HEADER.= 'a:link { color: #0000FF; text-decoration: none; } ';
$HTTP_HEADER.= 'a:active { color: #000088; text-decoration: underline; } ';
$HTTP_HEADER.= 'a:visited { color: #000088; text-decoration: none; } ';
$HTTP_HEADER.= 'a:hover { color: #0055FF; text-decoration: underline; } ';
$HTTP_HEADER.= 'table {width: 100% ; border-width:1px; border-collapse: collapse; border-color:#003333; border-style:dashed} ';
$HTTP_HEADER.= 'td {padding: 3px;} ';
$HTTP_HEADER.= '</style>';
$HTTP_HEADER.= '</head><body>'."\r\n";

$HTTP_FOOTER = "\r\n".'</body></html>'."\r\n" ;

$base_url = "http://news.mingpao.com/";
$main_url = $base_url.'index.htm';
$today = GetToday(CurlGetContent($main_url));
}

function DoShowNews()
{
global $HTTP_HEADER;
global $HTTP_FOOTER;

print $HTTP_HEADER;
PrintJavaScript();

foreach($_REQUEST as $k => $v)
{
if ($k == "ShowNews") continue;
$news = GetNewsContent($v);
if (is_array($news)) DisplayNewsPage($news);
}

print $HTTP_FOOTER;
}

function DoIndex()
{
global $HTTP_HEADER;
global $HTTP_FOOTER;
global $main_url;

print $HTTP_HEADER;

PrintJavaScript();

$ming = CurlGetContent($main_url);
$titles = GetTitle($ming);
if (!is_array($titles))
{
print "ERR Titles";
print $HTTP_FOOTER;
return;
}
foreach($titles as $v)
{
$head = GetHeadlines($v['Link']);
if (is_array($head)) DisplayHeader($head, $v['Title']);
}

print $HTTP_FOOTER;

}

// MAIN HERE



PrepareGlobals();

if (isset($_REQUEST['ShowNews']))
{
DoShowNews();
}
else
{
DoIndex();
}

?>

Thursday, April 23, 2009

課外/公餘活動與責任

從小就喜歡參加課外活動,尤其是參與不同的公職,中學時有班會、學生會、Prefect等,大學時搞O Camp,畢業後做Forum管理員。這類活動除了能直接學會很多不同的知識外,最重要是能夠和不同的同學、朋友合作,就算是同一個職份,和不同人合作也有不同的過程,不同的經歷。縱使並不是每次的合作和工作都很順利,但當問題出現時,都能使我們學到更多,在解決問題的過程中,使我們的團隊更緊密、更有效率。

哈哈!世事怎麼會這樣完美?以下就是每一次都出現在我身上的問題:

我參加了一個組織。
我熱心參與組織的活動。
搞手們欣賞我的熱心,把我當作是搞手的一部份。
一班朋友搞活動,一樂也!
滿足感使我更加投入,付出更多精神時間。

這時,往往會出現兩個問題:
問題一:過份的付出使正常生活失調,包括工作及健康等。
問題二:挑起了過多的責任,使得組織的責任分配失衡,影響了組織的穩定性。

兩個問題中,問題一是個人的事,解決不了終歸都是己身的責任,但問題二是團體的事,問題一出現,小則口角動武,大則影響組織的存亡。

問題出現的先決條件,是先要成為組織的要員,甚至重心。這並不是什麼困難的事。沒有利益關係的工作,要做多少有多少,自問辦事能力不低,有心有力總會得人認同。

當你不斷的付出,你心裡總會想,如果大家也付出同等的努力,各人在其位置盡心盡力,成績、效果都會很好,而且團隊的合作經歷能給大家很大的滿足感,隊友間的友誼亦能更上一層樓。

可惜的是,這樣的想法是太天真。每個人的時間空間畢竟有個極限,能付出多少是因人而異。當你越做越多,你希望大家亦能和你一樣,可惜大家的極限已到,在你付出的同時,平衡就會慢慢失去。久而久之,重心就會慢慢集中於一點,付出最多的人,責任就會越來越多。

結果,以下的情況都出現了:

「我今天沒空,你可以幫幫忙嗎?」…事後…「你做得比我更好,以後這個就交給你了!」此後,搞手降格成為會員。

「你做的比我好,我留下來也沒有意思」在失去滿足感下,有搞手慢慢失去了蹤影。

「今天你不在,我們成不了事!」偶爾將所有失敗的責任都放在你的頭上。

說實話,得到大家的認同,滿足感是很大的。但成為領袖並不是我的目的,我只想大家可以一起為相同的目標進發。

遇到以上的各種情況,要是換著是你,你會如何去處理呢?

為改善失衡的情況,減低自己的工作量。結果是,放下的工作沒人願意接受,最少並不自願。強硬別人做不喜歡的事,最終會把人趕走的,尤其是這種自願性興趣工作。而且,用硬的不是我的首選。
為改善失衡的情況,提高對大家的要求。結果是,大家為提高了的工作量感到不滿,更有人因壓力而退出。為自願性興趣工作出一分力,有一分已經很好,何必要求別人出兩分、三分呢?

我的最後選擇是:放棄改善失衡情況,搞高對自己的要求。問題一...又嚴重了...

Sunday, March 29, 2009

PostgreSQL, Slony-I, pgadmin, win32, First time config/installation

In one occasion, I have to use Slony-I for Database replication. As a starter, I look for my friend (Gooooogle) for help. She showed me some tutorial: some rather outdated with version or some having broken pic link. After quite a lot of struggle, I have managed to get it working, and would like to provide a better tutorial/notes for the installation.

Here is my software config:
PostgreSQL : 8.3.5-2
Slony-I : 1.2.15
OS: Windows XP, Windows Server 2003

Notice that my Postgres version of 8.3.5-2 is not available to be downloaded in official site, which only contain 8.3.5-1 (and higher version) in the mirror I have tried. (Minor version is not showing anywhere, AND 8.3.5-1 is NOT working for me somehow) I am not sure if the procedure will work on future versions but the general wisdom should able to apply to most situation.

My tutorial will be based on Dave's PostgreSQL Blog with extra information that help any first time comer to setup everything gracefully.

The target of the procedure is to setup a localhost replication between 2 database of your choice.

Initial Installation:

Install PostgreSQL server and run Application Stack Builder to add Replication : Slony-I support.
Start pgAdmin III, goto File -> Options, setup the Slony-I Path. On default case it should be C:\Program Files\PostgreSQL\8.3\share
Use pgAdmin, connect to the localhost server, and remember to save password.

Database Preparation:
If it is your first time playing around Slony-I, do not apply anything on the working database yet, which would probably screw things up.
With pgAdmin, backup with default setting a working database into an *.backup file.
Create 2 new database, which will be acting as Master and Slave, with names of your choice. I will use db_1 and db_2 which represent the master and slave respectively.
Restore db_1 with the backup file obtained above, using default option. (Full restore)
Restore db_2 with the backup file, using "Schema only" option.

Setting up Replication at pgAdmin

All the setting is located in Databases : db_1 / db_2 : Replication

Right click on db_1 : Replication and choose "New Slony-I Cluster"
Join existing cluster : UN-checked
Cluster name : sample
Local Node Left : 1
Local Node Right : Master Node
Admin Node Left : 99
Admin Node Right : Admin Node
Press OK.

Right click on db_2 : Replication and choose "New Slony-I Cluster"
Join existing cluster : Checked
Server : localhost
Database : db_1
Cluster name : sample
Local Node Left : 10
Local Node Right : Slave Node
Admin Node : 99 - Admin Node
Press OK.

Upcoming is path setup, which is not well documented in any reference.

Right click on db_1 : Replication : sample : Nodes : Master Node : Path and choose New path
Server : 10 - Slave Node
Connect Info : host=localhost port=5432 user=postgres dbname=db_2
Conn retry : 10
Press OK.

Right click on db_2 : Replication : sample : Nodes : Slave Node : Path and choose New path
Server : 1 - Master Node
Connect Info : host=localhost port=5432 user=postgres dbname=db_1
Conn retry : 10
Press OK.

Now we define Replication set.

Right click on db_1 : Replication : sample : Replication Sets and choose new Replication Set.
ID : 1
Comment : sample set

Now we choose the table to be replicated.
Right click on Replication Sets : sample set : Tables and choose New tables
Table : Any table of your choice
ID : 1
Index : Auto filled when choosing table.
Press OK.

Repeat the above for all the tables that you want to replicate with increasing IDs. If your Database contain hundreds of tables that need to be replicated, this manual action can takes hours. The following SQL can help this progress.

SELECT COL.table_name as table_name , CON.constraint_name as constraint_name FROM information_schema.columns as COL INNER JOIN information_schema.constraint_table_usage as CON ON CON.table_name = COL.table_name WHERE COL.table_schema='public' GROUP BY COL.table_name , CON.constraint_name ORDER BY COL.table_name;

and use the result to fill the following SQL.

SELECT _<ClusterName>.setaddtable(<MasterID>, <ID>, 'public.<table_name>', '<constraint_name>', '');

In our example,
<ClusterName> : sample
<MasterID> : 1
<ID> : increasing counter from 1
<table_name> : from above query
<constraint_name> : from above query

Notice that the query assume that there is one on one mapping between table_name and constraint_name. Otherwise, multiple matches will exist in result set. In my occasion, Hibernate is used and extra constraint is added to some of the tables, and I have added "AND CON.constraint_name not LIKE 'fk%'" in my WHERE clause to filter those constraints.

Now we create a new Subscription
Right click on db_1 : Replication : sample : Replication Sets : sample set : Subscription and choose New subscription.
Origin : 1
Provider : 1 - Master Node
Receiver : 10 - Slave Node
Can forward : UN-checked
Press OK.

Here ends the Database preparation.

Slon daemon setup

Slon daemon is needed to be running in all server involving the replication. The setting is quite tricky and differ from each other for different version of Windows being used.

To ease the installation, add slon.exe to the PATH environment variable.
Control Panel : System : Advanced : Environment Variables
Search for PATH in system environment and append ";C:\Program Files\PostgreSQL\8.3\bin" or other path if your installation is not on default.
Start command prompt and try "slon" to see if the path is correct. If successful, an help screen will be shown.

Register new service:
slon -regservice Slony-I

Create Engine Configure files:
Create folder C:\Slony
Create C:\Slony\db_1.conf with the following contents (2 lines)
cluster_name='sample'
conn_info='host=localhost port=5432 user=postgres dbname=db_1'

Create C:\Slony\db_2.conf with the following contents (2 lines)
cluster_name='sample'
conn_info='host=localhost port=5432 user=postgres dbname=db_2'

Add engine to the daemon:
Do the following in command prompt :
slon -addengine C:\Slony\db_1.conf
slon -addengine C:\Slony\db_2.conf

Double check the above with :
slon -listengines
You should see the path of the two conf files being listed.

Before starting the daemon, there are extra setting to be done for running the service successfully.
In folder C:\Documents and Settings\<Current User>\Application Data\postgresql,
there should be a postgres password file pgpass.conf. Slon need to access this file to connect to the database for replication. However as Slon is a daemon that do not run as current user, so the owner of Slon service should contain such a file.

Copy the file to C:\Documents and Settings\postgres\Application Data\postgresql
On default, this folder do not exist and have to be created manually (unless you login as postgres and saved password before). Make sure user "postgres" have read/write access to the password file.

Go to Control Panel : Administrative Tools: Services and look for "Slony-I"
Enter Slony-I's settings and access "Log On" tab.
Log on as "This account" and enter "postgres".

Now we are ready to start up the service. Either right click on Slony-I in services and choose Enable, or do "net start Slony-I" in command prompt.

Replication should takes place immediately if every setting above is correct.

Diagnose

In most cases if you are working on it for the first time, there would be some errors in the setting and the replication is not working. You can do one of the following to check up errors.

Using pgAdmin, access db_1 : Replication : sample : Nodes : Master Node, on the right panel, access Statistics tab. If replication is working fine, the numbers keep changing. Otherwise multiple "0"s will be shown. Statistics is also available for Slave nodes.

Using Event viewer (Control Panel : Administrative Tools : Event viewer, Slon's activities will be shown in Application session.

If Slon running as Service didn't show its log nicely to read, we can run Slon in command prompt in Debug mode, as follows:
slon -d 4 sample "host=localhost port=5432 user=postgres dbname=db_1"
slon -d 4 sample "host=localhost port=5432 user=postgres dbname=db_2"

Common Mistake

FATAL main: Cannot connect to local database - fe_sendauth: no password

This indicated that pgpass.conf is not found or not accessible.
Make sure pgpass.conf file exist for the user running the Slony-I service but not the current user. And make sure the user running the service should have full access the the password file. Also make sure that Service owner is correctly set up.

Conclusion

Setting up Slony-I for the first time really takes time and effort. I hope this guild can help whoever that want to start using Slony-I for replication on Windows.

Clearly, more in-depth knowledge in replication is needed for creating a real working Replication solution. One can find more information in the reference session below.

Good luck for your Slony-I setup.

References
http://www.slony.info/documentation/
http://www.pgadmin.org/docs/1.8/slony.html
http://pgsnake.blogspot.com/2007/09/setting-up-slony-i-with-pgadmin.html
http://www.nabble.com/No-replication-on-Windows-Vista-td19488860.html

Friday, February 20, 2009

How to build a Web Host on your computer that is accessible from the outside world?

I assumed that your followed my previous post for your webhost (http://ccstone.blogspot.com/2008/09/how-to-build-web-host-on-your-computer.html) and it is now the time make it accessible from outside world.

1) Change Default Port of your WebHost

1a) Open \mowes\apache2\conf\httpd.conf
1b) Goto line 127 (or somewhere around there
1c) Search for "Listen 80"
1d) Change 80 into any number between 2000 to 8000 (as you like), e.g. 2337
1e) Save and exit
1f) Restart mowes
1g) Give your homepage a test: originally you access a page like http://localhost/mingpao.php , now you have to access the same page with http://localhost:2337/mingpao.php

Notes: the reason behind changing default port is that most ISP blocked port 80, and it is not an easy (noob) task to redirect traffic

2) Setup Port forward in your router (if you have one)

There is no detailed instruction for this setting, as everyone use different brand and model of router. But in general you can follow procedure

2a) Open Command Prompt ("Run..." then enter "cmd")
2b) Type ipconfig and press enter. This is an example of result:

Windows IP Configuration

Ethernet adapter Local Area Connection:

Connection-specific DNS Suffix . :
IP Address. . . . . . . . . . . . : 192.168.1.222
Subnet Mask . . . . . . . . . . . : 255.255.255.0
Default Gateway . . . . . . . . . : 192.168.1.1

2c) Start browser and browse the page indicated by "Default Gateway". E.g. http://192.168.1.1/ . You are now in router setting.
2d) Enter password as needed. Those login information can be retrieved from your router manual.
2e) Look for Firewall setting / Port Forward / DMZ / etc.. Every brand and model differ from each other, so you have to navigate yourself to the destination.
2f) Setup a Port Forward to your computer. E.g. map port "2337" to "192.168.1.222" as IP Address in your ipconfig result.
2g) Save setting and retrieve your EXTERNAL IP address from your router setting. It can usually be found in "Connection Status" page or "Internet Setting" page. E.g. "202.169.200.234"
2h) Test if your setting is correct. Instead of using http://localhost:2337/mingpao.php, now try http://202.169.200.234:2337/mingpao.php

3) Setup a free domain name for your webserver

Some new router include this function internally. If so, follow the instruction in router manual. For those routers without such setting, or if you do not use a router at all, follow this instruction for easy domain setup.

3a) Visit http://www.no-ip.com/services/managed_dns/free_dynamic_dns.html
3b) Sign up the FREE DNS redirect service
3c) Register a Domain of your choice. E.g. helloworld.myvnc.com
3d) Download and install the No-IP DUC client.
3e) Add in the Hosts your newly registered domain
3f) Test your new host. Instead of http://202.169.200.234:2337/mingpao.php, try http://helloworld.myvnc.com:2337/mingpao.php
3g) (Optional) Setup the "Port 80 redirect" function at NO-IP DUC, so that you don't have to enter the port 2337 (I didn't do it personally)

-------------

After all those setup, you can now access your webhost from anywhere in the world, including your Laptop, iPhone, PDA, etc.

Enjoy