Change your site domain or path keeping your Google PageRank and position

When a website moves from an URL to another, the main issue for the webmaster is try to keep a good position on the Google search results (or other search engines).
Time ago I applied the SEO technique described in this article and I obtained good results.

The goal, in this example, is to move the site located in http://www.youroldsite.com/section/ to the new location http://www.yournewsite.com/section/.

The trick to map everything to the new location (without specifying every single URL) and especially to tell correctly to Google to consider that the site has moved (as described in Google FAQ) is to use the HTTP response code 301 that causes a Permanent Redirect.

Obviously everything inside the section directory must keep exactly its previous relative path. I.e. http://www.youroldsite.com/section/page.html have to be placed in http://www.yournewsite.com/section/page.html.
It’s even possible, using regular expression, to map relative paths that have some differences, but a “logic” between them should exist.

A .htaccess file should be placed inside the document root or in the section subdirectory of the old website where the site is moving out
The content of .htaccess should look like this:

RedirectMatch 301 ^/section/(.*)$ http://www.yournewsite.com/section/$1

Obviously if the new site has section moved in othersection the .htaccess should look like this:

RedirectMatch 301 ^/section/(.*)$ http://www.yournewsite.com/othersection/$1

And if the relocation has happened within the same domain:

RedirectMatch 301 ^/section/(.*)$ /othersection/$1

A test made with wget (a GNU tool):

Connecting to www.youroldsite.com|127.0.0.1|:80... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: http://www.yournewsite.com/section/ [following]
--2009-03-24 20:04:59--  http://www.yournewsite.com/section/
Resolving www.yournewsite.com... 127.0.0.1
Connecting to www.yournewsite.com|127.0.0.1|:80... connected.
HTTP request sent, awaiting response... 200 OK

I applied the first configuration to one of my websites that was in the first position on Google for some query strings, because unluckily I had to move it on a new domain name.

The result has been very good. After 7 days I placed the Redirect 301, my old URL (that was the first position of Google) has been replaced by the new one. Still in the first position, and it’s still there!

Java EE Load Balancing with Tomcat and Apache

This tutorial explains how to configure an Apache HTTPD server to map a specific path on a series of load-balanced Apache Tomcat.
The first step is to define the Virtual Host in the Apache configuration files.
In this case the root directory (on file system) of the site is located in /path/to/your/site/, the name of the site is www.yoursite.com and the path where the Tomcat servers may be reached is /javaee.

In few words, an URL like http://www.yoursite.com/home.html is mapped on the file /path/to/your/site/home.html.

An URL like http://www.yoursite.com/javaee/hello.jsp is mapped to the hello.jsp file contained in javaee.war application deployed on all the Tomcat servers defined in the load balanced cluster.

The configuration of the Apache virtual host:

<VirtualHost *>
	ServerAdmin webmaster@localhost
	ServerName www.yoursite.com
	DocumentRoot /path/to/your/site/
	<Directory /path/to/your/site/>
		Options MultiViews
		AllowOverride All
		Order allow,deny
		allow from all
	</Directory>

	ErrorLog /var/log/yoursite-error.log

	LogLevel warn

	CustomLog /var/log/yoursite-access.log combined

    <Proxy balancer://tomcatservers>
	BalancerMember ajp://tomcatserver.yoursite.com:8009 route=tomcatA retry=60
        BalancerMember ajp://tomcatserver.yoursite.com:8010 route=tomcatB retry=60
	BalancerMember ajp://tomcatserver.yoursite.com:8011 route=tomcatC retry=60
    </Proxy>

    <Location /javaee>
	Allow From All
        ProxyPass balancer://tomcatservers/javaee stickysession=JSESSIONID nofailover=off
    </Location>

</VirtualHost>

The most important settings are Proxy and Location.
In Proxy it’s defined a load balancer made with 3 tomcat servers and an URL is assigned to the balancer, in this case balancer://tomcatservers.

The balancer has three members, everyone with its own URL based on the ajp protocol. In this case Apache will connect to the Tomcat servers on their AJP connectors (an alternative would be to use their HTTP connectors).

The Tomcat servers run on the tomcatserver.yoursite.com hostname and each of them opens its own AJP connector on a different port: the first on 8009 (the default one), the second on 8010, the third on 8011 (obviously if they run on the same hostname/IP they must bind to different ports).

Each Tomcat is identified by a route name: tomcatA, tomcatB and tomcatC. The importance of it will be explained later.

In the Location section, a specific path /javaee of the virtual host is mapped on the previously defined balancer balancer://tomcatservers/javaee. So when someone asks for http://www.yoursite.com/javaee/hello.jsp the virtual host will request that JSP to a randomly chosen Tomcat in the balancer members.

What’s the stickysession attribute? It’s a very useful configuration parameter used in conjunction with the route attributes, defined before.

As probably every Java EE (or Web) developer should know, while browsing on a server, it keeps trace of some data about the browsing session in a server-side HttpSession object. For example an ecommerce web application needs to store somewhere the information about the shopping cart of non registered users.

How the server can associate the remote session data with the specific navigation session? This is done through a cookie (or via a GET parameter in the URL) that gives to the server the session ID value.

In Java EE applications, the cookie name to identify the sessions is JSESSIONID.

This is closely related to the management of the load balancing between the Tomcat servers.

If Apache picked randomly one of the Tomcat to handle a single request and if the next request from the same user/browser was forwarded by the balancer to another Tomcat in that battery, things wouldn’t work correctly.

Each Tomcat doesn’t know anything of the existence of other Tomcat in that balancer configuration and especially a single Tomcat server cannot access the information of http sessions handled by another Tomcat.

In few words, when a Tomcat is chosen to handle the first request from a user/browser, it’s absolutely required that, to keep valid session data, the same Tomcat must be used to handle the following requests coming from that browser/user.

If not, on each request, the session data would be lost and simple tasks, such as building a shopping cart would result impossible.

So, it’s required to tell to Apache what is the session cookie name: JSESSIONID and which is the identifier of the routes to each single tomcat Server: tomcatA, tomcatB, tomcatC.

In this way, Apache will append to the end of the cookie value the information about the route to the specific Tomcat.

Java EE Tomcat Load Balancing

Java EE Tomcat Load Balancing

Finally, the last thing to set-up Apache, is obviously to add to it the modules required by the previous configuration:

  • proxy
  • proxy_balancer
  • proxy_ajp

About Tomcat configuration, there are just few changes to apply to the default configuration, in the Engine section it’s required to add the jvmRoute attribute.

<Engine name="Catalina" defaultHost="localhost" jvmRoute="tomcatA">

This is related to the route parameter defined in the Apache load balancer. So, tomcatA should be set in the configuration of the first Tomcat server, tomcatB in the second and tomcatC in the third one.

Then, the port where AJP connector listens, has to be set accordingly to the apache configuration, so 8009, 8010, 8011 respectively on the first, second and third Tomcat.

The following is the the configuration of the AJP connector:

<Connector port="8009" protocol="AJP/1.3" redirectPort="8443" />

It’s not directly related to the setup of the load-balancer, but since they run on the same host, each Tomcat should have its own set of ports.

To prevent any conflict you should change the following settings on the second and third servers: Server port="8005" and Connector port="8080".

I hope this tutorial has given a complete overview about every step required to setup Apache and Tomcat to create a simple load-balanced cluster.