Importing an Excel spreadsheet in MySQL to quickly manipulate its data

I’ve been asked to compute some data and statistics from an Excel spreadsheet containing an huge phonebook

The operations I need wasn’t very complicated, like to find and remove duplicate rows and so on, but I didn’t find any quick way to achieve those simple tasks using just Excel (or OpenOffice and LibreOffice).

Since I’m good in SQL, I decided to move this data in a MySQL database, then I wondered what’s the simplest way to obtain this result.

Excel Data Structure

My excel data is on three simple columns:

  • First Name
  • Last Name
  • Phone Number

I need to export my 20,000 rows in a .csv file.

Export XLS Data in a CSV

Using Microsoft Excel

From Excel, go in “Save As” then pick the option “Other Formats”, and from the combo box, choose Comma Delimited  CSV.

Microsoft Excel  by default creates values separated by a Comma, and the single column is not enclosed by any special char.

Using OpenOffice or LibreOffice

In OpenOffice, choose Save As and then CSV, using the default options the .csv file will have values separated by semicolon and enclosed by double quotes.

Create the MySQL table to import the CSV

It’s time to create the basic data structure in MySQL that will be able to host the data we exported from Excel. So the simple task is to generate a table with the same number of columns (and type) that will be associated to the Excel columns.

create table phonebook (first_name varchar(100), last_name varchar(100), phone_number varchar(100))

And now, the last step, importing the CSV in MySQL

Import the CSV (generated from an XLS) into MySQL table

Mysql offers a useful command for the operation of importing the CSV in a table, the command is LOAD DATA LOCAL INFILE.

And now the code in the case you exported the CSV from OpenOffice and the rows have the following structure:

“Mario”,”Rossi”,”+390123456789″

The code to load the data is:

load data local infile ‘phonebook.csv’ into table phonebook fields terminated by ‘,’ enclosed by ‘”‘ lines terminated by ‘\n’ (first_name, last_name, phone_number);

If you exported using Microsoft Office the rows have the following structure:

Mario;Rossi;+390123456789

The code to load the data is:

load data local infile 'phonebook.csv' into table phonebook fields terminated by ';' enclosed by '' lines terminated by '\n' (first_name, last_name, phone_number);

SSH Tunnelling to Remote Servers, and with Local Address Binding

It’s often required to open different kind of connections to a server where there is available just a SSH account (or where only the port 22 is open).
Using ssh tunneling it’s easy to to access any port on the server, or even to connect to any other servers reachable from the server where the SSH account is available.

To access directly (i.e. with MySQL Query Browser) a MySQL service on the remote server, where the access to the port 3306 is denied, the trick is to open a SSH tunnel to the remote server, mapping an arbitrary local port the the remote port 3306. In the following example the local port 5306 is used:

ssh -L 5306:remoteserver.com:3306 remoteuser@remoteserver.com

In this case, the local port 5306 is forwarded (with ssh tunnelling) to remoteserver.com, that attaches the tunnel on its port 3306.
When the tunnel is open, it’s only required to setup MySQL Query Browser to connect on localhost:5306 and the connection will be magically forwarded to the remote server on its port 3306.

Simple ssh tunnelling of a MySQL Connection

Simple ssh tunnelling of a MySQL Connection

It’s even possible to set the remote side of the tunnel to be mapped not on the remote server itself, but on a different host.
For example, if the local computer is not allowed to access IRC servers, an idea could be to use a remote server where a SSH account is available to tunnel the IRC connections.

Here is an example:

ssh -L 8666:ircserver.org:6666 remoteuser@remoteserver.com

In this case the local port 8666 is mapped on the port 6666 of the IRC server ircserver.org, so the local IRC client (i.e. mIRC) should be simply setup to connect on localhost on the port 8666.

SSH Tunnelling to a Different Remote Host

SSH Tunnelling to a Different Remote Host

Finally, other people in the local network might desire to use the tunnel to the remote server (in this example it’s a IRC server). If the client that opened the SSH tunnel has the IP address 192.168.1.1, the other clients on the local network should connect to 192.168.1.1:8666 to reach the remote ircserver.org on the port 6666.

In this last case, it’s important to make sure that the tunnel binds to the correct local IP address.
If the local client has 2 addresses: 127.0.0.1 and 192.168.1.1, it’s useful to open the tunnel binding it on 192.168.1.1. In this way other clients on the LAN can use the tunnel. This is the syntax:

ssh -L 192.168.1.1:8666:ircserver.org:6666 remoteuser@remoteserver.com
SSH Tunnelling with Local Address Binding

SSH Tunnelling with Local Address Binding

Migrate MySQL database from latin1 to utf8

Unluckily it’s very common not to change the default charset of your MySQL server and, since the default is latin1, when someone wishes to store cyrillic or chinese character there are many problems.

The first step is to fix the MySQL installation in order to store internationalized information., so locate your my.cnf configuration file on Linux, or the my.ini on Windows boxes.

Search in the configuration file the [mysqld] section when there is the configuration of the MySQL server.

Insert the following lines and eventually remove any existing configuration option with the same name.

[mysqld]
character-set-server=utf8
default-collation=utf8_unicode_ci

The option character-set-server=utf8 tells to the server that, if not otherwise specified, the character set of the created databases, tables, column will be utf8.

utf8 columns will be able to store cyrillic or simplified chinese character, just to give you two examples.

The collation defines how alphabetical ordering will happen, in few words which is the order of the letters that we expect on ORDER BY columnName clauses.

The suffix _ci means that ordering and comparison will be case insensitive and this is the common behavior used in databases.

Be very careful, because usually programming languages (i.e. Java) have case sensitive .equals(String string) method on String class, so it’s quite common to have some mistakes caused by this incongruency.

Then look for the [client] section of your configuration file, and write this line below it.

[client]
default-character-set=utf8

This is very important because it defines the character set used by the MySQL command-line client, and that’s what will be used to migrate the data from latin1 to utf8.

Now everything is setup, restart MySQL to make sure it’s using the updated configuration, and shut-down any application that is using the database that’s going to be migrated.

First, mysqldump will create a .sql file containing all the data:

mysqldump --skip-set-charset --no-create-db –no-create-info -h hostname --protocol=TCP -P 3306 -u username -p old_database > dump.sql

The option --skip-set-charset prevents that in the dump file will be any reference to the old (and wrong) character sets. The options --no-create-db and --no-create-info are used because the new database name will be defined later.

Now the new database is going to be created: mysql -u username -p and the following SQL should be executed in the terminal:

create schema new_database;
quit

Finally the last step is to populate the brand new database with the dumped data:

mysql -u username -p new_database < dump.sql

In this way all the previous data from old_database is now stored in utf8 format in new_database.

I hope this tutorial can be useful, please ask any question or give your feedback.
Thank You.