Research » Web Spam Detection » Datasets » Help with .gz files

Help with .gz files

If you have problems with .gz files (Gzip-compressed), do not downlad them using a browser (such as Internet Explorer). Use a download helper as filezilla or wget. Then, uncompress the files using the gunzip program.

Example commands:

% wget http://barcelona.research.yahoo.net/webspam/datasets/uk2006/links/uk-2006-05.hostgraph.txt.gz
--2011-01-12 05:31:32--  http://barcelona.research.yahoo.net/webspam/datasets/uk...
Resolving barcelona.research.yahoo.net... 213.27.241.151
Connecting to barcelona.research.yahoo.net|213.27.241.151|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1989390 (1.9M) [text/plain]
Saving to: `uk-2006-05.hostgraph.txt.gz'

100%[======================================> 1,989,390   11.1M/s   in 0.2s    

2011-01-12 05:31:32 (11.1 MB/s) - `uk-2006-05.hostgraph.txt.gz' saved [1989390/1989390]


% gunzip uk-2006-05.hostgraph.txt.gz

% head uk-2006-05.hostgraph.txt
0 ->
1 ->
2 -> 3794:2506 4704:3
3 -> 4765:1
4 -> 24:1 52:1 530:6 4520:1 4765:5 5302:1 8303:1 9326:11 10037:1
5 -> 2079:1 4765:1
6 -> 362:1 2460:1 2794:1 2958:1 3805:14426 4300:1 4358:2 4520:4 4772:1 5948:1 ...
7 -> 10189:1 10462:1
8 -> 2798:410 9686:128 9687:128
9 -> 3808:23


For inquiries contact Carlos Castillo