CentOS network install behind a proxy

Posted by waldner on 3 October 2010, 2:12 pm

Warning: this is a gross hack, which may or may not work for you. It did work for me, but that doesn't really mean much. It's also very inefficient and resource-intensive. However, it's quick and dirty, and if you have no alternatives, it may be worth a try.

As most people sadly discover (the writer among them), the CentOS net install does not support installation behind an HTTP proxy.

But still, let's see if it's possible to work around that limitation.

Plan

This started out as a crazy idea, and turned out to be actually working (at least with my proxies), so hopefully it will be useful to other people.
I had been playing with socat lately, and I figured it could help for this task.

The idea is: trick the CentOS installer into thinking it's talking directly with the mirror, but instead send its requests to the local office proxy. Of course they can't just be forwarded unchanged; we need to intercept and mangle them into a format that is palatable to the proxy.

The basic technique is very simple: take requests like this, coming from the installer

GET /something HTTP/1.0      # or 1.1
Host: somehost
...rest of headers here...

turn them into

GET http://some.centos.mirror/something HTTP/1.0     # or 1.1
Host: some.centos.mirror
...rest of headers here...

and forward this to the proxy. With a little luck, requests will always have that form, and the proxy will like the modified version.
Also, obviously, forward the responses coming from the proxy back to the installer.

All that is needed is socat, bash, perl or awk, and the name of a CentOS mirror (and a pair of crossed fingers).

Implementation

We're using a host called fake.example.com where socat is installed. To make it work, we need to point the CentOS installer at this host, either by name or IP address.

The basic socat command to run is this:

$ socat TCP-L:4444 EXEC:some_mangling_code

Socat spawns the mangling code and connects itself to the code's standard input and output. So what the code needs to do is to edit the data it receives on standard input, send it to the local proxy, read the replies and print them on standard output, where socat will read them and forward them back to the installer.

The mangling code to achieve the transformation described above is quite straightforward. The text editing part can be easily implemented in sed, awk or perl, and another instance of socat can be used to connect to the proxy (netcat could be used as well). Here is an example bash script to implement it:

#!/bin/bash
# mangle.sh: invoke as "mangle.sh <mirror name> <proxy_address> <proxy_port>"

mirror=$1
proxy_addr=$2
proxy_port=$3

{ socat - TCP:"$proxy_addr":"$proxy_port" && exec 1>&- ; } < <(
  gawk -v mirror="$mirror" '/^Host: /{$0="Host: " mirror "\r"}
  /^GET /{$2="http://" mirror "/" $2}
  {print; fflush("")}' )

If you have netcat installed, you may use that rather than socat, as it's probably a somewhat lighter process to spawn:

#!/bin/bash
# mangle.sh: invoke as "mangle.sh <mirror name> <proxy_address> <proxy_port>"

mirror=$1
proxy_addr=$2
proxy_port=$3

{ nc "$proxy_addr" "$proxy_port" && exec 1>&- ; } < <(
  gawk -v mirror="$mirror" '/^Host: /{$0="Host: " mirror "\r"}
  /^GET /{$2="http://" mirror "/" $2}
  {print; fflush("")}' )

In any case, the code MUST be written that way, and not some other simpler or more obvious way, due to some subletites related to the way pipelines and file descriptors are handled by the shell. This is interesting enough that it deserves an article on its own; I won't go into the details here. But the above code (with process substitution and explicitly closing descriptor 1) did work for me, while other variations did not. Also the buffer flushing code is vital to quickly send data to the proxy, otherwise awk would buffer its output seeing that its stdout is not connected to a terminal.

Lines should be terminated by CR+LF as dictated by the standard. What the awk code does is: if the line starts with "Host: ", replace the whole line with a brand new Host: header with the name of the chosen mirror; if the line starts with "GET ", prepend "http://<mirror name>/" to the path that is specified. All other lines are printed verbatim.

I'm using the CentOS mirror "centos.cict.fr" here, but it's only an example and any mirror can be used, as long as the appropriate CentOS directory for the mirror is used in the installer text box. Here is the list of mirrors from the official page: North America, Europe, other regions. Pick one that is close to you, and note the CentOS directory to use for it.

So in the end here's the complete command to start the socat redirector/mangler:

socat TCP-L:4444,reuseaddr,fork EXEC:"./mangle.sh centos.cict.fr localproxy.example.com 8080"

Since multiple TCP requests are likely to come from the installer, the fork option to socat spawns a child instance to manage each request, while keeping the main instance listening for future requests. Note that the above mechanism will spawn a ridiculous number of processes, so keep this in mind.

Test it

I was surprised that this worked at all. Here are some screenshots:

As said, the "Web site name" should point to the redirector, but the "CentOS directory" should match the real CentOS directory on the real mirror that was specified as argument to mangle.sh.

The installer seems to like it:

In fact, from here the installation can be completed without problems, at least in my tests.

And finally:

We were able to fool the installer.

Socat 2

Socat 2 is still in beta, but according to the documentation, its new address chains feature should make the task described in this article easier.

There is an example at this page demonstrating unidirectional EXEC addresses:

socat - "exec1:gzip % exec1:gunzip | tcp:remotehost:port"

This can be used for our task, but it needs some modification:

Of course we want to edit the stream, not compress it. In the "left to right" direction, we'll apply some mangling code (similar to the one we used with socat 1, but without the socat/netcat part); in the "right to left" direction (replies from the proxy), we can use the special NOP address to pass everything unchanged, as we're not mangling the replies;
We want a listening "server" rather than stdin/stdout;
Multiple distinct TCP streams are likely to come from the installer, so we need to fork a child to service each of them.

So here's a socat 2 version of the CentOS installer proxifier:

socat TCP-L:4444,reuseaddr,fork "EXEC1:./mangle.awk centos.cict.fr % NOP | TCP:localproxy.example.com:8080"

And here's a sample mangle.awk:

#!/usr/bin/gawk -f
# mangle.awk: invoke as "mangle.awk <mirror name>"

BEGIN{mirror = ARGV[1]; ARGC--}
/^Host: /{$0 = "Host: " mirror "\r"}
/^GET /{$2 = "http://" mirror "/" $2}
{print; fflush("")}

or if you prefer Perl:

#!/usr/bin/perl -p 
# mangle.pl: invoke as "mangle.pl <mirror name>"

BEGIN{use IO::Handle; autoflush STDOUT; $mirror = $ARGV[0]; shift}
s%^Host: .*%Host: ${mirror}\r%; s%^GET (.*)%GET http://${mirror}/$1%;

From a few tests, the socat 2 version seems to run fine most of the time, but there are some occasional hiccups where the installer reports an error and it should be told to retry the operation, after which it usually succeeds (not investigated).

Conclusions

Please note that a huge, HUGE number of processes (thousands) are spawned on the redirector, especially in the last stage of the install, where individual packages are downloaded. This is very resource-intensive for the machine where the redirector runs, so be considerate when choosing where to implement the redirector. Perhaps using netcat instead of another instance of the (hevier) socat in the shell script could be marginally better, but it doesn't change the number of processes that are spawned.

Also, it makes a number of assumptions about the requests and replies exchanged by client and server, and it will most likely break if something deviates from those assumptions (eg, HTTP redirects just to name one).

Nevertheless, despite being such a crude hack, it seems to work surprisingly well, at least in the environment where it needed to run. There even was no need to mangle the replies from the proxy. As usual, YMMV.

Anyway, let's hope that future releases will have native proxy support!

Filed under awk, linux, networking, shell, tips, worksforme Tagged awk, bash, CentOS, kludges, perl, proxy, socat

Comments are closed | Permalink

One Comment

komradebob says:

June 28, 2012 at 22:23

Very handy. See here if you have trouble compiling socat2:

http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=571724

\1