Whether that's the problem or not, I would definitely appreciate some help in making my connections more "polite." I'm sure that the Python Requests module supports all the basic protocols and header options, but honestly I haven't a clue what I'm doing with this stuff. Like I have it doing 40 requests at once because that was the only way I could think to deal with the long delays (often 30 seconds or more) when I request individual log files.
Could you point me towards a useful reference for understanding how to do polite and efficient http requests in high(-ish) volume? It doesn't have to be Python-specific.
Keepalive and pipelining are designed to cut down on the number of TCP round-trips between client and server.
Keepalive is supported by all HTTP 1.1 servers (including goko's), and works by requesting multiple files without having to repeat the TCP handshake for each (because it reuses the same connection).
Pipelining is built on top of this, and works by pre-sending a bunch of HTTP requests over the same connection before receiving any response, letting the server queue them all up and process them rapid-fire rather than waiting for your client to make the next request.
These should help if the time it takes to download a log is based mainly on round-trip time, but if transferring a single file takes a long time this won't help much. Some poking around with curl suggests to me that with one TCP connection per file, it takes 0.2s to download a log; with keepalive over 100 log files I can get 0.1s. curl doesn't support pipelining, but if it did I think we'd make it down to 0.05s by halving the number of round trips again. That might be fast enough that being singlethreaded is fine, I would think, but you could do it on a small handful of threads if necessary.
Ok. I haven't used traceroute before though. Does this output mean anything to you?
li566-22> traceroute logs.prod.dominion.makingfun.com
traceroute to logs.prod.dominion.makingfun.com (54.213.198.64), 30 hops max, 60 byte packets
1 23.92.24.3 (23.92.24.3) 0.377 ms 0.522 ms 0.638 ms
2 10ge8-3.core3.fmt2.he.net (64.71.132.137) 0.153 ms 0.143 ms 0.146 ms
3 10ge10-1.core1.sjc2.he.net (184.105.222.14) 12.244 ms 12.202 ms 12.149 ms
4 216.218.193.42 (216.218.193.42) 0.678 ms 0.691 ms 0.656 ms
5 205.251.229.155 (205.251.229.155) 0.678 ms 205.251.229.157 (205.251.229.157) 0.682 ms 205.251.229.155 (205.251.229.155) 0.703 ms
6 205.251.232.68 (205.251.232.68) 30.317 ms 205.251.232.112 (205.251.232.112) 22.153 ms 205.251.232.68 (205.251.232.68) 29.517 ms
7 205.251.232.147 (205.251.232.147) 22.359 ms 205.251.232.153 (205.251.232.153) 22.272 ms 205.251.232.141 (205.251.232.141) 22.262 ms
8 205.251.232.165 (205.251.232.165) 22.997 ms 22.948 ms 205.251.232.63 (205.251.232.63) 22.938 ms
9 * * *
10 * * *
11 * * *
12 * * *
13 * * *
14 * * *
15 * * *
16 * * *
17 * * *
18 * * *
19 * * *
20 * * *
21 * * *
22 * * *
23 * * *
24 * * *
25 * * *
26 * * *
27 * * *
28 * * *
29 * * *
30 * * *
Your ping is getting as far as 205.251.232.165, which is an Amazon server in Seattle (according to
http://www.tcpiputils.com/browse/ip-address/205.251.232.165), and then being dropped. Since apparently the server you're requesting is an ec2 server and you made it to amazon before being ignored, I'd say it's the goko server itself rejecting you.