For the first go-round of the Wide Finder project, the idea was to mimic the behavior of the following Ruby program:
counts = {}
counts.default = 0
ARGF.each_line do |line|
if line =~ %r{GET /ongoing/When/\d\d\dx/(\d\d\d\d/\d\d/\d\d/[^ .]+) }
counts[$1] += 1
end
end
keys_by_count = counts.keys.sort_by{ |key| -counts[key] }
keys_by_count[0 .. 9].each do |key|
puts "#{counts[key]}: #{key}"
end
It has been pointed out that this could be expressed as a Unix one-liner with awk and sort. Suggestions are welcomed to increase the degree of difficulty and increase the element of computation.
WF2 Benchmark
Based on discussion on this page, I have proposed, and have heard no real objections, that we adopt a suggestion originally entitled "normal HTML report", as follows:
- Report the top 10 URIs, by hits, which are articles (much the same regex as before)
- Report the top 10 URIs of any kind by aggregate byte count downloaded
- Report the top 10 URIs of any kind that returned 404 Not Found
- Report the top 10 client addresses that fetched hit-counted articles
- Report the top 10 referrers for hit-counted articles
Here is a naive implementation in 78 lines of Ruby code, followed by discussion:
class Hash
def top(howmany)
count = 0
keys = []
min = 0
self.each do |key, val|
if val > min
keys << key
keys.sort! do |k1, k2|
diff = self[k2] - self[k1]
(diff != 0) ? diff : k1 <=> k2
end
keys.pop if keys.length > howmany
min = self[keys[-1]]
end
end
keys
end
end
@u_hits = {}
@u_bytes = {}
@s404s = {}
@clients = {}
@refs = {}
@count = 0
@u_hits.default = @u_bytes.default = @s404s.default =
@clients.default = @refs.default = 0
def record(client, u, bytes, ref)
@u_bytes[u] += bytes
@count += 1
if u =~ %r{^/ongoing/When/\d\d\dx/\d\d\d\d/\d\d/\d\d/[^ .]+$}
@u_hits[u] += 1
@clients[client] += 1
unless (ref == '"-"' || ref =~ %r{^\"http://www.tbray.org/ongoing/})
@refs[ref[1 .. -2]] += 1 # lose the quotes
end
end
end
def report(label, hash, shrink = false)
puts "Top #{label}:"
keys_by_count = hash.top(10)
fmt = (shrink) ? " %9.1fM: %s\n" : " %10d: %s\n"
keys_by_count.each do |key|
pkey = (key.length > 60) ? key[0 .. 59] + "..." : key
hash[key] = hash[key] / (1024.0 * 1024.0) if shrink
printf fmt, hash[key], pkey
end
puts
end
ARGF.each_line do |line|
f = line.split(/\s+/)
next unless f[5] == '"GET'
client, u, status, bytes, ref = f[0], f[6], f[8], f[9], f[10]
if status == '200'
record(client, u, bytes.to_i, ref)
elsif status == '304'
record(client, u, 0, ref)
elsif status == '404'
@s404s[u] += 1
end
end
print "#{@u_hits.size} resources, #{@s404s.size} 404s, #{@clients.size} clients\n\n"
report('URIs by hit', @u_hits)
report('URIs by bytes', @u_bytes, true)
report('404s', @s404s)
report('client addresses', @clients)
report('referrers', @refs)
Here's my thinking as to why this is appropriate. The Wide Finder challenge, as I see it, is to find a way to run crushingly ordinary, boring code on many-core machines with good performance and with minimal tax on the programmer. This sample strikes me as crushingly ordinary, but not so simple that you can do it with a one-liner; it should remain substantially but not exclusively I/O dominated. There is a wrinkle; I'm pretty sure some of the numbers will get out of 32-bit integer range.
(Saved) Proposed-test Discussion:
- compute the average articles and bytes downloaded per hour-of-day or day-of-week
- classify fetches and bandwidth devoted to (a) RSS/Atom syndication fees, (b) articles, (c) CSS/Javascript, (d) images
- fit curves modeling days since post versus number of hits for an article; generate curves for each category; report top categories
- generate an interpage navigation graph where nodes are pages and the edges are labelled with the number of times a link was followed from one page to another
- calculate session statistics, e.g. duration, pages per session, time between return visits, ...
- A normal HTML report: Top 10 popular URLs, top 10 404s, top 10 agents, top 10 client IPs (by
hits or bytes), and the top 10 referrers. - (your suggestion here)
Generally it would be good to have one problem that is easily partitioned across processes/machines vs one that has interdependencies between lines e.g. navigation graph above
Performance Measures:
- Elapsed time to complete test
- CPU time consumed during test
- Peak memory consumption
Complexity Measures:
- LOCS
- Number of required runtime parameters (e.g. specifying the number of threads/processes to spawn)
- Usage of non-idiomatic constructs (e.g. using a byte array instead of a built-in string class)
- Subjective assessment of complexity (poll?)
Robustness Measures:
- Maturity of underlying language and runtime
- Maturity of underlying libraries (e.g. production library, experimental library, library developed just for Wide Finder)
- Handling of non-ASCII character sets
- Whether the solution is maintainable code e.g. parsing and iteration code can be refactored into a common module for all benchmarks.
Comments (15)
May 03, 2008
znmeb says:
It's not at all clear to me from either of the two Wide Finder blog announcement...It's not at all clear to me from either of the two Wide Finder blog announcements or this wiki exactly what the goal is here. Is it
1. Compare language performance in a multi-thread/core environment?
2. Develop (massively) parallel algorithms for the log file analysis problem?
3. Create a benchmark that measures something that other benchmarks don't already measure?
4. Something else?
May 03, 2008
MenTaLguY says:
Obviously Tim has the best idea what he intends, but my impression was that it's...Obviously Tim has the best idea what he intends, but my impression was that it's something else: is it possible, with the current state of the art, to write massively parallel algorithms for a given sample problem in a way which is not substantially more complex than the sequential equivalent?
Performance enters into it (although I think performance may have become a distraction to some extent last time), but given reasonable performance I think the real test is the cognitive load on the programmer. (Of course then we get into issues of unfamiliarity versus actual complexity and so on...)
The choice of the sample problem itself is going to be interesting -- we don't really want something that is either embarassingly parallel or altogether unparallelizable, but instead something in the middle. (Nearly all of the ideas suggested so far lend themselves quite well to map/reduce style approaches).
May 03, 2008
znmeb says:
Maybe it would help if I explained my background and perspective a bit. From 198...Maybe it would help if I explained my background and perspective a bit. From 1980 through 1990, I worked for a company called Floating Point Systems. Our products were attached processors and later mini-supercomputers. During that decade, there was an explosion of interest in high-performance computing, vector and parallel architectures, languages, algorithms, SIMD, MIMD, RISC, SuperScalar, Dataflow, single-assignment languages, long instruction word and very long instruction word, software pipelining, etc.
In short, many thousands of person-decades of effort went into all of this. Much of it was published, and a few general principles emerged. Some of the people who were leaders back then are still in high-performance computing, but I suspect the majority of them, like myself, moved on to other things because the market disappeared. And it disappeared precisely because general-purpose computers, especially the Intel i386 and its descendants, delivered the best performance per dollar for everything but the largest scientific challenge problems.
"is it possible, with the current state of the art, to write massively parallel algorithms for a given sample problem in a way which is not substantially more complex than the sequential equivalent?"
Hell, yes! Been there, done that, got laid off!
"but given reasonable performance I think the real test is the cognitive load on the programmer."
Well, I think the real challenge was and still is to take the cognitive load off of the programmer and put it on the compiler, interpreter, virtual machine and operating system. But how do you benchmark that?
May 03, 2008
pbannister says:
Glad "znmeb" hit the relevant points. My experience is limited to reading the p...Glad "znmeb" hit the relevant points. My experience is limited to reading the published "high performance computing" papers in the 1980's. My impression at the time was - despite all the man-years spent - that "easy to use" parallelism was just not possible as a general thing. There were niche solutions, but no magic general purpose solutions.
Of course - 1990 would predate the use of massive arrays of x86 processors and the cheap/effective multi-CPU chips we are seeing now. The problem so popular then has reason to be popular again. Not that there is any reason to expect that the problem is in any way simpler this time around.
Well ... actually that last statement is partly wrong. Web server workloads parallelize well - mostly single-threaded scripts/code with the hard bits (synchronization) off-loaded to the database and the web server.
Tim Bray is offering a valid question. Outside the mostly-solved problem of web serving, are there solutions for other somewhat general purpose problems that make good use of the current and upcoming crop of multi-CPU chips? Picking a good example problem is going to be tricky.
As to "could be expressed as a Unix one-liner with awk and sort" - do not so quickly dismiss this sort of solution. The simple/ancient Bourne shell command line with piped together processes gets a benefit off multiple CPUs. The more steps and the more evenly processing is distributed, the greater the benefit. Fits the definition of "general purpose" fairly well.
The amusing bit - Perl might lose a bit of it's performance standing. The original purpose of Perl was to mash together the capabilities of grep/awk/sed/etc, and gain a bit of performance by doing all in a single process (a pretty brilliant and useful notion). But from this point forward - with two or four CPUs on common desktops - we might get better performance with Grep and Awk in the pipeline.
May 04, 2008
russellc says:
A test suite for correctness is required IMO. Maybe their was one (test suite) f...A test suite for correctness is required IMO. Maybe their was one (test suite) for v1 and I missed it as all I saw was an example input file that did not specify a corresponding "correct" output. For widefinder v1 I think it would have been enough to specify inputs and outputs with a few edge cases (i.e. no test frameworks necessary) don't know about v2 depends on the benchmark selected I guess.
I didn't see any mention above about correctness except for "Handling of non-ASCII character sets". handling of non-ASCII character sets would increase the verification load considerably?
May 04, 2008
ErikEngbrecht says:
re: Test Suite I agree that there should be something akin to a test suite alon...re: Test Suite
I agree that there should be something akin to a test suite along with a standard method of invoking a submission. AFAIK the outputs were never explicitly specified for WFv1...more like the Ruby version served as a reference implementation. Something like dumping stdout to a file and then diffing it with the correct results should be sufficient for checking correctness.
re: non-ASCII character sets
I stuck that in there for a couple reasons. One is that non-ASCII is important. The other is that languages that natively support Unicode, like Java, are subject to a significant performance penalty because they have to decode the file instead of just reading it into memory.
I don't think the Unicode tests have to be very extensive. Just enough to answer a yes/no question about whether it works. Some people don't care about Unicode. So do. I just think it should always be explicit what you give up in order to gain something - which in this case is giving up Unicode support in order to gain speed.
May 04, 2008
arankine says:
Seconded about unicode. I think UTF-8 processing would be a useful test, and Wik...Seconded about unicode. I think UTF-8 processing would be a useful test, and Wikipedia would be a good source of UTF-8 test data.
May 05, 2008
russellc says:
re: Unicode test Are there any assumptions about environment. would ruby 1.9 be...re: Unicode test
Are there any assumptions about environment. would ruby 1.9 be available for example?
What about the use of third party libraries/frameworks etc. e.g. message queue? I don't think anything like that was employed in v1 but I could be wrong. v2 might be different because the benchmark might be designed inherently stateful so it's not (trivially at least) map/reduce?
May 05, 2008
ErikEngbrecht says:
re: environmental assumptions I think that is forming. I think people will be ...re: environmental assumptions
I think that is forming. I think people will be able to use whatever they want, so long as it works on Solaris/Sparc and doesn't require root priviledges.
See the infrastructure page: http://wikis.sun.com/display/WideFinder/Infrastructure
Jun 14, 2008
msylvan says:
Seems to me that languages with support for both Unicode and more low-level byte...Seems to me that languages with support for both Unicode and more low-level byte arrays ought to be ahead in performance – in which case, it's surprising that the Java-based solutions (Java, Scala, Fan) are doing rather well (slower than ML and Awk, faster than Python and .. C++/Perl).
Would be interesting to see a Python/Cython solution. Ease of writing Python without the speed penalty. Of course, Python's threading support is weak at best..
Jun 10, 2008
_su.root says:
So, I think that requirements are in order. Must the application/script utilize...So, I think that requirements are in order. Must the application/script utilize regex's or can we just do character compares? Also along that line, can it be a compiled application or script?
Also, will having more than one person running their applications/scripts at once cause skewed results for each person. For example, the difference between one person running their test versus 2 or more at the same time.
Jun 14, 2008
msylvan says:
Seems like the best course of action is for the judges (Tim?) to set certain tim...Seems like the best course of action is for the judges (Tim?) to set certain times during which all contestants are blocked from using the machine, during which each program can be tested sequentially.
Jun 15, 2008
pbannister says:
Tim setup a Google Group to coordinate use of the test machine: http://groups....Tim setup a Google Group to coordinate use of the test machine:
http://groups.google.com/group/wide-finder/
It works.
Aug 08, 2008
connectivelogic says:
are the published benchmarks for processing the 100K test file or for something ...are the published benchmarks for processing the 100K test file or for something larger? There is some mention of 45GB of data but I can't believe the results are for processing that!!
Dec 11
finn.h says:
In addition to the Ruby benchmark proposed, I think two very useful pieces of da...In addition to the Ruby benchmark proposed, I think two very useful pieces of data would be:
1. time cat all45gb/* > /dev/null2. time awk '{ print $1 }' all45gb/* > /dev/nullThese would give lower-bound benchmarks for (1) reading the data off disk, and (2) reading-and-splitting-by-newlines.
Yes, these are sequential reads, but given that the data is on a big RAID array I doubt that parallel reads would give any benefit.