I wouldn't worry too much about big executables - anyone who frets over 1.3MB vs 500KB mustn't have any
real problems in their life I say

- but the underuse of multicore capabilities in the majority of apps is an unfortunate but inevitable consequence of the fact that current multithread APIs are crap.
I predict that whichever OS wins the next round of the war will be the one that gets a really good multithread API first (i.e. one usable by mere mortals without having to worry about arcane terminology, deadlocks, race conditions, etc).
Of course Intel or AMD could surprise us nicely by coming up with a CPU architecture that distributes workload among multiple cores automatically for you, but I wouldn't hold my breath for it.