Why does a Music Server require high processing CPU power?
I noticed that some music servers use, for example, a dual multicore CPU’s running under a custom assembled operating system. In addition, the server is powered by a linear power supply with choke regulation and a large capacitor bank utilizing the highest audiophile grade capacitors. Various other music servers have similar high CPU processing capabilities.
I know that music is played in real-time so there is not much time to do any large amounts of processing. I also know that the data stream needs to free of jitter and all other forms of extra noise and distortion. I believe that inputs and outputs are happening at the same time (I think).
I also know that Music Servers needs to support File Formats of FLAC, ALAC, WAV, AIFF, MP3, AAC, OGG, WMA, WMA-L, DSF, DFF, Native Sampling Rates of 44.1kHz, 48kHz, 88.2kHz, 96kHz, 176.4kHz, 192kHz, 352.8kHz, 384kHz, 705.6kHz, and 768kHz and DSD formats of DSD64, DSD128, DSD256 and DSD512 including Bit Depths of 16 and 24.
Why does a music server require high processing power? Does the list above of supported formats etc. require high processing power? Assuming the Music Server is not a DAC, or a pre-amp, what is going on that requires this much processing power?
What processing is going on in a music server? How much processing power does a music server require?
FLAC, ALAC, WAV, AIFF, MP3, AAC, OGG, WMA, WMA-L, DSF, DFF, Native
Sampling Rates of 44.1kHz, 48kHz, 88.2kHz, 96kHz, 176.4kHz, 192kHz,
352.8kHz, 384kHz, 705.6kHz, and 768kHz and DSD formats of DSD64, DSD128,
DSD256 and DSD512 including Bit Depths of 16 and 24.
It takes a tremendous amount of processing power to compute all the different ways digital has tried to keep up with analog.
@cleeds, My Aurender N10 uses a FPGA-based All Digital Phase-Locked Loop System This is "An All Digital Phase-Locked Loop system (ADPLL) incorporating Field-Programmable Gate Arrays (FPGA) with OCXO clocks precisely times digital audio data transmissions and minimizes jitter to below negligible levels".
I've never had a problem with using lower power computers as music servers. The only reason I know that you might want a more powerful computer is if you are doing room correction or equalization, or if you are actually editing digital music files -- say doing a mix-down of multiple tracks with equalization or other digital effects. Not too many people do that for playback on their home system.
One other reason might be is if you are asking the computer to do multiple tasks during playback, with those other tasks having nothing to do with music.
Curious where you got the idea that an unusually powerful machine was needed.
In thinking about it, the choice of the underlying PC for a branded music server probably has more to do with marketing than anything else. If you have a maker of audiophile equipment that is going to get a couple of grand for the product, you might as well put in a more powerful PC. That'll impress some buyers whether the power is needed or not.
The question about lots of CPU processing is not necessarily the same as having multiple cores.
Scheduling work to be done on a CPU is complicated, and desktop OS’s are not designed to be real-time(RTOS) or to guarantee latency between the time something is requested and it’s processed.
Having multiple cores facilitates this if you can guarantee 1 core for streaming, and leave the other tasks, like waiting for a user event or indexing music.
In terms of things that seem to actually consume processing power, upsampling and equalization are the two things which can consume processing power, and this varies on the type of upsampling, and the complexity of the eq.
Simple parametric EQ’s are usually benign, while room correction and convolution can really soak up the processing time. As well, upsampling to DSD seems to be (based on Roon) a big CPU consumer.
I use an AMD A10 processor, with 4/5 PEQ’s and filters, upsampling PCM by 2x and the CPU load is really light. It’s overkill, for an 8 year old CPU. Uses about 5% of 1 thread, but if I do upsampling to DSD it will use up nearly 75% or more of it.
Of course, MQA decoding will also add to this a little.
So, it depends, but CPU power is cheap these days, and easier to design a system with guaranteed latency if you have multiple cores than if you don’t.
Lastly, it's important to note that the CPU must do many things at once. The reading and decompression of data from a file or external streaming source, as well as providing that data to the metronome of the output signal. More cores helps facilitate this too, but the total processing power, that is, the amount of computation that must occur on the chip may not actually be all that much.
Thanks for your thoughtful response Eric. You are correct to point out that numerous cores do not necessarily correlate to total processing power. Nearly all the CPU-intensive activities you cite (except for equalization) are the responsibility of the DAC. I’d also like to point out that the latency issue you raised does not affect sound quality but the time it takes after pressing Play until the music starts. So the question still stands, why so much processor for a streamer?
I think @misstl may have identified the real reason: “That'll impress some buyers whether the power is needed or not.“ it’s not needed except to justify a high price, and to differentiate your product from the competition.
@hgeifman Anecdotally, I can tell you the best sound quality I have yet experienced from Roon, was when I built a Roon ROCK off the spec for the Nucleus Plus (8th generation Intel i7 core, 1TB SSD, 16 GB RAM, 256GB internal SSD that hosts the OS; I came from an Innuos Zenith MKIII after extensive experimentation...Roon Core running on a stripped down Linux kernel, well, rocks! Incredible fidelity. Roon boots in less than 3 seconds and when browsing and using the UI, the perceived response is blazing fast. Love it!!
If you're using roon and have a lot of CDs ripped to a hard drive and use Roons upsampling and digital filters they reccomend at least an i5 or i7 or compatible. The notion of dual Xeons and 48gigs of RAM escapes me unless you're doing a lot more than hosting a home based music server. Most dedicated streamers like a Node2i do fine with ARM processors.
@djones51 I read a lot in the Roon forums and what you are saying is consistent. I do no DSP, no up-sampling. I’ve experienced better sound quality from Qobuz and Tidal through Roon running on the Rock now too, not just DSD-64 albums on the internal SSD. Overall my library is on the small side because I can’t hear any difference between pulling from the cloud or pulling off the internal SSD of same FLAC;
I assembled my own NUC Rock with an i5 and stream Quboz and my ripped CDs through it. I agree it's the best sound quality I've had as well. I've messed with some of the DSP settings and upsampling but I always end up back at original settings.
“It takes a tremendous amount of processing power to compute all the different ways digital has tried to keep up with analog“
It’s certainly lot less of a hassle compare to putting together a system with vinyl as the only source of music. Check out this system, this guy went through awful lot of trouble to spin his vinyl without pops, dust and clicks.....https://systems.audiogon.com/systems/8367
Nearly all the CPU-intensive activities you cite (except for equalization) are the responsibility of the DAC.
Um, what?? Depends. In my world, a DAC takes S/PDIF in, or USB in, and produces an analog signal out.
While upsampling may be done by the DAC, file filtering and streaming does not. Many manufacturers may chose to implement their own upsampling algorithms before the DAC.
I’d also like to point out that the latency issue you raised does not affect sound quality but the time it takes after pressing Play until the music starts.
You misunderstand how I meant latency. I did not mean latency to user input, I meant the time between a CPU starting to process a sample and the time it is done with it, also known as CPU time. In order to do any processing a CPU takes time. We can estimate the worst case boundaries. For a 44.1kHz signal, the CPU must complete all of it’s work in 1/44,100 of a second, and have that sample of data ready for the DAC. If the signal is a higher sample, or is being upsampled, the CPU must complete it’s work in 1/88200 of a second for an 88.2kHz sample/upsample.
That’s’ about 11 microseconds. That’s the absolute maximum amount of time the CPU is allowed to handle this, or else the output will not keep up with the input. It must do at least the format conversion, EQ and upsampling. This is in addition to any network housekeeping, and UI interactions, and in addition to responding to the DAC clock saying "Gimme the next one!"
So, if a CPU core takes EXACTLY 11 microseconds to process this sample, it has ZERO time to process anything else. It has no time for network / house keeping, library management or UI responding. Any additional work would get queued and the audio output would stall. If the CPU takes 11/2 usecs (microseconds), then the CPU core is only 50% utilized, and has processing power to schedule other work.
It’s important to stress that you must have input/output flow balance. That is, the rate of processing must be equal to or faster than the rate of output. You can’t make up for this with longer delay before a track starts, or larger buffer sizes, unless they are infinitely large. Imagine a CPU that takes 22 uSeconds to process an 88 kHz sample, and that the track is 5 minutes long. That means that the CPU won’t finish with the last sample for 10 minutes. You’d need a five minute buffer. You press play, wait five minutes until the buffer fills, and then the music starts. 10 minutes after you hit play, the 5 minute track finally finishes playing.
Lastly, we should of course note that we are talking about a stereo sample. That is, at 88.2 kHz we must complete the decompression, EQ and upsampling of 2 samples every 11 microseconds.
So, if you ask me, how much CPU power do you need?
Well, you need enough CPU power to completely process every sample in real time, plus handle all the additional work.
We can calculate the maximum allowed compute time this way:
seconds = 1 / sample rate
So, you must use a CPU capable of meeting this, AND still have enough power to handle the other events that are happening in semi-real time. This calculation provides the minimum boundary.
Of course, for many reasons, having the CPU take 100% is not a good thing in an interactive or RTOS, so a target of 50% CPU utilization in an embedded CPU is reasonable. The amount of CPU power, if we think of this in terms of MIPS or SPECints or FLOPS is therefore dependent on the work that must be done in real time. You do less work, you get a cheaper, slower CPU.
I don’t really think it’s "a lot" either. I mean, we live at a time when 8 GByte/4 core 64-bit ARM based Raspberry PI sell for $120 and are fully capable of handling a desktop PC. It is important however to note that unless you know the architecture, clock speed, clocks and threads you don’t really know much about "how much" compute power you have. You can’t say "well I have 4 cores, so I can run grand theft auto." The’re all small, rectangular, and have a lot of pins on them. That doesn’t make them interchangeable in terms of the work they can do per unit time.
For instance, your average router has a pretty beefy, multicore Broadcomm system on a chip (including CPU) in it too, and all it’s doing is moving packets around. Is it "too much"? I doubt it.
@lalitk, Sorry, I do not understand and need more clarification.
Let us assume a normal ‘single' purpose MUSIC streamer (no ROON, DAC or Preamp) that needs to support an input stream of albums with File Formats of FLAC, ALAC, WAV, AIFF, MP3, AAC, OGG, WMA, WMA-L, DSF, DFF, Native Sampling Rates of 44.1kHz, 48kHz, 88.2kHz, 96kHz, 176.4kHz, 192kHz, 352.8kHz, 384kHz, 705.6kHz, and 768kHz and DSD formats of DSD64, DSD128, DSD256 and DSD512 including Bit Depths of 16 and 24. I agree this is a lot of information.
What processing is the streamer doing on these files, for example, an AIFF coded album stream?
FIRST, I know the streamer needs do to prepare the input for outputting USB, AES/EBU, etc. formats (I think). Does this process require high amounts of processing power (probably yes)?
SECOND, AND, in addition, what does the streamer do to the steaming input to prepare it to be accepted by the DAC? Let us assume an AIFF Coded album. What exactly is involved in preparing the AIFF Album file (or any of the other formats) for inputting to the DAC and does this require large amounts of processing power (probably yes)?
Your post above (and someone else's) said "It takes a tremendous amount of processing power to compute all the different ways digital has tried to keep up with analog“. I think you are saying is that in order to support the large amount of input formats, high amounts of processing power is required to prepare the AIFF coded album for output to the DAC. Is this correct?
Assuming the above is true, it means the servers processing power is used to prepare the input data stream for correct formatting for the USB, AES/EBU, formats, etc. AND then format it, as needed, for input to the DAC. Based on my understanding, I think this sentence answers my original question. Am I correct?
Or, are there other processes going on in the streamer that requires high processing power? If yes, what are they? Thank you very much.
To get back to the OPs question there is nothing going on in a dedicated home based music server to necessitate the need for dual 20 core Xeon processors and 48 GB of ram unless you're starting your own streaming service. A NAS can run a music server perhaps not Roon but Plex or something similar.
Roon may over-estimate it’s CPU requirements, mostly because they want to specify a CPU that is capable of doing all the upsampling you might ask it to do.
I’m running an 8 y.o. AMD A10 with no issues, but if I attempt high level DSD sampling I won’t be able to. It also sounds bad IMHO, so I lose nothing.
Also, Roon does all the EQ and upsampling in the server, so for every DAC end point you need to account for that. But overall, I think it’s pretty light. I’ve also seen Logitech Media Server run on routers. That’s extremely lightweight.
Let us assume an AIFF Coded album. What exactly is involved in preparing the AIFF Album file (or any of the other formats) for inputting to the DAC and does this require large amounts of processing power (probably yes)
Think of audio compression like a zip file, it's already been compressed the codec uncompresses then software and other firmware in the device send it on to the DAC. Unless some sort of manipulation of the file is being done like upsampling or EQ it's not a very CPU intense operation.
Here, take a look at the GNU gzip code: https://savannah.gnu.org/projects/gzip/ That may help you understand. Every instruction takes CPU time. The faster the CPU, the less time that instruction takes.
I believe Compression was an answer in the 1980's when CPU power and storage space was at a premium. I think the bottle neck now is bandwidth. I could just as easily store and stream uncompressed wav on a home network but compression is still advantageous over the internet. CPU'S have been fast enough to deal with audio compression for a long time now.
Or, are there other processes going on in the streamer that requires high processing power? If yes, what are they? Thank you very much
There’s nothing going on that requires what we would consider in today’s times as high processing power. I use a raspberry pi4 as a roon bridge from an NUC Roon server none of this uses high processing power. Nothing in a basic home server client relationship requires the kind of processing you’ve been talking about, Dual Xeons and megagigs of memory.
With respect to the OP’s citation of one such music server which employs dual Xenon 10 core processors (the Taiko Audio SGM Extreme), turns out that device is also running Roon and the Jplay software suite. That likely explains why they require the additional CPU power.
The question was about servers not streamers (clients to the servers).
This discussion has gone all over the place. Bottom line is servers do a lot things so clients can be “thinner” and less expensive.
The whole internet works this way. Music servers are just one kind of server. If you check out the servers at Spotify or Tidal serving many clients concurrently guess what? They use a lot of computer resources to provide their services which includes all those nice features to make the core streaming function more powerful and user friendly even on a relatively inexpensive computer tablet or smartphone.
Maybe I was mistaken but I though we were talking about a basic home music server not enterprise solutions. You can use a few years old computer, NAS a raspberry pi4 with a hard drive attached to set up a music server for home use. 40 cores of Xeon processors and 48 gigs of ram is a good start on an enterprise SQL database for a mid sized company.
You ask a very valid question. All digital music servers are computers, do they need to have super high computing power for music?
I currently use a fanless PC with a fanless linear power supply.
I’m running a low powered, commercial grade 4 core Intel motherboard with the CPU integrated (8 watts. from memory 2Ghz CPU) in a fanless Streacom case. The case has copper heat pipes and copper heat sink. the case also has fins on the side of the aluminum body, for higher surface area to radiate heat out.
My HDPlex 200w linear power supply has Linear Technology LT3045 voltage regulators on all four rails of power. This is a highly effective way to get ample clean regulated DC power to power 3 runs to my computer server. 19VDC to the mainboard/CPU, and 5VDC to my JCat FEMTO USB card, and another 5VDC just powering the two SSD drives (again no moving parts). I use a Samsung 850 Pro, 256 GB for my operating system and all programs. I have a Samsung 860QVO 1TB with only media files, uncompressed music (mostly .wav and very few FLAC), my CD’s get stored all in folders with images.
With the hardware sorted out, the next thing I needed was the program/s to make the hardware work at it’s optimum for the tasks it was built for.
Windows Server 2012 R2, this can be run in core mode, and can be switched between the GUI or core. The GUI or graphical user interface uses computing power. My goal is to pass bits of data to my external DAC without losing any data, while doing so with the least amount of electrical noise. Jitter is the enemy!
I currently use 4 different programs in conjunction with WinServer 2012 R2 to achieve this.
Fidelizer Pro - this program assigns the CPU cores to work on specific programs, for example the operating system gets it’s own core in the CPU.
Audio Optimizer 3.0 - this program is used to put the operating system in core mode and shuts off over 300 running background processes in Windows including the GUI.
JPlay FEMTO server - this program is a complete network music software package that converts a PC into a Music Server and Renderer.
MinorityClean.exe - A new piece of software for Windows has been released by Hiroyuki Yokota of Bughead fame. It’s purpose is to "Match the electronic bit standard of the CPU register with the internal circuit standard of the memory chip.
These software all add up in cost, but together transform my acceptable sounding system into one of detail, clarity and the vanishing of speakers and air that are often mentioned become apparent in my system.
So no, not only do you not need high powered, the fact is the lower voltage, the lower services, the closer one can get to adequately serve the data stream with as little noise and with correct timing the better.
I mentioned FEMTO earlier, it refers to the clock being accurate up to 15 decimal places. Timing in music reproduction is everything to sound quality, just as a low noise floor is.
The exact same principles for ANY music streamer are that they have as linear a power supply as possible, as little electronic noise as possible (jitter) with the most accurately timed data stream as possible.
Music servers do not pass ones and zeros through wire, they switch on and off either power or light or whatever means the signal is transferred it is done by switching between on and off.
The less electronic noise the more effectual the measuring, reading, transferring of the encoded data is. It’s an analog means of transferring data as a series of nulls and ones (on and off) which can be deciphered into logical and meaningful information.
To analogize, jitter or electrical noise mixed with the actual datastream can cause incorrect bits in the data. Without getting into things like electrical noise and mitigation; it’s not unlike two people giving instructions when only one of them is correct, and you can’t tell them apart.
Field-Programmable Gate Arrays is a bit like having a configurable integrated circuit. It's not set in stone like an IC is, where the function is set. They are often used to prototype and test before the IC is created from the FPGA prototype.
@djones51: I just got a Hi-Fi Berry Digi+ for my RPi3; Hoping to configure this soon. Did you have any issues with the setup you can share? I'm setting up another roon endpoint system in the house and had an RPi laying around. TIA! -Dave
+1 to those who say the best they have heard digital sound is through a dedicated Roon core--in particular the Intel NUC running Roon ROCK.
This blew me away. I've always streamed Tidal to a Bluesound Node and shot the digital algorithm out to a DAC. Well, going Roon ROCK (Intel NUC) right into the DAC is like buying a real upgrade! No foolin'. This is the best $500 I've spent.
@jbhiller absolutely! Roon ROCK has been incredible. 100% stable no variation in SQ, no performance issues, lagless UI experience; best of all, the SQ is crystalline pure. I now never use the MConnect/UPnP option on my ethernet DAC; Roon is fantastic! I can only hope that future software updates retain the excellent SQ;
For another system, I finally setup my RPI3 with a HiFi Berry Digi+ to use as WiFi (or hardwired) Roon Endpoint; Majorly impressed with this hardware and configuration over internet browser;