I have been writing this article since March 17th, found it in my “not published” tray so I decided to finish it today, There are datacenters and DATACENTES. What is a datacenter ? Data center infrastructure layers are power, cooling, telecom, data rooms and network operations center. In May 2008, Jeff Dean spoke at the Google I/O conference highlighting some information on the inner workings of their datacenter and ambitious plans. With already 36 datacenter around the world in 2008 with over 200,000 servers, that is a lot.
Here are some information from Wiki on Google’s datacenter
The original hardware (circa 1998) that was used by Google when it was located at Stanford University included:
- Sun Ultra II with dual 200 MHz processors, and 256 MB of RAM. This was the main machine for the original Backrub system.
- 2 × 300 MHz Dual Pentium II Servers donated by Intel, they included 512 MB of RAM and 9 × 9 GB hard drives between the two. It was on these that the main search ran.
- F50 IBM RS/6000 donated by IBM, included 4 processors, 512 MB of memory and 8 × 9 GB hard drives.
- Two additional boxes included 3 × 9 GB hard drives and 6 x 4 GB hard drives respectively (the original storage for Backrub). These were attached to the Sun Ultra II.
- IBM disk expansion box with another 8 × 9 GB hard drives donated by IBM.
- Homemade disk box which contained 10 × 9 GB SCSI hard drives.
Servers are commodity-class x86 PCs running customized versions of Linux. The goal is to purchase CPU generations that offer the best performance per dollar, not absolute performance. Estimates of the power required for over 450,000 servers range upwards of 20 megawatts, which cost on the order of US$2 million per month in electricity charges. The combined processing power of these servers might reach from 20 to 100 petaflops.
- Upwards of 15,000 servers ranging from 533 MHz Intel Celeron to dual 1.4 GHz Intel Pentium III (as of 2003). A 2005 estimate by Paul Strassmann has 200,000 servers, while unspecified sources claimed this number to be upwards of 450,000 in 2006.
- One or more 80 GB hard disks per server (2003)
- 2–4 GB of memory per machine (2004)
The exact size and whereabouts of the data centers Google uses are unknown, and official figures remain intentionally vague. In a 2000 estimate, Google’s server farm consisted of 6,000 processors, 12,000 common IDE disks (2 per machine, and one processor per machine), at four sites: two in Silicon Valley, California and one in Virginia. Each site had an OC-48 (2488 Mbit/s) internet connection and an OC-12 (622 Mbit/s) connection to other Google sites. The connections are eventually routed down to 4 × 1 Gbit/s lines connecting up to 64 racks, each rack holding 80 machines and two Ethernet switches. The servers run custom server software called Google Web Server.
Hardware details considered sensitive
In a 2008 book, reporter Randall Stross wrote: “Google’s executives have gone to extraordinary lengths to keep the company’s hardware hidden from view. The facilities are not open to tours, not even to members of the press.” He wrote this based on interviews with staff members and his experience of visiting the company.
Google has numerous data centers scattered around the world. At least 12 significant Google data center installations are located in the United States. The largest known centers are located in The Dalles, Oregon; Atlanta, Georgia; Reston, Virginia; Lenoir, North Carolina; and Goose Creek, South Carolina. In Europe, the largest known centers are inEemshaven and Groningen in the Netherlands and Mons, Belgium. Google’s Oceania Data Center is claimed to be located in Sydney, Australia. 
One of the largest Google data centers is located in the town of The Dalles, Oregon, on the Columbia River, approximately 80 miles from Portland. Codenamed “Project 02″, the new complex is approximately the size of two footballfields, with cooling towers four stories high. The site was chosen to take advantage of inexpensive hydroelectric power, and to tap into the region’s large surplus of fiber optic cable, a remnant of the dot-com boom. A blueprint of the site has appeared in print.
In February 2009, Stora Enso announced that they had sold the Summa paper mill in Hamina, Finland to Google for 40 million Euros. Google plans to invest 200 million euros on the site to build a data center.
Most of the software stack that Google uses on their servers was developed in-house. It is believed that C++, Java, and Python are favored over other programming languages. Google has acknowledged that Python has played an important role from the beginning, and that it continues to do so as the system grows and evolves.
The software that runs the Google infrastructure includes:
- Google Web Server
- Google File System
- Chubby lock service
- MapReduce and Sawzall programming language
- Protocol buffers
Most operations are read-only. When an update is required, queries are redirected to other servers, so as to simplify consistency issues. Queries are divided into sub-queries, where those sub-queries may be sent to different ducts inparallel, thus reducing the latency time.
To lessen the effects of unavoidable hardware failure, software is designed to be fault tolerant. Thus, when a system goes down, data is still available on other servers, which increases reliability.
Apple has been building a MDC (Massive Data Center) in Maiden North Carolina, and is said to have gone into operation already, but Apple has not made clear on how it is using it at this time. The 500,000 square foot facility is five times larger than the Newark California facility (109,500 square foot, bought from WorldCom @ 45 million, a bargain considering it cost 110 million to build) it owns, is said to cost Apple more than 1 Billion US dollars to build.
Named the iDatacenter, many speculate this site to house the cloud based iTunes and other services that Apple plans to deliver in the future. With it’s own Newark and Cupertino data center supported by additional services from Akamai and Limelight, why does it have to spend 1 billion USD in North Carolina? (The 1 billion USD price tag is about twice what Micosoft and Google spend for their data center.) The answer is fairly simple, it get a larger tax incentive from the state if a company invests more than 1 billion dollars over 9 years.
It is said that Eric Schmidt stole all the secrets from Apple during his days as a board member of Apple, but knowing Steve Jobs, I have a feeling that Steve would have picked Eric’s brains on how to construct, run and use a data center as well. I find it unlikely that Apple would launch such an aggressive investment (yes, Apple’s data center is the biggest and most expensive in the corporate world) into data center if it did not have confidence on construction, operation, usage, return and investment.
On June 7th, 2010, Apple may unveil a new service of sorts that use the data center. Apple purchase of Lala and Quattro may have a direct relation to the MDCs. I am sure Apple needs iDatacenter just to fulfill the App and iTunes store sales, so many more may be in the pipeline.
Many specialists feel that data center is so large and close to Pentagon grade, that usage of this power can not be filled by iTunes, SaaS, Lala, Quattro, iPhone, iBookstore or the App store.
CoM: First, any idea why Apple is building this new data center?
Miller: Apple has said very little about the North Carolina facility, beyond the fact that it will serve as the company’s East coast data hub. Apple also has a West coast data center facility in Newark, Calif. Local officials I’ve spoken with say they believe the space is primarily to support Mobile Me and digital content for the iTunes store. The most interesting question is whether Apple needs a much larger facility to support growth in its existing services, or is scaling up capacity for future offerings.
CoM: Could Apple be building it for cloud computing apps — cloud versions of its iLife apps for example?
Miller: One of the leading theories about the size of the NC project is that Apple is planning future cloud computing services that will require lots of data center storage. Cloud computing is a hot trend, and I’d be surprised if Apple isn’t thinking hard – and thinking differently – about cloud computing. Many cloud enthusiasts say that cloud computing will eliminate the need for data centers. In reality, the only thing will change is the owner of the building. All the applications and data that are moving into the cloud will live on servers in brick-and-mortar data centers. The companies that are building the biggest data centers tend to also have the biggest cloud ambitions.
CoM: How big is Apple’s new North Carolina data center — big, small, medium?
Miller: The early site plans indicate Apple is planning about 500,000 square feet of data center space in a single building. That would place it among the largest data centers in the world. For comparison purposes, Apple’s existing data center in Newark, Calif. is a little more than 100,000 square feet. Most new stand-alone enterprise data centers are in the range of 100,000 to 200,000 square feet. So this would qualify as a big-ass data center.
CoM: What’s it comparable to? Do you know of any specific examples?
Miller: In the past several years we’ve seen a handful of new facilities that are redefining the scope of modern data centers. These include Microsoft’s new facility in Chicago, the SuperNAP in Las Vegas and the Phoenix ONE colocation center in Phoenix. All of these facilities house at least 400,000 square feet of space. These data centers are designed to support an enormous volume of data, and reflect the acceleration of the transition to a digital economy. All those digital assets – email, images, video and now virtual machines – drive demand for more and larger data centers.
CoM: Why did Apple chose NC? Are there particularly big pipes in NC? A big powerplant nearby?
Miller: The choice of rural North Carolina suggests that the bottom line for Apple is cost, rather than connectivity. The site in Maiden, NC is not far from a large data center by Google, which usually chases cheap power and tax incentives. Power from Duke Energy is about 4 to 5 cents per kilowatt hour, compared to 7 to 12 cents in California. The company also maximized its incentives by pitting Virginia and North Carolina against one another in trying to wring the best tax incentives out of both states (a popular strategy in data center site location).
Some large companies use distributed data centers to manage their latency and content delivery costs. That may be part of Apple’s thinking, since they’re a major customer for CDNs (I believe they use both Akamai and Limelight Networks). Facebook cited latency to Europe as a key factor in its decision to add data centers in Virginia. Before that, MySpace added a data center in Los Angeles to reduce its reliance on CDNs. But in both cases, those companies sought out Internet hubs where they could connect with dozens of other networks to manage their Internet traffic. You don’t get that in rural North Carolina, soApple seems more focused on cost and scale than on connectivity – which again would suggest a cloud focus.
Unfotunartluy this interview does not reveal or hint what Apple may use the data center for, only to speculate that it will handle a lot of data.
Lot’s of Data? (IMPORTANT READ HERE)
Yes, lots. What uses a lot of data? Hum, video comes to mind. With the iPhone 4G unmasked and it’s frontal camera and 5mega pixel rear camera (Apple is also said to have 8mega pixel camera version in the field), and considering that it is an innovator in digital lifestyle application, it is not far fetched to think that Apple may have a new “Video Social Network” planned that may put Facebook, Twitter, Foursquare, Ustream and You Tube to rest. With Vimeo quality HD video (as iPhone most likely will be named “iPhone HD” for a reason, not fad), face and location recognition, AR and AR (Artificial Reality and Augmented Reality), this would be a killer offering if it came true. Apple with it’s new iPhone and SaaS may store everyones video, tag them, add face, voice and text recognition to them and use this data to create and suggest relationship with people. Apple may have more information on it’s users that FBI’s collected data over the decades.
Video is the missing link in SOCIAL. Yesterday I did an event and streamed it live. I had 4,410 viewers (2,638 unique) of this show. I am just amazed that someone like me goes live and you get 4k people watching the show at one time or the other. Twitter of course enhances Ustream, the platform which was used for the event. All I needed was a MacBook Pro notebook and 2 video cameras, 4 microphones and a audio mixer.
VIDEO is KING of ALL THINGS DIGITAL
I said this and I will say this again, there is nothing to clear and easy to understand than moving picture. If 2 people read a book, their experiences may be completely different. With moving pictures it is always the same. If geo tagging, short messages, moods, AR and AR can be part of ” what are you going now” twitter posting, then there will be no more need for any of the SNS services which are now fragmented and difficult to understand. You Tube and Ustream is showing Apple how good a potential business this is, what You Tube and Ustream lack in ease of use and depth of content since neither makes hardware and operating systems.
Apple has been in a unique position to watch closely the developments of VIDEO SOCIAL (although they are only available in fragments and encompass multiple services and hardware and operating system platform thus causing chaos to some users), it is singlehandedly in a position to offer a seamless solution via it’s OS, hardware and delivery platform. One button and your done with video, geo and text.
If Apple really wants to embark on video, some analysts say, they may need more of these 1 billion dollar data centers around the world to fight latency and sheer number of uploads each iPhone HD will store.
Who else ?
Oracle and the US government are also speeding up the construction of MDCs in 2010. Oracle stopped construction of the MDC in Salt Lake City but then again has recently resumed work on the 240,000 square foot 285 million dollar project. Oracle’s main business is CRM SaaS (software as service) and needs these centers to take on more speed and customer demand. MDCs are being built in London, Wales, Tokyo, Tsukuba and many other cities around the world.
Make no mistake about it, Apple and Google are at war, but come June 7th, Apple may throw in some new weapons to combat Googles announcements that it made at Google I/O, or it may not, since it may announce a service so ambitious and close to our daily life, our method of sharing notched up to a higher level. In such case, others will play catch up with new service offerings and data centers, but once Apple takes the lead, it may be hard for Google to organize HTC and other manufacturers to have a common user interface product to provide the seamless user expirience provided by a company that makes everything.
Ever since 1985, I have been involved with Apple, not because I am an Apple freak (in honesty I am but), but because it makes good business sense to trust a company that has a solid vision (not during Job’s absence I must say), owns and can control the direction it wants to go. To me the HTC Desire is like a Ferrari 430 with a Toyota engine, while the Nexus One is like a Ferrari California with a Nissan engine, not bad but in the end, i will no longer buy Ferrari since what I am buying is neither a Ferrari, Toyota or a Nissan.
It may be a really good time to be in a construction company or a company making container unit module for data centers.