How To Develop A Simple Version Google Search Engine?

Roby Widjaja
May 26, 2020
8 min read

“It is important knowledge for all Search Engine Optimization ( SEO ) Specialists and Beginner Software Developers. You don’t need to be a Genius Computer Scientist or have a PhD in Computer Science to develop it.”

Image source: https://unsplash.com/photos/c4aT8MfEzdw

At the time I first learned to develop a website, everything was so primitive compared to all current existing computer hardware and software technologies in the year 2020. There were no Google and Bing search engines, there were only Lycos, Alta Vista, and Infoseek search engines. There were no visual website development software tools. I had to use Notepad or similar simple text editor software as the only text-based website development tool. So, I had to type the raw HTML codes ( see the picture below ) character by character, word by word, and line by line, one by one.

Image source: https://www.w3schools.com/html/default.asp

How to add an image on the web page I developed? There is no “Drag-and-Drop by mouse” feature on Notepad and similar software for adding and moving an image on a web page during editing. I had to type this one line HTML codes to add an image file on a web page:

There was no better way to develop a website faster other than memorizing all important most used HTML codes in my mind so that I didn’t need to look at books or other websites ( such as www.w3schools.com ) too often during developing the website.

There was no Database Management Server Software such as Microsoft SQL Server, Oracle, or even that simple MySQL. I had to use a text file as the database file of the website. For example, the website could receive input data from the users: “Name: _____________” , “Address: ______________”, and “Age: _____”. So, my website converted the input data into a string first such as “Somebody, somewhere in this world, 25” before it was stored to the text file “database”. People call it as Comma Separated Values ( CSV ) file today.

The only Web Browser I could use was Netscape. There was no Internet Explorer, Google Chrome, and Microsoft Edge.

That’s how I developed a website many years ago. There were no cloud-based website development platforms such as www.wix.com or Wordpress at that time. So, all the finished web page files edited on Notepad must be saved with “*.htm” or “*.html” file extensions before I uploaded it to the web server with FTP software.

How primitive, difficult, and time-consuming it was to develop a website compared to what I do now in developing websites on www.wix.com or similar cloud-based website development platforms.

This Is How To Develop A Simple Version Google Search Engine

Please stop your complex imagination about it. I know most of you think that it is too difficult or too complex to develop a simple version Google search engine. NO, it is NOT difficult and complex. But, I said the simple version one, not the current existing Google Search Engine version now in the year 2020.

I believe most of you have smartphones. On your smartphones, there must be a simple “Contact List” or “Address Book” built-in mobile app. The mobile app that you usually use for saving other people’s phone numbers, emails, address, name, etc. I can say that developing a simple version Google search engine is as easy as developing that simple “Contact List” database management mobile apps.

The knowledge I shared in this article is based on my simple experiment developing it many years ago. I only spent about 7 days with 6 to 8 hours a day developing it.

A simple version Google search engine has three must-have parts, Crawling Engine, Indexing Engine, and Query Engine.

The Crawling Engine

Let’s back to the “Contact List” database management mobile app shortly. In the beginning, that “Contact List” is empty. You must input all your contacts data one by one into it manually. For example, you have 200 families, friends, and acquaintances contacts, it means you must input it 200 times to the Contact List software. Is that right? I am not talking about the “Import Contacts” feature, let’s assume there are no such features.

Crawling Engine is like collecting ALL of your contact list data one by one before you input it on your “Contact List” mobile app one by one.

Crawling Engine is a 24 hours 7 days running software on a computer hardware server that is connected to the Internet. This software visits ALL web pages on the Internet one page by one page automatically continuously 24 hours 7 days non-stop. As we know, that ALL website addresses ( for example www.imarketology.net ) are “alias” addresses of the real numerical addresses ( for example 123.456.789 ). So, you can make automatic address counter to discover ALL existing website addresses on the Internet.

The second part of the Crawling Engine is scanning or “reading” each web page it finds on the Internet. Yes, just like human eyes read a web page. Let’s say we only want to scan one factor only from each web page, the 10 most often used words or phrases on a web page. This is only simple texts based string manipulation algorithm that you must code into the Crawling Engine. The current existing Google or Bing search engines may scan for more than 200 factors on a single web page. So, this part is like making your Search Engine understands what that web page tells about, the main topic of the web page.

Store those 10 most often used words or phrases data into temporary variables before you store it to the text file database by the Indexing Engine.

The Indexing Engine

Let’s back to the “Contact List” database management mobile app shortly again. So, every time you read a Contact data on the paper you wrote all the contacts data, you input it to your “Contact List” mobile app one by one.

Indexing Engine is just like you input your contact data one by one into the mobile app.

This engine also runs 24 hours 7 days continuously on the same computer server with Crawling Engine. The Crawling Engine always passes the 10 most often used words or phrases data stored in temporary variables to this engine. Every time this engine receives that data from Crawling Engine, this engine stores it to the text file database ( Yes, you can store it to heavy-duty Database Management Software such as Microsoft SQL Server or Oracle today. ).

Just an example, you can store it in this record format:

“www.abc.com”, “computer”, “software”, “computer training”, “computer education”, “computer software”, “computer class”, “learning computer”, “education”, “class”, “computer software learning”.

Based on that data record example, you can see that the Indexing Engine always saves the web page address and all 10 most often used words or phrases.

Both Crawling and Indexing Engines must run 24 hours 7 days continuously automatically and never stop, because there are always new web pages on the Internet every day.

The Query Engine

Let’s back to the “Contact List” database management mobile app shortly again. So, after you input and save all of your contacts data into the mobile app, you will always search a contact from it every time, right?

So, The Query Engine is just like searching a contact based on the name or address or job from the mobile app.

It is the engine that receives a keyword from your search engine user input. Then, it queries data from the database ( the database that the Indexing Engine always store data into it ) based on that keyword. So, it is a very basic database manipulation algorithm.

After the data query process, this engine shows the query result to the Search Engine screen, it is called as Search Engine Result Pages ( SERPs ) today.

Tada! The magic just happened on the screen of your search engine user. Yes, for too many people it feels like magic when they use Google or Bing search engines. For some software developers? No, It’s not magic. It’s only a simple database management software as simple as your “Contact List” management mobile apps.

What Are The Difficult Parts in Building a Google Company?

Although it is as simple as that to develop a simple version Google Search Engine, that you don’t need to be a Genius Computer Scientist or have a PhD in Computer Science to be able to develop it, There are only less than 100 people in this world, from 7 billion more people on Earth, who can build a company such as Google today.

Why is it so difficult for the majority of people in this world to build a company such as Google today?

As most of us know that Google started its business with Google Search Engine service for the first time. It’s a free service for global users until now. Google Search Engine generates many billion US Dollars revenue, from paid advertisements, for the company monthly now. As we also know that almost all Google products and services are free for users.

Too few people in this world can accumulate many billion USD Dollars financial equity capital from investors while the company doesn’t generate revenue or profit for many years just like all other technology companies in this world.
Too few people in this world can accumulate many billion USD Dollars financial equity capital from investors for developing multi-billion US Dollars “Super Computer” or “Quantum Computer” class computer server hardware. Including paying the Quantum Computer daily maintenance costs too. Google has the fastest Quantum Computer in the world now in the year 2020.

The Simple Version Google Vs The Real Current Existing Google

Of course, what has been explained above, a conceptual tutorial level, is just for experiment or educational purpose. Nobody in this world wants to use that simple version search engine when they also have other free choices to use Google and Bing.

The first big difference between that simple version Vs the real current existing Google now is the real one scan and save more than 200 factors from a web page by its Crawling and Indexing Engines.

The second difference, the real one can also understand the context of your keyword. For example, when you input just one “apple” word as the keyword, Google can understand whether you are a farmer or somebody who is very related to the agriculture industry, or you are searching for information related to Apple Company and all of its products and services. That’s why Google will show different search results to both farmer and non-farmer users although both of them use the same keyword “apple”.

The third difference, Google also has a feature to predict the keyword you are typing on the keyword input bar and correct the mistype on it too.

The fourth difference, Google also has many different algorithms to counter Black Hat SEO practices.

There are many other differences between both of them. Once again, what has been explained in this article is just for experiment and educational purpose only, not for building commercial search engines such as Google or Bing.

Why Do I Feel Lucky as a Computer Scientist who First Learned Computer Programming with ALL of Those Primitive Tools…

There was a time in my life many years ago that I felt all millennial generation Computer Scientists are very lucky. They first learn computer with all of current existing sophisticated user-friendly tools, both for computer hardware and software developments.

But, a few years later after that until now, I feel that I am luckier than all millennials computer scientists although I first learned computer hardware and software development with all of those primitive tools.

What makes me feel luckier than all millennials computer scientists?

ALL of those primitive tools forced me to learn the very basic things about Computer hardware and software development. It forced me to gain all knowledge that I can use to develop a general or specific purpose computer hardware from ground zero level or scratch including developing the simple Operating System Software for it by machine codes or bare-metal-programming ( The programming with only two numbers “1” and “0” in many different combinations ).