Database Applications for Managers Spring 2008  BU4040

Home  |  Syllabus  |  Journal  | Project Exam Research  |  Gallery

Journal

Dr. Alireza Ebrahimi

Journal Wednesday April 21st, 2008

 

1) How does the Google database or Yahoo database work?

 Both are search engines but they have to have a database.  They have to add, remove, and  update their information. 

The captain cannot be late.

----------------------------------------------------------------------------------------------------------------------------

Conclusions

Google is designed to be a scalable search engine. The primary goal is to provide high quality search results over a rapidly growing World Wide Web. Google employs a number of techniques to improve search quality including page rank, anchor text, and proximity information. Furthermore, Google is a complete architecture for gathering web pages, indexing them, and performing search queries over them.

---------------------------------------------------------------------------------------------------

A large-scale web search engine is a complex system and much remains to be done. Our immediate goals are to improve search efficiency and to scale to approximately 100 million web pages. Some simple improvements to efficiency include query caching, smart disk allocation, and subindices. Another area which requires much research is updates. We must have smart algorithms to decide what old web pages should be recrawled and what new ones should be crawled. Work toward this goal has been done in [Cho 98]. One promising area of research is using proxy caches to build search databases, since they are demand driven. We are planning to add simple features supported by commercial search engines like boolean operators, negation, and stemming. However, other features are just starting to be explored such as relevance feedback and clustering (Google currently supports a simple hostname based clustering). We also plan to support user context (like the user's location), and result summarization. We are also working to extend the use of link structure and link text. Simple experiments indicate PageRank can be personalized by increasing the weight of a user's home page or bookmarks. As for link text, we are experimenting with using text surrounding links in addition to the link text itself. A Web search engine is a very rich environment for research ideas. We have far too many to list here so we do not expect this Future Work section to become much shorter in the near future.

---------------------------------------------------------------------------------------------------

Farzad contributed this information from :  http://www.readwriteweb.com/archives/yahoo_pipes_web_database.php

The Web is just a vast database of information. Everyday, we interact with it without thinking about that too much. We simply take our best query tool, usually called Google, and fire away. Yet decades before the web made its way into our lives, a different kind of database revolutionized our lives. The Relational Database qualifies as one of our best computer science inventions. Lesser known to the non-techie crowd, it nowadays quietly stores terabytes of information behind most familiar ecommerce and corporate sites.

 

---------------------------------------------------------------------------------------------

 

Christian contributed this from : From Wikipedia, the free encyclopedia

Googlebot

Jump to: navigation, search

A Googlebot is a search bot used by Google. It collects documents from the web to build a searchable index for the Google search engine.

If a webmaster wishes to restrict the information on their site available to a Googlebot, or another well-behaved spider, they can do so with the appropriate directives in a robots.txt file,[1] or by adding the meta tag <meta name="Googlebot" content="noindex"> to the webpage. [2] Googlebot requests to Web servers are discernible from their user-agent string 'Googlebot'.

Googlebot has two versions, deepbot and freshbot. Deepbot, the deep crawler, tries to follow every link on the web and download as many pages as it can to the Google indexers. It completes this process about once a month. Freshbot crawls the web looking for fresh content. It visits websites that change frequently, according to how frequently they change. Currently Googlebot only follows HREF links and SRC links. [3]

Googlebot discovers pages by harvesting all of the links on every page it finds. It then follows these links to other web pages. New web pages must be linked to from another known page on the web in order to be crawled and indexed.

A problem which webmasters have often noted with the Googlebot is that it takes up an enormous amount of bandwidth. This can cause websites to exceed their bandwidth limit and be taken down temporarily. This is especially troublesome for mirror sites which host many gigabytes of data. Google provides "Webmaster Tools" that allow website owners to throttle the crawl rate. [1]

-----------------------------------------------------------------------------------------------

When you search the keywords "dr ebrahimi" on different engines, why is www.drebrahimi.com on top?

is it the number of hits?

----------------------------------------------------------------------------------------------------

contribution by Steve http://nsgd.gso.uri.edu/searchguide.html#Truncation

Advanced Searching
For those desiring more flexibility or additional search terms, you may
now string 2 or more words in each term box and link them with the search
operator of your choice in the pull-down menu to the right of each term
box. The default (word or phrase) indicates ADJ (adjacent) so no
change is necessary during simpler searches. Search terms must be
separated by a space or a comma. The search operators allowed are:

AND (two or more terms)

OR (two or more terms)

AND NOT (two terms only)

NEAR (two terms only)

-------------------------------------------------------------------------------------------

JAVA DATABASE EQUIVALENT TO THE C++ DATABASE

 

The following Java database program demonstrates how your knowledge of C++ will help you to write Java applications. You will realize that the control structures are the same (while, if/else, for); however, there are differences in java file handling and user input. File handling in java is a little more complicated due to the class nature of the file. Input from the keyboard has to be parsed into the desired format. There are two stages of compiling Java programs: compilation and interpretation. The compilation of the .java file creates a .class bytecode file that can be interpreted in virtually any platform (Virtual Machine). One of the benefits of Java is its portability and its ability to create applets that can be run on browsers. Java does not have pointers and applets cannot directly access files due to security restrictions. To read input or values, StreamTokenizer is used to read either a string (sval) or a number (nval) as instance variables. The Cin class is used for input from the keyboard and once it is compiled, the class can be used in other applications as long as it is in the same directory.

----------------------------------------------------------------------------------------

Demo this java script program to receive extra points


import java.io.*;
public class Database {
private staticfinal int MAX = 100;
private Cin cin =new Cin();
private Stringname[] = new String[MAX];
private doublehourlyrate[] = new double[MAX];
private doublehoursworked[] = new double[MAX];
private doublegrosspay[] = new double[MAX];
private StringsearchName;
private StringfileName;
private int n =0;
public void load()throws IOException{
try{ Reader fi = new BufferedReader(
newInputStreamReader(
new FileInputStream("employee.dat")));
StreamTokenizer fin = new StreamTokenizer(fi);
int tokenType;
tokenType= fin.nextToken();
while((tokenType != StreamTokenizer.TT_EOF)){
name[n] = fin.sval;
fin.nextToken();
hourlyrate[n] = fin.nval;
fin.nextToken();
hoursworked[n] = fin.nval;
grosspay[n] = hoursworked[n] * hourlyrate[n];
tokenType = fin.nextToken();
n++;}//WHILE
}//TRY
catch(IOException ex){ System.out.println(ex); }//CATCH
}//LOAD
public voidinsert(){
System.out.print("Enter the employee name: ");
name[n] =cin.readString();
System.out.print("What is employee hourly rate? ");
hourlyrate[n]= cin.readDouble();
System.out.print("How many hours did theemployee work? ");
hoursworked[n]= cin.readDouble();
grosspay[n] =hoursworked[n] * hourlyrate[n];
n++;}//INSERT
public voiddisplay(){
for ( int i =0; i<n; i++ ){
if(name[i]!=""){
System.out.println( name[i] + " " + grosspay[i] );}//IF
}//FOR
}//DISPLAY


public boolean search(){
System.out.print("Enter the search name: ");
String searchName = cin.readString();
for(int i = 0; i < n; i++){
if ( searchName.equalsIgnoreCase(name[i])){
System.out.println("Found: "+name[i]);
System.out.println("Grosspay: "+grosspay[i]);
return true; }//IF
}//FOR
System.out.println("Name not found");
return true;
}//SEARCH
public boolean modify(){
System.out.print("Enter the search name: ");
String searchName = cin.readString();
for(int i = 0; i < n; i++){
if ( searchName.equalsIgnoreCase(name[i])){
System.out.print("What is employee hourly rate? ");
hourlyrate[i] = cin.readDouble();
System.out.print("How many hours did the employee work?");
hoursworked[i] = cin.readDouble();
grosspay[i] = hoursworked[i] * hourlyrate[i];
return true; }//IF
}//FOR
System.out.println("Name not found");
return false;
}//MODIFY
public boolean remove(){
System.out.print("Enter the search name: ");
String searchName = cin.readString();
for(int i = 0; i < n; i++){
if ( searchName.equalsIgnoreCase(name[i])){
name[i] = "";
return true; }//IF
}//FOR
System.out.println("Name not found");
return false;
}//REMOVE
public void quit() throws IOException{
this.store();
System.out.println("Thank you");
System.exit(0);
}//QUIT
public void store() throws IOException{
try{
FileOutputStream out = new FileOutputStream("employee.dat");
PrintStream ps = new PrintStream(out);
for(int i = 0; i < n; i++){
if ( name[i] != "" )
ps.println( name[i] + "\t" + hourlyrate[i] +
"\t" + hoursworked[i] );
}//FOR
out.close(); }//TRY
catch( IOException ex ){
System.out.println(ex); }//CATCH
}//STORE


public static void main( String args[] ) throws IOException{
Database db = new Database();
Cin read = new Cin();
db.load();
int choice = 0;
do{System.out.println("\t1-Insert");
System.out.println("\t2-Display");
System.out.println("\t3-Search");
System.out.println("\t4-Modify");
System.out.println("\t5-Remove");
System.out.println("\t6-Quit");
choice = read.readInt();
if( choice == 1 ) db.insert();
else if( choice == 2 ) db.display();
else if( choice == 3 ) db.search();
else if( choice == 4 ) db.modify();
else if ( choice == 5 ) db.remove();
else if ( choice == 6 ) db.quit();
else System.out.println("Enter correct number");
} while(true);
}//MAIN
}//DATABASE

-----------------------------------------------------------------------------------------------