Tuesday, June 4, 2019

Adopting MapReduce and Hummingbird for Information Retrieval

Adopting MapReduce and Hummingbird for entropy RetrievalAdopting MapReduce and Hummingbird for reading Retrieval in dedicated cloud Environment Dr. Piyush GuptaChandelkar Kashinath K.AbstractData collected in class 3 indicated the number of active internet users crosswise the globe. The collected chunks of information termed as pornographic Data not only utilizes physical resources into the network, just in like manner leads to increase in human and financial resources. calumniate computing being a technology with IaaS (Infrastructure as a Service), PaaS (Platform as a Service) and SaaS (Software as a Service) provides virtual resources on pay per use policy. MapReduce being widely used algorithm is used in line with Hummingbird Search engine for information retrieval.Keywords MapReduce, SaaS, IaaS, PaaS, Hummingbird, Big info1. IntroductionOne of the papers published in International conference at Jaipur, entitled The Need and Impact of Hummingbird Algorithm on Cloud groun d Content Management System 21 elaborates on existence of humming bird algorithm on 15th birthday of Google. In existence with previous(prenominal) Google algorithms like panda 3.5, page rank and penguin, hummingbird is a new replacement of full engine instead of repairing individual modules. This has affected 90% of data across the globe.Migrating MapReduce algorithm on cloud environment using Hadoop, not only improves performance due to cloud features but also the efficiency is increased with be minimization.2. ProblemFig-1 Data center Source IBM Enterprise SystemFig-1 get hold ofs a snapshot of engineers working at data centers who manages information from diverse platforms and resources. Managing hardware and mesh topology with virtualized resources needs dedicated young talent. When it comes to end user, he gets an average service as a result of improper management of data centers. MapReduce is one of the trounce known algorithms used for IR (Information Retrieval) in au xiliary with animated algorithms as explained in section 7.Due to exponential increase in smart devices that supports section based search, definitely needs fast and efficient searching algorithm for information retrieval. The voice based search assists to make smart decisions in true time applications like place identification, weather forecast and medical assistance using android based applications.3. Why problem is importantFig-2 Global profits users Source W3 FoundationLooking at data increase across the globe as shown in Fig-2 (data collected till July 1, 2014) 19, the pilled mental ability in repositories is increase worldwide. It requires huge amount of hardware resources running for years to extract information and knowledge for decision making. The big challenge in big data is incessantly increasing content utilizing human resource and cost to create chunks in available networks across the globe, which needs attention.4. It is an unsolved problemFrom the following rel evant reviewed literature (table-1), it gives a blueprint that the problem has still remained unsolved. The authors have either focused on cloud components 6 11 or had used traditional Google Components during the analysis. Since Hummingbird Algorithm 10 is not keyword based the searching criteria have changed. When combined with MapReduce 1 3 15 in cloud environment shall definitely yield efficient results with minimum cost and resources.Table-1 Existing Systems compared5. Here is my ideaFig-3 Proposed Information Retrieval SystemBeing cloud computing 4 6 is upcoming Technology as discussed in section -7.2, is a good source of virtualized resources that helps to manage content on diverse platform irrespective of geographical boundaries. An showcase of Hadoop that supports MapReduce Algorithm (elaborated in sec-7) is migrated in cloud environment using SaaS (Software as a Service) to whom scuttlebutt is diverted for processing. Hummingbird (more in section-7) Algorithm is a brand new search engine designed to project significance from acquired query instead of word, is imparted to collect issue from MapReduce instance. The collected output on amazon S3 cluster is efficiently and effectively delivered to end user based on voice based request, in addition to traditional systems for efficient decision making in the field of medicine, scientific research and so on.6. My idea worksTo confirm the working of proposed idea, a hosted instance of Hadoop was used that supports MapReduce Algorithm and S3 data cluster from virago. It also has Qubole 20 managed database to test the instance in cloud environment. Qubole has an API (Application programming Interface) that gives overview of running instances through dashboard. A user shall give input as a database or can manually select file in addition to query wizard.Once the input is given to MapReduce cluster, data analysis shall be done by using hive query in addition to pig script.Following results were collected by using existing database.Fig-4 Cloud based Hadoop Instance Source QuboleFigure -4 shows a dashboard running Hadoop instance, in which 2 queries have finished data analysis. It communicates at runtime with Amazon S3 bucket where data is stored for input. The schemer 1315 scans the data files from the source and extends the output to reducer. The reducer further processes data and is sent back to S3 cluster for further processing. This information shall be accessed by end user through web access and with the support of Hummingbird Algorithm.Fig-5 Running Hadoop Cluster Source QuboleFig-5 shows a champion running Hadoop Instance in cloud environment. Qubole supports metrics of instances running simultaneously that enhances performance their by increasing efficiency. The graph in the above figure indicates time spent to complete single job. Every task is monitored by master DNS having unique ID. To each DNS a list of queries shall be given as input for further analysis.Fig-6 shows process getting started on Hadoop Cluster that combines both map and Reduce session together. The jobs performed uses batch processing system for single instance. Running multiple instances on different clusters in cloud environment makes process more efficient without investing much is physical infrastructure. As a result of which end user shall enjoy the benefits of information retrieval with minimum time, cost and physical resources. As cloud supports pay per use policies resource allocation as per requirements becomes easier.Fig-6 Hadoop Master DNS Source QuboleDetail business relationship about conceptsexisting algorithms used for information retrievalBFS(Bredth First Search)Redundant BFS.ISN (Intelligent Search Machine)Directed BFSRandom walker searchRandomized Gossiping centralized approachDistributed Information retrievalSearching Object identifierFollowing explanations shall help to elaborate more about specific areas.7.2 Cloud computer architectureFig-7 Cloud Architectur e Source NISTCloud is an upcoming technology that supports IaaS (Infrastructure as a Service) PaaS (Platform as a Service) and SaaS (Software as a Service) as shown in Fig-7.For any hosted instance in cloud, open source software is used as a server that supports virtualization and Grid technology. Virtual private network is used in addition to wideband network13 16. As a service provider SLA (Service level Agreement) is signed between an organization and service provider. Distributed computing is one of the known components as data transferred across the network requires secure, authentic and efficient service in a given network.The type of cloud includes public, private, community and hybrid cloud 2. Private clouds are hosted in dedicated environment having firewall and other authentication features. Updating existing system and taking backup remains responsibility of the owner. Hybrid clouds may be hosted in private environment in synchronization with public resources. The end use r held responsible for resources used in public cloud with minimum security.7.3 MapReduce AlgorithmFig-8 MapReduce Algorithm Source Jimmy Lin, University of MarylandThe algorithm takes data input as a file or database in the form of query. A list of mapper instances are activated which travels across the database in search of information. The jobs or data values are shuffled based on keys and aggregated as an input to reducers. These reducers understand the key inputs and reshuffle to get unique relevant information for further processing as shown in Fig-81.7.4 Hummingbird AlgorithmHummingbird Algorithm 10 21 is the latest birthday benefaction from Google. Panda 3.5 and penguin were basically filters applied to searching criteria in the form of web pages and hyperlink.The traditional search engine extracts information based on keywords. Considering a condemnation How many times does hummingbird flap their wings per second? the traditional search engine being keyword based tries to extract word like times, flap and per second. establish on collected keywords the web pages are searched in database. The collected content undergoes filtering from panda and penguin. Resultant results are displayed to user in the form of hyperlinks.Being hummingbird is universe in the field of search and meant for voice based information retrieval, it accepts query as a single sentence instead of keywords. The engine tries to understand meaning and creates knowledge base from provided information or query.Fig-9 Hummingbird Search Source Google.comIn fig-9, the query asked to Google was where am i? Using voice search. The search engine had rig my current location based on IP address or physical location and displayed map for the same.8. Conclusion and future workThe paper is extension to hummingbird Algorithm 21 that supports MapReduce Algorithm with Hummingbird search engine in dedicated cloud environment. Qubole a hosted Hadoop instance is used to confirm working of MapReduce i n support with Amazon S3 for data during. A single hive query instance on single DNS is tested which shall be extended for testing multiple instances of hive and pig script simultaneously as future work.References1 Rahul Prasad Kanu , Shabeera T P , S D Madhu Kumar 2014- Dynamic Cluster Configuration Algorithm in MapReduce Cloud, International daybook of Computer Science and Information Technologies, Vol. 5 (3), 2014, 4028-4033.2 Mr. Kulkarni N. N., Dr. Pawar V. P., Dr. K.K Deshmukh -2014 Evaluation of Information Retrieval in Cloud computing based services, Asian Journal of Management Sciences 02 (03 (Special Issue))3 Brian Hellig, Stephen turner, generative collier, long zheng-2014- beyond map educe the next generation of big data analytics HAMR.Eti.com.4 Ismail Hmeidi, Maryan Yatim, Ala Ibrahim, Mai Abujazouh, 2014 Survey of Cloud Computing Web Services for Healthcare Information Retrieval Systems , International conference on Computing Technology and Information Management, D ubai, UAE.5 Anil Radhakrishnan and Kiran kalmadi -2013- Big Data Medical engine in the cloud, Infosys Lab apprise Vol-11, No-1.6 Dr. Sanjay Mishra, Dr. Arun Tiwari 2013 A Novel Technique for Information Retrieval establish on Cloud Computing, international Journal of information technology.7 Yu Mon Zaw, Nay Min Tun 2013-Web Services Based Information Retrieval Agent System for Cloud Computing. International Journal of Computer Applications Technology and Research Volume 2 Issue 1, 67-71.8 Gautam Vemuganti 2013- Metadata Management in Big data, Infosys lab Briefing.9 Aaditya Prakash 2013-Natured Inspired visualization of unstructured big data, Infosys lab briefing, Vol-11, No-1.10 Xinxin Fan, Guang Gong,Honggang Hu-2011- Remedying the Hummingbird Cryptographic Algorithm, IEEE.11 Mosashi Inoue 2009- image retrieval research and use in the information retrieval, internal Institute of Informatics.12 Jeff Dean Google Fellow 2009- Challenges in Building Large-Scale Information Retrieval Systems.13 Tsungnan Lin, Pochiang Lin, Hsinping Wang,Chiahung Chen-2009-Dynamic Search Algorithm in Unstructured Peer-to-Peer Networks, IEEE.14 William Hersh -2008 Future perspectives Ubiquitous but unfinished grand challenges for information retrieval, Department of Medical Informatics and Clinical Epidemiology, Oregon Health and Science University, Portland, Oregon, USA.15 Jeffrey Dean and Sanjay Ghemawat 2004-MapReduce Simplified Data Processing on Large Clusters, Google.com.16 Mehran Sahami Vibhu Mittal Shumeet Baluja Henry Rowley 2003-The Happy Searcher Challenges in Web Information Retrieval, google.com17 James Allan 2002-Challenges in Information Retrieval and Language Modeling, Report of a workshop held at the Center for Intelligent Information Retrieval, University of Massachusetts Amherst18 Amit Singhal 2001- Modern Information Retrieval A Brief Overview IEEE Computer Society Technical Committee on Data Engineering.19 tp//www.internetlivestats.com20 https//api.qubole.com 21 Dr. Piyush Gupta, kashinath Chandelkar 2012- The Need and Impact of Hummingbird Algorithm on Cloud based Content Management System, vol-2, issue-12, IJARCSSE journal.

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.