IJRETS, ICICSIT 2015 Mounica N, Vijay Kumar Damera 51 AN EFFICIENT METHOD TO GENERATE CAPTION FROM THE ELECTRONIC DOCUMENT FOR DOMAIN MODULE Mounica N and Vijay Kumar Damera Department of Information Technology, MGIT, Hyderabad, Telangana, INDIA ABSTRACT: This paper cares with the task of mechanically generating captions for pictures that is vital for several image related applications. Examples embody video and image retrieval in addition because the development of tools that aid visually impaired individuals to access pictorial data. Our approach leverages the large resource of images out there on the online and also the indisputable fact that many of them area unit captioned and co-located with thematically connected documents. Our model learns to form captions from a info of news articles, the images embedded in them, and their captions, and consists of 2 stages. Content choice identifies what the image and incidental to article area unit regarding, whereas surface realization determines the way to verbalize the chosen content. We approximate content choice with a probabilistic image annotation model that implies keywords for a picture. The model postulates that pictures and their matter descriptions area unit generated by a shared set of latent variables (topics) and is trained on a feeble labeled dataset (which treats the captions and associated news articles as image labels). Impressed by recent add report, we propose extractive and theoretical surface realization models. Experimental results show that it's viable to get captions that area unit pertinent to the precise content of a picture and its associated article, whereas allowing ability within the description. Indeed, the output of our theoretical model compares favorably to written captions and is commonly superior to extractive strategies. Index Terms — Image labels, pictorial data, article and dataset. [1] INTRODUCTION Recent years have witnessed an unprecedented growth in the amount of digital info on the market on the Internet. Flickr, one in all the simplest famed icon sharing websites, hosts over three billion pictures, with roughly 2.5 million pictures being uploaded a day. Many online news sites like CNN, Yahoo!, and BBC publish pictures with their stories and even offer icon feeds associated with current events. Browsing and finding photos in large-scale and heterogeneous collections or a vital downside that has attracted a lot of interest at intervals info retrieval. Many of the search engines deployed on the net retrieve pictures while not analyzing their content, simply by matching user queries against collocated matter info. Examples embody data (e.g., the image’s file name and format), user-annotated tags, captions, and, generally, text close the image. As this limits the applicability of search engines (images that don't coincide with matter information can't be retrieved), an excellent deal of labor has targeted on the event of ways that generate description words for an image mechanically. The literature is affected by numerous tries to be told the associations between image options and words victimization supervised classification instantiations of the noisy-channel model latent variable models and models inspired by data retrieval. Although keyword-based classification