Data Mining and Optimization for Effective Decision Making

 

S. Kannan1, A. Albert  Martin Ruban2

1Associate Professor , Dept.of  IT, Kings  College of  Engineering, Tamil Nadu, India.

2Associate Professor , Dept.of  EEE,  Kings  College of  Engineering, Tamil Nadu, India.

 

ABSTRACT:

Business intelligence is a broad category of applications and technologies for gathering, providing access to, and analyzing data for the purpose of helping enterprise users enable the better optimal business decisions. Business Intelligence (BI) is about getting the right information, to the right decision makers, at the right time. Dynamic decision making is effectively dealt with through an instinctive approach, and require precisely based on Analytical methodologies and Mathematical models. This paper describes the basic knowledge of Business intelligence and suitable optimization techniques for the optimal and dynamic decision making in the current business world. This paper aims at analyzing Business Intelligence Systems (BI) in the context of opportunities for improving decision-making in a contemporary organization.

 

KEY WORDS: Business Intelligence, Decision making, Optimization techniques, Data mining.

 

I. INTRODUCTION:

Business intelligence, or BI, is an umbrella term that refers to a variety of software applications used to analyze an organization’s raw data for intelligent decision making for business success. Bias discipline is made up of several related activities, including data mining, online analytical processing, querying and reporting. Techniques include multidimensional a analyses, mathematical projection, modeling, ad-hoc queries and ‘canned' reporting. BI leads to: fact-based decision making and “single version of the truth”.

 

The main purpose of BI systems is to provide Decision makers with tools and methodologies that allow them to make effective and timely decisions. With the help of mathematical models and algorithms, it is actually possible to analyze a large number of alternative actions, achieve more accurate conclusions and reach effective and timely decisions. We may conclude that the major advantage deriving from the adoption of the business intelligence system is found in the increase defectiveness of the decision-making process [1-5].

 

II. LITERATURE  REVIEW:

In a 1958 article, IBM researcher Hans Peter Luhn used the term business intelligence [12]. He defined intelligence as:"the ability to apprehend the interrelationships of presented facts in such away as to guide action towards desired goal.

 

"Business intelligence as it is understood today is said to have evolved from the decision support systems which be gain the 1960 sand developed throughout the mid-80s. Decision Support System (DSS) originated in the computer-aided models created to assist with decision making and planning. From DSS, data warehouses,Executive Information Systems, OLAP and business intelligence came into focus beginning in the late80s.

 

In 1989 Howard Dresner (lateral Gartner Group analyst) proposed "business intelligence" as an umbrella term to describe "concepts and methods to improve business decision making by using fact-based support systems"[6]. It was not until the late 1990s that this usage was widespread. Followed by these studies, a great number of researchers involved in examining to create a huge amount to valuable information in the form of e.g. e-mails, memos, notes from call-centers, news, user groups, chats, reports, web-pages, presentations, image-files, video-files, marketing material and news etc. However, organizations often only use these documents once [26-29].

 

A. Demerits:

There are several problems /challenges when trying to develop BI with semi-structured data, and according to (InmonandNesavich,2008)[14]some of those are:

 

1.   Physicallyaccessingunstructuredtextualdata–unstructureddataisstoredinahugevarietyofformats.

2.   Terminology–Among researchers and analysts, there is a need to develop a standardized terminology.

3.   Volume of data– As stated    earlier, upto 85% of all data exists as semi-structured data. Couple that with the need for word -to-word and semantic analysis.

4.   Search ability of unstructured textual data.

 

III. METHODOLOGY:

The procedure or procedures used to make a system or design as effective or functional as possible, especially the mathematical techniques involved. The approaches to optimizing systems are varied and depend on the type of system involved, but the goal of all optimization procedures is to obtain the best results possible; subject to restrictions or constraints that are imposed [7-11,13].

 

The first step in modern optimization is to obtain a mathematical description of the process or the system to be optimized. A mathematical model of the process or system is then for medon the basis of this description. Depending on the application, the model complexity can range from very simple to extremely complex. System models used in optimization are classified in various ways, such as linear versus non linear, static versus dynamic [21-25].

 

Certain models lend themselves to rapid and well-developed solution algorithms, whereas other models may not. When choosing between equally valid models,

Therefore, those that are cast in standard optimization forms are to be preferred .That is:

 

·       Genetic Algorithm

·       Ant colony optimization

·       Particles warm optimization

 

A.  Genetic Algorithm (GA):

A heuristic search technique [15-20] used in computing and Artificial Intelligence to find optimized solutions to search problems using techniques inspired by evolutionary biology: mutation, selection, reproduction [inheritance] and recombination. This search technique used in computing to find exact or approximate solutions to optimization and search problems.

 

Algorithm1: GeneticAlgorithm:

STEP 1: [Start]Generate random population of n

             Chromosomes (suitable solutions for the problem)

STEP   2: [Fitness]Evaluate the fitness f(x)of each chromosome x in the population

STEP 3 [New population]Create a new population by repeating following steps until the new population is complete

STEP 3.1:[Selection]Select two parent chromosomes from a population according to their fitness (the

             better fitness, the bigger chance to be selected)

STEP 3.2:[Crossover] With a cross over probability crossover the parents to form a new offspring 

              (children).If no cross over was performed, off spring is an exact copy of parents.

STEP 3.3:[Mutation]With a mutation probability mutate new off spring a teach locus(position in

             chromosome).

STEP 3.4:[Accepting]Place new off spring in a new population

STEP 4 : [Replace]Use new generated population for a further run of algorithm

STEP 5 : [Test]If the end condition is satisfied, stop, and return the best solution in current population

STEP 6 : [Loop] Go to STEP2

 

 Each iteration of this process is called generation. AGA is typically iterated for anywhere from 50 to 500 or more generations. The entire set of generations is called arun. At the end of arun there are often one or more highly fit chromosomes in the population. Sincer and omness plays a large role in each run, two runs with different random−number seeds will generally produce different detailed behaviors. GA researchers often report statistics (such as the best fitness found in arun and the generation at which the individual with that best fitness was discovered) averaged over many different runs of the GA on the same problem.

 

B.  Ant Colony Optimization Algorithm:

This algorithm is a probabilistic technique for solving computational problems which can be reduced to finding good paths through graphs. The original idea has since diversified to solve a wider class of numerical problems, and as a result, several problems have emerged, drawing on various aspects of the behavior of ants. A short path, by comparison, gets marched over faster, and thus the pheromone density remains high.

 

Ant Colony Optimization (ACO) studies artificial systems that take inspiration from the behavior of real ant colonies and which are used to solve discrete optimization problems. In 1999, the Ant Colony Optimization meta heuristic was defined by Marco Dorigo, DiCaro and Gambardella. The ants move from vertex to vertex along the edges of the construction graph exploiting information provided by the pheromone values and in this way incrementally building a solution. Additionally, the ants deposit a certain amount of pheromone on the components, that is, either on the vertices or on the edges that they traverse. The amount of pheromone deposited may depend on the quality of the solution found. Subsequent ants utilize the pheromone information as a guide towards more promising regions of the search space.

 

The ACO Meta heuristic is: Set parameters, initialize pheromone trails SCHEDULE_ACTIVITIES Construct Ant Solutions Daemon Actions                {optional}

 

Update Pheromones END_SCHEDULE_ACTIVITIES

 

The Meta heuristic consists of an initialization step and of three algorithmic components whose activation is regulated by the SCHEDULE_ACTIVITIES construct. This construct is repeated until a termination criterion is met. Typical criteria are a maximum number of iterations or a maximum CPU time. The schedule activities construct does not specify how the three algorithmic components are scheduled and synchronized. In most applications of ACO to1NP-hard problems however, the three algorithmic components under goal oop that consists in

 

(a) the construction of solutions by all ants

(b) the (optional) improvement of these solution via the use of a local search algorithm, and

(c) the update of the pheromones.

 

Ants (blind) navigate from nest to food source; shortest path is discovered via pheromone trails each ant moves at random; pheromone is deposited on path; ants detect lead ant’s path, inclined to follow and more pheromone on path increases probability of path being followed.

 

C.  Particle Swarm Optimization:

Particleswarmoptimizationisacomputationalmethodthatoptimizesaproblembyiterativelytryingtoimproveacandidatesolutionwithregardtoagivenmeasureofquality.Suchmethodsarecommonlyknownasmetaheuristicsastheymakefewornoassumptions.About the problem being optimized and can search very large spaces of candidate solutions. However, meta heuristics such as PSO do not guarantee an optimal solution is ever found.

 

The conventional PSO is used to discover the optimal solution in a static environment. Conventional PSO has poor tracking characteristics when the optimal solution is moving. When the environment is dynamic, the task of the optimization is not only to acquire the extreme butal so to track the extreme trajectory as closely as possible. A human social adaptive based PSO is used to discover and track the optimal solution in a dynamic environment. Each particle evaluates the knowledge it received from it previous experience and from the neighbor’s experience. The outdated knowledge will be gradually forgotten by the particle and new knowledge will be learned.

 

PSO is a robust stochastic optimization technique based on the movement and intelligence of swarms. PSO applies the concept of social interaction to problem solving. It was developed in 1995 by James Kennedy (social-psychologist) and Russell Eberhart (electrical engineer). It uses a number of agents (particles) that constitute as warm moving around in the search space looking for the best solution.

 

 

 

In PSO, there have been two basic topologies:

Ring Topology (neighborhood of 3)

Star Topology (global neighborhood)

 

THE ANATOMY OF THE PARTICLE:

·       A particle(individual)is composed of:

 

Three vectors:

·       The  x-vector records  the current position(location)of the particle in the search space,

·       The p  vector records the location of the best solution found so far by the particle, and

·       The v-vector contains a gradient (direction) for which particle will travel in if

·       undisturbed.

 

Two fitness values:

·       The x-fitness records the fitness of the x-vector, and

·       The p- fitness records the fitness of the p-vector.

 

Each particle is treated as appoint in a N-dimensional space which adjusts its “flying” according to its own flying experience as well as the flying experience of other particles. Each particle keeps track of its co ordinates in the solution space which are associated with the best solution (fitness) that has achieved so far by that particle. This value is called personal best, pbest. Another best value that is tracked by the PSO is the best value obtained so far by any particle in the neighborhood of that particle. This value is called gbest. The basic concept of PSO lies in accelerating each particle toward its p best and the gbest locations, with a random weighted acceleration at each time.

 

IV. EXPERIMENT AND RESULTS:

A key step in the formulation of any optimization problem is the assignment of performance measures that are to be optimized. The success of any optimization result is critically dependent on the selection of meaningful performance measures. In many cases,  the actual computational solution approach is secondary. Ways in which multiple performance measures can be incorporated in the optimization process are varied.

 

The BIG BANK is currently flourished bank in the financial sector.  BIG BANK have 10 million account holders. They are going to introduce the Credit Card in the market. In the current financial market, Lots of Competition for the Credit Card Sectors. If we find the Valued Customers, It’ll be a big successful and back bone for the bank. Since the bank is flourished bank, They don’t want to take the risk in the Financial market.

 

Since, they have decision to sell the Credit card from the Valued and Trust worthy Internal Customer only. Plenty of account holders are in the Banks Data warehouse. We need to identify valued customer from the Data warehouse.

 

V. PERFORMANCE AND ANALYSIS:

In the Banking and finance sectors, the Customer profitability analysis. Determinate the overall profitability of individual customer, current and long term, provide the basis for high-profit sales and relationship banking, maximize sales to high-value customers, reduce coststo low-value customers, provide the means to maximize profitability of new products and services. Establish patterns of credit problem progression by customers class and type, warn customers to avoid credit problems, to manage credit limits, evaluate of the bank’s credit portfolio, reduce credit losses. Improve customer service and account selling, facilitate cross selling, improve customer support, and strengthen customer lo0yalty.

 

In data envelopment analysis the units being compared are called decision-making units (DMUs),[23]since they enjoy a certain decisional autonomy. Assuming that we wish to evaluate the efficiency of n units, let N={1,2,...,n}denote the set of units being compared. If the units produce a single output using a single input only, the efficiency of the jth decision-making unit DMU j, jN, is defined as θj=yj/xjin which yj is the output value produced by DMU j and xjthe input value used.

 

If the units produce multiple outputs using various input factors, the efficiency of DMU j is defined as the ratio between a weighted sum of the outputs and a weighted sum of the inputs. Denote by H={1,2,...,s}these to production factor sand by K={1,2,...,m}the corresponding set of outputs. If xij,iH, denotes the quantity of input I used by DMUj and yrj,rK, the quantity of  output r obtained. Efforts undertaken toDevelop BI systems have resulted in many business solutions that allow for effective support of manager’s work.

 

VI. CONCLUSION:

Business intelligence needs to provide us with feedback information that can be used to evaluate a decision. It can provide that foundational and feedback information. Key Performance Indicators (KPIs) are highly summarized measures designed to quickly relay the status of that measure. They usually reflect the most vital aspects of the organization. By bringing discipline to strategic financial modeling, facilitating the world wide operational planning and fore casting, and linking strategies with operations. By letting management, finance, and operating staff focus on  analyzing information rather than gathering and  processing it, such solutions provide organizations with the agility they need to capitalize on business opportunities, optimizer sources, and link strategic goals to operational plans.

 

Contemporary organizations have faced a necessity for complex and semi-structured decision-making. Dispersion of information sources and decentralization of a decision making process result in insufficiency of present information management models. Meta heuristics algorithm is used for both Static and Dynamic Combinatorial optimization problems. Convergence is guaranteed, although the speed is unknown. Hybrid algorithms combining solution constructed by “probabilistic constructive” with local search algorithms yield significantly improved solution. It proposes a new way of thinking the solution of the non-linear complex problems.

 

VII. REFERENCES:

1.     G.E.Kersten, Z.Mikolajuk,andA.Gar-onYeh(Eds.), Decision support systems for sustainable development. Are source book of methods and applications. KluwerAcademicPublishers.2000

2.     ChristianBlum, XiaodongLi:Swarm Intelligence in Optimization. Swarm Intelligence 2008:43-85

3.     ClemenR.Making Hard Decisions: An Introduction to Decision Analysis .Duxbury Press. 1997

4.     DAVIS,L.,Ed.GeneticAlgorithmsandSimulatedAnnealing.MorganKaufmannPublishers,1987.

5.     Davis,L.D.:Hand book of Genetic Algorithms. Van Nostrand Reinhold,1991.

6.     Dresner, H.J.,Buytendijk, F.,Linden, A.,Friedman, T.,Strange, K.H., Knox, M.,andCamn,M.The business intelligence center: An essential business strategy. Gartner Research.2002.

7.     ChristianBlum, Andrea Roli: Hybrid Metaheuristics: An Introduction.HybridMetaheuristics   2008:1-30

8.     GiudiciP.(2003).AppliedDataMining:StatisticalMethodsforBusinessand Industry. Wiley.

9.     Gray,P.,andWatson,H. Decision support in the data warehouse. PrenticeHall.1998

10.  Gray,P.,"The SMU decision room project ",Transactions of the Ist International Conference on  Decision Support Systems(Atlanta, Ga.),1981,pp.122-129.

11.  J.Gołuchowski,(Eds.),DSS in the uncertainty of the Internet age. Katowice: University of Economics.

12.  Hauke,K.;Owoc,M.L.andPondel,M.BuildingDataMiningModelsintheOracle9iEnvironment,InformingScience, pp1183-1191.2003

13.  H.P.Luhn,"A Business Intelligence System "(PDF).IBM Journal.October1958.

14.  Inmon,W.H..Building the data warehouse. New York: J.Wiley,1992.

15.  Kantardzic,M., Data mining: Concepts, models, methods and algorithms. New York: J.Wiley, 2002

16.  Kersten,G.E., Decision making and decision support. In G.E.Kersten, Z.Mikolajuk,andA.Gar-on Yeh (Eds.), Decision support systems for sustainable development. Are source book of methods and applications. Kluwer Academic Publishers.2000

17.  MMarco Dorigo, Thomas Stützle: Antcolony optimization. MITPress2004.

18.  MarshallB., McDonaldD., ChenH., ChungW., Ebizport: collecting and analyzing business intelligence information. Journal of the American Society for information Science and Technology,55,873–891.2004.

19.  MendenhallW.,BeaverR.,BeaverB.,ABriefCourseinBusinessStatistics.South-WesternCollegePub.2000.

20.  MillerH.,HanJ., Geographic Data Mining and Knowledge Discovery. TaylorandFrancis.2000

21.  Moss,L.T.andAlert,S. Business intelligence road map–The complete project life cycle for decision support applications. Addison-Wesley.2003

22.  Olszak,C.M.,andZiemba,E.,Business intelligence systems as a new generation of decision support systems. Proceedings of pista 2004, International conference on politics and information systems: Technologies and Applications. Orlando: The International Institute of Informatics and Systemics.2004.

23.  R.S.Parpinelli, H.S.Lopes, and A.A.Freiatas.Data mining with an ant colony optimization algorithm. IEEE Transactions on evolutionary computation,6(4):321-332,2002.

24.  Rasmussen,N.,Goldy,P.S.,andSolli,P.O. Financial business intelligence. Trends, technology, software selection, and implementation.JohnWileyandSons.2002.

25.  Reinschmidt,J.,andFrancoise,A.,Business intelligence certification quid. IBM, International Technical Support Organization.2000.

26.  Silva, R.,and Rahimi, I. Issues in implementing CRM: A case study. Journal of Issues in Informing Science and Information Technology 2004

27.  Turban,E.,andAronson,J.E.Decision support systems and intelligent systems.PrenticeHall.1998.

28.  Wells,J.D.,andHess,T.J. Understanding decision-making in data warehousing and related decision support systems. An explanatory study of a customer relationship management application. In M.Raisinghani (Ed.) Business intelligence in the digital economy. London: Idea Group Publishing.2004

29.  Wijnhoven,F. Models of information markets: Analysis of markets, identification of services, and design models. Informing Science: The International Journal of an Emerging Discipline,4(4).2001.

 

 

 

 

Received on 29.08.2016

Modified on 06.09.2016

Accepted on 20.12.2016

© A&V Publications all right reserved

Research J. Humanities and Social Sciences. 8(1): January - March, 2017, 32-36.

DOI: 10.5958/2321-5828.2017.00005.5