Hi friends! I am sure you like my last post about survey done by Oracle and ITtoolbox. I had forgotten to post "The URL link for actual survey" in HTML. So it could not appear but Now I have made it correct.
Yeah I have just finished my lecture and now I am writing post about it. Today's post is very basic and deals with my concern about some terms used in lecture notes.
My concerns:
1) "Data Mining is the process of exploration and analysis, by automatic or semiautomatic means, of large quantities of data in order to discover meaningful patterns and rules" (Berry & Linoff 2000)
2) what is after discovering meaningful patterns and rules? Do we sit back and stop? in regards to Berry and Linoff defintion.
3) Data Sampling is very important for Data Mining...but how to do sampling?
4) Who will "Decide" the success and validity of "the" Model.
5) Will "Automation" replace Human Involvement.? Is it a risk? Who will take responsibility?
My Views/Opinions
1) What Can you understand by reading the definition given by Berry and Linoff. I have some questions regarding this definition. Data Mining is Automatic process? Is it Semiautomatic? Do we have to have Large Quantities of data to do Data Mining? Do we sit back and stop after discovering Meaningful Patterns and Rules?
I think Data Mining is not the only automatic process. It can not be only automatic process. It is mixer of "Automatic" and "Manual (Human Involvement)" process. And thats why I used Red color for that term. Yeah you can say semiautomatic because it involves both "automatic" and "Human-Manual". If any one of you find or have come across Fully "Automatic" Data Mining Process please contact me.
Large Quantities of data? I think they wanted to say it differently but some how had put words differently. Based on their definition I interpret that they are saying that we are doing data mining on Large volume of data. It might be that I am wrong in interpretation. But if I was correct then I think we are not using Large Quantities of data for data mining. Because we are performing data mining on "Selected" (sampled) data sets. and this sampled data must be small compared to whole (full) organizational data you have relevant to your business problem. So I think what they wanted to say is that Organizations have large quantities of data and then they perform data based on selected data relevant to their problem. But again they have not used the word "Selected" "Sampled" "relevant" in their definition any where.
I asked the same Question to our lecturer and He said same thing that yeah there is debate going on whether it is "Automated" or "Semiautomated".
I my opinion Data Mining definition should be:
It is the circled process of going through data (whatever we have everything), finding relevant data (adding some relevant data during process), taking out most relevant data (sampling) (human + machine), discovering patterns and rules and based on that, predicting "future" patterns and rules which will add positive value(Money or service) to business. Whole process is relevant to our business problem.
It is time to explain my definition. Take a example of "Predicting sales of televisions for Next Cricket World Cup four countries India-Pakistan-Australia" for a company called "TV4you". Now this company might have data from all sources like ICC(-International Cricket Council), internal data (Data warehouse+Employee views) , external data. They should go through each data and find relevant data, then they will do sampling and discovering pattern and rules for sales during cricket world cup in those countries and they can predict "future trends and patterns" for the TV sales in those countries. It might be at Individual customer level or may be general at country specific. If Data Mining is used for CRM purpose it should be at Individual Customer Level.
2) Berry and Linoff said about discovering rules and patterns. But What is then? They have not said about Predicting future rules and patterns. I think The main purpose of doing Data Mining is Predicting Future based on what we have now. In what area we can step in or increase our business value.
3) Sampling is very Important in data mining. But how we are going to do it? Is it fully Automatic process. Is it involving Humans? What are meta data for sampling process? I think sampling should involve both Humans and Machines. It can not be only statistical or automatic machine based process. Business must have "at least" good statistics and business employee skills for sampling. And thus it suggests overall organizations success depends on person who does sampling for data.
4) Validating model is very important. But who will decide that "The" model is "valid" and "successful" now. Is it only automatic process based on previous models? or Does it involve Humans? I think without human support validation is not possible. It is a subjective area.
5) Will "Automation" replace "Humans"? Is it risk? Responsibility? Now a days many data mining organizations are taking about fully automatic data mining. I concern that is it possible to do fully automatic data mining? I think no it is not possible. In case it is done then I think it will be very risky to do that. Data Mining is done for businesses and after all businesses are running by ourselves, for humans (customers) , to make money (value, service: Subjective to us). So how "Machine" "Automatic Processes" "Rules and Patterns" only three will decide about future trends and patterns and customers. Yeah certainly they can help us (Humans") to look where we can not look or discovered yet by our eyes so far but certainly they can not predict alone. And in case we got "automated" data mining tools then who will take responsibility if business looses value based on its prediction? will it be "developer" "client (at a business level) " "vendors". Here I said at Business Level because no employee is directly adding any thing into data mining. And thus business can not blame your marketing statistics person for sales drop or business loss in cricket world cup match if he has not given his opinion or any input during any stage of data mining process. .......oh I remembered, Managers (business) can blame Indian Cricket team for TV sales drop in India during Cricket World Cup 2007. And by the way from decision support systems (About Managers, Managerial Decisions, How they take decisions) I can say Managers trust Employees Gossip about Sales Prediction rather than Computer generated data. If employees are confident enough while saying predictions to their managers (management), they understand yeah their business will sure earn money or will increase their value or services. If they are not, then they will consider both (Employees and Machines) plus their experience and then will develop prediction satisfying all three sourcing all together. (Middle solution)
Any ways I have been talking too much now.....I think I should stop now and you should take rest now...
Catch you later sure..Cheers!
Thanks,
Anki
Tuesday, August 7, 2007
Subscribe to:
Post Comments (Atom)
1 comment:
Good questions, particularly with regard to 'what do you do with a pattern once you've discovered it?' Data mining is useless unless it results in some kind of business change, such as new business processes, changed approach to customers, some sort of impact on strategic decision-making, etc. Data mining has a fair bit of potential for CRM in that it can be used for a wide variety of business changing activities, but there is not a lot out there on how to incorporate the results into the organisation (we'll have a lecture on that later, actually). Have a look at the sample chapter from the Groth book in the recommended text list for some insight.
Post a Comment