Predicting Software Reselling Profits. Tayko Software is a software catalog firm that sells games and educational software. It started out as a software manufacturer and then added third-party titles to its offerings. It recently revised its collection of items in a new catalog, which it mailed out to its customers. This mailing yielded 1,000 purchases. Tayko wants to devise a model for predicting the spending amount that a purchasing customer will yield. The file Tayko.xls contains information on 1,000 purchases (found in the “Purchasers only” tab of the Excel file). Using the “Purchasers only” data: Explore the spending amount by creating pivot tables for the categorical variables — web order, gender, US address, and residential address — and computing the average and standard deviation of spending in each category. Explore the spending amount by source catalog by creating a single pivot table for all source catalogs. (To do this, you’ll need to use your Excel skills to create a single variable out of the multiple source columns. Hint: a series of nested if statements is one straightforward way to accomplish this.) Explore the relationship between spending and each of the two continuous predictors by creating two scatter plots (SPENDING vs. FREQ, and SPENDING vs. LAST_UPDATE). Does there seem to be a linear relationship? Given what you’ve found in 1-3 above, make some initial observations about relationships you might find in the data, e.g. relationships between spending and other continuous variables based on the scatter plots or differences that the pivot tables show for categorical variables that might be meaningful. To fit a predictive model for SPENDING: Partition the 1000 records into training and validation sets. You will use the partition variable in the “Purchasers only” tab for this. When setting up partitioning, instead of selecting “Random Partition”, select “Use Partition Variable” and select partition as the variable to use. This is how you would utilize a data set in XLMiner that was previously partitioned. Run a multiple linear regression model for SPENDING versus the six predictors (4 categorical variables above, FREQ, and LAST_UPDATE_DAYS_AGO; do not use the sources or 1ST_UPDATE_DAYS_AGO.). Assess your model by comparing fit results between the training and validation results. You can ignore the test results. Starting with all predictors including the catalog sources (exclude PURCHASE which is a dependent variable) run a Best Subsets selection model? What variables will you use in your selected model? What is your rationale? Assess your model by comparing fit results between the training and validation results. You can ignore the test results. Compare your initial model with your Best Subsets model. Which model would you recommend? Explain your rationale? Create histograms of both models’ residuals. Do they appear to follow a normal distribution? How does this affect the predictive performance of the model? 

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes:

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>