oserooms.blogg.se - Weka 3.8.1 download

Training a Weka classifier (or regressor).This means that parallelism can be exploited in the reduce phase by using as many reducers as there are rows in the matrix. The reduce tasks aggregates individual rows of the matrix in order to produce the final matrix. Map tasks compute a partial matrix of covariance sums. The matrix produced by this job can be read by Weka's Matrix class. Once the ARFF header job has been run, then computing a correlation matrix can be completed in just one pass over the data given our handy summary stats. Computing a correlation or covariance matrix.These summary statistics come in useful for some of the other tasks listed below. At the same time this task computes some handy summary statistics (that are stored as additional "meta attributes" in the header), such as count, sum, sum squared, min, max, num missing, mean, standard deviation and frequency counts for nominal values. This is particularly important because, as Weka users know, Weka is quite particular about metadata - especially when it comes to nominal attributes. Determining a unified ARFF header from separate data chunks in CSV format.In the future there could be other wrappers - one based on the Spark platform would be cool.īase map and reduce tasks distributedWekaBase version 1.0 provides tasks for: The second, called distributedWekaHadoop, provides Hadoop-specific wrappers and jobs for these base tasks. It provides base "map" and "reduce" tasks that are not tied to any specific distributed platform. The first new package is called distributedWekaBase. This series of posts is continued in part 2 and part 3. This post is the first of three that outlines what's available, in terms of distributed processing functionality, in several new packages for Weka 3.7. How to handle large datasets with Weka is a question that crops up frequently on the Weka mailing list and forums.