martes, 12 de septiembre de 2017

Python Knowledge Discovery and Data Mining Research Group KDDRG

https://web.cs.wpi.edu/~ruiz/KDDRG/Resources/Python/

Knowledge Discovery and Data Mining Research Group
KDDRG

Miscellaneous Notes on Python 

Lots of the text and materials posted on this page were produced by Ahmedul Kabir (thanks Kabir!)
------------------------------------------

General Information about Python:


Python Tutorials:


Python Books:


Python Environments:


Python Data Mining Packages:

Python has many open source packages available specifically for Data Mining and Knowledge Management. Here is a list of the most widely used ones, along with brief descriptions:
  • Scikit-learn: Simple and efficient tools for data mining and data analysis. Has algorithms implemented in the fields of Preprocessing, Classification, Regression, Clustering, Dimensionality Reduction and Model selection. It is built on the commonly used NumPy and SciPy packages. Scikit-learn is usually the default choice when it comes to Data Mining in Python.
  • Pandas: Python Data Analysis Library: Slightly more advanced library than Scikit-learn. Has a very good API. Pandas introduces some useful data structures, such as .dataframes.. However, Pandas doesn.t provide all of the predictive modelling tools. Pandas is used when more control is needed when working directly on raw data.
  • Orange: The best thing about Orange is that it has a Graphical User Interface. Has quite a comprehensive collection of algorithms for Classification, Clustering and feature selection. It also has add-ons for Bioinformatics and Text mining.
  • MLPy: Machine Learning Python: MLPy is a Machine Learning package similar to Scikit-Learn. It has most of the algorithms necessary for Data mining, but is not as comprehensive as Scikit-learn. MLPy can be used for both Python 2 and 3.
Note: Python Package Index: All Python packages can be searched by name or keyword in the Python Package Index.

Data Preprocessing:


Model Evaluation:


Decision Trees:


Linear Regression:


Regression using Trees:


Association Rules:


Clustering:

Scikit-learn:
- K-means: http://scikit-learn.org/stable/modules/generated/sklearn.cluster.KMeans.html
- Hierarchical: http://scikit-learn.org/stable/modules/generated/sklearn.cluster.AgglomerativeClustering.html
- DBSCAN: http://scikit-learn.org/stable/modules/generated/sklearn.cluster.DBSCAN.html

Orange:
- K-Means: http://orange.biolab.si/docs/latest/reference/rst/Orange.clustering.kmeans.html
- Hierarchical: http://orange.biolab.si/docs/latest/reference/rst/Orange.clustering.hierarchical.html

MLPy:
- K-means and hiearchical: http://mlpy.sourceforge.net/docs/3.3/cluster.html

- An independent implementation of DBSCAN: http://iamtawit.blogspot.in/2012/12/dbscan.html

------------------------------------------
[Return to the WPI Homepage]  [Return to the CS Homepage]

No hay comentarios:

Publicar un comentario