Thursday, 23 May 2019

Data science tools Part 2

Numba – 

This tool is an open source optimizing compiler that uses the LLVM compiler infrastructure to compile Python syntax to machine code.
Related image
 The main advantage of working with Numba in data science applications is its speed when 

using code with NumPy arrays since Numba is a NumPy aware compiler. Just like Scikit-

Learn, Numba is also suitable for machine learning applications as its speedups can run 

even faster on hardware that is particularly built for either machine learning or data science applications.

HPAT – 
High-Performance Analytics Toolkit (HPAT) is a compiler-based framework for big data. 
Image result for High-Performance Analytics Toolkit
It automatically scales analytics/machine learning codes in Python to bare-metal cluster/cloud 
performance and can optimize specific functions with the @jit decorator.
Cython – 
When working with math-heavy code or code that runs in tight loops, Cython is your best 
choice. 
Related image
Cython is a source code translator based on Pyrex that allows you to easily write C 
extensions for Python. What’s more, with the addition of support for integration 
with IPython/Jupyter notebooks, code compiled with Cython can be used in Jupyter 
notebooks via inline annotations just like any other Python code.

Data science tools Part 1

SciPy – 
This is a Python-based ecosystem of open-source software for mathematics, science, and engineering. 
Related image
SciPy uses various packages like NumPy, IPython or Pandas to provide libraries for common

math- and science-oriented programming tasks. This tool is a great option when you want 

to manipulate numbers on a computer and display or publish the results and it is free as well.

Dask –

 Dask is a tool providing parallelism for analytics by integrating into other community projects like NumPy, Pandas and Scikit-Learn.
Image result for dask

 With this too, you can quickly parallelize existing code by changing only a few lines of code, 
since its DataFrame is the same as in the Pandas library, its Array object works like 

NumPy’s has the ability to parallelize jobs written in pure Python, as well.

Monday, 20 May 2019

Python for data science

When you sign up, you get free access to Watson Studio. If you want to learn Python from scratch, this free course is for you.
Image result for data science with python
You can start creating your own data science projects and collaborating with other data scientists using IBM Watson Studio. Start now and take advantage of this platform. This introduction to Python will kickstart your learning of Python for data science, as well as programming in general. This beginner-friendly Python course will take you from zero to programming in Python in a matter of hours.

Upon its completion, you'll be able to write your own Python scripts and perform basic hands-on data analysis using our Jupyter-based lab environment.