Thursday, January 11, 2018

Installing Python - Anaconda ,Miniconda


It is possible to do programming with Python on your own computer, but first you need to install Python. The purpose of this page is to help you to install Python and different Python packages into your own computer. Even though it is possible to install Python from their homepage, we highly recommend using Anaconda which is an open source distribution of the Python and R programming languages for large-scale data processing, predictive analytics, and scientific computing, that aims to simplify package management and deployment. In short, it makes life much easier when installing new tools to your Python.

System requirements

  • 32- or 64-bit computer.
  • For Miniconda—400 MB disk space.
  • For Anaconda—Minimum 3 GB disk space to download and install.
  • Windows, macOS or Linux.
  • Python 3.4, 3.5 or 3.6.
  • pycosat.
  • PyYaml.
  • Requests.
NOTE: You do not need administrative or root permissions to install Anaconda if you select a user-writable install location.

Install Python on Windows

Following steps have been tested to work on Windows 7 and 10 with Anaconda3 version 5.0.1 
Download and save the installer file.
Install Anaconda to your computer by double clicking the installer and install it into a directory you want (needs admin rights). Install it to all users and use default settings.
Click on Next
Now Agree the license

Select the location to save 
now next and then finish
After clicking on finish 
now you can see on the start menu Anaconda Installed



Note
Note : you need to set the installation location as C:\Python
Test that the Anaconda´s package manage called conda works by opening a command prompt as a admin userand running command conda --version. If the command returns a version number of conda (e.g. conda 4.3.23) everything is working correctly.

Install Python on macOS

There is now a convenient graphical installer that can be used to install Anaconda for Mac. For the IntroQG people, we recommend you install Anaconda 4.4.0 by visiting the Anaconda downloads page and clicking on the button to install the latest Python 3 version of Anaconda, as shown below.
Note
Anaconda for Mac (version 5.0.1). You can download that version using this link to the Anaconda software repository.

Installing on macOS

  1. Download the installer:
  2. Install:
    • Miniconda—In your Terminal window, run:
      bash Miniconda3-latest-MacOSX-x86_64.sh
      
    • Anaconda—Double-click the .pkg file.
  3. Follow the prompts on the installer screens.
    If you are unsure about any setting, accept the defaults. You can change them later.
  4. To make the changes take effect, close and then re-open your Terminal window.

Installing in silent mode

NOTE: The following instructions are for Miniconda. For Anaconda, substitute Anaconda for Miniconda in all of the commands.
To run the silent installation of Miniconda for macOS or Linux, specify the -b and -p arguments of the bash installer. The following arguments are supported:
  • -b—Batch mode with no PATH modifications to ~/.bashrc. Assumes that you agree to the license agreement. Does not edit the .bashrc or .bash_profile files.
  • -p—Installation prefix/path.
  • -f—Force installation even if prefix -p already exists.
EXAMPLE:
wget http://repo.continuum.io/miniconda/Miniconda3-3.7.0-Linux-x86_64.sh -O ~/miniconda.sh
bash ~/miniconda.sh -b -p $HOME/miniconda
export PATH="$HOME/miniconda/bin:$PATH"
NOTE: This sets the PATH only for the current session, not permanently. Trying to use conda when conda is not in your PATH causes errors such as “command not found.”
In each new bash session, before using conda, set the PATH and run the activation scripts of your conda packages by running:
source $HOME/miniconda3/bin/activate
NOTE: Replace $HOME/miniconda3/bin/activate with the path to the activate script in your conda installation.
To set the PATH permanently, you can add a line to your .bashrc file. However, this makes it possible to use conda without running the activation scripts of your conda packages, which may produce errors.
EXAMPLE:
export PATH="$HOME/miniconda3/bin:$PATH"

Updating Anaconda or Miniconda

  1. Open a Terminal window.
  2. Navigate to the anaconda directory.
  3. Run conda update conda.

Uninstalling Anaconda or Miniconda

  1. Open a Terminal window.
  2. Remove the entire Miniconda install directory with:
    rm -rf ~/miniconda
    
  3. You may also:
  4. OPTIONAL: Edit ~/.bash_profile to remove the Miniconda directory from your PATH environment variable.
  5. Remove the following hidden file and folders that may have been created in the home directory:
    • .condarc file
    • .conda directory
    • .continuum directory
    By running:
    rm -rf ~/.condarc ~/.conda ~/.continuum

Installing on Linux

  1. Download the installer:
  2. In your Terminal window, run:
    • Miniconda:
      bash Miniconda3-latest-Linux-x86_64.sh
      
    • Anaconda:
      bash Anaconda-latest-Linux-x86_64.sh
      
  3. Follow the prompts on the installer screens.
    If you are unsure about any setting, accept the defaults. You can change them later.
  4. To make the changes take effect, close and then re-open your Terminal window.

Using with fish shell

To use conda with fish shell, add the following line in your fish.config file:
source (conda info --root)/etc/fish/conf.d/conda.fish

Installing in silent mode

See the instructions for installing in silent mode on macOS.

Updating Anaconda or Miniconda

  1. Open a Terminal window.
  2. Run conda update conda.

Uninstalling Anaconda or Miniconda

  1. Open a Terminal window.
  2. Remove the entire miniconda install directory with:
    rm -rf ~/miniconda
    
  3. OPTIONAL: Edit ~/.bash_profile to remove the Miniconda directory from your PATH environment variable.
  4. OPTIONAL: Remove the following hidden file and folders that may have been created in the home directory:
    • .condarc file
    • .conda directory
    • .continuum directory
    By running:
    rm -rf ~/.condarc ~/.conda ~/.continuum
    

How to find out which conda -command to use when installing a package?

The easiest way

The first thing to try when installing a new module X is to run in a command prompt (as admin) following command (here we try to install a hypothetical module called X)
conda install X
In most cases this approach works but sometimes you get errors like (example when installing a module called shapely):
C:\WINDOWS\system32>conda install shapely
Using Anaconda API: https://api.anaconda.org
Fetching package metadata .........
Solving package specifications: .
Error: Package missing in current win-64 channels:
  - shapely

You can search for packages on anaconda.org with

    anaconda search -t conda shapely
In this case conda was not able to find the shapely module from the typical channel it uses for downloading the module.

Alternative way to install packages if typical doesn’t work

If conda install command was not able to install the package you were interested in there is an alternative way to do it by taking advantage of different conda distribution channels that are maintained by programmers themselves. An easy way to find the right command to install a package from these alternative conda distribution channels is to Google it.
Let’s find our way to install the Shapely module by typing following query to Google:
../_images/google_query_conda.PNG
Here, we can see that we have different pages showing how to install Shapely using conda package manager.
Which one of them is the correct one to use?
We need to check the operating system banners and if you find a logo of the operating system of your computer, that is the one to use! Thus, in our case the first page that Google gives does not work in Windows but the second one does, as it has Windows logo on it:
../_images/conda_shapely_windows.PNG
From here we can get the correct installation command for conda and it works!
../_images/install_shapely.PNG
You can follow these steps similarly for all of the other Python modules that you are interested to install.

Top Libraries of Python

Python Standard Libraries


1. Pipenv


Pipenv, originally started as a weekend project by the awesome Kenneth Reitz, aims to bring ideas from other package managers (such as npm or yarn) into the Python world. Forget about installing virtualenvvirtualenvwrapper, managing requirements.txt files and ensuring reproducibility with regards to versions of dependencies of the dependencies (read here for more info about this). With Pipenv, you specify all your dependencies in a Pipfile — which is normally built by using commands for adding, removing, or updating dependencies. The tool can generate a Pipfile.lock file, enabling your builds to be deterministic, helping you avoid those difficult to catch bugs because of some obscure dependency that you didn’t even think you needed.
Of course, Pipenv comes with many other perks and has great documentation, so make sure to check it out and start using it for all your Python projects, as we do at Tryolabs :)

2. PyTorch


If there is a library whose popularity has boomed this year, especially in the Deep Learning (DL) community, it’s PyTorch, the DL framework introduced by Facebook this year.
PyTorch builds on and improves the (once?) popular Torch framework, especially since it’s Python based — in contrast with Lua. Given how people have been switching to Python for doing data science in the last couple of years, this is an important step forward to make DL more accessible.
Most notably, PyTorch has become one of the go-to frameworks for many researchers, because of its implementation of the novel Dynamic Computational Graph paradigm. When writing code using other frameworks like TensorFlowCNTK or MXNet, one must first define something called a computational graph. This graph specifies all the operations that will be run by our code, which are later compiled and potentially optimized by the framework, in order to allow for it to be able to run even faster, and in parallel on a GPU. This paradigm is called Static Computational Graph, and is great since you can leverage all sorts of optimizations and the graph, once built, can potentially run in different devices (since execution is separate from building). However, in many tasks such as Natural Language Processing, the amount of “work” to do is often variable: you can resize images to a fixed resolution before feeding them to an algorithm, but cannot do the same with sentences which come in variable length. This is where PyTorch and dynamic graphs shine, by letting you use standard Python control instructions in your code, the graph will be defined when it is executed, giving you a lot of freedom which is essential for several tasks.
Of course, PyTorch also computes gradients for you (as you would expect from any modern DL framework), is very fast, and extensible, so why not give it a try?

3. Caffe2



It might sound crazy, but Facebook also released another great DL framework this year.
The original Caffe framework has been widely used for years, and known for unparalleled performance and battle-tested codebase. However, recent trends in DL made the framework stagnate in some directions. Caffe2 is the attempt to bring Caffe to the modern world.
It supports distributed training, deployment (even in mobile platforms), the newest CPUs and CUDA-capable hardware. While PyTorch may be better for research, Caffe2 is suitable for large scale deployments as seen on Facebook.
Also, check out the recent ONNX effort. You can build and train your models in PyTorch, while using Caffe2 for deployment! Isn’t that great?

4. Pendulum


Previously, Arrow, a library that aims to make your life easier while working with datetimes in Python, made the list. This year, it is the turn of Pendulum.
One of Pendulum’s strength points is that it is a drop-in replacement for Python’s standard datetime class, so you can easily integrate it with your existing code, and leverage its functionalities only when you actually need them. The authors have put special care to ensure timezones are handled correctly, making every instance timezone-aware and UTC by default. You will also get an extended timedelta to make datetime arithmetic easier.
Unlike other existing libraries, it strives to have an API with predictable behavior, so you know what to expect. If you are doing any non trivial work involving datetimes, this will make you happier! Check out the docs for more.

5. Dash



You are doing data science, for which you use the excellent available tools in the Python ecosystem like Pandas and scikit-learn. You use Jupyter Notebooks for your workflow, which is great for you and your colleagues. But how do you share the work with people who do not know how to use those tools? How do you build an interface so people can easily play around with the data, visualizing it in the process? It used to be the case that you needed a dedicated frontend team, knowledgeable in Javascript, for building these GUIs. Not anymore.
Dash, announced this year, is an open source library for building web applications, especially those that make good use of data visualization, in pure Python. It is built on top of FlaskPlotly.js and React, and provides abstractions that free you from having to learn those frameworks and let you become productive quickly. The apps are rendered in the browser and will be responsive so they will be usable in mobile devices.
If you would like to know more about what is possible with Dash, the Gallery is a great place for some eye-candy.

6. PyFlux


There are many libraries in Python for doing data science and ML, but when your data points are metrics that evolve over time (such as stock prices, measurements obtained from instruments, etc), that is not the case.
PyFlux is an open source library in Python built specifically for working with time series. The study of time series is a subfield of statistics and econometrics, and the goals can be describing how time series behave (in terms of latent components or features of interest), and also predicting how they will behave the future.
PyFlux allows for a probabilistic approach to time series modeling, and has implementations for several modern time series models like GARCH. Neat stuff.

7. Fire


It is often the case that you need to make a Command Line Interface (CLI) for your project. Beyond the traditional argparse, Python has some great tools like click or docopt. Fire, announced by Google this year, has a different take on solving this same problem.
Fire is an open source library that can automatically generate a CLI for any Python project. The key here is automatically: you almost don’t need to write any code or docstrings to build your CLI! To do the job, you only need to call a Fire method and pass it whatever you want turned into a CLI: a function, an object, a class, a dictionary, or even pass no arguments at all (which will turn your entire code into a CLI).
Make sure to read the guide so you understand how it works with examples. Keep it under your radar, because this library can definitely save you a lot of time in the future.

8. imbalanced-learn



In an ideal world, we would have perfectly balanced datasets and we would all train models and be happy. Unfortunately, the real world is not like that, and certain tasks favor very imbalanced data. For example, when predicting fraud in credit card transactions, you would expect that the vast majority of the transactions (+99.9%?) are actually legit. Training ML algorithms naively will lead to dismal performance, so extra care is needed when working with these types of datasets.
Fortunately, this is a studied research problem and a variety of techniques exist. Imbalanced-learn is a Python package which offers implementations of some of those techniques, to make your life much easier. It is compatible with scikit-learnand is part of scikit-learn-contrib projects. Useful!

9. FlashText


When you need to search for some text and replace it for something else, as is standard in most data-cleaning work, you usually turn to regular expressions. They will get the job done, but sometimes it happens that the number of terms you need to search for is in the thousands, and then, reg exp can become painfully slow to use.
FlashText is a better alternative just for this purpose. In the author’s initial benchmark, it improved the runtime of the entire operation by a huge margin: from 5 days to 15 minutes. The beauty of FlashText is that the runtime is the same no matter how many search terms you have, in contrast with regexp in which the runtime will increase almost linearly with the number of terms.
FlashText is a testimony to the importance of the design of algorithms and data structures, showing that, even for simple problems, better algorithms can easily outdo even the fastest CPUs running naive implementations.

10. Luminoth

Disclaimer: this library was built by Tryolabs’ R&D area.
Images are everywhere nowadays, and understanding their content can be critical for several applications. Thankfully, image processing techniques have advanced a lot, fueled by the advancements in DL.
Luminoth is an open source Python toolkit for computer vision, built using TensorFlow and Sonnet. Currently, it out-of-the-box supports object detection in the form of a model called Faster R-CNN.
But Luminoth is not only an implementation of a particular model. It is built to be modular and extensible, so customizing the existing pieces or extending it with new models to tackle different problems should be straightforward, with as much code reuse as there can be. It provides tools for easily doing the engineering work that are needed when building DL models: converting your data (in this case, images) to adequate format for feeding your data pipeline (TensorFlow’s tfrecords), doing data augmentation, running the training in one or multiple GPUs (distributed training will be a must when working with large datasets), running evaluation metrics, easily visualizing stuff in TensorBoard and deploying your trained model with a simple API or browser interface, so people can play around with it.
Moreover, Luminoth has straightforward integration with Google Cloud’s ML Engine, so even if you don’t own a powerful GPU, you can train in the cloud with a single command, just as you do in your own local machine.
If you are interested in learning more about what’s behind the scenes, you can read the announcement blog post and watch the video of our talk at ODSC.

Bonus: watch out for these

PyVips

You may have never heard of the libvips library. In that case, you must know that it’s an image processing library, like Pillow or ImageMagick, and supports a wide range of formats. However, when comparing to other libraries, libvips is faster and uses less memory. For example, some benchmarks show it to be about 3x faster and use less than 15x memory as ImageMagick. You can read more about why libvips is nice here.
PyVips is a recently released Python binding for libvips, which is compatible with Python 2.7-3.6 (and even PyPy), easy to install with pip and drop-in compatible with the old binding, so if you are using that, you don’t have to modify your code.
If doing some sort of image processing in your app, definitely something to keep an eye on.

Requestium

Disclaimer: this library was built by Tryolabs.
Sometimes, you need to automatize some actions in the web. Be it when scraping sites, doing application testing, or filling out web forms to perform actions in sites that do not expose an API, automation is always necessary. Python has the excellent Requests library which allows you perform some of this work, but unfortunately (or not?) many sites make heavy client side use of Javascript. This means that the HTML code that Requests fetches, in which you could be trying to find a form to fill for your automation task, may not even have the form itself! Instead, it will be something like an empty div of some sort that will be generated in the browser with a modern frontend library such as React or Vue.
One way to solve this is to reverse-engineer the requests that Javascript code makes, which will mean many hours of debugging and fiddling around with (probably) uglified JS code. No thanks. Another option is to turn to libraries like Selenium, which allow you to programmatically interact with a web browser and run the Javascript code. With this, the problems are no more, but it is still slower than using plain Requests which adds very little overhead.
Wouldn’t it be cool if there was a library that let you start out with Requests and seamlessly switch to Selenium, only adding the overhead of a web browser when actually needing it? Meet Requestium, which acts as a drop-in replacement for Requests and does just that. It also integrates Parsel, so writing all those selectors for finding the elements in the page is much cleaner than it would otherwise be, and has helpers around common operations like clicking elements and making sure stuff is actually rendered in the DOM. Another time saver for your web automation projects!

skorch

You like the awesome API of scikit-learn, but need to do work using PyTorch? Worry not, skorch is a wrapper which will give PyTorch an interface like sklearn. If you are familiar with those libraries, the syntax should be straightforward and easy to understand. With skorch, you will get some code abstracted away, so you can focus more on the things that really matter, like doing your data science.

11. Zappa

Since the release of AWS Lambda (and others that have followed), all the rage has been about serverless architectures. These allow microservices to be deployed in the cloud, in a fully managed environment where one doesn’t have to care about managing any server, but is assigned stateless, ephemeral computing containers that are fully managed by a provider. With this paradigm, events (such as a traffic spike) can trigger the execution of more of these containers and therefore give the possibility to handle “infinite” horizontal scaling.
Zappa is the serverless framework for Python, although (at least for the moment) it only has support for AWS Lambda and AWS API Gateway. It makes building so-architectured apps very simple, freeing you from most of the tedious setup you would have to do through the AWS Console or API, and has all sort of commands to ease deployment and managing different environments.

12. Sanic + uvloop

Who said Python couldn’t be fast? Apart from competing for the best name of a software library ever, Sanic also competes for the fastest Python web framework ever, and appears to be the winner by a clear margin. It is a Flask-like Python 3.5+ web server that is designed for speed. Another library, uvloop, is an ultra fast drop-in replacement for asyncio’s event loop that uses libuv under the hood. Together, these two things make a great combination!
According to the Sanic author’s benchmarkuvloop could power this beast to handle more than 33k requests/s which is just insane (and faster than node.js). Your code can benefit from the new async/await syntax so it will look neat too; besides we love the Flask-style API. Make sure to give Sanic a try, and if you are using asyncio, you can surely benefit from uvloop with very little change in your code!

13. asyncpg

In line with recent developments for the asyncio framework, the folks from MagicStack bring us this efficient asynchronous (currently CPython 3.5 only) database interface library designed specifically for PostgreSQL. It has zero dependencies, meaning there is no need to have libpq installed. In contrast with psycopg2 (the most popular PostgreSQL adapter for Python) which exchanges data with the database server in text format, asyncpg implements PostgreSQL binary I/O protocol, which not only allows support for generic types but also comes with numerous performance benefits.
The benchmarks are clear: asyncpg is on average, at least 3x faster than psycopg2(or aiopg), and faster than the node.js and Go implementations.

14. boto3

If you have your infrastructure on AWS or otherwise make use of their services (such as S3), you should be very happy that boto, the Python interface for AWS API, got a completely rewrite from the ground up. The great thing is that you don’t need to migrate your app all at once: you can use boto3 and boto (2) at the same time; for example using boto3 only for new parts of your application.
The new implementation is much more consistent between different services, and since it uses a data-driven approach to generate classes at runtime from JSON description files, it will always get fast updates. No more lagging behind new Amazon API features, move to boto3!

15. TensorFlow

Do we even need an introduction here? Since it was released by Google in November 2015, this library has gained a huge momentum and has become the #1 trendiest GitHub Python repository. In case you have been living under a rock for the past year, TensorFlow is a library for numerical computation using data flow graphs, which can run over GPU or CPU.
We have quickly witnessed it become a trend in the Machine Learning community (especially Deep Learning, see our post on 10 main takeaways from MLconf), not only growing its uses in research but also being widely used in production applications. If you are doing Deep Learning and want to use it through a higher level interface, you can try using it as a backend for Keras (which made it to last years post) or the newer TensorFlow-Slim.

16. gym + universe

If you are into AI, you surely have heard about the OpenAI non-profit artificial intelligence research company (backed by Elon Musk et al.). The researchers have open sourced some Python code this year! Gym is a toolkit for developing and comparing reinforcement learning algorithms. It consists of an open-source library with a collection of test problems (environments) that can be used to test reinforcement learning algorithms, and a site and API that allows to compare the performance of trained algorithms (agents). Since it doesn’t care about the implementation of the agent, you can build them with the computation library of your choice: bare numpy, TensorFlow, Theano, etc.
We also have the recently released universe, a software platform for researching into general intelligence across games, websites and other applications. This fits perfectly with gym, since it allows any real-world application to be turned into a gymenvironment. Researchers hope that this limitless possibility will accelerate research into smarter agents that can solve general purpose tasks.

17. Bokeh

You may be familiar with some of the libraries Python has to offer for data visualization; the most popular of which are matplotlib and seaborn. Bokeh, however, is created for interactive visualization, and targets modern web browsers for the presentation. This means Bokeh can create a plot which lets you explore the data from a web browser. The great thing is that it integrates tightly with Jupyter Notebooks, so you can use it with your probably go-to tool for your research. There is also an optional server component, bokeh-server, with many powerful capabilities like server-side downsampling of large dataset (no more slow network tranfers/browser!), streaming data, transformations, etc.
Make sure to check the gallery for examples of what you can create. They look awesome!

18. Blaze

Sometimes, you want to run analytics over a dataset too big to fit your computer’s RAM. If you cannot rely on numpy or Pandas, you usually turn to other tools like PostgreSQL, MongoDB, Hadoop, Spark, or many others. Depending on the use case, one or more of these tools can make sense, each with their own strengths and weaknesses. The problem? There is a big overhead here because you need to learn how each of these systems work and how to insert data in the proper form.
Blaze provides a uniform interface that abstracts you away from several database technologies. At the core, the library provides a way to express computations. Blaze itself doesn’t actually do any computation: it just knows how to instruct a specific backend who will be in charge of performing it. There is so much more to Blaze (thus the ecosystem), as libraries that have come out of its development. For example, Dask implements a drop-in replacement for NumPy array that can handle content larger than memory and leverage multiple cores, and also comes with dynamic task scheduling. Interesting stuff.

19. arrow

There is a famous saying that there are only two hard problems in Computer Science: cache invalidation and naming things. I think the saying is clearly missing one thing: managing datetimes. If you have ever tried to do that in Python, you will know that the standard library has a gazillion modules and types: datetimedatecalendartzinfotimedeltarelativedeltapytz, etc. Worse, it is timezone naive by default.
Arrow is “datetime for humans”, offering a sensible approach to creating, manipulating, formatting and converting dates, times, and timestamps. It is a replacement for the datetime type that supports Python 2 or 3, and provides a much nicer interface as well as filling the gaps with new functionality (such as humanize). Even if you don’t really need arrow, using it can greatly reduce the boilerplate in your code.

20. hug

Expose your internal API externally, drastically simplifying Python APIdevelopment. Hug is a next-generation Python 3 (only) library that will provide you with the cleanest way to create HTTP REST APIs in Python. It is not a web framework per se (although that is a function it performs exceptionally well), but only focuses on exposing idiomatically correct and standard internal Python APIs externally. The idea is simple: you define logic and structure once, and you can expose your API through multiple means. Currently, it supports exposing REST API or command line interface.

21. jupyter

How hard would be for a painter to paint without seeing immediately the results of what he is doing? Jupyter Notebooks makes it easy to interact with code, plots and results, and is becoming one of the preferred tools for data scientists. These Notebooks are documents which combine live code and documentation. For this reason, it is our go to tool for creating fast prototypes or tutorials.
Although we use Jupyter for writing Python code only, nowadays it has added support for other programming languages such as Julia or Haskell.

22. retrying

The retrying library helps you to avoid reinventing the wheel: it implements a retrying behavior for you. It provides a generic decorator which makes giving retrying abilities to any method effortless, as also has a bunch of properties you can set in order to have the desired retrying behavior such as maximum number of attempts, delay, backoff sleeping, error conditions, etc. Small and simple.

23. aiohttp

As of 2015, the most important libraries have all been ported to Python 3, so we started embracing it. We really liked asyncio for writing concurrent code using coroutines, so we had the need for an HTTP client (such as requests) and server using the same concurrency paradigm. The aiohttp library is such, providing a clean and easy to use HTTP client/server for asyncio.

24. plumbum

We have tried several solutions for subprocess wrappers in order to call other scripts or executables from Python programs, but the model of plumbum blows them all away. With an easy to use syntax you can execute local or remote commands, get the output or error codes in a cross-platform way, and if that were not enough, you get composability (a la shell pipes) and an interface for building command line applications. Give it a try!

25. phonenumbers

Working with and validating phone numbers can be a real pain, as there are international prefixes and area codes to take into account, and possibly other things depending on the country. The phonenumbers Python library is a port of Google’s libphonenumbers which thankfully simplifies this. It that can be used to parse, format and validate phone numbers with very little code involved. Most importantly, phonenumbers can tell whether a phone number is unique or not (following the E.164 format). It also works on both, Python 2 and Python 3.
We have used this library extensively in many projects, mostly through its adaptation django-phonenumber-field, as a way to solve this tedious problem that pretty much always pops up.

26. networkx

Graphs and networks are tools often used for many different tasks, such as organizing data or showing it’s flow or representing relations between entities. NetworkX allows the creation and manipulation of graphs and networks. The algorithms used in NetworkX make it highly scalable, allowing it to be ideal when working with large graphs is required. Moreover, there are tons of options for rendering graphs making it an awesome visualization tool too.

27. influxdb

If you are thinking about storing loads of data in a time-series basis, then you have to consider using InfluxDB. InfluxDB is a time-series database we have been using to store measurements over time. Through a RESTFul API, it’s super easy to use and very efficient, which is a must when talking about lot of data. Additionally,  retrieving and grouping data is painless due its built-in clustering functionalities. This official client abstracts away most of the work with invoking the API, although we would really like to see it improved by implementing a Pythonic way to create queries instead of writing the raw JSONs.

28. elasticsearch-dsl

If you have ever used Elasticsearch you surely have suffered going over those long queries in JSON format, wasting time trying to find out where the parsing error is. The Elasticsearch DSL client is built upon the official Elasticsearch client and frees you from having to worry about JSONs again: you simply write everything using Python defined classes or queryset-like expressions. It also provides wrappers for working with documents as Python objects, mappings, etc.

29. keras

Deep learning is the new trend, and here is where keras shines. It can run on top of Theano and allows fast experimentation with a variety of Neural Networks architectures. Highly modular and minimalistic, it can run seamlessly on CPU and GPU. Having something like keras was key for some of the R&D projects we tackled in 2015.

30. gensim

If you are into NLP (Natural Language Processing) and haven’t heard about Gensim, you are living under a rock. It provides fast and scalable (memory independent) implementations of some of the most used algorithms such as tf-idf, word2vec, doc2vec, LSA, etc, as well as an easy to use and well documented interface.

Python Bokeh

Creating Interactive Web Visualizations

Step 1> install bokeh
Note : python must be installed and PATH must be set.

Open cmd and type following cmd:
 pip install bokeh



Step 2> We need Jupyter notebook to write bokeh code

Installing Jupyter Notebook

o    While Jupyter runs code in many programming languages, Python is a requirement (Python 3.6 or greater, or Python 2.7) for installing the Jupyter Notebook.
o    We recommend using the Anaconda distribution to install Python and Jupyter. 
If you are an experienced then install Jupyter using pip command
Open cmd
Type following cmd:
pip install jupyter

Now type in cmd  (jupyter notebook)

And the browser will open automatically, 

  Then click on new -> python3

Now we can type our code in the Python notebook and run the code
Step 3>
installing Anaconda. Anaconda conveniently installs Python, the Jupyter Notebook, and other commonly used packages for scientific computing and data science.
Use the following installation steps: 
  1. Download Anaconda. We recommend downloading Anaconda’s latest Python 3 version (currently Python 3.6).


the open source Anaconda Distribution is the easiest way to do Python data science and machine learning. It includes hundreds of popular data science packages and the conda package and virtual environment manager for Windows, Linux, and MacOS. Conda makes it quick and easy to install, run, and upgrade complex data science and machine learning environments like scikit-learn, TensorFlow, and SciPy. Anaconda Distribution is the foundation of millions of data science projects as well as Amazon Web Services' Machine Learning AMIs and Anaconda for Microsoft on Azure and Windows.







Reproducible Data Science and Machine Learning

The Python and R conda packages in the Anaconda Repository are curated and compiled in our secure environment so you get optimized binaries that "just work" on your system. Combined with conda's virtual environments and deep dependency management, you can easily reproduce exactly the same data science results across Windows, Linux, and MacOS systems. conda package builders on anaconda.org.