Ruby script to download multiple xml files from urls






















Active Oldest Votes. Improve this answer. Excuse my python newbness but how d I get the elementTree module? I'm on Windows 7 and do have the main Python distribution installed. Sign up or log in Sign up using Google. Sign up using Facebook. Sign up using Email and Password. Post as a guest Name. Email Required, but never shown. The Overflow Blog. Who owns this outage? Building intelligent escalation chains for modern SRE. Podcast Who is building clouds for the independent developer?

Featured on Meta. Now live: A fully responsive profile. Reducing the weight of our footer. Related Hot Network Questions. Question feed. Stack Overflow works best with JavaScript enabled. Accept all cookies Customize settings.

So, before we start, I would like to give an small introduction to the modules that I am going to use in my python script. In order to install the modules, you can use python-pip and also you might need to have administrative privileges. Here are the modules as follows :. The first step we need to do is import the necessary modules in the python script or shell, and this can be done as shown below :.

Now, according to the concept, for a single video url that is loaded using an AJAX call, we need to get the web page using the selenium webdriver. This is done as follows :. When you execute this in the python shell or via the script after you import the modules , you will observe that, a firefox browser will popup and a page will be loaded into it.

If you want to use the PhantomJS and stop the browser from popping up, then just replace the webdriver. Firefox with webdriver. Here is the tricky part, what you need to do is extract the video urls from the web page.

You would need to manually check for a pattern or the video element that is dynamically loaded. From the previously mentioned scenario, lets say the video is dynamically loaded using a AJAX call after 1 sec you visit the website. Then you would need to wait till the video is loaded and then get the element. So, for that you can write the script in this manner :. When you execute these two lines in the python shell, it will tell the browser to wait for 50 seconds by default until the element with the specific id appears or is visible on the screen, and then get the html source.

Lets say the element has a certain class, then you can just replace the By. ID with By. Here is the entire list of attributes for the By class object 2 :. Once you get the HTML source, you would need to parse it and extract the video tag from it.

This is done as shown below :. The parser. This generates a multi-dimensional array and is stored in the tag variable. The next step is to get the url from the video tag and finally download it using wget. We can do this by writing the script in this manner :. Depending on the number of videos loaded in the web page, you can specify which video you want to download. This can by done changing the value of n. Now, when you put all the pieces together, and with some additional functionality to login to a website, you will get something like this :.

What we did in this tutorial is, to create a small script that automates the process of downloading a file which is dynamically loaded. The above script works for a single url.

If you want to download multiple files, then you would need to manually grab the tags and dynamic content information of each website and store them in json or xml file. Then you would need to read that file and pass it through a for loop.

I created another small script that does this job. Its not full proof but is a good starting point for you guys to get an idea on how to do it.



0コメント

  • 1000 / 1000