You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

475 lines
18 KiB

10 years ago
Switch codebase to use sanitized_Request instead of compat_urllib_request.Request [downloader/dash] Use sanitized_Request [downloader/http] Use sanitized_Request [atresplayer] Use sanitized_Request [bambuser] Use sanitized_Request [bliptv] Use sanitized_Request [brightcove] Use sanitized_Request [cbs] Use sanitized_Request [ceskatelevize] Use sanitized_Request [collegerama] Use sanitized_Request [extractor/common] Use sanitized_Request [crunchyroll] Use sanitized_Request [dailymotion] Use sanitized_Request [dcn] Use sanitized_Request [dramafever] Use sanitized_Request [dumpert] Use sanitized_Request [eitb] Use sanitized_Request [escapist] Use sanitized_Request [everyonesmixtape] Use sanitized_Request [extremetube] Use sanitized_Request [facebook] Use sanitized_Request [fc2] Use sanitized_Request [flickr] Use sanitized_Request [4tube] Use sanitized_Request [gdcvault] Use sanitized_Request [extractor/generic] Use sanitized_Request [hearthisat] Use sanitized_Request [hotnewhiphop] Use sanitized_Request [hypem] Use sanitized_Request [iprima] Use sanitized_Request [ivi] Use sanitized_Request [keezmovies] Use sanitized_Request [letv] Use sanitized_Request [lynda] Use sanitized_Request [metacafe] Use sanitized_Request [minhateca] Use sanitized_Request [miomio] Use sanitized_Request [meovideo] Use sanitized_Request [mofosex] Use sanitized_Request [moniker] Use sanitized_Request [mooshare] Use sanitized_Request [movieclips] Use sanitized_Request [mtv] Use sanitized_Request [myvideo] Use sanitized_Request [neteasemusic] Use sanitized_Request [nfb] Use sanitized_Request [niconico] Use sanitized_Request [noco] Use sanitized_Request [nosvideo] Use sanitized_Request [novamov] Use sanitized_Request [nowness] Use sanitized_Request [nuvid] Use sanitized_Request [played] Use sanitized_Request [pluralsight] Use sanitized_Request [pornhub] Use sanitized_Request [pornotube] Use sanitized_Request [primesharetv] Use sanitized_Request [promptfile] Use sanitized_Request [qqmusic] Use sanitized_Request [rtve] Use sanitized_Request [safari] Use sanitized_Request [sandia] Use sanitized_Request [shared] Use sanitized_Request [sharesix] Use sanitized_Request [sina] Use sanitized_Request [smotri] Use sanitized_Request [sohu] Use sanitized_Request [spankwire] Use sanitized_Request [sportdeutschland] Use sanitized_Request [streamcloud] Use sanitized_Request [streamcz] Use sanitized_Request [tapely] Use sanitized_Request [tube8] Use sanitized_Request [tubitv] Use sanitized_Request [twitch] Use sanitized_Request [twitter] Use sanitized_Request [udemy] Use sanitized_Request [vbox7] Use sanitized_Request [veoh] Use sanitized_Request [vessel] Use sanitized_Request [vevo] Use sanitized_Request [viddler] Use sanitized_Request [videomega] Use sanitized_Request [viewvster] Use sanitized_Request [viki] Use sanitized_Request [vk] Use sanitized_Request [vodlocker] Use sanitized_Request [voicerepublic] Use sanitized_Request [wistia] Use sanitized_Request [xfileshare] Use sanitized_Request [xtube] Use sanitized_Request [xvideos] Use sanitized_Request [yandexmusic] Use sanitized_Request [youku] Use sanitized_Request [youporn] Use sanitized_Request [youtube] Use sanitized_Request [patreon] Use sanitized_Request [extractor/common] Remove unused import [nfb] PEP 8
9 years ago
10 years ago
Switch codebase to use sanitized_Request instead of compat_urllib_request.Request [downloader/dash] Use sanitized_Request [downloader/http] Use sanitized_Request [atresplayer] Use sanitized_Request [bambuser] Use sanitized_Request [bliptv] Use sanitized_Request [brightcove] Use sanitized_Request [cbs] Use sanitized_Request [ceskatelevize] Use sanitized_Request [collegerama] Use sanitized_Request [extractor/common] Use sanitized_Request [crunchyroll] Use sanitized_Request [dailymotion] Use sanitized_Request [dcn] Use sanitized_Request [dramafever] Use sanitized_Request [dumpert] Use sanitized_Request [eitb] Use sanitized_Request [escapist] Use sanitized_Request [everyonesmixtape] Use sanitized_Request [extremetube] Use sanitized_Request [facebook] Use sanitized_Request [fc2] Use sanitized_Request [flickr] Use sanitized_Request [4tube] Use sanitized_Request [gdcvault] Use sanitized_Request [extractor/generic] Use sanitized_Request [hearthisat] Use sanitized_Request [hotnewhiphop] Use sanitized_Request [hypem] Use sanitized_Request [iprima] Use sanitized_Request [ivi] Use sanitized_Request [keezmovies] Use sanitized_Request [letv] Use sanitized_Request [lynda] Use sanitized_Request [metacafe] Use sanitized_Request [minhateca] Use sanitized_Request [miomio] Use sanitized_Request [meovideo] Use sanitized_Request [mofosex] Use sanitized_Request [moniker] Use sanitized_Request [mooshare] Use sanitized_Request [movieclips] Use sanitized_Request [mtv] Use sanitized_Request [myvideo] Use sanitized_Request [neteasemusic] Use sanitized_Request [nfb] Use sanitized_Request [niconico] Use sanitized_Request [noco] Use sanitized_Request [nosvideo] Use sanitized_Request [novamov] Use sanitized_Request [nowness] Use sanitized_Request [nuvid] Use sanitized_Request [played] Use sanitized_Request [pluralsight] Use sanitized_Request [pornhub] Use sanitized_Request [pornotube] Use sanitized_Request [primesharetv] Use sanitized_Request [promptfile] Use sanitized_Request [qqmusic] Use sanitized_Request [rtve] Use sanitized_Request [safari] Use sanitized_Request [sandia] Use sanitized_Request [shared] Use sanitized_Request [sharesix] Use sanitized_Request [sina] Use sanitized_Request [smotri] Use sanitized_Request [sohu] Use sanitized_Request [spankwire] Use sanitized_Request [sportdeutschland] Use sanitized_Request [streamcloud] Use sanitized_Request [streamcz] Use sanitized_Request [tapely] Use sanitized_Request [tube8] Use sanitized_Request [tubitv] Use sanitized_Request [twitch] Use sanitized_Request [twitter] Use sanitized_Request [udemy] Use sanitized_Request [vbox7] Use sanitized_Request [veoh] Use sanitized_Request [vessel] Use sanitized_Request [vevo] Use sanitized_Request [viddler] Use sanitized_Request [videomega] Use sanitized_Request [viewvster] Use sanitized_Request [viki] Use sanitized_Request [vk] Use sanitized_Request [vodlocker] Use sanitized_Request [voicerepublic] Use sanitized_Request [wistia] Use sanitized_Request [xfileshare] Use sanitized_Request [xtube] Use sanitized_Request [xvideos] Use sanitized_Request [yandexmusic] Use sanitized_Request [youku] Use sanitized_Request [youporn] Use sanitized_Request [youtube] Use sanitized_Request [patreon] Use sanitized_Request [extractor/common] Remove unused import [nfb] PEP 8
9 years ago
10 years ago
  1. # coding: utf-8
  2. from __future__ import unicode_literals
  3. import base64
  4. import functools
  5. import hashlib
  6. import itertools
  7. import json
  8. import random
  9. import re
  10. import string
  11. from .common import InfoExtractor
  12. from ..compat import compat_struct_pack
  13. from ..utils import (
  14. determine_ext,
  15. error_to_compat_str,
  16. ExtractorError,
  17. int_or_none,
  18. mimetype2ext,
  19. OnDemandPagedList,
  20. parse_iso8601,
  21. sanitized_Request,
  22. str_to_int,
  23. unescapeHTML,
  24. urlencode_postdata,
  25. )
  26. class DailymotionBaseInfoExtractor(InfoExtractor):
  27. @staticmethod
  28. def _build_request(url):
  29. """Build a request with the family filter disabled"""
  30. request = sanitized_Request(url)
  31. request.add_header('Cookie', 'family_filter=off; ff=off')
  32. return request
  33. def _download_webpage_handle_no_ff(self, url, *args, **kwargs):
  34. request = self._build_request(url)
  35. return self._download_webpage_handle(request, *args, **kwargs)
  36. def _download_webpage_no_ff(self, url, *args, **kwargs):
  37. request = self._build_request(url)
  38. return self._download_webpage(request, *args, **kwargs)
  39. class DailymotionIE(DailymotionBaseInfoExtractor):
  40. _VALID_URL = r'(?i)https?://(?:(www|touch)\.)?dailymotion\.[a-z]{2,3}/(?:(?:(?:embed|swf|#)/)?video|swf)/(?P<id>[^/?_]+)'
  41. IE_NAME = 'dailymotion'
  42. _FORMATS = [
  43. ('stream_h264_ld_url', 'ld'),
  44. ('stream_h264_url', 'standard'),
  45. ('stream_h264_hq_url', 'hq'),
  46. ('stream_h264_hd_url', 'hd'),
  47. ('stream_h264_hd1080_url', 'hd180'),
  48. ]
  49. _TESTS = [{
  50. 'url': 'http://www.dailymotion.com/video/x5kesuj_office-christmas-party-review-jason-bateman-olivia-munn-t-j-miller_news',
  51. 'md5': '074b95bdee76b9e3654137aee9c79dfe',
  52. 'info_dict': {
  53. 'id': 'x5kesuj',
  54. 'ext': 'mp4',
  55. 'title': 'Office Christmas Party Review – Jason Bateman, Olivia Munn, T.J. Miller',
  56. 'description': 'Office Christmas Party Review - Jason Bateman, Olivia Munn, T.J. Miller',
  57. 'thumbnail': r're:^https?:.*\.(?:jpg|png)$',
  58. 'duration': 187,
  59. 'timestamp': 1493651285,
  60. 'upload_date': '20170501',
  61. 'uploader': 'Deadline',
  62. 'uploader_id': 'x1xm8ri',
  63. 'age_limit': 0,
  64. },
  65. }, {
  66. 'url': 'https://www.dailymotion.com/video/x2iuewm_steam-machine-models-pricing-listed-on-steam-store-ign-news_videogames',
  67. 'md5': '2137c41a8e78554bb09225b8eb322406',
  68. 'info_dict': {
  69. 'id': 'x2iuewm',
  70. 'ext': 'mp4',
  71. 'title': 'Steam Machine Models, Pricing Listed on Steam Store - IGN News',
  72. 'description': 'Several come bundled with the Steam Controller.',
  73. 'thumbnail': r're:^https?:.*\.(?:jpg|png)$',
  74. 'duration': 74,
  75. 'timestamp': 1425657362,
  76. 'upload_date': '20150306',
  77. 'uploader': 'IGN',
  78. 'uploader_id': 'xijv66',
  79. 'age_limit': 0,
  80. 'view_count': int,
  81. },
  82. 'skip': 'video gone',
  83. }, {
  84. # Vevo video
  85. 'url': 'http://www.dailymotion.com/video/x149uew_katy-perry-roar-official_musi',
  86. 'info_dict': {
  87. 'title': 'Roar (Official)',
  88. 'id': 'USUV71301934',
  89. 'ext': 'mp4',
  90. 'uploader': 'Katy Perry',
  91. 'upload_date': '20130905',
  92. },
  93. 'params': {
  94. 'skip_download': True,
  95. },
  96. 'skip': 'VEVO is only available in some countries',
  97. }, {
  98. # age-restricted video
  99. 'url': 'http://www.dailymotion.com/video/xyh2zz_leanna-decker-cyber-girl-of-the-year-desires-nude-playboy-plus_redband',
  100. 'md5': '0d667a7b9cebecc3c89ee93099c4159d',
  101. 'info_dict': {
  102. 'id': 'xyh2zz',
  103. 'ext': 'mp4',
  104. 'title': 'Leanna Decker - Cyber Girl Of The Year Desires Nude [Playboy Plus]',
  105. 'uploader': 'HotWaves1012',
  106. 'age_limit': 18,
  107. },
  108. 'skip': 'video gone',
  109. }, {
  110. # geo-restricted, player v5
  111. 'url': 'http://www.dailymotion.com/video/xhza0o',
  112. 'only_matching': True,
  113. }, {
  114. # with subtitles
  115. 'url': 'http://www.dailymotion.com/video/x20su5f_the-power-of-nightmares-1-the-rise-of-the-politics-of-fear-bbc-2004_news',
  116. 'only_matching': True,
  117. }, {
  118. 'url': 'http://www.dailymotion.com/swf/video/x3n92nf',
  119. 'only_matching': True,
  120. }, {
  121. 'url': 'http://www.dailymotion.com/swf/x3ss1m_funny-magic-trick-barry-and-stuart_fun',
  122. 'only_matching': True,
  123. }]
  124. @staticmethod
  125. def _extract_urls(webpage):
  126. # Look for embedded Dailymotion player
  127. matches = re.findall(
  128. r'<(?:(?:embed|iframe)[^>]+?src=|input[^>]+id=[\'"]dmcloudUrlEmissionSelect[\'"][^>]+value=)(["\'])(?P<url>(?:https?:)?//(?:www\.)?dailymotion\.com/(?:embed|swf)/video/.+?)\1', webpage)
  129. return list(map(lambda m: unescapeHTML(m[1]), matches))
  130. def _real_extract(self, url):
  131. video_id = self._match_id(url)
  132. webpage = self._download_webpage_no_ff(
  133. 'https://www.dailymotion.com/video/%s' % video_id, video_id)
  134. age_limit = self._rta_search(webpage)
  135. description = self._og_search_description(
  136. webpage, default=None) or self._html_search_meta(
  137. 'description', webpage, 'description')
  138. view_count_str = self._search_regex(
  139. (r'<meta[^>]+itemprop="interactionCount"[^>]+content="UserPlays:([\s\d,.]+)"',
  140. r'video_views_count[^>]+>\s+([\s\d\,.]+)'),
  141. webpage, 'view count', default=None)
  142. if view_count_str:
  143. view_count_str = re.sub(r'\s', '', view_count_str)
  144. view_count = str_to_int(view_count_str)
  145. comment_count = int_or_none(self._search_regex(
  146. r'<meta[^>]+itemprop="interactionCount"[^>]+content="UserComments:(\d+)"',
  147. webpage, 'comment count', default=None))
  148. player_v5 = self._search_regex(
  149. [r'buildPlayer\(({.+?})\);\n', # See https://github.com/rg3/youtube-dl/issues/7826
  150. r'playerV5\s*=\s*dmp\.create\([^,]+?,\s*({.+?})\);',
  151. r'buildPlayer\(({.+?})\);',
  152. r'var\s+config\s*=\s*({.+?});',
  153. # New layout regex (see https://github.com/rg3/youtube-dl/issues/13580)
  154. r'__PLAYER_CONFIG__\s*=\s*({.+?});'],
  155. webpage, 'player v5', default=None)
  156. if player_v5:
  157. player = self._parse_json(player_v5, video_id)
  158. metadata = player['metadata']
  159. if metadata.get('error', {}).get('type') == 'password_protected':
  160. password = self._downloader.params.get('videopassword')
  161. if password:
  162. r = int(metadata['id'][1:], 36)
  163. us64e = lambda x: base64.urlsafe_b64encode(x).decode().strip('=')
  164. t = ''.join(random.choice(string.ascii_letters) for i in range(10))
  165. n = us64e(compat_struct_pack('I', r))
  166. i = us64e(hashlib.md5(('%s%d%s' % (password, r, t)).encode()).digest())
  167. metadata = self._download_json(
  168. 'http://www.dailymotion.com/player/metadata/video/p' + i + t + n, video_id)
  169. self._check_error(metadata)
  170. formats = []
  171. for quality, media_list in metadata['qualities'].items():
  172. for media in media_list:
  173. media_url = media.get('url')
  174. if not media_url:
  175. continue
  176. type_ = media.get('type')
  177. if type_ == 'application/vnd.lumberjack.manifest':
  178. continue
  179. ext = mimetype2ext(type_) or determine_ext(media_url)
  180. if ext == 'm3u8':
  181. m3u8_formats = self._extract_m3u8_formats(
  182. media_url, video_id, 'mp4', preference=-1,
  183. m3u8_id='hls', fatal=False)
  184. for f in m3u8_formats:
  185. f['url'] = f['url'].split('#')[0]
  186. formats.append(f)
  187. elif ext == 'f4m':
  188. formats.extend(self._extract_f4m_formats(
  189. media_url, video_id, preference=-1, f4m_id='hds', fatal=False))
  190. else:
  191. f = {
  192. 'url': media_url,
  193. 'format_id': 'http-%s' % quality,
  194. 'ext': ext,
  195. }
  196. m = re.search(r'H264-(?P<width>\d+)x(?P<height>\d+)', media_url)
  197. if m:
  198. f.update({
  199. 'width': int(m.group('width')),
  200. 'height': int(m.group('height')),
  201. })
  202. formats.append(f)
  203. self._sort_formats(formats)
  204. title = metadata['title']
  205. duration = int_or_none(metadata.get('duration'))
  206. timestamp = int_or_none(metadata.get('created_time'))
  207. thumbnail = metadata.get('poster_url')
  208. uploader = metadata.get('owner', {}).get('screenname')
  209. uploader_id = metadata.get('owner', {}).get('id')
  210. subtitles = {}
  211. subtitles_data = metadata.get('subtitles', {}).get('data', {})
  212. if subtitles_data and isinstance(subtitles_data, dict):
  213. for subtitle_lang, subtitle in subtitles_data.items():
  214. subtitles[subtitle_lang] = [{
  215. 'ext': determine_ext(subtitle_url),
  216. 'url': subtitle_url,
  217. } for subtitle_url in subtitle.get('urls', [])]
  218. return {
  219. 'id': video_id,
  220. 'title': title,
  221. 'description': description,
  222. 'thumbnail': thumbnail,
  223. 'duration': duration,
  224. 'timestamp': timestamp,
  225. 'uploader': uploader,
  226. 'uploader_id': uploader_id,
  227. 'age_limit': age_limit,
  228. 'view_count': view_count,
  229. 'comment_count': comment_count,
  230. 'formats': formats,
  231. 'subtitles': subtitles,
  232. }
  233. # vevo embed
  234. vevo_id = self._search_regex(
  235. r'<link rel="video_src" href="[^"]*?vevo\.com[^"]*?video=(?P<id>[\w]*)',
  236. webpage, 'vevo embed', default=None)
  237. if vevo_id:
  238. return self.url_result('vevo:%s' % vevo_id, 'Vevo')
  239. # fallback old player
  240. embed_page = self._download_webpage_no_ff(
  241. 'https://www.dailymotion.com/embed/video/%s' % video_id,
  242. video_id, 'Downloading embed page')
  243. timestamp = parse_iso8601(self._html_search_meta(
  244. 'video:release_date', webpage, 'upload date'))
  245. info = self._parse_json(
  246. self._search_regex(
  247. r'var info = ({.*?}),$', embed_page,
  248. 'video info', flags=re.MULTILINE),
  249. video_id)
  250. self._check_error(info)
  251. formats = []
  252. for (key, format_id) in self._FORMATS:
  253. video_url = info.get(key)
  254. if video_url is not None:
  255. m_size = re.search(r'H264-(\d+)x(\d+)', video_url)
  256. if m_size is not None:
  257. width, height = map(int_or_none, (m_size.group(1), m_size.group(2)))
  258. else:
  259. width, height = None, None
  260. formats.append({
  261. 'url': video_url,
  262. 'ext': 'mp4',
  263. 'format_id': format_id,
  264. 'width': width,
  265. 'height': height,
  266. })
  267. self._sort_formats(formats)
  268. # subtitles
  269. video_subtitles = self.extract_subtitles(video_id, webpage)
  270. title = self._og_search_title(webpage, default=None)
  271. if title is None:
  272. title = self._html_search_regex(
  273. r'(?s)<span\s+id="video_title"[^>]*>(.*?)</span>', webpage,
  274. 'title')
  275. return {
  276. 'id': video_id,
  277. 'formats': formats,
  278. 'uploader': info['owner.screenname'],
  279. 'timestamp': timestamp,
  280. 'title': title,
  281. 'description': description,
  282. 'subtitles': video_subtitles,
  283. 'thumbnail': info['thumbnail_url'],
  284. 'age_limit': age_limit,
  285. 'view_count': view_count,
  286. 'duration': info['duration']
  287. }
  288. def _check_error(self, info):
  289. error = info.get('error')
  290. if error:
  291. title = error.get('title') or error['message']
  292. # See https://developer.dailymotion.com/api#access-error
  293. if error.get('code') == 'DM007':
  294. self.raise_geo_restricted(msg=title)
  295. raise ExtractorError(
  296. '%s said: %s' % (self.IE_NAME, title), expected=True)
  297. def _get_subtitles(self, video_id, webpage):
  298. try:
  299. sub_list = self._download_webpage(
  300. 'https://api.dailymotion.com/video/%s/subtitles?fields=id,language,url' % video_id,
  301. video_id, note=False)
  302. except ExtractorError as err:
  303. self._downloader.report_warning('unable to download video subtitles: %s' % error_to_compat_str(err))
  304. return {}
  305. info = json.loads(sub_list)
  306. if (info['total'] > 0):
  307. sub_lang_list = dict((l['language'], [{'url': l['url'], 'ext': 'srt'}]) for l in info['list'])
  308. return sub_lang_list
  309. self._downloader.report_warning('video doesn\'t have subtitles')
  310. return {}
  311. class DailymotionPlaylistIE(DailymotionBaseInfoExtractor):
  312. IE_NAME = 'dailymotion:playlist'
  313. _VALID_URL = r'(?:https?://)?(?:www\.)?dailymotion\.[a-z]{2,3}/playlist/(?P<id>x[0-9a-z]+)'
  314. _TESTS = [{
  315. 'url': 'http://www.dailymotion.com/playlist/xv4bw_nqtv_sport/1#video=xl8v3q',
  316. 'info_dict': {
  317. 'title': 'SPORT',
  318. 'id': 'xv4bw',
  319. },
  320. 'playlist_mincount': 20,
  321. }]
  322. _PAGE_SIZE = 100
  323. def _fetch_page(self, playlist_id, authorizaion, page):
  324. page += 1
  325. videos = self._download_json(
  326. 'https://graphql.api.dailymotion.com',
  327. playlist_id, 'Downloading page %d' % page,
  328. data=json.dumps({
  329. 'query': '''{
  330. collection(xid: "%s") {
  331. videos(first: %d, page: %d) {
  332. pageInfo {
  333. hasNextPage
  334. nextPage
  335. }
  336. edges {
  337. node {
  338. xid
  339. url
  340. }
  341. }
  342. }
  343. }
  344. }''' % (playlist_id, self._PAGE_SIZE, page)
  345. }).encode(), headers={
  346. 'Authorization': authorizaion,
  347. 'Origin': 'https://www.dailymotion.com',
  348. })['data']['collection']['videos']
  349. for edge in videos['edges']:
  350. node = edge['node']
  351. yield self.url_result(
  352. node['url'], DailymotionIE.ie_key(), node['xid'])
  353. def _real_extract(self, url):
  354. playlist_id = self._match_id(url)
  355. webpage = self._download_webpage(url, playlist_id)
  356. api = self._parse_json(self._search_regex(
  357. r'__PLAYER_CONFIG__\s*=\s*({.+?});',
  358. webpage, 'player config'), playlist_id)['context']['api']
  359. auth = self._download_json(
  360. api.get('auth_url', 'https://graphql.api.dailymotion.com/oauth/token'),
  361. playlist_id, data=urlencode_postdata({
  362. 'client_id': api.get('client_id', 'f1a362d288c1b98099c7'),
  363. 'client_secret': api.get('client_secret', 'eea605b96e01c796ff369935357eca920c5da4c5'),
  364. 'grant_type': 'client_credentials',
  365. }))
  366. authorizaion = '%s %s' % (auth.get('token_type', 'Bearer'), auth['access_token'])
  367. entries = OnDemandPagedList(functools.partial(
  368. self._fetch_page, playlist_id, authorizaion), self._PAGE_SIZE)
  369. return self.playlist_result(
  370. entries, playlist_id,
  371. self._og_search_title(webpage))
  372. class DailymotionUserIE(DailymotionBaseInfoExtractor):
  373. IE_NAME = 'dailymotion:user'
  374. _VALID_URL = r'https?://(?:www\.)?dailymotion\.[a-z]{2,3}/(?!(?:embed|swf|#|video|playlist)/)(?:(?:old/)?user/)?(?P<user>[^/]+)'
  375. _MORE_PAGES_INDICATOR = r'(?s)<div class="pages[^"]*">.*?<a\s+class="[^"]*?icon-arrow_right[^"]*?"'
  376. _PAGE_TEMPLATE = 'http://www.dailymotion.com/user/%s/%s'
  377. _TESTS = [{
  378. 'url': 'https://www.dailymotion.com/user/nqtv',
  379. 'info_dict': {
  380. 'id': 'nqtv',
  381. 'title': 'Rémi Gaillard',
  382. },
  383. 'playlist_mincount': 100,
  384. }, {
  385. 'url': 'http://www.dailymotion.com/user/UnderProject',
  386. 'info_dict': {
  387. 'id': 'UnderProject',
  388. 'title': 'UnderProject',
  389. },
  390. 'playlist_mincount': 1800,
  391. 'expected_warnings': [
  392. 'Stopped at duplicated page',
  393. ],
  394. 'skip': 'Takes too long time',
  395. }]
  396. def _extract_entries(self, id):
  397. video_ids = set()
  398. processed_urls = set()
  399. for pagenum in itertools.count(1):
  400. page_url = self._PAGE_TEMPLATE % (id, pagenum)
  401. webpage, urlh = self._download_webpage_handle_no_ff(
  402. page_url, id, 'Downloading page %s' % pagenum)
  403. if urlh.geturl() in processed_urls:
  404. self.report_warning('Stopped at duplicated page %s, which is the same as %s' % (
  405. page_url, urlh.geturl()), id)
  406. break
  407. processed_urls.add(urlh.geturl())
  408. for video_id in re.findall(r'data-xid="(.+?)"', webpage):
  409. if video_id not in video_ids:
  410. yield self.url_result(
  411. 'http://www.dailymotion.com/video/%s' % video_id,
  412. DailymotionIE.ie_key(), video_id)
  413. video_ids.add(video_id)
  414. if re.search(self._MORE_PAGES_INDICATOR, webpage) is None:
  415. break
  416. def _real_extract(self, url):
  417. mobj = re.match(self._VALID_URL, url)
  418. user = mobj.group('user')
  419. webpage = self._download_webpage(
  420. 'https://www.dailymotion.com/user/%s' % user, user)
  421. full_user = unescapeHTML(self._html_search_regex(
  422. r'<a class="nav-image" title="([^"]+)" href="/%s">' % re.escape(user),
  423. webpage, 'user'))
  424. return {
  425. '_type': 'playlist',
  426. 'id': user,
  427. 'title': full_user,
  428. 'entries': self._extract_entries(user),
  429. }