You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

512 lines
20 KiB

10 years ago
Switch codebase to use sanitized_Request instead of compat_urllib_request.Request [downloader/dash] Use sanitized_Request [downloader/http] Use sanitized_Request [atresplayer] Use sanitized_Request [bambuser] Use sanitized_Request [bliptv] Use sanitized_Request [brightcove] Use sanitized_Request [cbs] Use sanitized_Request [ceskatelevize] Use sanitized_Request [collegerama] Use sanitized_Request [extractor/common] Use sanitized_Request [crunchyroll] Use sanitized_Request [dailymotion] Use sanitized_Request [dcn] Use sanitized_Request [dramafever] Use sanitized_Request [dumpert] Use sanitized_Request [eitb] Use sanitized_Request [escapist] Use sanitized_Request [everyonesmixtape] Use sanitized_Request [extremetube] Use sanitized_Request [facebook] Use sanitized_Request [fc2] Use sanitized_Request [flickr] Use sanitized_Request [4tube] Use sanitized_Request [gdcvault] Use sanitized_Request [extractor/generic] Use sanitized_Request [hearthisat] Use sanitized_Request [hotnewhiphop] Use sanitized_Request [hypem] Use sanitized_Request [iprima] Use sanitized_Request [ivi] Use sanitized_Request [keezmovies] Use sanitized_Request [letv] Use sanitized_Request [lynda] Use sanitized_Request [metacafe] Use sanitized_Request [minhateca] Use sanitized_Request [miomio] Use sanitized_Request [meovideo] Use sanitized_Request [mofosex] Use sanitized_Request [moniker] Use sanitized_Request [mooshare] Use sanitized_Request [movieclips] Use sanitized_Request [mtv] Use sanitized_Request [myvideo] Use sanitized_Request [neteasemusic] Use sanitized_Request [nfb] Use sanitized_Request [niconico] Use sanitized_Request [noco] Use sanitized_Request [nosvideo] Use sanitized_Request [novamov] Use sanitized_Request [nowness] Use sanitized_Request [nuvid] Use sanitized_Request [played] Use sanitized_Request [pluralsight] Use sanitized_Request [pornhub] Use sanitized_Request [pornotube] Use sanitized_Request [primesharetv] Use sanitized_Request [promptfile] Use sanitized_Request [qqmusic] Use sanitized_Request [rtve] Use sanitized_Request [safari] Use sanitized_Request [sandia] Use sanitized_Request [shared] Use sanitized_Request [sharesix] Use sanitized_Request [sina] Use sanitized_Request [smotri] Use sanitized_Request [sohu] Use sanitized_Request [spankwire] Use sanitized_Request [sportdeutschland] Use sanitized_Request [streamcloud] Use sanitized_Request [streamcz] Use sanitized_Request [tapely] Use sanitized_Request [tube8] Use sanitized_Request [tubitv] Use sanitized_Request [twitch] Use sanitized_Request [twitter] Use sanitized_Request [udemy] Use sanitized_Request [vbox7] Use sanitized_Request [veoh] Use sanitized_Request [vessel] Use sanitized_Request [vevo] Use sanitized_Request [viddler] Use sanitized_Request [videomega] Use sanitized_Request [viewvster] Use sanitized_Request [viki] Use sanitized_Request [vk] Use sanitized_Request [vodlocker] Use sanitized_Request [voicerepublic] Use sanitized_Request [wistia] Use sanitized_Request [xfileshare] Use sanitized_Request [xtube] Use sanitized_Request [xvideos] Use sanitized_Request [yandexmusic] Use sanitized_Request [youku] Use sanitized_Request [youporn] Use sanitized_Request [youtube] Use sanitized_Request [patreon] Use sanitized_Request [extractor/common] Remove unused import [nfb] PEP 8
9 years ago
10 years ago
Switch codebase to use sanitized_Request instead of compat_urllib_request.Request [downloader/dash] Use sanitized_Request [downloader/http] Use sanitized_Request [atresplayer] Use sanitized_Request [bambuser] Use sanitized_Request [bliptv] Use sanitized_Request [brightcove] Use sanitized_Request [cbs] Use sanitized_Request [ceskatelevize] Use sanitized_Request [collegerama] Use sanitized_Request [extractor/common] Use sanitized_Request [crunchyroll] Use sanitized_Request [dailymotion] Use sanitized_Request [dcn] Use sanitized_Request [dramafever] Use sanitized_Request [dumpert] Use sanitized_Request [eitb] Use sanitized_Request [escapist] Use sanitized_Request [everyonesmixtape] Use sanitized_Request [extremetube] Use sanitized_Request [facebook] Use sanitized_Request [fc2] Use sanitized_Request [flickr] Use sanitized_Request [4tube] Use sanitized_Request [gdcvault] Use sanitized_Request [extractor/generic] Use sanitized_Request [hearthisat] Use sanitized_Request [hotnewhiphop] Use sanitized_Request [hypem] Use sanitized_Request [iprima] Use sanitized_Request [ivi] Use sanitized_Request [keezmovies] Use sanitized_Request [letv] Use sanitized_Request [lynda] Use sanitized_Request [metacafe] Use sanitized_Request [minhateca] Use sanitized_Request [miomio] Use sanitized_Request [meovideo] Use sanitized_Request [mofosex] Use sanitized_Request [moniker] Use sanitized_Request [mooshare] Use sanitized_Request [movieclips] Use sanitized_Request [mtv] Use sanitized_Request [myvideo] Use sanitized_Request [neteasemusic] Use sanitized_Request [nfb] Use sanitized_Request [niconico] Use sanitized_Request [noco] Use sanitized_Request [nosvideo] Use sanitized_Request [novamov] Use sanitized_Request [nowness] Use sanitized_Request [nuvid] Use sanitized_Request [played] Use sanitized_Request [pluralsight] Use sanitized_Request [pornhub] Use sanitized_Request [pornotube] Use sanitized_Request [primesharetv] Use sanitized_Request [promptfile] Use sanitized_Request [qqmusic] Use sanitized_Request [rtve] Use sanitized_Request [safari] Use sanitized_Request [sandia] Use sanitized_Request [shared] Use sanitized_Request [sharesix] Use sanitized_Request [sina] Use sanitized_Request [smotri] Use sanitized_Request [sohu] Use sanitized_Request [spankwire] Use sanitized_Request [sportdeutschland] Use sanitized_Request [streamcloud] Use sanitized_Request [streamcz] Use sanitized_Request [tapely] Use sanitized_Request [tube8] Use sanitized_Request [tubitv] Use sanitized_Request [twitch] Use sanitized_Request [twitter] Use sanitized_Request [udemy] Use sanitized_Request [vbox7] Use sanitized_Request [veoh] Use sanitized_Request [vessel] Use sanitized_Request [vevo] Use sanitized_Request [viddler] Use sanitized_Request [videomega] Use sanitized_Request [viewvster] Use sanitized_Request [viki] Use sanitized_Request [vk] Use sanitized_Request [vodlocker] Use sanitized_Request [voicerepublic] Use sanitized_Request [wistia] Use sanitized_Request [xfileshare] Use sanitized_Request [xtube] Use sanitized_Request [xvideos] Use sanitized_Request [yandexmusic] Use sanitized_Request [youku] Use sanitized_Request [youporn] Use sanitized_Request [youtube] Use sanitized_Request [patreon] Use sanitized_Request [extractor/common] Remove unused import [nfb] PEP 8
9 years ago
10 years ago
  1. # coding: utf-8
  2. from __future__ import unicode_literals
  3. import base64
  4. import functools
  5. import hashlib
  6. import itertools
  7. import json
  8. import random
  9. import re
  10. import string
  11. from .common import InfoExtractor
  12. from ..compat import compat_struct_pack
  13. from ..utils import (
  14. determine_ext,
  15. error_to_compat_str,
  16. ExtractorError,
  17. int_or_none,
  18. mimetype2ext,
  19. OnDemandPagedList,
  20. parse_iso8601,
  21. sanitized_Request,
  22. str_to_int,
  23. try_get,
  24. unescapeHTML,
  25. update_url_query,
  26. url_or_none,
  27. urlencode_postdata,
  28. )
  29. class DailymotionBaseInfoExtractor(InfoExtractor):
  30. @staticmethod
  31. def _build_request(url):
  32. """Build a request with the family filter disabled"""
  33. request = sanitized_Request(url)
  34. request.add_header('Cookie', 'family_filter=off; ff=off')
  35. return request
  36. def _download_webpage_handle_no_ff(self, url, *args, **kwargs):
  37. request = self._build_request(url)
  38. return self._download_webpage_handle(request, *args, **kwargs)
  39. def _download_webpage_no_ff(self, url, *args, **kwargs):
  40. request = self._build_request(url)
  41. return self._download_webpage(request, *args, **kwargs)
  42. class DailymotionIE(DailymotionBaseInfoExtractor):
  43. _VALID_URL = r'''(?ix)
  44. https?://
  45. (?:
  46. (?:(?:www|touch)\.)?dailymotion\.[a-z]{2,3}/(?:(?:(?:embed|swf|\#)/)?video|swf)|
  47. (?:www\.)?lequipe\.fr/video
  48. )
  49. /(?P<id>[^/?_]+)
  50. '''
  51. IE_NAME = 'dailymotion'
  52. _FORMATS = [
  53. ('stream_h264_ld_url', 'ld'),
  54. ('stream_h264_url', 'standard'),
  55. ('stream_h264_hq_url', 'hq'),
  56. ('stream_h264_hd_url', 'hd'),
  57. ('stream_h264_hd1080_url', 'hd180'),
  58. ]
  59. _TESTS = [{
  60. 'url': 'http://www.dailymotion.com/video/x5kesuj_office-christmas-party-review-jason-bateman-olivia-munn-t-j-miller_news',
  61. 'md5': '074b95bdee76b9e3654137aee9c79dfe',
  62. 'info_dict': {
  63. 'id': 'x5kesuj',
  64. 'ext': 'mp4',
  65. 'title': 'Office Christmas Party Review – Jason Bateman, Olivia Munn, T.J. Miller',
  66. 'description': 'Office Christmas Party Review - Jason Bateman, Olivia Munn, T.J. Miller',
  67. 'thumbnail': r're:^https?:.*\.(?:jpg|png)$',
  68. 'duration': 187,
  69. 'timestamp': 1493651285,
  70. 'upload_date': '20170501',
  71. 'uploader': 'Deadline',
  72. 'uploader_id': 'x1xm8ri',
  73. 'age_limit': 0,
  74. },
  75. }, {
  76. 'url': 'https://www.dailymotion.com/video/x2iuewm_steam-machine-models-pricing-listed-on-steam-store-ign-news_videogames',
  77. 'md5': '2137c41a8e78554bb09225b8eb322406',
  78. 'info_dict': {
  79. 'id': 'x2iuewm',
  80. 'ext': 'mp4',
  81. 'title': 'Steam Machine Models, Pricing Listed on Steam Store - IGN News',
  82. 'description': 'Several come bundled with the Steam Controller.',
  83. 'thumbnail': r're:^https?:.*\.(?:jpg|png)$',
  84. 'duration': 74,
  85. 'timestamp': 1425657362,
  86. 'upload_date': '20150306',
  87. 'uploader': 'IGN',
  88. 'uploader_id': 'xijv66',
  89. 'age_limit': 0,
  90. 'view_count': int,
  91. },
  92. 'skip': 'video gone',
  93. }, {
  94. # Vevo video
  95. 'url': 'http://www.dailymotion.com/video/x149uew_katy-perry-roar-official_musi',
  96. 'info_dict': {
  97. 'title': 'Roar (Official)',
  98. 'id': 'USUV71301934',
  99. 'ext': 'mp4',
  100. 'uploader': 'Katy Perry',
  101. 'upload_date': '20130905',
  102. },
  103. 'params': {
  104. 'skip_download': True,
  105. },
  106. 'skip': 'VEVO is only available in some countries',
  107. }, {
  108. # age-restricted video
  109. 'url': 'http://www.dailymotion.com/video/xyh2zz_leanna-decker-cyber-girl-of-the-year-desires-nude-playboy-plus_redband',
  110. 'md5': '0d667a7b9cebecc3c89ee93099c4159d',
  111. 'info_dict': {
  112. 'id': 'xyh2zz',
  113. 'ext': 'mp4',
  114. 'title': 'Leanna Decker - Cyber Girl Of The Year Desires Nude [Playboy Plus]',
  115. 'uploader': 'HotWaves1012',
  116. 'age_limit': 18,
  117. },
  118. 'skip': 'video gone',
  119. }, {
  120. # geo-restricted, player v5
  121. 'url': 'http://www.dailymotion.com/video/xhza0o',
  122. 'only_matching': True,
  123. }, {
  124. # with subtitles
  125. 'url': 'http://www.dailymotion.com/video/x20su5f_the-power-of-nightmares-1-the-rise-of-the-politics-of-fear-bbc-2004_news',
  126. 'only_matching': True,
  127. }, {
  128. 'url': 'http://www.dailymotion.com/swf/video/x3n92nf',
  129. 'only_matching': True,
  130. }, {
  131. 'url': 'http://www.dailymotion.com/swf/x3ss1m_funny-magic-trick-barry-and-stuart_fun',
  132. 'only_matching': True,
  133. }, {
  134. 'url': 'https://www.lequipe.fr/video/x791mem',
  135. 'only_matching': True,
  136. }, {
  137. 'url': 'https://www.lequipe.fr/video/k7MtHciueyTcrFtFKA2',
  138. 'only_matching': True,
  139. }]
  140. @staticmethod
  141. def _extract_urls(webpage):
  142. urls = []
  143. # Look for embedded Dailymotion player
  144. # https://developer.dailymotion.com/player#player-parameters
  145. for mobj in re.finditer(
  146. r'<(?:(?:embed|iframe)[^>]+?src=|input[^>]+id=[\'"]dmcloudUrlEmissionSelect[\'"][^>]+value=)(["\'])(?P<url>(?:https?:)?//(?:www\.)?dailymotion\.com/(?:embed|swf)/video/.+?)\1', webpage):
  147. urls.append(unescapeHTML(mobj.group('url')))
  148. for mobj in re.finditer(
  149. r'(?s)DM\.player\([^,]+,\s*{.*?video[\'"]?\s*:\s*["\']?(?P<id>[0-9a-zA-Z]+).+?}\s*\);', webpage):
  150. urls.append('https://www.dailymotion.com/embed/video/' + mobj.group('id'))
  151. return urls
  152. def _real_extract(self, url):
  153. video_id = self._match_id(url)
  154. webpage = self._download_webpage_no_ff(
  155. 'https://www.dailymotion.com/video/%s' % video_id, video_id)
  156. age_limit = self._rta_search(webpage)
  157. description = self._og_search_description(
  158. webpage, default=None) or self._html_search_meta(
  159. 'description', webpage, 'description')
  160. view_count_str = self._search_regex(
  161. (r'<meta[^>]+itemprop="interactionCount"[^>]+content="UserPlays:([\s\d,.]+)"',
  162. r'video_views_count[^>]+>\s+([\s\d\,.]+)'),
  163. webpage, 'view count', default=None)
  164. if view_count_str:
  165. view_count_str = re.sub(r'\s', '', view_count_str)
  166. view_count = str_to_int(view_count_str)
  167. comment_count = int_or_none(self._search_regex(
  168. r'<meta[^>]+itemprop="interactionCount"[^>]+content="UserComments:(\d+)"',
  169. webpage, 'comment count', default=None))
  170. player_v5 = self._search_regex(
  171. [r'buildPlayer\(({.+?})\);\n', # See https://github.com/ytdl-org/youtube-dl/issues/7826
  172. r'playerV5\s*=\s*dmp\.create\([^,]+?,\s*({.+?})\);',
  173. r'buildPlayer\(({.+?})\);',
  174. r'var\s+config\s*=\s*({.+?});',
  175. # New layout regex (see https://github.com/ytdl-org/youtube-dl/issues/13580)
  176. r'__PLAYER_CONFIG__\s*=\s*({.+?});'],
  177. webpage, 'player v5', default=None)
  178. if player_v5:
  179. player = self._parse_json(player_v5, video_id, fatal=False) or {}
  180. metadata = try_get(player, lambda x: x['metadata'], dict)
  181. if not metadata:
  182. metadata_url = url_or_none(try_get(
  183. player, lambda x: x['context']['metadata_template_url1']))
  184. if metadata_url:
  185. metadata_url = metadata_url.replace(':videoId', video_id)
  186. else:
  187. metadata_url = update_url_query(
  188. 'https://www.dailymotion.com/player/metadata/video/%s'
  189. % video_id, {
  190. 'embedder': url,
  191. 'integration': 'inline',
  192. 'GK_PV5_NEON': '1',
  193. })
  194. metadata = self._download_json(
  195. metadata_url, video_id, 'Downloading metadata JSON')
  196. if try_get(metadata, lambda x: x['error']['type']) == 'password_protected':
  197. password = self._downloader.params.get('videopassword')
  198. if password:
  199. r = int(metadata['id'][1:], 36)
  200. us64e = lambda x: base64.urlsafe_b64encode(x).decode().strip('=')
  201. t = ''.join(random.choice(string.ascii_letters) for i in range(10))
  202. n = us64e(compat_struct_pack('I', r))
  203. i = us64e(hashlib.md5(('%s%d%s' % (password, r, t)).encode()).digest())
  204. metadata = self._download_json(
  205. 'http://www.dailymotion.com/player/metadata/video/p' + i + t + n, video_id)
  206. self._check_error(metadata)
  207. formats = []
  208. for quality, media_list in metadata['qualities'].items():
  209. for media in media_list:
  210. media_url = media.get('url')
  211. if not media_url:
  212. continue
  213. type_ = media.get('type')
  214. if type_ == 'application/vnd.lumberjack.manifest':
  215. continue
  216. ext = mimetype2ext(type_) or determine_ext(media_url)
  217. if ext == 'm3u8':
  218. m3u8_formats = self._extract_m3u8_formats(
  219. media_url, video_id, 'mp4', preference=-1,
  220. m3u8_id='hls', fatal=False)
  221. for f in m3u8_formats:
  222. f['url'] = f['url'].split('#')[0]
  223. formats.append(f)
  224. elif ext == 'f4m':
  225. formats.extend(self._extract_f4m_formats(
  226. media_url, video_id, preference=-1, f4m_id='hds', fatal=False))
  227. else:
  228. f = {
  229. 'url': media_url,
  230. 'format_id': 'http-%s' % quality,
  231. 'ext': ext,
  232. }
  233. m = re.search(r'H264-(?P<width>\d+)x(?P<height>\d+)', media_url)
  234. if m:
  235. f.update({
  236. 'width': int(m.group('width')),
  237. 'height': int(m.group('height')),
  238. })
  239. formats.append(f)
  240. self._sort_formats(formats)
  241. title = metadata['title']
  242. duration = int_or_none(metadata.get('duration'))
  243. timestamp = int_or_none(metadata.get('created_time'))
  244. thumbnail = metadata.get('poster_url')
  245. uploader = metadata.get('owner', {}).get('screenname')
  246. uploader_id = metadata.get('owner', {}).get('id')
  247. subtitles = {}
  248. subtitles_data = metadata.get('subtitles', {}).get('data', {})
  249. if subtitles_data and isinstance(subtitles_data, dict):
  250. for subtitle_lang, subtitle in subtitles_data.items():
  251. subtitles[subtitle_lang] = [{
  252. 'ext': determine_ext(subtitle_url),
  253. 'url': subtitle_url,
  254. } for subtitle_url in subtitle.get('urls', [])]
  255. return {
  256. 'id': video_id,
  257. 'title': title,
  258. 'description': description,
  259. 'thumbnail': thumbnail,
  260. 'duration': duration,
  261. 'timestamp': timestamp,
  262. 'uploader': uploader,
  263. 'uploader_id': uploader_id,
  264. 'age_limit': age_limit,
  265. 'view_count': view_count,
  266. 'comment_count': comment_count,
  267. 'formats': formats,
  268. 'subtitles': subtitles,
  269. }
  270. # vevo embed
  271. vevo_id = self._search_regex(
  272. r'<link rel="video_src" href="[^"]*?vevo\.com[^"]*?video=(?P<id>[\w]*)',
  273. webpage, 'vevo embed', default=None)
  274. if vevo_id:
  275. return self.url_result('vevo:%s' % vevo_id, 'Vevo')
  276. # fallback old player
  277. embed_page = self._download_webpage_no_ff(
  278. 'https://www.dailymotion.com/embed/video/%s' % video_id,
  279. video_id, 'Downloading embed page')
  280. timestamp = parse_iso8601(self._html_search_meta(
  281. 'video:release_date', webpage, 'upload date'))
  282. info = self._parse_json(
  283. self._search_regex(
  284. r'var info = ({.*?}),$', embed_page,
  285. 'video info', flags=re.MULTILINE),
  286. video_id)
  287. self._check_error(info)
  288. formats = []
  289. for (key, format_id) in self._FORMATS:
  290. video_url = info.get(key)
  291. if video_url is not None:
  292. m_size = re.search(r'H264-(\d+)x(\d+)', video_url)
  293. if m_size is not None:
  294. width, height = map(int_or_none, (m_size.group(1), m_size.group(2)))
  295. else:
  296. width, height = None, None
  297. formats.append({
  298. 'url': video_url,
  299. 'ext': 'mp4',
  300. 'format_id': format_id,
  301. 'width': width,
  302. 'height': height,
  303. })
  304. self._sort_formats(formats)
  305. # subtitles
  306. video_subtitles = self.extract_subtitles(video_id, webpage)
  307. title = self._og_search_title(webpage, default=None)
  308. if title is None:
  309. title = self._html_search_regex(
  310. r'(?s)<span\s+id="video_title"[^>]*>(.*?)</span>', webpage,
  311. 'title')
  312. return {
  313. 'id': video_id,
  314. 'formats': formats,
  315. 'uploader': info['owner.screenname'],
  316. 'timestamp': timestamp,
  317. 'title': title,
  318. 'description': description,
  319. 'subtitles': video_subtitles,
  320. 'thumbnail': info['thumbnail_url'],
  321. 'age_limit': age_limit,
  322. 'view_count': view_count,
  323. 'duration': info['duration']
  324. }
  325. def _check_error(self, info):
  326. error = info.get('error')
  327. if error:
  328. title = error.get('title') or error['message']
  329. # See https://developer.dailymotion.com/api#access-error
  330. if error.get('code') == 'DM007':
  331. self.raise_geo_restricted(msg=title)
  332. raise ExtractorError(
  333. '%s said: %s' % (self.IE_NAME, title), expected=True)
  334. def _get_subtitles(self, video_id, webpage):
  335. try:
  336. sub_list = self._download_webpage(
  337. 'https://api.dailymotion.com/video/%s/subtitles?fields=id,language,url' % video_id,
  338. video_id, note=False)
  339. except ExtractorError as err:
  340. self._downloader.report_warning('unable to download video subtitles: %s' % error_to_compat_str(err))
  341. return {}
  342. info = json.loads(sub_list)
  343. if (info['total'] > 0):
  344. sub_lang_list = dict((l['language'], [{'url': l['url'], 'ext': 'srt'}]) for l in info['list'])
  345. return sub_lang_list
  346. self._downloader.report_warning('video doesn\'t have subtitles')
  347. return {}
  348. class DailymotionPlaylistIE(DailymotionBaseInfoExtractor):
  349. IE_NAME = 'dailymotion:playlist'
  350. _VALID_URL = r'(?:https?://)?(?:www\.)?dailymotion\.[a-z]{2,3}/playlist/(?P<id>x[0-9a-z]+)'
  351. _TESTS = [{
  352. 'url': 'http://www.dailymotion.com/playlist/xv4bw_nqtv_sport/1#video=xl8v3q',
  353. 'info_dict': {
  354. 'title': 'SPORT',
  355. 'id': 'xv4bw',
  356. },
  357. 'playlist_mincount': 20,
  358. }]
  359. _PAGE_SIZE = 100
  360. def _fetch_page(self, playlist_id, authorizaion, page):
  361. page += 1
  362. videos = self._download_json(
  363. 'https://graphql.api.dailymotion.com',
  364. playlist_id, 'Downloading page %d' % page,
  365. data=json.dumps({
  366. 'query': '''{
  367. collection(xid: "%s") {
  368. videos(first: %d, page: %d) {
  369. pageInfo {
  370. hasNextPage
  371. nextPage
  372. }
  373. edges {
  374. node {
  375. xid
  376. url
  377. }
  378. }
  379. }
  380. }
  381. }''' % (playlist_id, self._PAGE_SIZE, page)
  382. }).encode(), headers={
  383. 'Authorization': authorizaion,
  384. 'Origin': 'https://www.dailymotion.com',
  385. })['data']['collection']['videos']
  386. for edge in videos['edges']:
  387. node = edge['node']
  388. yield self.url_result(
  389. node['url'], DailymotionIE.ie_key(), node['xid'])
  390. def _real_extract(self, url):
  391. playlist_id = self._match_id(url)
  392. webpage = self._download_webpage(url, playlist_id)
  393. api = self._parse_json(self._search_regex(
  394. r'__PLAYER_CONFIG__\s*=\s*({.+?});',
  395. webpage, 'player config'), playlist_id)['context']['api']
  396. auth = self._download_json(
  397. api.get('auth_url', 'https://graphql.api.dailymotion.com/oauth/token'),
  398. playlist_id, data=urlencode_postdata({
  399. 'client_id': api.get('client_id', 'f1a362d288c1b98099c7'),
  400. 'client_secret': api.get('client_secret', 'eea605b96e01c796ff369935357eca920c5da4c5'),
  401. 'grant_type': 'client_credentials',
  402. }))
  403. authorizaion = '%s %s' % (auth.get('token_type', 'Bearer'), auth['access_token'])
  404. entries = OnDemandPagedList(functools.partial(
  405. self._fetch_page, playlist_id, authorizaion), self._PAGE_SIZE)
  406. return self.playlist_result(
  407. entries, playlist_id,
  408. self._og_search_title(webpage))
  409. class DailymotionUserIE(DailymotionBaseInfoExtractor):
  410. IE_NAME = 'dailymotion:user'
  411. _VALID_URL = r'https?://(?:www\.)?dailymotion\.[a-z]{2,3}/(?!(?:embed|swf|#|video|playlist)/)(?:(?:old/)?user/)?(?P<user>[^/]+)'
  412. _MORE_PAGES_INDICATOR = r'(?s)<div class="pages[^"]*">.*?<a\s+class="[^"]*?icon-arrow_right[^"]*?"'
  413. _PAGE_TEMPLATE = 'http://www.dailymotion.com/user/%s/%s'
  414. _TESTS = [{
  415. 'url': 'https://www.dailymotion.com/user/nqtv',
  416. 'info_dict': {
  417. 'id': 'nqtv',
  418. 'title': 'Rémi Gaillard',
  419. },
  420. 'playlist_mincount': 100,
  421. }, {
  422. 'url': 'http://www.dailymotion.com/user/UnderProject',
  423. 'info_dict': {
  424. 'id': 'UnderProject',
  425. 'title': 'UnderProject',
  426. },
  427. 'playlist_mincount': 1800,
  428. 'expected_warnings': [
  429. 'Stopped at duplicated page',
  430. ],
  431. 'skip': 'Takes too long time',
  432. }]
  433. def _extract_entries(self, id):
  434. video_ids = set()
  435. processed_urls = set()
  436. for pagenum in itertools.count(1):
  437. page_url = self._PAGE_TEMPLATE % (id, pagenum)
  438. webpage, urlh = self._download_webpage_handle_no_ff(
  439. page_url, id, 'Downloading page %s' % pagenum)
  440. if urlh.geturl() in processed_urls:
  441. self.report_warning('Stopped at duplicated page %s, which is the same as %s' % (
  442. page_url, urlh.geturl()), id)
  443. break
  444. processed_urls.add(urlh.geturl())
  445. for video_id in re.findall(r'data-xid="(.+?)"', webpage):
  446. if video_id not in video_ids:
  447. yield self.url_result(
  448. 'http://www.dailymotion.com/video/%s' % video_id,
  449. DailymotionIE.ie_key(), video_id)
  450. video_ids.add(video_id)
  451. if re.search(self._MORE_PAGES_INDICATOR, webpage) is None:
  452. break
  453. def _real_extract(self, url):
  454. mobj = re.match(self._VALID_URL, url)
  455. user = mobj.group('user')
  456. webpage = self._download_webpage(
  457. 'https://www.dailymotion.com/user/%s' % user, user)
  458. full_user = unescapeHTML(self._html_search_regex(
  459. r'<a class="nav-image" title="([^"]+)" href="/%s">' % re.escape(user),
  460. webpage, 'user'))
  461. return {
  462. '_type': 'playlist',
  463. 'id': user,
  464. 'title': full_user,
  465. 'entries': self._extract_entries(user),
  466. }