Skip to content

Requests 104 Error on tweets search in api 2.0 #140

Open
@markowanga

Description

@markowanga

Describe the bug
When I want to search tweets in api 2.0 I have problem with requests library. First few days I have no this error -- I have restarted me PC, and rebuild dockers. Here are logs:

swps_worker | 2021-11-05 06:29:57,447 [searchtweets.result_stream    ] INFO     using bearer token for authentication
swps_worker | 2021-11-05 06:29:57,447 [searchtweets.result_stream    ] DEBUG    sending request
swps_worker | 2021-11-05 06:29:57,450 [urllib3.connectionpool        ] DEBUG    Starting new HTTPS connection (1): api.twitter.com:443
swps_worker | 2021-11-05 06:29:58,540 [urllib3.connectionpool        ] DEBUG    https://api.twitter.com:443 "GET /2/tweets/search/all?query=%28%22%23absolwenci%22+OR+%22%23covid%22+OR+%22%23COVID-19%22+OR+%22%23Covid19%22+OR+%22%23doros%C5%82o%C5%9B%C4%87%22+OR+%22%23generacjaZ%22+OR+%22%23genX%22+OR+%22%23genZ%22+OR+%22%23koronawirus%22+OR+%22%23koronawiruspolska%22+OR+%22%23liceum%22+OR+%22%23lockdown%22+OR+%22%23matura%22+OR+%22%23matura2020%22+OR+%22%23matura2021%22+OR+%22%23millenialsi%22+OR+%22%23m%C5%82odzi%22+OR+%22%23pandemia%22+OR+%22%23pierwszami%C5%82o%C5%9B%C4%87%22+OR+%22%23praca2021%22+OR+%22%23pracazdalna%22+OR+%22%23rekrutacja2020%22+OR+%22%23rekrutacja2021%22+OR+%22%23rodzina%22+OR+%22%23siedznadupie%22+OR+%22%23solidarno%C5%9B%C4%87%22+OR+%22%23strajkkobiet%22+OR+%22%23studia2020%22+OR+%22%23studia2021%22+OR+%22%23studiazdalne%22+OR+%22%23szko%C5%82a%22+OR+%22%23zdalne%22+OR+%22%23zdalnenauczanie%22+OR+%22%23zostanwdomu%22%29+lang%3Apl&start_time=2020-03-03T00%3A00%3A00Z&end_time=2020-03-04T00%3A00%3A00Z&max_results=100&tweet.fields=attachments%2Cauthor_id%2Ccontext_annotations%2Cconversation_id%2Ccreated_at%2Centities%2Cgeo%2Cid%2Cin_reply_to_user_id%2Clang%2Cpublic_metrics%2Cpossibly_sensitive%2Creferenced_tweets%2Creply_settings%2Csource%2Ctext%2Cwithheld&user.fields=created_at%2Cdescription%2Centities%2Cid%2Clocation%2Cname%2Cpinned_tweet_id%2Cprofile_image_url%2Cprotected%2Cpublic_metrics%2Curl%2Cusername%2Cverified%2Cwithheld&media.fields=duration_ms%2Cheight%2Cmedia_key%2Cpreview_image_url%2Ctype%2Curl%2Cwidth%2Cpublic_metrics%2Calt_text&place.fields=contained_within%2Ccountry%2Ccountry_code%2Cfull_name%2Cgeo%2Cid%2Cname%2Cplace_type&expansions=attachments.poll_ids%2Cattachments.media_keys%2Cauthor_id%2Centities.mentions.username%2Cgeo.place_id%2Cin_reply_to_user_id%2Creferenced_tweets.id%2Creferenced_tweets.id.author_id HTTP/1.1" 200 61537
swps_worker | 2021-11-05 06:29:58,672 [searchtweets.result_stream    ] INFO     paging; total requests read so far: 1
swps_worker | 2021-11-05 06:30:00,674 [searchtweets.result_stream    ] DEBUG    sending request
swps_worker | Traceback (most recent call last):
swps_worker |   File "/root/.cache/pypoetry/virtualenvs/swps-tweet-infrastructure-9TtSrW0h-py3.8/lib/python3.8/site-packages/urllib3/connectionpool.py", line 699, in urlopen
swps_worker |     httplib_response = self._make_request(
swps_worker |   File "/root/.cache/pypoetry/virtualenvs/swps-tweet-infrastructure-9TtSrW0h-py3.8/lib/python3.8/site-packages/urllib3/connectionpool.py", line 445, in _make_request
swps_worker |     six.raise_from(e, None)
swps_worker |   File "<string>", line 3, in raise_from
swps_worker |   File "/root/.cache/pypoetry/virtualenvs/swps-tweet-infrastructure-9TtSrW0h-py3.8/lib/python3.8/site-packages/urllib3/connectionpool.py", line 440, in _make_request
swps_worker |     httplib_response = conn.getresponse()
swps_worker |   File "/usr/local/lib/python3.8/http/client.py", line 1348, in getresponse
swps_worker |     response.begin()
swps_worker |   File "/usr/local/lib/python3.8/http/client.py", line 316, in begin
swps_worker |     version, status, reason = self._read_status()
swps_worker |   File "/usr/local/lib/python3.8/http/client.py", line 277, in _read_status
swps_worker |     line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
swps_worker |   File "/usr/local/lib/python3.8/socket.py", line 669, in readinto
swps_worker |     return self._sock.recv_into(b)
swps_worker |   File "/usr/local/lib/python3.8/ssl.py", line 1241, in recv_into
swps_worker |     return self.read(nbytes, buffer)
swps_worker |   File "/usr/local/lib/python3.8/ssl.py", line 1099, in read
swps_worker |     return self._sslobj.read(len, buffer)
swps_worker | ConnectionResetError: [Errno 104] Connection reset by peer
swps_worker | 
swps_worker | During handling of the above exception, another exception occurred:
swps_worker | 
swps_worker | Traceback (most recent call last):
swps_worker |   File "/root/.cache/pypoetry/virtualenvs/swps-tweet-infrastructure-9TtSrW0h-py3.8/lib/python3.8/site-packages/requests/adapters.py", line 439, in send
swps_worker |     resp = conn.urlopen(
swps_worker |   File "/root/.cache/pypoetry/virtualenvs/swps-tweet-infrastructure-9TtSrW0h-py3.8/lib/python3.8/site-packages/urllib3/connectionpool.py", line 755, in urlopen
swps_worker |     retries = retries.increment(
swps_worker |   File "/root/.cache/pypoetry/virtualenvs/swps-tweet-infrastructure-9TtSrW0h-py3.8/lib/python3.8/site-packages/urllib3/util/retry.py", line 532, in increment
swps_worker |     raise six.reraise(type(error), error, _stacktrace)
swps_worker |   File "/root/.cache/pypoetry/virtualenvs/swps-tweet-infrastructure-9TtSrW0h-py3.8/lib/python3.8/site-packages/urllib3/packages/six.py", line 769, in reraise
swps_worker |     raise value.with_traceback(tb)
swps_worker |   File "/root/.cache/pypoetry/virtualenvs/swps-tweet-infrastructure-9TtSrW0h-py3.8/lib/python3.8/site-packages/urllib3/connectionpool.py", line 699, in urlopen
swps_worker |     httplib_response = self._make_request(
swps_worker |   File "/root/.cache/pypoetry/virtualenvs/swps-tweet-infrastructure-9TtSrW0h-py3.8/lib/python3.8/site-packages/urllib3/connectionpool.py", line 445, in _make_request
swps_worker |     six.raise_from(e, None)
swps_worker |   File "<string>", line 3, in raise_from
swps_worker |   File "/root/.cache/pypoetry/virtualenvs/swps-tweet-infrastructure-9TtSrW0h-py3.8/lib/python3.8/site-packages/urllib3/connectionpool.py", line 440, in _make_request
swps_worker |     httplib_response = conn.getresponse()
swps_worker |   File "/usr/local/lib/python3.8/http/client.py", line 1348, in getresponse
swps_worker |     response.begin()
swps_worker |   File "/usr/local/lib/python3.8/http/client.py", line 316, in begin
swps_worker |     version, status, reason = self._read_status()
swps_worker |   File "/usr/local/lib/python3.8/http/client.py", line 277, in _read_status
swps_worker |     line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
swps_worker |   File "/usr/local/lib/python3.8/socket.py", line 669, in readinto
swps_worker |     return self._sock.recv_into(b)
swps_worker |   File "/usr/local/lib/python3.8/ssl.py", line 1241, in recv_into
swps_worker |     return self.read(nbytes, buffer)
swps_worker |   File "/usr/local/lib/python3.8/ssl.py", line 1099, in read
swps_worker |     return self._sslobj.read(len, buffer)
swps_worker | urllib3.exceptions.ProtocolError: ('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer'))
swps_worker | 
swps_worker | During handling of the above exception, another exception occurred:
swps_worker | 
swps_worker | Traceback (most recent call last):
swps_worker |   File "app/main.py", line 34, in <module>
swps_worker |     single_work()
swps_worker |   File "app/main.py", line 25, in single_work
swps_worker |     worker_service.run()
swps_worker |   File "/app/app/application/worker_service.py", line 53, in run
swps_worker |     tweets = self._scrap_service.scrap(
swps_worker |   File "/app/app/infrastructure/official_twitter_scrap_service.py", line 54, in scrap
swps_worker |     tweets = collect_results(
swps_worker |   File "/root/.cache/pypoetry/virtualenvs/swps-tweet-infrastructure-9TtSrW0h-py3.8/lib/python3.8/site-packages/searchtweets/result_stream.py", line 467, in collect_results
swps_worker |     return list(rs.stream())
swps_worker |   File "/root/.cache/pypoetry/virtualenvs/swps-tweet-infrastructure-9TtSrW0h-py3.8/lib/python3.8/site-packages/searchtweets/result_stream.py", line 375, in stream
swps_worker |     self.execute_request()
swps_worker |   File "/root/.cache/pypoetry/virtualenvs/swps-tweet-infrastructure-9TtSrW0h-py3.8/lib/python3.8/site-packages/searchtweets/result_stream.py", line 415, in execute_request
swps_worker |     resp = request(session=self.session,
swps_worker |   File "/root/.cache/pypoetry/virtualenvs/swps-tweet-infrastructure-9TtSrW0h-py3.8/lib/python3.8/site-packages/searchtweets/result_stream.py", line 77, in retried_func
swps_worker |     raise exc
swps_worker |   File "/root/.cache/pypoetry/virtualenvs/swps-tweet-infrastructure-9TtSrW0h-py3.8/lib/python3.8/site-packages/searchtweets/result_stream.py", line 73, in retried_func
swps_worker |     resp = func(*args, **kwargs)
swps_worker |   File "/root/.cache/pypoetry/virtualenvs/swps-tweet-infrastructure-9TtSrW0h-py3.8/lib/python3.8/site-packages/searchtweets/result_stream.py", line 140, in request
swps_worker |     result = session.get(url, **kwargs)
swps_worker |   File "/root/.cache/pypoetry/virtualenvs/swps-tweet-infrastructure-9TtSrW0h-py3.8/lib/python3.8/site-packages/requests/sessions.py", line 555, in get
swps_worker |     return self.request('GET', url, **kwargs)
swps_worker |   File "/root/.cache/pypoetry/virtualenvs/swps-tweet-infrastructure-9TtSrW0h-py3.8/lib/python3.8/site-packages/requests/sessions.py", line 542, in request
swps_worker |     resp = self.send(prep, **send_kwargs)
swps_worker |   File "/root/.cache/pypoetry/virtualenvs/swps-tweet-infrastructure-9TtSrW0h-py3.8/lib/python3.8/site-packages/requests/sessions.py", line 655, in send
swps_worker |     r = adapter.send(request, **kwargs)
swps_worker |   File "/root/.cache/pypoetry/virtualenvs/swps-tweet-infrastructure-9TtSrW0h-py3.8/lib/python3.8/site-packages/requests/adapters.py", line 498, in send
swps_worker |     raise ConnectionError(err, request=request)
swps_worker | requests.exceptions.ConnectionError: ('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer'))

To Reproduce
I'm running library wrapped in this service:

class OfficialTwitterScrapService(ScrapService):
    _config_file: str
    _premium_search_args: Dict[str, Any]

    def __init__(self, config_file: str):
        self._config_file = config_file
        self._premium_search_args = load_credentials(self._config_file,
                                                     yaml_key="search_tweets_premium",
                                                     env_overwrite=False)

    def scrap(
            self,
            query: str,
            since: Arrow,
            until: Arrow
    ) -> List[RawJsonTwitterResponse]:
        logger.info(
            f'run scrap query :: {query}'
            f' | since :: {since.isoformat()}'
            f' | until :: {until.isoformat()}'
        )
        query = gen_request_parameters(
            query=query,
            granularity=None,
            results_per_call=100,
            start_time=self._get_string_time_from_arrow(since),
            end_time=self._get_string_time_from_arrow(until),
            expansions='attachments.poll_ids,attachments.media_keys,author_id,'
                       'entities.mentions.username,geo.place_id,in_reply_to_user_id,'
                       'referenced_tweets.id,referenced_tweets.id.author_id',
            media_fields='duration_ms,height,media_key,preview_image_url,type,url,width,'
                         'public_metrics,alt_text',
            place_fields='contained_within,country,country_code,full_name,geo,id,name,place_type',
            tweet_fields='attachments, author_id, context_annotations, conversation_id, created_at,'
                         ' entities, geo, id, in_reply_to_user_id, lang, public_metrics,'
                         ' possibly_sensitive, referenced_tweets, reply_settings, source,'
                         ' text, withheld'.replace(' ', ''),
            user_fields='created_at,description,entities,id,location,name,pinned_tweet_id,'
                        'profile_image_url,protected,public_metrics,url,username,verified,withheld'
        )
        tweets = collect_results(
            query,
            max_tweets=10_000_000,
            result_stream_args=self._premium_search_args
        )
        return [RawJsonTwitterResponse(json.dumps(it)) for it in tweets]

    @staticmethod
    def _get_string_time_from_arrow(time: Arrow) -> str:
        return time.isoformat()[:-9]

Expected behavior
I want to scrap tweets without errors

Environment

  • Ubuntu 20.10 -> docker image python:3.8
  • searchtweets-v2 = "^1.1.1"

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions