- 
                Notifications
    You must be signed in to change notification settings 
- Fork 484
          WIP: Reimplementing search_dates
          #945
        
          New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
          
     Open
      
      
            gavishpoddar
  wants to merge
  44
  commits into
  scrapinghub:master
  
    
      
        
          
  
    
      Choose a base branch
      
     
    
      
        
      
      
        
          
          
        
        
          
            
              
              
              
  
           
        
        
          
            
              
              
           
        
       
     
  
        
          
            
          
            
          
        
       
    
      
from
gavishpoddar:search_dates
  
      
      
   
  
    
  
  
  
 
  
      
    base: master
Could not load branches
            
              
  
    Branch not found: {{ refName }}
  
            
                
      Loading
              
            Could not load tags
            
            
              Nothing to show
            
              
  
            
                
      Loading
              
            Are you sure you want to change the base?
            Some commits from the old base branch may be removed from the timeline,
            and old review comments may become outdated.
          
          
  
     Open
                    Changes from 34 commits
      Commits
    
    
            Show all changes
          
          
            44 commits
          
        
        Select commit
          Hold shift + click to select a range
      
      02220da
              
                Implimenting new search_dates
              
              
                gavishpoddar f933d3a
              
                Fixing DATE_ORDER, implimenting deep_search, tests
              
              
                gavishpoddar 77727b5
              
                Unproving _joint_parse with data_carry accurate_return_text,  deep_se…
              
              
                gavishpoddar e7f38e8
              
                implementing  _final_text_clean()
              
              
                gavishpoddar 962066c
              
                Simplifying text_clean and modifying tests
              
              
                gavishpoddar 624ac8e
              
                Implementing relative date
              
              
                gavishpoddar 42ca6f6
              
                Fixing tests
              
              
                gavishpoddar 51749a2
              
                secondary_split_implimentation
              
              
                gavishpoddar f5e4635
              
                positional args to keyword argument
              
              
                gavishpoddar 121b15f
              
                Micro fixes
              
              
                gavishpoddar 2cd93f0
              
                Removing codes now part of #953
              
              
                gavishpoddar 006d2a5
              
                adding check_settings
              
              
                gavishpoddar 10404c9
              
                implimenting double_punctuation_split
              
              
                gavishpoddar 22596e0
              
                Updating docs and removing test (TMP)
              
              
                gavishpoddar b799dfb
              
                cleaning code, adding tests, improving coverage
              
              
                gavishpoddar 42c984a
              
                Merge branch 'scrapinghub:master' into search_dates
              
              
                gavishpoddar 8fc5e0d
              
                Improving  codecov
              
              
                gavishpoddar 74b6ec4
              
                temporary commit to get diff
              
              
                gavishpoddar 56e0505
              
                Merge branch 'search_dates' of https://github.com/gavishpoddar/datepa…
              
              
                gavishpoddar 5a1b1c5
              
                temporary file change for review
              
              
                gavishpoddar aa2aa8f
              
                reverting the previous commit
              
              
                gavishpoddar 41eff6a
              
                improvements
              
              
                gavishpoddar f65531b
              
                formatting code
              
              
                gavishpoddar 982fc08
              
                formatting code
              
              
                gavishpoddar 3621b2d
              
                improvements in text filter
              
              
                gavishpoddar 8a9496b
              
                Merge branch 'scrapinghub:master' into search_dates
              
              
                gavishpoddar 45996b4
              
                removing previous search_dates
              
              
                gavishpoddar 2ac88c6
              
                Merge branch 'search_dates' of https://github.com/gavishpoddar/datepa…
              
              
                gavishpoddar 5dabc62
              
                adding test
              
              
                gavishpoddar ab1778d
              
                fixing doc string
              
              
                gavishpoddar 14adf89
              
                fixing doc string
              
              
                gavishpoddar d57223a
              
                Merge branch 'search_dates' of https://github.com/gavishpoddar/datepa…
              
              
                gavishpoddar 88afa30
              
                updating xfail
              
              
                gavishpoddar 9209f3d
              
                 updating tests
              
              
                gavishpoddar 85254e0
              
                Apply suggestions from code review
              
              
                gavishpoddar e4604e6
              
                Merge branch 'master' into search_dates
              
              
                gavishpoddar 4f119dd
              
                Updates
              
              
                gavishpoddar e6da4be
              
                Fixing upstraem merges
              
              
                gavishpoddar f6116bf
              
                DateSearch -> DateSearchWithDetection
              
              
                gavishpoddar 0525cdc
              
                Merge branch 'scrapinghub:master' into search_dates
              
              
                gavishpoddar 96b91c0
              
                updating test with xfail
              
              
                gavishpoddar b9d12f3
              
                Merge branch 'search_dates' of https://github.com/gavishpoddar/datepa…
              
              
                gavishpoddar 99e66c6
              
                minor fixes
              
              
                gavishpoddar 2935aae
              
                Merge branch 'master' into search_dates
              
              
                serhii73 File filter
Filter by extension
Conversations
          Failed to load comments.   
        
        
          
      Loading
        
  Jump to
        
          Jump to file
        
      
      
          Failed to load files.   
        
        
          
      Loading
        
  Diff view
Diff view
There are no files selected for viewing
  
    
      This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
      Learn more about bidirectional Unicode characters
    
  
  
    
              
              | Original file line number | Diff line number | Diff line change | 
|---|---|---|
| @@ -1,57 +1,119 @@ | ||
| from dateparser.search.search import DateSearchWithDetection | ||
| from dateparser.search.search import DateSearch | ||
| from dateparser.conf import apply_settings | ||
|  | ||
|  | ||
| _search_with_detection = DateSearchWithDetection() | ||
| _search_dates = DateSearch() | ||
|  | ||
|  | ||
| @apply_settings | ||
| def search_dates(text, languages=None, settings=None, add_detected_language=False): | ||
| """Find all substrings of the given string which represent date and/or time and parse them. | ||
| :param text: | ||
| A string in a natural language which may contain date and/or time expressions. | ||
| :type text: str | ||
| :param languages: | ||
| A list of two letters language codes.e.g. ['en', 'es']. If languages are given, it will | ||
| not attempt to detect the language. | ||
| :type languages: list | ||
| :param settings: | ||
| Configure customized behavior using settings defined in :mod:`dateparser.conf.Settings`. | ||
| :type settings: dict | ||
| :param add_detected_language: | ||
| Indicates if we want the detected language returned in the tuple. | ||
| :type add_detected_language: bool | ||
| :return: Returns list of tuples containing: | ||
| substrings representing date and/or time, corresponding :mod:`datetime.datetime` | ||
| object and detected language if *add_detected_language* is True. | ||
| Returns None if no dates that can be parsed are found. | ||
| :rtype: list | ||
| :raises: ValueError - Unknown Language | ||
| >>> from dateparser.search import search_dates | ||
| >>> search_dates('The first artificial Earth satellite was launched on 4 October 1957.') | ||
| [('on 4 October 1957', datetime.datetime(1957, 10, 4, 0, 0))] | ||
| >>> search_dates('The first artificial Earth satellite was launched on 4 October 1957.', | ||
| >>> add_detected_language=True) | ||
| [('on 4 October 1957', datetime.datetime(1957, 10, 4, 0, 0), 'en')] | ||
| >>> search_dates("The client arrived to the office for the first time in March 3rd, 2004 " | ||
| >>> "and got serviced, after a couple of months, on May 6th 2004, the customer " | ||
| >>> "returned indicating a defect on the part") | ||
| [('in March 3rd, 2004 and', datetime.datetime(2004, 3, 3, 0, 0)), | ||
| ('on May 6th 2004', datetime.datetime(2004, 5, 6, 0, 0))] | ||
| """ | ||
| result = _search_with_detection.search_dates( | ||
| :param text: | ||
| A string in a natural language which may contain the date and/or time expressions. | ||
| :type text: str | ||
| :param languages: | ||
| A list of two letters language codes.e.g. ['en', 'es']. If languages are given, it will | ||
| not attempt to detect the language. | ||
| :type languages: list | ||
| :param settings: | ||
| Configure customized behavior using settings defined in :mod:`dateparser.conf.Settings`. | ||
| :type settings: dict | ||
| :param add_detected_language: | ||
| Indicates if we want the detected language returned in the tuple. | ||
| :type add_detected_language: bool | ||
| :return: Returns tuples containing: | ||
|         
                  gavishpoddar marked this conversation as resolved.
              Outdated
          
            Show resolved
            Hide resolved | ||
| substrings representing date and/or time, corresponding :mod:`datetime.datetime` | ||
| object and detected language if *add_detected_language* is True. | ||
| Returns None if no dates that can be parsed are found. | ||
| :rtype: list | ||
| :raises: ValueError - Unknown Language | ||
| >>> from dateparser.search import search_dates | ||
| >>> search_dates('The first artificial Earth satellite was launched on 4 October 1957.') | ||
| [('on 4 October 1957', datetime.datetime(1957, 10, 4, 0, 0))] | ||
| >>> search_dates('The first artificial Earth satellite was launched on 4 October 1957.', | ||
| >>> add_detected_language=True) | ||
| [('on 4 October 1957', datetime.datetime(1957, 10, 4, 0, 0), 'en')] | ||
| >>> search_dates("The client arrived to the office for the first time in March 3rd, 2004 " | ||
| >>> "and got serviced, after a couple of months, on May 6th 2004, the customer " | ||
| >>> "returned indicating a defect on the part") | ||
| [('in March 3rd, 2004 and', datetime.datetime(2004, 3, 3, 0, 0)), | ||
| ('on May 6th 2004', datetime.datetime(2004, 5, 6, 0, 0))] | ||
| """ | ||
|  | ||
| result = _search_dates.search_dates( | ||
| text=text, languages=languages, settings=settings | ||
| ) | ||
| dates = result.get('Dates') | ||
|  | ||
| dates = result.get("Dates") | ||
| if dates: | ||
| if add_detected_language: | ||
| language = result.get('Language') | ||
| dates = [date + (language, ) for date in dates] | ||
| language = result.get("Language") | ||
| dates = [date + (language,) for date in dates] | ||
| return dates | ||
|  | ||
|  | ||
| @apply_settings | ||
| def search_first_date(text, languages=None, settings=None, add_detected_language=False): | ||
| """Find first substring of the given string which represent date and/or time and parse it. | ||
| :param text: | ||
| A string in a natural language which may contain the date and/or time expression. | ||
| :type text: str | ||
| :param languages: | ||
| A list of two letters language codes.e.g. ['en', 'es']. If languages are given, it will | ||
| not attempt to detect the language. | ||
| :type languages: list | ||
| :param settings: | ||
| Configure customized behavior using settings defined in :mod:`dateparser.conf.Settings`. | ||
| :type settings: dict | ||
| :param add_detected_language: | ||
| Indicates if we want the detected language returned in the tuple. | ||
| :type add_detected_language: bool | ||
| :return: Returns tuples containing: | ||
| substrings representing date and/or time, corresponding :mod:`datetime.datetime` | ||
|         
                  gavishpoddar marked this conversation as resolved.
              Outdated
          
            Show resolved
            Hide resolved | ||
| object and detected language if *add_detected_language* is True. | ||
| Returns None if no dates that can be parsed are found. | ||
| :rtype: tuple | ||
| :raises: ValueError - Unknown Language | ||
| >>> from dateparser.search import search_first_date | ||
| >>> search_first_date('The first artificial Earth satellite was launched on 4 October 1957.') | ||
| ('on 4 October 1957', datetime.datetime(1957, 10, 4, 0, 0)) | ||
| >>> from dateparser.search import search_first_date | ||
| >>> search_first_date('Caesar Augustus, also known as Octavian') | ||
| None | ||
| >>> search_first_date('The first artificial Earth satellite was launched on 4 October 1957.', | ||
| >>> add_detected_language=True) | ||
| ('on 4 October 1957', datetime.datetime(1957, 10, 4, 0, 0), 'en') | ||
| >>> search_first_date("The client arrived to the office for the first time in March 3rd, 2004 " | ||
| >>> "and got serviced, after a couple of months, on May 6th 2004, the customer " | ||
| >>> "returned indicating a defect on the part") | ||
| ('in March 3rd, 2004 and', datetime.datetime(2004, 3, 3, 0, 0)) | ||
| """ | ||
|  | ||
| result = _search_dates.search_dates( | ||
| text=text, languages=languages, limit_date_search_results=1, settings=settings | ||
| ) | ||
| dates = result.get("Dates") | ||
| if dates: | ||
| if add_detected_language: | ||
| language = result.get("Language") | ||
| dates = [date + (language,) for date in dates] | ||
| return dates[0] | ||
  
    
      This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
      Learn more about bidirectional Unicode characters
    
  
  
    
              | Original file line number | Diff line number | Diff line change | 
|---|---|---|
| @@ -0,0 +1,49 @@ | ||
| from collections.abc import Set | ||
|  | ||
| from dateparser.search.text_detection import FullTextLanguageDetector | ||
| from dateparser.languages.loader import LocaleDataLoader | ||
|  | ||
|  | ||
| class SearchLanguages: | ||
| def __init__(self): | ||
| self.loader = LocaleDataLoader() | ||
| self.available_language_map = self.loader.get_locale_map() | ||
| self.language = None | ||
|  | ||
| def get_current_language(self, language_shortname): | ||
| if self.language is None or self.language.shortname != language_shortname: | ||
| self.language = self.loader.get_locale(language_shortname) | ||
|  | ||
| def translate_objects(self, language_shortname, text, settings): | ||
| self.get_current_language(language_shortname) | ||
| result = self.language.translate_search(text, settings=settings) | ||
| return result | ||
|  | ||
| def detect_language(self, text, languages): | ||
| if isinstance(languages, (list, tuple, Set)): | ||
|  | ||
| if all([language in self.available_language_map for language in languages]): | ||
| languages = [ | ||
| self.available_language_map[language] for language in languages | ||
| ] | ||
| else: | ||
| unsupported_languages = set(languages) - set( | ||
| self.available_language_map.keys() | ||
| ) | ||
| raise ValueError( | ||
| "Unknown language(s): %s" | ||
| % ", ".join(map(repr, unsupported_languages)) | ||
| ) | ||
| elif languages is not None: | ||
| raise TypeError( | ||
| "languages argument must be a list (%r given)" % type(languages) | ||
| ) | ||
|  | ||
| if languages: | ||
| self.language_detector = FullTextLanguageDetector(languages=languages) | ||
| else: | ||
| self.language_detector = FullTextLanguageDetector( | ||
| list(self.available_language_map.values()) | ||
| ) | ||
|  | ||
| return self.language_detector._best_language(text) | 
      
      Oops, something went wrong.
        
    
  
  Add this suggestion to a batch that can be applied as a single commit.
  This suggestion is invalid because no changes were made to the code.
  Suggestions cannot be applied while the pull request is closed.
  Suggestions cannot be applied while viewing a subset of changes.
  Only one suggestion per line can be applied in a batch.
  Add this suggestion to a batch that can be applied as a single commit.
  Applying suggestions on deleted lines is not supported.
  You must change the existing code in this line in order to create a valid suggestion.
  Outdated suggestions cannot be applied.
  This suggestion has been applied or marked resolved.
  Suggestions cannot be applied from pending reviews.
  Suggestions cannot be applied on multi-line comments.
  Suggestions cannot be applied while the pull request is queued to merge.
  Suggestion cannot be applied right now. Please check back later.
  
    
  
    
Uh oh!
There was an error while loading. Please reload this page.