|
4749 | 4749 | "# <ins>8.</ins> Advanced string manipulation" |
4750 | 4750 | ] |
4751 | 4751 | }, |
| 4752 | + { |
| 4753 | + "cell_type": "markdown", |
| 4754 | + "id": "e34aedb6", |
| 4755 | + "metadata": { |
| 4756 | + "editable": false, |
| 4757 | + "slideshow": { |
| 4758 | + "slide_type": "slide" |
| 4759 | + }, |
| 4760 | + "tags": [] |
| 4761 | + }, |
| 4762 | + "source": [ |
| 4763 | + "## String Methods - Working with String Objects" |
| 4764 | + ] |
| 4765 | + }, |
| 4766 | + { |
| 4767 | + "cell_type": "markdown", |
| 4768 | + "id": "ffef8f84", |
| 4769 | + "metadata": { |
| 4770 | + "editable": false, |
| 4771 | + "slideshow": { |
| 4772 | + "slide_type": "" |
| 4773 | + } |
| 4774 | + }, |
| 4775 | + "source": [ |
| 4776 | + "In Python, strings are objects, and objects have **methods** - functions that belong to and operate on that object.\n", |
| 4777 | + "\n", |
| 4778 | + "**Syntax**: `string_object.method_name(arguments)`\n", |
| 4779 | + "\n", |
| 4780 | + "Key characteristics:\n", |
| 4781 | + "- Methods are called using _dot notation_: `variable.method()`\n", |
| 4782 | + "- String methods return a **new string** (strings are immutable)\n", |
| 4783 | + "- The original string is never modified\n", |
| 4784 | + "- You must assign the result if you want to keep it\n", |
| 4785 | + "\n", |
| 4786 | + "**Example:**" |
| 4787 | + ] |
| 4788 | + }, |
| 4789 | + { |
| 4790 | + "cell_type": "code", |
| 4791 | + "execution_count": null, |
| 4792 | + "id": "d5bd4280", |
| 4793 | + "metadata": { |
| 4794 | + "editable": true, |
| 4795 | + "slideshow": { |
| 4796 | + "slide_type": "" |
| 4797 | + }, |
| 4798 | + "tags": [] |
| 4799 | + }, |
| 4800 | + "outputs": [], |
| 4801 | + "source": [ |
| 4802 | + "text = \"hello world\"\n", |
| 4803 | + "result = text.upper() # Creates new string\n", |
| 4804 | + "print(text) # Still \"hello world\" \n", |
| 4805 | + "print(result) # \"HELLO WORLD\"" |
| 4806 | + ] |
| 4807 | + }, |
4752 | 4808 | { |
4753 | 4809 | "cell_type": "markdown", |
4754 | 4810 | "id": "c9016a2e-dee1-4030-a1a0-8e519a84be6f", |
|
4760 | 4816 | "tags": [] |
4761 | 4817 | }, |
4762 | 4818 | "source": [ |
4763 | | - "* Adjusting case\n", |
4764 | | - "* Formatting strings\n", |
4765 | | - " - _Note_: Modification requires assignment, because these functions return a copy, not modifying the original string\n", |
4766 | | - "* Quering the existence, replacing, splitting" |
| 4819 | + "Common categories of string methods we'll explore:\n", |
| 4820 | + "- **Finding/searching**: `find()`, `index()`\n", |
| 4821 | + "- **Checking**: `startswith()`, `endswith()`\n", |
| 4822 | + "- **Transforming**: `replace()`, `upper()`, `lower()`, `capitalize()`, `strip()`\n", |
| 4823 | + "- **Splitting/joining**: `split()`, `join()`" |
4767 | 4824 | ] |
4768 | 4825 | }, |
4769 | 4826 | { |
|
4777 | 4834 | "tags": [] |
4778 | 4835 | }, |
4779 | 4836 | "source": [ |
4780 | | - "## Finding values \n", |
4781 | | - "* `find()` and `index()` both return index of a substring,\n", |
4782 | | - " - `index()` raises a `ValueError` exception when not found (_exception handling_)\n", |
4783 | | - " - `find()` returns `-1` when a values was not found\n" |
| 4837 | + "## Finding/searching string methods" |
| 4838 | + ] |
| 4839 | + }, |
| 4840 | + { |
| 4841 | + "cell_type": "markdown", |
| 4842 | + "id": "a4fa9b5b", |
| 4843 | + "metadata": { |
| 4844 | + "editable": false, |
| 4845 | + "slideshow": { |
| 4846 | + "slide_type": "" |
| 4847 | + } |
| 4848 | + }, |
| 4849 | + "source": [ |
| 4850 | + "These methods help you locate substrings within a string, returning their position or checking for their presence:\n", |
| 4851 | + "* `index()` - returns `-1` if the substring is not found\n", |
| 4852 | + "* `find()` - raises a `ValueError` if the substring is not found" |
4784 | 4853 | ] |
4785 | 4854 | }, |
4786 | 4855 | { |
|
4817 | 4886 | "print(line.find('wombat'))" |
4818 | 4887 | ] |
4819 | 4888 | }, |
| 4889 | + { |
| 4890 | + "cell_type": "markdown", |
| 4891 | + "id": "714fc569", |
| 4892 | + "metadata": { |
| 4893 | + "editable": false, |
| 4894 | + "slideshow": { |
| 4895 | + "slide_type": "" |
| 4896 | + } |
| 4897 | + }, |
| 4898 | + "source": [ |
| 4899 | + "The key difference is how they handle missing substrings - `find()` returns `-1` while `index()` raises an exception:" |
| 4900 | + ] |
| 4901 | + }, |
4820 | 4902 | { |
4821 | 4903 | "cell_type": "code", |
4822 | 4904 | "execution_count": null, |
|
4847 | 4929 | "tags": [] |
4848 | 4930 | }, |
4849 | 4931 | "source": [ |
4850 | | - "## Checking conditions on strings" |
| 4932 | + "## Checking string methods" |
4851 | 4933 | ] |
4852 | 4934 | }, |
4853 | 4935 | { |
|
4861 | 4943 | "tags": [] |
4862 | 4944 | }, |
4863 | 4945 | "source": [ |
4864 | | - "* Checking whether a string starts or ends a certain way is really common and easy" |
| 4946 | + "These methods return boolean values (`True` or `False`) to verify if a string matches certain patterns or criteria:\n", |
| 4947 | + "* `startswith(substring)`\n", |
| 4948 | + "* `endswith(substring)`" |
4865 | 4949 | ] |
4866 | 4950 | }, |
4867 | 4951 | { |
|
4908 | 4992 | "metadata": { |
4909 | 4993 | "editable": false, |
4910 | 4994 | "slideshow": { |
4911 | | - "slide_type": "slide" |
| 4995 | + "slide_type": "" |
4912 | 4996 | }, |
4913 | 4997 | "tags": [] |
4914 | 4998 | }, |
4915 | 4999 | "source": [ |
4916 | | - "* The canonical way to search a string (if not interested in the index):" |
| 5000 | + "The canonical way to search a string (if not interested in the index):" |
4917 | 5001 | ] |
4918 | 5002 | }, |
4919 | 5003 | { |
|
4944 | 5028 | "tags": [] |
4945 | 5029 | }, |
4946 | 5030 | "source": [ |
4947 | | - "## Replacing a value:" |
| 5031 | + "## Transforming string methods" |
| 5032 | + ] |
| 5033 | + }, |
| 5034 | + { |
| 5035 | + "cell_type": "markdown", |
| 5036 | + "id": "80b12b8b", |
| 5037 | + "metadata": { |
| 5038 | + "editable": false, |
| 5039 | + "slideshow": { |
| 5040 | + "slide_type": "" |
| 5041 | + } |
| 5042 | + }, |
| 5043 | + "source": [ |
| 5044 | + "These methods create modified versions of strings by replacing content, changing case, or removing unwanted characters. _**Remember**: strings are immutable, so these methods always return a new string._\n", |
| 5045 | + "* `replace(old, new)`\n", |
| 5046 | + "* `upper()`\n", |
| 5047 | + "* `lower()`\n", |
| 5048 | + "* `capitalize()`\n", |
| 5049 | + "* `strip()`" |
4948 | 5050 | ] |
4949 | 5051 | }, |
4950 | 5052 | { |
|
4966 | 5068 | ] |
4967 | 5069 | }, |
4968 | 5070 | { |
4969 | | - "cell_type": "markdown", |
4970 | | - "id": "bb4faee5-8c3f-4919-acbe-1b7b2cf7b700", |
| 5071 | + "cell_type": "code", |
| 5072 | + "execution_count": null, |
| 5073 | + "id": "a95784ad-44fc-4e7c-9b16-caa09fedbc60", |
4971 | 5074 | "metadata": { |
4972 | | - "editable": false, |
| 5075 | + "editable": true, |
4973 | 5076 | "slideshow": { |
4974 | | - "slide_type": "slide" |
| 5077 | + "slide_type": "" |
4975 | 5078 | }, |
4976 | 5079 | "tags": [] |
4977 | 5080 | }, |
| 5081 | + "outputs": [], |
4978 | 5082 | "source": [ |
4979 | | - "## Bring all words to a common case" |
| 5083 | + "arc_update = \"ThE HAmILton suPERcompUTER is beiNg UPGraded\"\n", |
| 5084 | + "print(arc_update)\n", |
| 5085 | + "print(arc_update.upper())\n", |
| 5086 | + "print(arc_update.title())\n", |
| 5087 | + "print(arc_update.capitalize())" |
4980 | 5088 | ] |
4981 | 5089 | }, |
4982 | 5090 | { |
4983 | 5091 | "cell_type": "code", |
4984 | 5092 | "execution_count": null, |
4985 | | - "id": "a95784ad-44fc-4e7c-9b16-caa09fedbc60", |
| 5093 | + "id": "22763b5f-7576-43c5-a237-925dce9b88aa", |
4986 | 5094 | "metadata": { |
4987 | 5095 | "editable": true, |
4988 | 5096 | "slideshow": { |
|
4992 | 5100 | }, |
4993 | 5101 | "outputs": [], |
4994 | 5102 | "source": [ |
4995 | | - "arc_update = \"ThE HAmILton suPERcompUTER is beiNg UPGraded\"\n", |
4996 | | - "print(arc_update)\n", |
4997 | | - "print(arc_update.upper())\n", |
4998 | | - "print(arc_update.title())\n", |
4999 | | - "print(arc_update.capitalize())" |
| 5103 | + "user_input = \" [email protected] \"\n", |
| 5104 | + "email = user_input.strip()\n", |
| 5105 | + "print(f\"Original: '{user_input}'\")\n", |
| 5106 | + "print(f\"Cleaned: '{email}'\")" |
5000 | 5107 | ] |
5001 | 5108 | }, |
5002 | 5109 | { |
5003 | 5110 | "cell_type": "markdown", |
5004 | | - "id": "7a25cc7f-6040-4c03-ac38-b5f9a7baef41", |
| 5111 | + "id": "2fd63835", |
5005 | 5112 | "metadata": { |
5006 | 5113 | "editable": false, |
5007 | 5114 | "slideshow": { |
5008 | 5115 | "slide_type": "slide" |
5009 | | - }, |
5010 | | - "tags": [] |
| 5116 | + } |
5011 | 5117 | }, |
5012 | 5118 | "source": [ |
5013 | | - "## Removing white space" |
| 5119 | + "These methods work with any string and don't raise exceptions, but you may want to validate the results." |
5014 | 5120 | ] |
5015 | 5121 | }, |
5016 | 5122 | { |
5017 | 5123 | "cell_type": "code", |
5018 | 5124 | "execution_count": null, |
5019 | | - "id": "22763b5f-7576-43c5-a237-925dce9b88aa", |
5020 | | - "metadata": { |
5021 | | - "editable": true, |
5022 | | - "slideshow": { |
5023 | | - "slide_type": "" |
5024 | | - }, |
5025 | | - "tags": [] |
5026 | | - }, |
| 5125 | + "id": "8f1ceba1", |
| 5126 | + "metadata": {}, |
5027 | 5127 | "outputs": [], |
5028 | 5128 | "source": [ |
5029 | | - "user_input = \" [email protected] \"\n", |
5030 | | - "email = user_input.strip()\n", |
5031 | | - "print(f\"Original: '{user_input}'\")\n", |
5032 | | - "print(f\"Cleaned: '{email}'\")" |
| 5129 | + "# Transforming methods don't raise exceptions, but you can validate transformed results\n", |
| 5130 | + "def process_username(username):\n", |
| 5131 | + " \"\"\"Process and validate a username.\"\"\"\n", |
| 5132 | + " if not isinstance(username, str):\n", |
| 5133 | + " raise TypeError(f\"Username must be a string, got {type(username).__name__}\")\n", |
| 5134 | + " \n", |
| 5135 | + " # Transform: strip whitespace and convert to lowercase\n", |
| 5136 | + " processed = username.strip().lower()\n", |
| 5137 | + " \n", |
| 5138 | + " # Validate the result\n", |
| 5139 | + " if not processed:\n", |
| 5140 | + " raise ValueError(\"Username cannot be empty after processing\")\n", |
| 5141 | + " \n", |
| 5142 | + " if len(processed) < 3:\n", |
| 5143 | + " raise ValueError(f\"Username must be at least 3 characters, got {len(processed)}\")\n", |
| 5144 | + " \n", |
| 5145 | + " return processed\n", |
| 5146 | + "\n", |
| 5147 | + "# Valid case\n", |
| 5148 | + "try:\n", |
| 5149 | + " result = process_username(\" JohnDoe \")\n", |
| 5150 | + " print(f\"\u2713 Processed username: '{result}'\")\n", |
| 5151 | + "except ValueError as e:\n", |
| 5152 | + " print(f\"Error: {e}\")\n", |
| 5153 | + "\n", |
| 5154 | + "# Invalid case\n", |
| 5155 | + "try:\n", |
| 5156 | + " result = process_username(\" AB \")\n", |
| 5157 | + " print(f\"\u2713 Processed username: '{result}'\")\n", |
| 5158 | + "except ValueError as e:\n", |
| 5159 | + " print(f\"Error: {e}\")" |
5033 | 5160 | ] |
5034 | 5161 | }, |
5035 | 5162 | { |
|
5043 | 5170 | "tags": [] |
5044 | 5171 | }, |
5045 | 5172 | "source": [ |
5046 | | - "## Extracting/concatenating the individual words or parts" |
| 5173 | + "## Splitting/joining string methods" |
| 5174 | + ] |
| 5175 | + }, |
| 5176 | + { |
| 5177 | + "cell_type": "markdown", |
| 5178 | + "id": "71a4546c", |
| 5179 | + "metadata": { |
| 5180 | + "editable": false, |
| 5181 | + "slideshow": { |
| 5182 | + "slide_type": "" |
| 5183 | + } |
| 5184 | + }, |
| 5185 | + "source": [ |
| 5186 | + "These methods allow you to break strings apart into lists of substrings, or combine lists of strings into a single string. They're particularly useful for parsing text data or formatting output:\n", |
| 5187 | + "* `split(separator)` - splits string into a list at each occurrence of separator (defaults to whitespace)\n", |
| 5188 | + "* `join(iterable)` - joins elements of an iterable into a single string with the string as separator" |
5047 | 5189 | ] |
5048 | 5190 | }, |
5049 | 5191 | { |
|
5079 | 5221 | "tags": [] |
5080 | 5222 | }, |
5081 | 5223 | "source": [ |
5082 | | - "The operation in the other direction is `a_string.join()` where `a_string` is placed between every string of a list" |
| 5224 | + "The `join` operation is `a_string.join()` where `a_string` is placed between every string of a list" |
5083 | 5225 | ] |
5084 | 5226 | }, |
5085 | 5227 | { |
|
5116 | 5258 | "print(\"\\n\".join(line_list))" |
5117 | 5259 | ] |
5118 | 5260 | }, |
| 5261 | + { |
| 5262 | + "cell_type": "markdown", |
| 5263 | + "id": "0deaf239", |
| 5264 | + "metadata": { |
| 5265 | + "editable": false, |
| 5266 | + "slideshow": { |
| 5267 | + "slide_type": "" |
| 5268 | + } |
| 5269 | + }, |
| 5270 | + "source": [ |
| 5271 | + "`join()` will raise a `TypeError` if the iterable contains non-string elements" |
| 5272 | + ] |
| 5273 | + }, |
| 5274 | + { |
| 5275 | + "cell_type": "code", |
| 5276 | + "execution_count": null, |
| 5277 | + "id": "225bdd40", |
| 5278 | + "metadata": { |
| 5279 | + "editable": true, |
| 5280 | + "slideshow": { |
| 5281 | + "slide_type": "" |
| 5282 | + }, |
| 5283 | + "tags": [] |
| 5284 | + }, |
| 5285 | + "outputs": [], |
| 5286 | + "source": [ |
| 5287 | + "flags = [16384,1048576]\n", |
| 5288 | + "\n", |
| 5289 | + "for i in range(len(flags)):\n", |
| 5290 | + " flags_str = \"+\".join(flags[i])\n", |
| 5291 | + "#flags_str = \"+\".join(str(flag) for flag in flags)\n", |
| 5292 | + "#print(flags_str)" |
| 5293 | + ] |
| 5294 | + }, |
5119 | 5295 | { |
5120 | 5296 | "cell_type": "markdown", |
5121 | 5297 | "id": "a965b5d9-59f4-4ed5-b7e9-89f5f0bcf60f", |
|
5932 | 6108 | "name": "python", |
5933 | 6109 | "nbconvert_exporter": "python", |
5934 | 6110 | "pygments_lexer": "ipython3", |
5935 | | - "version": "3.11.5" |
| 6111 | + "version": "3.13.0" |
5936 | 6112 | } |
5937 | 6113 | }, |
5938 | 6114 | "nbformat": 4, |
|
0 commit comments